[PDF] Patterns of Cognition: Cognitive Algorithms as Galois Connections Fulfilled by Chronomorphisms On Probabilistically Typed Metagraphs

Abstract

It is argued that a broad class of AGI-relevant algorithms can be expressed in a common formal framework, via specifying Galois connections linking search and optimization processes on directed metagraphs whose edge targets are labeled with probabilistic dependent types, and then showing these connections are fulfilled by processes involving metagraph chronomorphisms. Examples are drawn from the core cognitive algorithms used in the OpenCog AGI framework: Probabilistic logical inference, evolutionary program learning, pattern mining, agglomerative clustering, pattern mining and nonlinear-dynamical attention allocation. The analysis presented involves representing these cognitive algorithms as recursive discrete decision processes involving optimizing functions defined over metagraphs, in which the key decisions involve sampling from probability distributions over metagraphs and enacting sets of combinatory operations on selected sub-metagraphs. The mutual associativity of the combinatory operations involved in a cognitive process is shown to often play a key role in enabling the decomposition of the process into folding and unfolding operations; a conclusion that has some practical implications for the particulars of cognitive processes, e.g. militating toward use of reversible logic and reversible program execution. It is also observed that where this mutual associativity holds, there is an alignment between the hierarchy of subgoals used in recursive decision process execution and a hierarchy of subpatterns definable in terms of formal pattern theory.

Full PDF

aa r X i v : . [ c s . A I] F e b Patterns of Cognition:Cognitive Algorithms as Galois ConnectionsFulﬁlled by ChronomorphismsOn Probabilistically Typed Metagraphs

Ben GoertzelFebruary 23, 2021

Abstract

It is argued that a broad class of AGI-relevant algorithms can be expressedin a common formal framework, via specifying Galois connections linking searchand optimization processes on directed metagraphs whose edge targets are labeledwith probabilistic dependent types, and then showing these connections are fulﬁlledby processes involving metagraph chronomorphisms. Examples are drawn fromthe core cognitive algorithms used in the OpenCog AGI framework: Probabilisticlogical inference, evolutionary program learning, pattern mining, agglomerativeclustering, pattern mining and nonlinear-dynamical attention allocation.The analysis presented involves representing these cognitive algorithms as re-cursive discrete decision processes involving optimizing functions deﬁned overmetagraphs, in which the key decisions involve sampling from probability distri-butions over metagraphs and enacting sets of combinatory operations on selectedsub-metagraphs. The mutual associativity of the combinatory operations involvedin a cognitive process is shown to often play a key role in enabling the decomposi-tion of the process into folding and unfolding operations; a conclusion that has somepractical implications for the particulars of cognitive processes, e.g. militating to-ward use of reversible logic and reversible program execution. It is also observedthat where this mutual associativity holds, there is an alignment between the hier-archy of subgoals used in recursive decision process execution and a hierarchy ofsubpatterns deﬁnable in terms of formal pattern theory.

Contents

AGI architectures may be qualitatively categorized into two varieties: single-algorithm-focused versus hybrid/multi-algorithm-focused. In the former case there is a single al-gorithmic approach – say, deep reinforcement learning or uncertain logic inference –which is taken as the core of generally intelligent reasoning, and then other algorithmsare inserted in the periphery providing support of various forms to this core. In thelatter case, multiple diﬀerent algorithmic approaches are assumed to have foundationalvalue for handling diﬀerent sorts of cognitive problem, and a cognitive architectureis introduced that has capability of eﬀectively integrating these diﬀerent approaches,generally including some sort of common knowledge meta-representation and somecross-algorithm methodology for controlling resource allocation. The OpenCog AGIarchitecture pursued by the author and colleagues [GPG13a] [GPG13b] is an exam-ple of the latter, with key component algorithms being probabilistic logical inference[GIGH08], evolutionary program learning [Loo06], pattern mining, concept forma-tion and nonlinear-dynamical attention allocation. In recent years the hybridization2f OpenCog with neural-net learning mechanisms has been explored and in some casespractically utilized as well.In 2016 the notion of "OpenCoggy Probabilistic Programming" – or more formallyProbabilistic Growth and Mining of Combinations (PGMC) [Goe16] – was introducedas a way of representing the various core OpenCog AI algorithms as particular instanti-ations of the same high-level cognitive meta-process. This notion was a specializationof earlier eﬀorts in the same direction such as the notion of forward and backward cogni-tive processes [Goe06] and the "Cognitive Equation" from

Chaotic Logic [Goe94]. Thispaper takes the next step in this direction, presenting a formalization that encompassesPGMC in terms of moderately more general, and in some ways more conventional,mathematical and functional programming constructs deﬁned over typed metagraphs.The ideas presented here are in the spirit of, and draw heavily on, Mu and Oliveira’s[MO12] work on "Programming with Galois Connections," which shows how a varietyof programming tasks can be handled via creating a concise formal speciﬁcation using aGalois connection between one preorder describing a search space and another preorderdescribing an objective function over that search space, and then formally "shrinking"that speciﬁcation into an algorithm for optimizing the objective function over the searchspace. This approach represents a simpliﬁcation and specialization of broader ideasabout formal speciﬁcations and program derivation that have been developing gradu-ally in the functional programming literature over the past decades, e.g. [BDM96] andits descendants. In Mu and Oliviera [MO12], the Galois Connection methodology is ap-plied to greedy algorithms, dynamic programming and scheduling, among other cases.Here we sketch out an approach to applying this methodology to several of the core AIalgorithms underlying the OpenCog AGI design: probabilistic reasoning, evolutionaryprogram learning, nonlinear-dynamical attention allocation, and concept formation.Many of Mu and Oliveira’s examples of shrinking involve hylomorphisms over listsor trees. We present a framework called COFO (Combinatory-Operation based Func-tion Optimization) that enables a variety of OpenCog’s AGI-related algorithms to berepresented as Discrete Decision Systems (DDSs) that can be executed via either greedyalgorithms or (generally approximate and stochastic) dynamic programming type algo-rithms. The representation of a cognitive algorithm as a DDS via COFO entails rep-resenting the objective of the cognitive algorithm in a linearly decomposable way (sothat the algorithm can be cast roughly in terms of expected utility maximization), andrepresenting the core operations of the cognitive algorithm as the enaction of a set ofcombinatory operations.Cognitive algorithms that have a COFO-style representation as greedy algorithmsdecompose nicely into folding operations (histomorphisms if the algorithms carry long-term memory through their operations) via straightforward applications of results from[MO12]. Cognitive algorithms whose COFO-style representation require some form ofdynamic programming style recursion are a bit subtler, but in the case where the combi-natory operations involved are mutually associative, they decompose into chronomor-phisms (hylomorphisms with memory). It is also observed that where this mutual asso-ciativity holds, there is an alignment between the hierarchy of subgoals used in recursivedecision process execution and a hierarchy of subpatterns deﬁnable in terms of formalpattern theory. The needed associativity holds very naturally in cases such as agglom-erative clustering and pattern mining; to make it hold for logical inference is a little3ubtler and the most natural route involves the use of reversible inference rules.The decisions made by the DDSs corresponding to AGI-oriented cognitive algo-rithms tend to involve sampling from distributions over typed metagraphs whose typesembody key representational aspects of the algorithms such as paraconsistent and prob-abilistic truth values. The search-space preorder in the Galois connection describingthe problem faced by the DDS encompasses metapaths within metagraphs, and theobjective-function pre-order represents the particular goal of the AI algorithm in ques-tion. Folding over a metagraph via a futumorphism (a catamorphism with memory)encompasses the process of searching a metagraph for suitable metapaths; unfoldingover the metagraph via a histomorphism (an anamorphism with memory) encompassesthe process of extracting the result of applying an appropriate transformation acrossmetapaths to produce a result. Metagraph chronomorphism composing these fold andunfold operations fulﬁlls the speciﬁcation implicit in the Galois connection.The generality of the mathematics underlying these results also suggests some natu-ral extensions to quantum cognitive algorithms, which however are only hinted at here,based on the elegant mappings that exist between the Bellman functional equation fromdynamic programming and the Schrodinger equation.On a more practical level, it is also observed that, in a practical AGI context, it isoften not appropriate to enact a single fold or unfold operation across a large metagraphas an atomic action; rather one needs to interleave multiple folds and unfolds across thesame metagraph, and sometimes pause those that are not looking suﬃciently promising.One way to achieve this is to implement one’s folds and unfolds (e.g. futumorphisms andhistomorphisms) in continuation-passing style, as David Raab [Raa16] has discussed inthe context of simple catamorphisms and anamorphisms.Because AGI systems necessarily involve dynamic updating of the knowledge baseon which cognitive algorithms are acting, in the course of the cognitive algorithm’sactivity, there is an unavoidable heuristic aspect to the application of the theory givenhere to real AGI systems. The equivalence of a recursively deﬁned DDS on a metagraphto a folding and unfolding process across that metagraph only holds rigorously if oneassumes the metagraph is not changing during the folding — which will not generally bethe case. What needs to happen in practice, I suggest, is that the folding and unfoldinghappen and they do change the metagraph, and one then has a complex self-organizing /self-modifying system that is only moderately well approximated by the idealized casedirectly addressed by the theory presented here.One expects that, if this recursive self-modiﬁcation obeys appropriate statisticalproperties, one can model the resulting system using appropriate statistical modiﬁca-tions of analyses that work for DDDs that remain static while folding and unfoldingoccur. But this analytical extension has not yet been carried out.Due to subtleties of this nature, we can’t claim that the analysis of "patterns of cog-nition" given here is a full mathematical analysis of AGI-oriented cognition. However,we do suggest that it may be a decent enough approximation to be useful for guiding var-ious sorts of practical work – including for instance the detailed design of AGI-orientedprogramming languages. In the penultimate section we explore some potential impli-cations of the ideas presented here for the design of an interpreter of an AGI language– a topic highly pertinent to current practical work designing the Atomese 2 languagefor use within the OpenCog Hyperon system [Goe20c].4

Framing Probabilistic Cognitive Algorithms as Ap-proximate Stochastic Greedy or Dynamic-ProgrammingOptimization

We explain here how a broad class of cognitive algorithms related to estimating proba-bility distributions using combinational operators can be expressed in terms of a recur-sive algorithmic framework incorporating stochastic greedy optimization and approx-imate stochastic dynamic programming. This observation does not directly provide alarge degree of practical help in implementing these cognitive algorithms eﬃciently,but it may help in putting multiple diverse-looking cognitive algorithms in a commonframework, and in guiding formal analysis of these algorithms – which analysis may inturn help with optimization.In Section 5.2 we will review Mu and Oliviera’s general formulation of greedy al-gorithms and dynamic programming in terms of Galois connections, which the ideas ofthis section then render applicable to a variety of probabilistic cognitive algorithms. TheGalois connection formalism then ties in with representations of algorithms in terms ofhylomorphisms and chronomorphisms on underlying knoweldge stores such as typedmetagraphs – representations with clear implementations for practical implementation. Consider a discrete decision system (DDS) deﬁned on 𝑛 stages in which each stage 𝑡 = 1 , … , 𝑛 is characterized by• an initial state 𝑠 𝑡 ∈ 𝑆 𝑡 , where 𝑆 𝑡 is the set of feasible states at the beginning ofstage 𝑡 ;• an action or ”’decision variable”’ 𝑥 𝑡 ∈ 𝑋 𝑡 , where 𝑋 𝑡 is the set of feasible actionsat stage 𝑡 – note that 𝑋 𝑡 may be a function of the initial state 𝑠 𝑡 ;• an immediate cost/reward function 𝑝 𝑡 ( 𝑠 𝑡 , 𝑥 𝑡 ) , representing the cost/reward atstage 𝑡 if 𝑠 𝑡 is the initial state and 𝑥 𝑡 the action selected;• a state transition function 𝑔 𝑡 ( 𝑠 𝑡 , 𝑥 𝑡 ) that leads the system towards state 𝑠 𝑡 +1 = 𝑔 𝑡 ( 𝑠 𝑡 , 𝑥 𝑡 ) .In this setup we can frame greedy optimization and also both deterministic and stochas-tic dynamic programming in a reasonably general way.The greedy case is immediate to understand. One begins with an initial state, chosenbased on prior knowledge or via purely randomly or via appropriately biased stochasticselection. Then one chooses an action with a probability proportional to immediate Some text in this section is borrowed, with paraphrasing and other changes, from the Wikipedia articleon Stochastic Dynamic Programming [Wik20] 𝑤 𝑡 ∈ 𝑊 𝑡 , and then deﬁne the membersof 𝑋 𝑡 as subsets of 𝑊 𝑡 . In this case each action 𝑥 𝑡 represents a set of 𝑤 𝑡 being executedconcurrently. Let 𝑓 𝑡 ( 𝑠 𝑡 ) represent the optimal cost/reward obtained by following an ”optimal policy”over stages 𝑡, 𝑡 + 1 , … , 𝑛 . Without loss of generality in what follow we will consider areward maximization setting. In deterministic dynamic programming one usually dealswith functional equations taking the following structure 𝑓 𝑡 ( 𝑠 𝑡 ) = max 𝑥 𝑡 ∈ 𝑋 𝑡 { 𝑝 𝑡 ( 𝑠 𝑡 , 𝑥 𝑡 ) + 𝑓 𝑡 +1 ( 𝑠 𝑡 +1 )} where 𝑠 𝑡 +1 = 𝑔 𝑡 ( 𝑠 𝑡 , 𝑥 𝑡 ) and the boundary condition of the system is 𝑓 𝑛 ( 𝑠 𝑛 ) = max 𝑥 𝑛 ∈ 𝑋 𝑛 { 𝑝 𝑛 ( 𝑠 𝑛 , 𝑥 𝑛 )} . The aim is to determine the set of optimal actions that maximize 𝑓 ( 𝑠 ) . Given thecurrent state 𝑠 𝑡 and the current action 𝑥 𝑡 , we ”know with certainty” the reward securedduring the current stage and ? thanks to the state transition function 𝑔 𝑡 –the future statetowards which the system transitions. Stochastic Dynamic Programming

Where does the stochastic aspect of this kindof optimization come in? Basically: Even if we know the state of the system at thebeginning of the current stage as well as the decision taken, the state of the system atthe beginning of the next stage and the current period reward are often random variablesthat can be observed only at the end of the current stage.”’Stochastic dynamic programming”’ deals with problems in which the current pe-riod reward and/or the next period state are random, i.e. with multi-stage stochasticsystems. The decision maker’s goal is to maximize expected (discounted) reward overa given planning horizon.In their most general form, stochastic dynamic programs deal with functional equa-tions taking the following structure 6 𝑡 ( 𝑠 𝑡 ) = max 𝑥 𝑡 ∈ 𝑋 𝑡 ( 𝑠 𝑡 ) { ( expected reward during stage 𝑡 ∣ 𝑠 𝑡 , 𝑥 𝑡 ) + 𝛼 ∑ 𝑠 𝑡 +1 Pr( 𝑠 𝑡 +1 ∣ 𝑠 𝑡 , 𝑥 𝑡 ) 𝑓 𝑡 +1 ( 𝑠 𝑡 +1 ) } where• 𝑓 𝑡 ( 𝑠 𝑡 ) is the maximum expected reward that can be attained during stages 𝑡, 𝑡 +1 , … , 𝑛 , given state 𝑠 𝑡 at the beginning of stage 𝑡 ;• 𝑥 𝑡 belongs to the set 𝑋 𝑡 ( 𝑠 𝑡 ) of feasible actions at stage 𝑡 given initial state 𝑠 𝑡 ;• 𝛼 is the discount factor;• Pr( 𝑠 𝑡 +1 ∣ 𝑠 𝑡 , 𝑥 𝑡 ) is the conditional probability that the state at the beginning ofstage 𝑡 is 𝑠 𝑡 +1 given current state 𝑠 𝑡 and selected action 𝑥 𝑡 .Markov decision processes represent a special class of stochastic dynamic pro-grams in which the underlying stochastic process is a stationary process that featuresthe Markov property. But in AGI or other real-world AI contexts, stationarity is oftennot an acceptable assumption.Exactly solving the above equations is rarely possible in complex real-world situ-ations – leading on to approximate stochastic dynamic programming [Pow09], and ofcourse to reinforcement learning (RL) which is commonly considered as a form of ap-proximate stochastic dynamic programming. Basically, in an RL approach, one doesn’ttry to explicitly solve the dynamic-programming functional equations even approxima-tively, but instead one seeks to ﬁnd a "policy" for determining actions based on states,which will approximately give the same results as a solution to the functional equations.If the policy is represented as a neural net, for example, then it may be learned via itera-tive algorithms for adjusting the weights in the network. However, existing RL methodsare only eﬀective approximations given fairly particular choices of the reward functionand state and action space. For the rewards, states and actions that will interest us mosthere, current RL methods are not particularly useful and other sorts of approximationtechniques must be sought. Dynamic Cognition Breaks Expected Reward Maximization

One key assumptionunderlying the speciﬁc approach to DDS outlined above is that the overall goal of thesystem is to maximize expected reward, deﬁned as the sum of the immediate rewards.So for instance if one is trying to ﬁnd the min-cost path from A to B in a weighted graph,the path cost is the sum of the costs of the edges along the path. It’s worth recallingthat this is actually a very special type of goal function for a system to have. Evensticking within the framework of optimization-based choice-making, a more generalapproach would be to look at a system that makes decisions based on trying to maximizesome function Φ deﬁned over its whole future history – without demanding that Φ bedecomposed as a sum of immediate rewards [Ben09].One can argue that any such Φ could be decomposed into immediate reward, bydeﬁning the immediate reward of an action as the maximum achievable degree of Φ 𝑝 𝑡 ( 𝑠 𝑡 , 𝑥 𝑡 ) to involve notonly straightforward data about the state 𝑠 𝑡 and action 𝑥 𝑡 but also data regarding the pasthistory and possible futures following on from 𝑠 𝑡 . That is, it ﬁts into the DDS frameworkonly if one assumes a setup in which each state 𝑠 𝑡 contains very rich information aboutpast and future history, and exists not as a delimited package of information about asystem at one point in time, but rather as a sort of pointer to a certain position withina broader past/future history. In this sort of setup one can deﬁne immediate rewardsthat depend on past and future history. This sort of setup allows one to bypass certainconceptual pathologies associated with conventional RL and "wireheading" (wherein asystem achieves its goals in a misleading way by modifying itself in such a way as tomodify its interpretation of its goals) [Ben09]. However, it also eliminates the simplepracticality of the standard interpretations of the DDS / dynamic programming / RLframework, and one could argue eliminates most of the value of this framework.Another approach to generalize the standard dynamic programming formulae wouldbe to stick with the localized deﬁnition of states and the more restricted sort of imme-diate reward – but to accept that the actions a system takes may aﬀect its long termmemory, in such a way that after a system has chosen a certain action at a certain pointin execution of a DDS process, the knowledge generated by this action might in somecases aﬀect the result that would be obtained if it were to redo some of the immediate-reward calculations it has already made earlier while executing the DDS process. Incases like this, one cannot guarantee that the decision ultimately made by the systemwhile pursuing immediate reward step-by-step is going to result in maximization of theoverall reward for the system, as judged by the system based on all the knowledge it hasin its memory at the end of the DDS process.One could represent this sort of process schematically via assuming that the states 𝑠 𝑡 contain knowledge that is mutable based on chosen actions, and looking at a modiﬁedfunctional equation of the form 𝑓 𝑡 ( 𝑠 𝑡 ) = max 𝑥 𝑡 ∈ 𝑋 𝑡 ( 𝑠 𝑡 ) { ( expected reward during stage 𝑡 ∣ 𝑠 𝑡 , 𝑥 𝑡 )+ 𝛼 ∑ 𝑠 𝑡 +1 Pr( 𝑠 𝑡 +1 ∣ 𝑠 𝑡 , 𝑥 𝑡 ) 𝑓 𝑡 +1 ( 𝑠 𝑡 +1 )+ 𝛼 ∑ 𝑠 𝑡 −1 Pr( 𝑠 𝑡 −1 ∣ 𝑠 𝑡 , 𝑥 𝑡 ) 𝑓 𝑡 −1 ( 𝑠 𝑡 −1 ) } where Pr( 𝑠 𝑡 −1 ∣ 𝑠 𝑡 , 𝑥 𝑡 ) denotes the probability of 𝑠 𝑡 −1 being the state that leads up to 𝑠 𝑡 calculated across the possible worlds where 𝑠 𝑡 −1 references a knowledge base in-corporating the changes that were made in the course of choosing action 𝑥 𝑡 during 𝑠 𝑡 .However this sort of modiﬁed time-symmetric Bellman equation lacks an elegant iter-ative solution like the ordinary Bellman equation.The practical approach we suggest for modeling currently relevant real-world cog-nitive systems is to take the ordinary expected reward based DDS as formulated above,with states and immediate rewards deﬁned in a temporally localized way as usual, as aﬁrst approximation to cognitive dynamics. It is not a fully accurate model of cognitivedynamics because states do sometimes encapsulate future and past histories in subtle8ays, and because actions impact knowledge in ways that impact systems’ retrospectiveviews of previously calculated immediate rewards. However it’s a reasonable model ofa lot of cognitive dynamics. Large deviations from this approximation are cognitivelyinteresting and sometimes important. Small deviations may be modeled by stochasticvariation on conventional dynamic programming, though we do not pursue this in depthhere.In particular, we suggest, deliberative abstract cognition is sometimes able to devi-ate from the simplistic DDS approach and incorporate a more holistic view of actionselection incorporating past and future history. This may be important for the dynam-ics of top-level goals in deliberative cognitive systems, and e.g. for the ability of ad-vanced cognitive systems to avoid wireheading traps that expected reward maximizerswill more easily fall into. On the other hand, we hypothesize that eﬀective large-scalecognition using limited computational resources generally depends on setting up situa-tions where a more straightforward greedy or dynamic programming like DDS approachapplies. Now we consider a particular sort of computational challenge, involving the making ofa series of decisions involving how to best use a set of combinational operators 𝐶 𝑖 togain information about maximizing a function 𝐹 (or Pareto optimizing a set of functions { 𝐹 𝑖 } ) via sampling evaluations of 𝐹 ( { 𝐹 𝑖 } ). For simplicity we’ll present this process –which we call COFO for Combinatory-Operation Based Function Optimization – in thecase of a single function 𝐹 but the same constructs work for the multiobjective case. Wewill show here how COFO can be represented as a Discrete Decision Process, whichcan then be enacted in greedy or dynamic programming style.Given a function 𝐹 ∶ 𝑋 → 𝑅 (where 𝑋 is any space with a probability measureon it and 𝑅 is the reals), let  denote a "dataset" comprising ﬁnite subset of the graph  ( 𝐹 ) of 𝐹 , i.e. a set of pairs ( 𝑥, 𝐹 ( 𝑥 )) . We want to introduce a measure 𝑞 𝐹 (  ) whichmeasures how much guidance  gives toward the goal of ﬁnding 𝑥 that make 𝐹 ( 𝑥 ) large.The best measure will often be application-speciﬁc. However, it is also worth artic-ulating a sort of "default" general-purpose approach. One way to do this is to assumea prior probability measure over the space  of all measurable functions 𝑓 with thesame domain and range as 𝐹 . Given a threshold value 𝜌 and a function 𝑓 ∈  , let 𝑀 𝜌 ( 𝑓 ) denote the set of 𝑥 so that 𝑓 ( 𝑥 ) ≥ 𝑓 ( 𝑦 ) with probability at least 𝜌 for arandomly selected 𝑦 ∈ 𝑋 . I.e. 𝑀 𝜌 ( 𝑓 ) is the set of arguments 𝑥 for which 𝑓 ( 𝑥 ) isin the top 𝜌 values for 𝑓 . One can then set 𝑞 𝜌,𝐹 (  ) equal to the entropy of the set 𝑀 𝐷𝜌 = ⋃ 𝑓 ∶ 𝐷 ∈  ( 𝑓 ) 𝑀 𝜌 ( 𝑓 ) .If this entropy is large, then knowing 𝑓 agrees with 𝐹 on 𝐷 doesn’t narrow thingsdown much in terms of ﬁnding 𝑥 that make 𝐹 big – it still leave a wide spread ofpossibilities. If this entropy is small, then the set of 𝑥 that make 𝑓 big for functions 𝑓 that agree with 𝐹 on 𝐷 is a strongly localized distribution in  – so that knowing 𝐷 meaningfully focuses the search space for 𝑥 that make 𝐹 big.Given two such datasets 𝐷 and 𝐷 one can of course look at the relative quality9 𝜌,𝐹 ( 𝐷 , 𝐷 ) equal to the relative entropy of 𝑀 𝐷 𝜌 given 𝑀 𝐷 𝜌 .We can then look at greedy or dynamic programming processes aimed at graduallybuilding a set 𝐷 in a way that will maximize 𝑞 𝜌,𝐹 ( 𝐷 ) . Speciﬁcally, in a cognitive algo-rithmics context it is interesting to look at processes involving combinatory operations 𝐶 𝑖 ∶ 𝑋 × 𝑋 → 𝑋 with the property that 𝑃 ( 𝐶 𝑖 ( 𝑥, 𝑦 ) ∈ 𝑀 𝐷𝜌 | 𝑥 ∈ 𝑀 𝐷𝜌 , 𝑦 ∈ 𝑀 𝐷𝜌 ) ≫ 𝑃 ( 𝑧 ∈ 𝑀 𝐷𝜌 | 𝑧 ∈ 𝑋 ) . That is, given 𝑥, 𝑦 ∈ 𝑀 𝐷𝜌 , combining 𝑥 and 𝑦 using 𝐶 𝑖 has surprisingly high proba-bility of yielding 𝑧 ∈ 𝑀 𝐷𝜌 .Given combinatory operators of this nature, one can then approach gradually build-ing a set 𝐷 in a way that will maximize 𝑞 𝜌,𝐹 ( 𝐷 ) , via a route of successively applyingcombinatory operators 𝐶 𝑖 to the members of a set 𝐷 𝑗 to obtain a set 𝐷 𝑗 +1 .Framing this COFO process in terms of the above-described Discrete Decision Pro-cess:1. A state 𝑠 𝑡 is a dataset 𝐷 formed from function 𝐹

2. An action is the formation of a new entity 𝑧 by(a) Sampling 𝑥, 𝑦 from 𝑋 and 𝐶 𝑖 from the set of available combinatory opera-tors, in a manner that is estimated likely to yield 𝑧 = 𝐶 𝑖 ( 𝑥, 𝑦 ) with 𝑧 ∈ 𝑀 𝐷𝜌 i. As a complement or alternative to directly sampling, one can performprobabilistic inference of various sorts to ﬁnd promising ( 𝑥, 𝑦, 𝐶 𝑖 ) . Thisprobabilistic inference process itself may be represented as a COFOprocess, as we show below via expressing PLN forward and backwardchaining in terms of COFO(b) Evaluating 𝐹 ( 𝑧 ) , and setting 𝐷 ∗ = 𝐷 ∪ ( 𝑧, 𝐹 ( 𝑧 )) .3. The immediate reward is an appropriate measure of the amount of new infor-mation about making 𝐹 big that was gained by the evaluation 𝐹 ( 𝑧 ) . The rightmeasure may depend on the speciﬁc COFO application; one fairly generic choicewould be the relative entropy 𝑞 𝜌,𝐹 ( 𝐷 ∗ , 𝐷 ) State transition : setting the new state 𝑠 𝑡 +1 = 𝐷 ∗ A concurrent-processing version of this would replace 2a with a similar step in whichmultiple pairs ( 𝑥, 𝑦 ) are concurrently chosen and then evaluated.In the case where one pursues COFO via dynamic programming, it becomes stochas-tic dynamic programming because of the probabilistic sampling in the action. The sam-pling step in the above can be speciﬁed in various ways, and incorporates the familiar(and familiarly tricky) exploration/exploitation tradeoﬀ. If probabilistic inference isused along with sampling, then one may have a peculiar sort of stochastic dynamic pro-gramming in which the step of choosing an action involves making an estimation thatitself may be usefully carried out by stochastic dynamic programming (but with a dif-ferent objective function than the objective function for whose optimization the actionis being chosen). 10asically, in the COFO framework we are looking at the process of optimizing 𝐹 as an explicit dynamical decision process conducted via sequential application of anoperation in which: Operations 𝐶 𝑖 that combine inputs chosen from a distribution in-duced by prior objective function evaluations, are used to get new candidate argumentsto feed to 𝐹 for evaluation. The reward function guiding this exploration is the questfor reduction of the entropy of the set of guesses at arguments that look promising tomake 𝐹 near-optimal based on the evaluations made so far.As noted at the start, the same COFO process can be applied equally well the case ofPareto-optimizing a set of objective functions. The deﬁnition of 𝑀 𝐷𝜌 must be modiﬁedaccordingly and then the rest follows.Actually carrying out an explicit stochastic dynamic programming algorithm ac-cording to the lines described above, will prove computationally intractable in mostrealistic cases. However, we shall see below that the formulation of the COFO processas dynamic programming (or simpler greedy sequential choice based optimization) pro-vides a valuable foundation for theoretical analysis. The next thread in the tapestry we’re weaving here is the observation that the cogni-tive algorithms centrally used in the OpenCog AGI architecture, which encompass wellknown AI paradigms like logical reasoning, evolutionary learning and clustering – canbe formulated in COFO terms, where the functions 𝐹 involved are deﬁned on subgraphsof typed metagraphs (TMGs). This allows us to argue that these AI algorithms can beviewed as judicious, computationally practical approximations to greedy optimizationor stochastic dynamic programming for COFO in a TMG context – somewhat as classicRL is an eﬀective approximation to dynamic programming in other cases.Looking forward to Section 5.2 below, this will then allow us to represent thesecognitive operations as Galois connections, thus enabling us to represent their solu-tions elegantly in terms of hylomorphisms and chronomorphisms on TMGs. The roleof dynamic programming here is then, in part, to serve as the connecting formalismbetween a diverse set of cognitive algorithms and a general formulation in terms ofchronomorphisms.Given that standard RL is also an eﬀective approximation of stochastic dynamicprogramming for diﬀerent sorts of states and actions, it seems this formal approachmay also provide an eﬀective way to think about the integration of RL with other cog-nitive algorithms such as logical inference and evolutionary learning. I.e. if one con-siders states that include the sorts of real vectors that conventional RL works well with,along with the sub-TMGs that various other cognitive algorithms work well with, thenone can think about an approximate approach to solving the corresponding dynamicprogramming problem that incorporates common RL-type model learning mechanismswith combinatory-operation-based learning mechanisms as are embodied in these othercognitive algorithms.We will focus here on the algorithmic processes involved in the various cognitive11unctions considered, rather than on knowledge representation issues, but this is be-cause we are assuming the context of the OpenCog framework [GPG13a] [GPG13b],in which the various cognitive algorithms considered represent their input and outputdata and their internal states and structures in terms of the optionally-directed typedmetagraph called the Atomspace. We also assume that the type system on the Atom-space metagraph includes in its vocabulary the paraconsistent and probabilistic typesdescribed in [Goe21c]. The core actions involved in most of the cognitive algorithmsdescribed here involve sampling from distributions on a metagraph – i.e. "metagraphprobabilistic programming" as described in [Goe21c]. What we are doing here is lever-aging the process of sampling from distributions on subgraphs of typed metagraph tocreate cognitive algorithms, via observing that a great variety of important cognitivealgorithms can be represented in terms of a fairly stereotyped discrete decision pro-cess involving pursuit of function optimization via iterative application of combinatoryoperators acting on sampled sub-metagraphs. Forward chaining inference in Probabilistic Logic Networks [GIGH08] or other similarframeworks may be based on a quality metric measuring the interestingness or surpris-ingness of an inference conclusion (see [Goe21b] for a formalization of surprisingnessin an OpenCog context). The general process involved is then, qualitatively:• Choose reasonably interesting premises and combine them with an inference ruleto get a conclusion that is hopefully also interesting.• Select the inference rule based on past experience regarding which rules tend towork for generating interesting conclusions from those sorts of premises.In this case the objective function 𝐹 of the COFO process is the interestingness measure.The use of probabilistic inference for choosing actions within the action-selectionstep of a stochastic dynamic programming approach to COFO may be formulated in thisway. Here one is seeking ( 𝑥, 𝑦, 𝐶 𝑖 ) so that 𝜒 𝑀 𝐷𝜌 ( 𝐶 𝑖 ( 𝑥, 𝑦 )) is large, where 𝜒 𝑀 𝐷𝜌 ( 𝑧 ) is thefuzzy membership degree of 𝑀 𝐷𝜌 considered as a fuzzy set. So one has 𝐹 ( 𝑥, 𝑦, 𝐶 𝑖 ) = 𝜒 𝑀 𝐷𝜌 ( 𝐶 𝑖 ( 𝑥, 𝑦 )) , which in context is a measure of the interestingness of ( 𝑥, 𝑦, 𝐶 𝑖 ) .We then have a DDS formalization via1. A state 𝑠 𝑡 is a set of statements labeled with interestingness values2. An action is the formation of a new statement by(a) Sampling premise statements 𝑥, 𝑦 from the knowledge base and 𝐶 𝑖 from theset of available inference rules, in a manner that is estimated likely to yield 𝑧 = 𝐶 𝑖 ( 𝑥, 𝑦 ) that is interestingi. As a complement or alternative to directly sampling, one can performprobabilistic inference to ﬁnd promising premise / inference-rule com-binations in context. This is "inference-based inference control."12b) Doing the selected inference to create a statement 𝑧 , evaluating this state-ment’s interestingness and creating an updated state by adding the statementlabeled with its interestingness to the previous state3. The immediate reward is the amount of new information gained about how toﬁnd interesting statements, via the process of ﬁnding 𝑧 and evaluating its inter-estingness4. State transition : setting the new state equal to the updated state. just created,and then iterating again by returning to "Sampling premise statements..." etc.A concurrent version of this would sample multiple ( 𝑥, 𝑦, 𝐶 𝑖 ) concurrently in step 2a.Similar strategies work for making all the cognitive processes considered below con-current, so we will not explicitly note this in each case.The key conceptual constraint placed on the measure of interestingness used hereis that it must be meaningful to assess the interestingness of an inference dag as thesum of the interestingnesses of the nodes in the dag. For instance one could deﬁnethe interestingness of an inference result via mutual information. A probabilistic logi-cal statement 𝑆 ( 𝑣 ) , with variable-arguments 𝑣 drawn from a knowledge base (such asan OpenCog Atomspace), can be associated with a probability distribution 𝜈 over theknowledge-base in which 𝜈 ( 𝑣 ) is proportional to 𝑆 ( 𝑣 ) .One can then calculate entropy (information) and mutual information among suchdistributions. Interestingness of an inference result can then be measured as the condi-tional information-theoretic surprisingness (information gain) of the result, where theconditioning is done on a combination of the predecessors of the inference in the infer-ence dag plus the assumed background knowledge. The needed linear decompositionthen emerges from the algebra of mutual information, if the knowledge base is assumedconstant throughout the inference.Alternatively, one could deﬁne a distinction graph [Goe19] on the knowledge basevia drawing a link between 𝑣 and 𝑤 if | 𝑆 ( 𝑣 ) − 𝑆 ( 𝑤 ) | < 𝜖 , and then calculate absoluteand conditional graphtropies of these distinction graphs. Interestingness of an infer-ence result can then be measured as the conditional graphtropy of the result, where theconditioning is done on a combination of the predecessors of the inference in the infer-ence dag plus the assumed background knowledge. The needed linear decompositionfollow from graphtropy algebra, if the knowledge base is assumed constant throughoutthe inference.A more eﬃcient alternative to trying to explicitly solve the stochastic dynamic pro-gramming functional equation, in this case, is to build a large model of which seriesof choices ( 𝑥, 𝑦, 𝐶 𝑖 ) seem to be eﬀective in which contexts. This is what PLN "adap-tive inference control" aims to do [GIGH08]. Doing explicit stochastic dynamic pro-gramming but using the (2ai) "inference based inference control" option is one way ofimplementing this, as each instance of the underlying inference can make use of knowl-edge generated and saved when applying similar inference elsewhere in the dynamicprogramming process. 13 .2 Backward Chaining Inference To give a similar analysis for uncertain backward chaining, two kinds of BC inferenceneed to be considered: TV inference, aimed at estimating the truth value of a statement;and SS inference, aimed at ﬁnding some entity that satisﬁes a predicate (lies in theSatisfyingSet of the predicate).To formalize backward chaining TV inference as stochastic dynamic programming,we start with the idea of a "backward inference dag" (BID) deﬁne as a binary dag whoseinternal nodes 𝑁 are labeled with ( 𝑁 𝑠 , 𝑁 𝑟 ) = (statement, inference rule) pairs. A leafnode may either be similar to an internal node, or may have a label of the form ( 𝑁 𝑠 , 𝑁 𝐷 ) where 𝐷 is a dataset on which the truth value of 𝑁 𝑠 may be directly estimated. A BIDencapsulates a set of inferences leading from a set of premises (the leaves of the tree)to a conclusion (the root of the tree). This is uncertain inference so the statementsinvolved are labeled with truth values. The semantics is that the statement at a node 𝑁 is derived from the statements at its children, using the rule at 𝑁 . Leaf nodes that aren’tlabeled with datasets have empty inference rules in their labels; internal nodes cannothave empty inference rules.To formalize backward chaining SS inference as stochastic dynamic programming,we use a slightly diﬀerent BID whose internal nodes 𝑁 are labeled with ( 𝑁 𝑃 , 𝑁 𝑟 , 𝑁 𝑔 ) tuples. Here 𝑁 𝑃 is a predicate, 𝑁 𝑟 is an inference rule, 𝑁 𝑔 is a goal qualiﬁer. As in theTV case, leaf nodes have empty inference rules in their labels; internal nodes cannot.The goal qualiﬁer may e.g. specify that the goal of the inference is to work towardﬁnding some entity that makes the predicate 𝑁 𝑃 true, or rather to work toward ﬁndingsome entity that makes 𝑁 𝑃 false. Leaf nodes are of the form ( 𝑁 𝑃 , 𝑁 𝐸 , 𝑁 𝑔 ) where 𝐸 is some speciﬁc entity that fulﬁlls the predicate 𝑁 𝑃 according to 𝑁 𝑔 (to an uncertaindegree encapsulated in the associated truth value).We will assume here use of PLN Simple Truth Values (STV) including strength(probability) values 𝑠 and conﬁdence values 𝑐 (representing "amount of evidence" scaledinto [0 , . To quantify the amount of reward achieved in expanding the BID and thusmoving from a prior truth value estimate ( 𝑠 , 𝑐 ) to a new truth value estimate ( 𝑠 , 𝑐 ) for the root statement, we introduce the CWI (Conﬁdence Weighted Information Gain)measure. Referring to the fundamental theory of STVs as outlined in [GIGH08], let 𝛾 ( 𝑠,𝑐 ) denote the second-order distribution corresponding to the pair ( 𝑠, 𝑐 ) . We then de-ﬁne the CWIG as the information gain in the distribution 𝛾 ( 𝑠 ,𝑐 ) conditioned on 𝛾 ( 𝑠 ,𝑐 ) plus background knowledge; we can also make similar calculations using graphtropy,yielding a Conﬁdence Weighted Graphtropy Gain (CWGG).For TV inference we then have1. A state 𝑠 𝑡 is a BID (which may be viewed from a COFO perspective as a set ofsub-BIDS paired with CWIG/CWGG values)2. An action is the formation of a new statement by(a) Sampling a leaf node 𝑁 from the current BID, along with premises 𝑥, 𝑦 from the knowledge base and a rule 𝐶 𝑖 from the set of available inferencerules, in a manner that is estimated likely to yield 𝑁 𝑠 = 𝐶 𝑖 ( 𝑥, 𝑦 ) that has ahigh truth value conﬁdence 14. As a complement or alternative to directly sampling, one can performprobabilistic inference to ﬁnd promising premise / inference-rule com-binations in context. This is "inference-based inference control."(b) Doing the selected inference to estimate the truth value of 𝑁 𝑠 and then up-dating the BID by adding children to 𝑁 corresponding to 𝑥 and 𝑦

3. The immediate reward is the CWIG (or CWGG) achieved regarding the truthvalue of the statement at the root node of the BID, via the process of estimatingthe truth value of 𝑁 𝑠 via 𝑁 𝑠 = 𝐶 𝑖 ( 𝑥, 𝑦 ) State transition : setting the new BID equal to the updated BIDFor SS inference, the story is similar:1. A state 𝑠 𝑡 is a BID2. An action is the formation of a new statement by(a) Sampling a leaf node 𝑁 from the current BID, along with premise predi-cates 𝑥, 𝑦 from the knowledge base and a rule 𝐶 𝑖 from the set of availableinference rules, in a manner that is estimated likely to yield 𝑁 𝑃 = 𝐶 𝑖 ( 𝑥, 𝑦 ) so there is a high 𝑠 ∗ 𝑐 to the implication that if some entity 𝐸 fulﬁllspredicates 𝑥 and 𝑦 it also fulﬁlls 𝑃 .i. As a complement or alternative to directly sampling, one can performprobabilistic inference to ﬁnd promising premise / inference-rule com-binations in context. This is "inference-based inference control."(b) Updating the BID by adding children to 𝑁 corresponding to 𝑥 and 𝑦

3. The immediate reward is the CWIG (or CWGG) gained regarding the satisfac-tion of the predicate at the root of the BID by some entity, via the process ofextending the BID with the inference of 𝑁 𝑃 from 𝑁 𝑃 = 𝐶 𝑖 ( 𝑥, 𝑦 ) State transition : setting the new BID equal to the updated BID

Evolutionary optimization, in a fairly general form, can be straightforwardly cast interms of COFO and greedy optimization. Here we consider an evolutionary approachto ﬁnding 𝑥 that maximize, or come close to maximizing, the "ﬁtness function" or ob-jective function 𝐹 ∶ 𝑋 → 𝑅 :1. A state 𝑠 𝑡 is a population of candidate solutions (genotypes)2. An action is the addition of a new member to the population via one of the oper-ations(a) Sampling in the form of either15. Sampling pairs ( 𝑥, 𝑦 ) of genotypes from the population and 𝐶 𝑖 fromthe set of available crossover operators, in a manner that is estimatedlikely to yield 𝑧 = 𝐶 𝑖 ( 𝑥, 𝑦 ) that is ﬁtii. Sampling an individual genotype 𝑥 from the population and 𝐶 𝑖 fromthe set of available mutation operators, in a manner that is estimatedlikely to yield 𝑧 = 𝐶 𝑖 ( 𝑥 ) that is ﬁt(b) As a complement or alternative to directly sampling, one can perform prob-abilistic inference to ﬁnd promising genotypes and crossover or mutationoperators or combinations thereof in context. Or one can bypass these op-erators and choose a new genotype 𝑧 using a generative model learned fromthe set of relatively highly ﬁt population members. The latter gives one an"Estimation of Distribution Algorithm" [PGL02].3. Enacting the selected approach to create a new genotype 𝑧 , evaluating this geno-type’s ﬁtness and creating an updated population by adding the new element tothe population4. The immediate reward is the amount of new information gained about how toﬁnd large values of 𝐹 , via the process of ﬁnding 𝑧 and evaluating its ﬁtness5. State transition : setting the new population equal to the prior population.Ordinarily evolutionary algorithms follow a single "greedy" path from an initialpopulation, growing the population till it comes to contain a good enough solution. Anexplicit dynamic programming approach would make sense only in a situation where,say, the ﬁtness function was constantly changing so that one had to keep re-evaluatingthe population members – in which case one would have a pressure militating in favor ofa smaller population, so that sculpting an optimal population rather than simply growingthe population endlessly would make sense. Of course the dynamic ﬁtness functionscenario has resonance with the properties of many real ecological niches – in realevolutionary ecology, the ﬁtness function does change as the environment changes, andpopulation size is restricted by resource limitations.It is interesting to think about variations on evolutionary algorithms that relate todynamic programming in roughly the same way standard RL algorithms do. I.e., onecan update, during an evolutionary optimization process, a probabilistic model of "whatis a good population" for exploring the given function’s graph with a view toward op-timization – and then grow a population, using mutation and crossover and EDA typemodeling, in a direction according to this probabilistic model. This is a way of implic-itly doing what the dynamic programming approach is doing but without the massivetime overhead.A concrete algorithm along these lines is the Info-Evo approach suggested in [Goe21a].In this approach one looks at the space of weighting distributions over the elements ofan evolving population, estimates the natural gradient in this space, and then at eachstage seeks to evolve the population along the direction of the natural gradient. Interms of the above ﬂow, this corresponds to Option 2b, where the probabilistic infer-ence used to ﬁnd promising genotypes and operators involves estimating the naturalgradient on population-weighting-distribution space and then preferentially choosing16enotypes and operators that are likely to move the population as a whole along thenatural gradient.OpenCog’s MOSES algorithm adds an extra layer to evolutionary optimization.One deﬁnes a "deme" as a set of program dags, and does evolution by mutation andcrossover and probabilistic modeling within each deme. Then there is a deme level oflearning, in which poorly performing demes are killed and highly performing demesare copied.1. A state 𝑠 𝑡 is a meta-population of demes2. An action is one of(a) Sampling in the form of eitheri. Selecting a deme from the meta-population that appears to have a highprobability of failure, and removing itii. Selecting a deme from the meta-population that appears to have a highprobability of success, and cloning itiii. Selecting a deme from a uniform distribution on the the meta-populationand advancing its growth by carrying out a sequence of actions withinits own evolutionary process(b) As a complement or alternative to directly sampling, one can perform prob-abilistic inference to ﬁnd promising or unpromising demes for appropriateaction. This gives one an EDA on the deme level, which has not yet beenexperimented with.3. Updating the deme population, or the modiﬁed deme, accordingly4. The immediate reward is the amount of new information gained about how toﬁnd large values of 𝐹 , via the process of curating or updating demes5. State transition : setting the new deme population equal to the prior deme pop-ulation.

Economic Attention Allocation (ECAN) is the process in OpenCog that spreads Short-Term Importance (STI) and Long-Term Importance (LTI) values among Atoms in theAtomspace. The simplest way to view the use of these values is: STI values guidewhich Atoms get processing power, LTI values guide which Atoms remain in RAM.STI corresponds conceptually to a DDS such as:1. A state 𝑠 𝑡 is a set of Atoms labeled with short-term importance (STI) values2. An action is the process of(a) Sampling an Atoms 𝑥 from the knowledge base and another Atom 𝑦 con-nected to 𝑥 via some Atomspace link17b) Subtracting 𝑄 from 𝑥 ’s STI and adding 𝑄 to 𝑦 ’s STI3. The immediate reward is the amount of utility obtained by the overall system,during the immediately future interval, that is causally attributable to 𝑥 and 𝑦 (viawhatever causal attribution mechanisms are operative).4. State transition : setting the new state equal to the updated state just created,with the new STI values for 𝑥 and 𝑦 The story for LTI is basically the same, the diﬀerence between the two cases beingwrapped up in the causal attribution estimate used to calculate immediate reward. In thecase of LTI we assume the background activity of a "forgetting" process that removesAtoms with overly low LTI from the Atomspace.To make this work in terms of expected reward maximization, we need the mea-sure of causality to be linearly decomposable, so that the amount of causal inﬂuenceattributed to a causal chain is equal to the sum of the causal inﬂuences attributed tothe elements in the chain. This generally works with measures of causality associatedwith Judea Pearl style causal networks [Pea09] and also with treatment of causationas conditional implication plus temporal precedence as discussed in [GIGH08]. If onetreats causation as conditional implication plus temporal predecence plus existence ofa plausible causal mechanism (as also discussed in [GIGH08]), then one needs to takecare that one’s methods of assessing existence of plausible causal mechanism adhere todecomposability; e.g. if one uses uncertain logic to assess plausiblity of causal mech-anisms then this can be made to work along the lines of the treatment of inferenceprocesses described above.In this case, actually doing attention allocation using dynamic programing wouldbe insanely expensive, but would allow one to ﬁgure out an optimal path for attentionallocation in a given Atomspace. What is actually done in OpenCog systems is to im-plement a fairly simple activation-spreading-based method for diﬀusing STI and LTIthrough the Atomspace. This activation-spreading method is intended as a crude butcheap approximation of the attention allocation path that dynamic programming wouldgive.

Agglomerative clustering follows a similar logic to forward-chaining inference as con-sidered above – cluster quality plays the role of interestingness, and the application ofa merge operator takes the place of application of an inference rule.1. A state 𝑠 𝑡 is a "clustering", meaning set of Atoms grouped into clusters2. An action is the formation of a new clustering(a) Sampling clusters 𝑥, 𝑦 from the current clustering and 𝐶 𝑖 from the set ofavailable cluster-merge rules, in a manner that is estimated likely to yield 𝑧 = 𝐶 𝑖 ( 𝑥, 𝑦 ) that increases clustering quality18. As a complement or alternative to directly sampling, one can performprobabilistic inference to ﬁnd promising cluster combinations in con-text. This is "inference-based clustering control."(b) Performing the selected merger to create a new cluster 𝑧 , evaluating theclustering quality after replacing 𝑥, 𝑦 with 𝑧 and creating an updated overallclustering3. The immediate reward is the increase in clustering quality obtained via the ag-glomeration of 𝑥 and 𝑦 into 𝑧 State transition : setting the new clustering equal to the updated clustering

Pattern mining follows a similar template to evolutionary learning, with crossover/mutationoperations replaced by operations for expanding known patterns by adding new termsto them.1. A state 𝑠 𝑡 is a population of patterns mined from an Atomspace2. An action is the addition of a new pattern to the population via(a) selecting a pattern 𝑥 from the population, with a probability proportion toits pattern-quality, then selecting a pattern-expansion operator 𝐶 𝑖 with aprobability proportional to the estimated odds that applying it to 𝑥 will yielda high-quality pattern(b) applying 𝐶 𝑖 to 𝑥 to obtain a new candidate pattern 𝑧 = 𝐶 𝑖 ( 𝑥 ) (c) Evaluating the new pattern 𝑧 ’s pattern-quality3. The immediate reward is the amount the total pattern-quality of the set of pat-terns in the population has increased, as a result of adding 𝑧 to the population4. State transition : adding 𝑧 to the pattern populationAs in the evolutionary learning case, here the default approach is to proceed in a greedyway, but approximations to a richer stochastic dynamic programming strategy couldalso be interesting. There is an interesting correspondence between the subgoal hierarchies involved in thedynamic-programming-type DDSs described above, and the subpattern hierarchies de-scribed in [Goe20b]. In the latter paper it is shown that, if one has a collection of mutu-ally associative combinatory operations, then one can use these to describe a subpatternhierarchy, i.e. a dag in which 𝑥 is a child of 𝑦 if there is some 𝑧 and some combinatoryoperator 𝐶 so that 19 𝐶 ( 𝑦, 𝑧 ) = 𝑥 • 𝜎 ( 𝑦 ) + 𝜎 ( 𝑧 ) + 𝜎 ∗ ( 𝐶, 𝑦, 𝑧 ) < 𝜎 ( 𝑥 ) where ( 𝜎, 𝜎 ∗ ) is a simplicity measure.Formally speaking, then, if one has a DDS based on a set of combinatory opera-tors 𝐶 𝑖 that are mutually associative, there is a subpattern hierarchy connected with thatDDS. There are also alternate routes to getting subpattern hierarchies out of combina-tory operator collections that don’t require mutual associativity, and these may relateto some sorts of DDSs also, but the mutually-associative case is the most relevant onehere.Just because a dynamic-programming-based DDS formally corresponds to a subpat-tern hierarchy, doesn’t mean this correspondence necessarily has to be meaningful interms of the operation of the DDS. However, it happens that for cognitively interestingDDS’s there often is a meaningful correspondence of this nature. This sort of corre-spondence might be described as a cognitive synergy between perception and action –or put more precisely, between pattern-recognition and action-selection. In the case of DDSs embodying uncertain inference for example, the relevant simplicitymeasure is the count , i.e. the number of items of evidence on which a given judgmentor hypothesis relies. Suppose 𝐶 is an inference rule used to combine premises 𝑥 and 𝑦 to obtain some conclusion. If 𝑥 and 𝑦 are based on disjoint bodies of evidence, then thecount of 𝐶 ( 𝑦, 𝑧 ) can be eﬀectively deﬁned as the sum of the count of 𝑦 and the count of 𝑧 . This is an upper bound for the count of 𝐶 ( 𝑦, 𝑧 ) in the general case.Are the multiple inference rules 𝐶 𝑖 involved in a logic system like PLN mutuallyassociative? This turns out to depend sensitively on how the rules are formulated. Inmost logics, ordinary one-way (conditional) implication is not associative, but two-way(biconditional) implication is associative. So if one formulates one’s inference rulesas equivalences rather than one-way implications, then – at least within the domain ofcrisp inference rules, setting aside quantitative uncertainty management – one obtainsa formulation of logic in terms of mutually associative rules. Another way to say thisis: A reversible formulation of logic will tend to involve mutually associative inferencerules.Reversible forms of Boolean logic gates are well known [PN05] and play interestingroles in quantum computing [Per85]. Reversible forms of predicate logic have also beenﬂeshed out in detail, e.g. Sparks and Sabry’s Superstructural Reversible Logic [SS14],which – just as linear logic adds a second version of’ "and"– adds a second versionof ?or?. The second "or operator is used to ensure the conservation of choice (or –equivalently – the conservation of temporal resources) just as linear logic ensures theconservation of spatial resources. There would appear to be an isomorphic mappingbetween this logic and Constructible Duality logic with its new exponentials that forma mirror image of the standard linear logic exponentials [PPA98].20eversible logic maps via a Curry-Howard type correspondence into reversiblecomputing, via the observation that every reversible-logic proof corresponds to a com-putation witnessing a type isomorphism. One thus concludes that, if one wants an in-ference process that corresponds naturally to a subpattern hierarchy, one wants to usereversible inference rules, which then turns one’s inference process into a process ofexecuting a reversible computer program.The elegance of the achievement of mutually associative inference operations viause of co-implication somewhat recedes when one incorporates uncertainty into thepicture. 𝐴 𝜈 ←←←→ 𝐵 𝜈 ←←←→ 𝐶 𝜈 ←←←→ 𝐷 , where the 𝜈 𝑖 are probability distributions charac-terizing the co-implications, may indeed give diﬀerent results depending on how theco-implications are parenthetically grouped, related to the probabilistic dependenciesbetween the distributions. If all the distributions are mutually independent then asso-ciativity will hold, but obviously this won’t always be the case. One can try to achieveas much independence as possible, and one can also try to arrange dependencies in anorder copacetic with the processing one actually needs to do, a topic to be revisitedbelow. The coordination of subgoal hierarchy with subpattern hierarchy can also be seen invarious other cognitive processes, including some considered above.In the context of agglomerative clustering, one may deﬁne the simplicity of a cluster-ing in information-theoretic terms, e.g. as the logical entropy of the partition constitutedby the clustering. Looking at cluster merge operations that act to map of a clusteringinto a less ﬁne-grained clustering via joining certain clusters together, it’s obvious thatthe merge operations are associative (they are just unioning disjoint subsets, and unionis associative).For pattern mining, relevant simplicity measures would be the frequency of a pat-tern or the information-theoretic surprisingness of a pattern. Pattern combination isassociative. For instance, if patterns are represented as metagraphs (which is the casein OpenCog’s Pattern Miner) then we can combine two patterns with metagraph Heyt-ing algebra conjunction and disjunction operations, in which case associativity followsvia the rules of Heyting algebra.The process of pattern growth is then representable as a matter of taking patternsthat are both small in size (as metagraphs) and simple (by the simplicity measure),and combining them conjunctively and disjunctively to form new patterns which arethen pruned based on their simplicity – and then iteratively repeating this process. Onecan then spell out algebraic relationships and inequalities regarding the frequency orsurprisingness of the conjunction or disjunction of two patterns, providing a connectionbetween the subgoal hierarchy used in pattern growth during the pattern mining process,and the subpattern hierarchy implied by the simplicity measure.21

Cognitive Algorithms as Galois Connections Repre-senting Search and Optimization on Typed Metagraphs

One interesting aspect of the formulation of multiple cognitive algorithms as DDSsembodying COFO processes is the guidance it gives regarding practical implementationof these cognitive algorithms. Following the ideas on Galois connection based programderivation outlined in [MO12], we will trace here a path from DDS formulation ofcognitive algorithms to representation of these cognitive algorithms in terms of foldingand unfolding operations.First, as a teaser, we will leverage the Greedy Theorem from [MO12] to show thatgreedy steepest-ascent (or steepest-descent) optimization is represented naturally as afold operation. Then, proceeding to dynamic programming type algorithms, we willshow that if one has a DDS embodying COFO based on mutually associative combi-natory operations, it can be implemented as a hylomorphism (fold followed by unfold)on metagraphs or, if one wants to take advantage of memory caching to avoid repetitivecomputation, a metagraph chronomorphism (history-bearing fold followed by history-bearing unfold). This suggests that one strategy for eﬃcient implementation of cogni-tive algorithms is to create a language in which (pausable and resumable) metagraphfolds and unfolds of various sorts are as eﬃcient as possible in both concurrent and dis-tributed processing settings – an important conclusion for the design of next-generationAGI systems like OpenCog Hyperon [Goe20c].

Suppose we are concerned with maximizing a function 𝑓 ∶ 𝑋 → 𝑅 via a “patternsearch" approach. That is, we assume an algorithm that repeatedly iterates a patternsearch operation such as: Generates a set of candidate next-steps from its focus point 𝑎 , evaluates the candidates, and then using the results of this evaluation, chooses anew focus point 𝑎 ∗ . Steepest ascent obviously has this format, but so do a variety ofderivative-free optimization methods as reviewed e.g. in [Tor95].Evolutionary optimization may be put in this framework if one shifts attention toa population-level function 𝑓 𝑃 ∶ 𝑋 𝑁 → 𝑅 where 𝑋 𝑁 is a population of 𝑁 elementsof 𝑋 , and deﬁnes 𝑓 𝑃 ( 𝑥 ) for 𝑥 ∈ 𝑋 𝑁 as e.g. the average of 𝑓 ( 𝑥 ) across 𝑥 ∈ 𝑋 𝑁 (so the average population ﬁtness, in genetic algorithm terms). The focus point 𝑎 isa population, which evolves into a new population 𝑎 ∗ via crossover or mutation – aprocess that is then ongoingly iterated as outlined above.The basic ideas to be presented here work for most any topological space 𝑋 but weare most interested in the case where 𝑋 is a metagraph. In this case the pattern searchiteration can be understood as a walk across the metagraph, moving from some initialposition in the graph to another position, then another one, etc.We can analyze this sort of optimization algorithm via the Greedy Theorem from[MO12], Theorem 1. (Theorem 1 from [MO12]). ⦇ 𝑆 ↾ 𝑅 ⦈ ⊆ ⦇ 𝑆 ⦈ ↾ 𝑅 if R is transitive and Ssatisﬁes the "monotonicity condition" 𝑅 ◦ ← 𝑆𝐹 𝑅 ◦ 𝑅 𝑆 ←←←←←←←←← 𝐹 𝑅 indicates 𝑆 ⋅ 𝐹 𝑅 ⊆ 𝑅 ⋅ 𝑆 • ⦇ 𝑆 ⦈ means the operation of folding 𝑆 • ⟨ 𝜇𝑋 ∶∶ 𝑓 𝑋 ⟩ denotes the least ﬁxed point of 𝑓 • 𝑇 ◦ means the converse of 𝑇 , i.e. ( 𝑏, 𝑎 ) ∈ 𝑅 ◦ ≡ ( 𝑎, 𝑐 ) ∈ 𝑅 • 𝑆 ↾ 𝑅 means " 𝑆 shrunk by 𝑅 ", i.e. 𝑆 ∩ 𝑅 ∕ 𝑆 ◦ Here 𝑆 represents the local candidate-generation operation used in the pattern-search optimization algorithm, and 𝑅 represents the operatio of evaluating a candidatepoint in 𝑋 according to the objective function being optimized.The rhs of the equation in the Theorem describes a process in which the candidate-generation process is folded across all of 𝑋 and then all the candidates generated areevaluated. I.e. this is basically an exhaustive search. In the case where 𝑋 is a meta-graph, we have in [Goe20a] given a detailed formulation of what it means to fold acrossa metagraph, based on a model of metagraphs as composed of sub-metagraphs joinedtogether via dangling-target-joining operations. The exhaustive search is then a matterof folding candidate-generation across all the sub-metagraphs of the metagraph. Thisis a valid fold operation so long as the set of candidates generated from the join of (say)three sub-metagraphs is independent of the order in which the joining occurs.The lhs of the equation describes the folding across 𝑋 of a process in which: Thecandidate-generation process ( 𝑆 ) is done locally at one location, and evaluation ( 𝑅 ) isthen carried out, resulting in a single candidate being selected.The monotonicity condition, in this case, basically enforces that the objective func-tion is convex. In this case we lose nothing by computing local optima at each point andthen folding this process across the space 𝑋 (e.g. the metagraph). If the speciﬁcs of thepattern applies depend on the prior history of the optimization process then we have ahistomorphism rather than just an ordinary catamorphism (fold), e.g. this is the case ifthere are optimization parameters that get adapted based on optimization history.If the objective function is not convex, then the theorem does not hold, but thegreedy pattern-search optimization may still be valuable in a heuristic sense. This is thecase, for instance, in nearly all real-world applications of evolutionary programming,steepest ascent or classical derivative-free optimization methods. Next we consider how to represent dynamic programming based execution of DDSsusing folds and unfolds. Here our approach is to leverage Theorem 2 in [MO12] whichis stated as 23 heorem 2. (Theorem 2 from [MO12]). Assume 𝑆 is monotonic with respect to 𝑅 , thatis, 𝑅 𝑆 ←←←←←←←←← 𝐹 𝑅 holds, and 𝑑𝑜𝑚 ( 𝑇 ) ⊆ 𝑑𝑜𝑚 ( 𝑆 · 𝐹 𝑀 ) . Then 𝑀 = ( ⦇ 𝑆 ⦈ · ⦇ 𝑇 ⦈ ◦ ) ↾ 𝑅 ⇒ ⟨ 𝜇𝑋 ∶∶ ( 𝑆 · 𝐹 𝑋 · 𝑇 ◦ ) ↾ 𝑅 ⟩ ⊆ 𝑀 Conceptually, 𝑇 ◦ transforms input into subproblems, e.g.• for backward chaining inference, chooses ( 𝑥, 𝑦, 𝐶 ) so that 𝑧 = 𝐶 ( 𝑥, 𝑦 ) has highquality (e.g. CWIG)• for forward chaining, chooses x, y, C so that z = C(x,y) has high interestingness(e.g. CWIG) 𝐹 𝑋 ﬁgures out recursively which combinations give maximum immediate reward ac-cording to the relevant measure. These optimal solutions are combined and then thebest one is picked by ↾ 𝑅 , which is the evaluation on the objective function. Cachingresults to avoid overlap may be important here in practice (and is what will give ushistomorphisms and futumorphisms instead of simple folds and unfolds).The ﬁx-point based recursion/iteration speciﬁed by the theorem can of course beapproximatively rather than precisely solved – and doing this approximation via statis-tical sampling yields stochastic dynamic programming. Roughly speaking the approachsymbolized by 𝑀 = ( ⦇ 𝑆 ⦈ · ⦇ 𝑇 ⦈ ◦ ) ↾ 𝑅 begins by applying all the combinatory oper-ations to achieve a large body of combinations-of-combinations-of-combinations- … ,and then shrinks this via the process of optimality evaluation. On the other hand, theleast-ﬁxed-point version on the rhs of the Theorem iterates through the combinationprocess step by step (executing the fold).The key point required to make the conditions of the theorem work for COFO pro-cesses is that the repeated combinatory processes are foldable. This is achieved if theyare mutually associative, as in that case series of repeated combination can be carriedout step by step in any order, including the order implicit in a certain fold operation.Given that the series of combinatory operations on the lhs decomposes into a fold (whichis the case if the operations are mutually associative), then we can do fold fusion and wehave made the "easy part" ⦇ 𝑆 ⦈ · ⦇ 𝑇 ⦈ ◦ of the speciﬁcation of the dynamic programmingprocess a fold. We thus obtain, as a corollary of Theorem 5.2, the result that Theorem 3.

A COFO DDS whose combinatory operations 𝐶 𝑖 are mutually associativecan be implemented as a chronomorphism. In the PLN inference context, for example, the approach to PLN inference usingrelaxation rather than chaining outlined in [GP08] is one way of ﬁnding the ﬁxed pointof the recursion. What the theorem suggests is that folding PLN inferences across theknowledge metagraph is another way, basically boiling down to forward and backwardchaining as outlined above – but as we have observed above, this can only work rea-sonably cleanly for crisp inference if one uses PLN rules formulated as co-implicationsrather than one-way implications.When dealing with the uncertainty-management aspects of PLN rules, one is nolonger guaranteed associativity merely by adopting reversibility of individual infer-ence steps, and one is left with a familiar sort of heuristic reasoning: One tries to24rrange one’s inferences as series of co-implications whose associated distributionshave favorable independence relationships. For instance if one is trying to fold for-ward through a series of probabilistically labeled co-implications, one will do well ifeach co-implication is independent of its ancestors conditional on its parents (as in aBayes net); this allows one to place the parentheses in the same place the fold naturallydoes. The ability of chronomorphisms to fulﬁll the speciﬁcations implicit in the rel-evant Galois connectiosn becomes merely an approximate heuristic guide, though wesuspect still a very valuable one.

An interesting variation on these themes is provided by the concept of "implicit pro-gramming" from Patterson’s thesis on Constructible Duality logic [PPA98]. The basicidea is that sometimes when the least ﬁxed point of an equation (deﬁned in CD logic)doesn’t exist, there is a notion of an "optimal ﬁxed point" that can be used instead.Roughly, if there is only one assignation of values to variables that could possibly serveas a ﬁxed point, then this strategy assumes that it actually is the ﬁxed point – whetheror not it can consistently be described as minimal.The implication of this concept for our present considerations is that: Given a re-versible paraconsistent proof, we can look at both the least and optimal ﬁxed point ofthe Dynamic Programming equation (the rhs in Theorem 5.2 above) corresponding tothe proof as being equivalent to the proof. The former gives the direct demonstrationand the latter gives a derivation based on negated information.The reason this is interesting in general is that sometimes least ﬁxed point is unde-ﬁned, but the optimal ﬁxed point might still be deﬁned. For instance, sometimes whenthe consistency condition of the DP theorem fails, there could still be a solution via im-plicit programing. This might occur when there is some circular reasoning involved forexample – a situation worthy of further analysis to be deferred to a later paper. Theremay be a connection with reasoning about uncertain self-referential statements suchas we have previously modeled using inﬁnite-order probabilities [Goe10] – statementslike "This sentence is false with probability .7", reformulated in terms of CD logic, mayserve as toy examples where implicit programming yields intuitively sensible answersvia optimal rather than minimal ﬁxed points.

As a brief aside to our core considerations here, it is interesting to explore the poten-tial extensions of the above ideas from the classical to the quantum computing do-main. The foundations of such extension may be found in the observation that thebasic dynamic programming functional equation is mappable isomorphically into theSchrodinger equation [LL19] [CPV17] [OHS94].Given this mathematics, the question of how to frame the optimal-control inter-pretation of the Schrodinger equation in a cognitive science and AGI context becomes25nteresting. One begins with the result that, under appropriate assumptions, minimizingaction in quantum mechanics is isomorphic to minimizing expected reward in dynamicprogramming. Conceptually one then concludes that• At the classical-stochastic-dynamics level, one views the multiverse as being op-erated by some entity that is choosing histories (multiverse-evolution-paths) withprobability proportional to their expected action (where expected action servesthe role of expected reward).• Then at the quantum level, one views the multiverse as being operated by someentity that is assigning histories amplitudes proportional to their expected action(where expected action serves the role of expected reward). The amplitudes aresummed and the stationary-action histories include those that in the "classicallimit" give you the maximal-expected-action paths.From a quantum computing view, if one has a classical algorithm that operatesaccording to dynamic programming with a particular utility function 𝑈 , one would mapit into a quantum algorithm designed so that expected action maximization correspondsto expected 𝑈 maximization.For instance, suppose we start on the classical side with the time-symmetric Bell-man’s equation brieﬂy discussed above. Then on the quantum level we seem to ob-tain something similar to the representation used in the Consistent Histories interpreta-tion [Gri84]. That is, the forward and backward operators composed in the consistent-histories class operator appear to correspond to a solution of the ﬁx-point equation cor-responding to time-symmetric dynamic programming. Many other connections arisehere, e.g. in a quantum computing context• where graphtropy is invoked above in deﬁning reward functions for inferenceprocesses, one can potentially insert quangraphtropy [Goe19] instead• where standard PLN inference formulas manipulate real probabilities, one cansubstitute similarly deﬁned formulas manipulating complex-valued amplitudesOne sees here the seeds of a potential approach to deriving quantum cognition algo-rithms via Galois connections, but further validation of this potential will be left forlater research. In [Goe20c] it is asked what sort of programming language best suits implementationof algorithms and structures intended for use in an integrative AGI system. Among theconclusions are that one probably wants a purely functional language with higher-orderdependent types and gradual typing; and that the bulk of processing time is likely tobe taken up with pattern-matching queries against a large dynamic knowledge-base,which means e.g. that computational expense regarding type-system mechanics is notan extremely sensitive point. 26he two prequels to this paper [Goe21c] [Goe20a] add some particulars to theseconclusions, e.g. that an AGI language should include• Eﬃcient implementation of folding and unfolding across metagraphs, including – Memory-carrying folds and unfolds, i.e. histomorphisms, futumorphisms,chronomorphisms – Continuation-passing-style implementations of all these folds and unfolds,to elegantly enable pause and restart within a fold• First and second order probabilistic types• Paraconsistent types, implemented via an individual node or edge having separatetype expressions corresponding to positive vs. negative evidenceThe explorations pursued here add a few more points to the list, e.g.:• Simple tools for expressing DDSs, including those to be pursued via greedy, dy-namic programming, or stochastic/approximate dynamic programming• Simple, eﬃcient tools for sampling from distributions deﬁned across typed meta-graphs (as the COFO DDSs outlined here mainly rely on sampling of this naturein their decision step)• Simple tools for estimating measurements of probability distributions deﬁnedover typed metagraphs, e.g. entropy, logical entropy, graphtropy• Syntactic and semantic tools for handling reversible operations, e.g. reversiblelogical inference rules and reversible program execution steps – This entails some work speciﬁc to particular cognitive algorithms, e.g. ﬁg-uring out the most pragmatically convenient way to represent common PLNinference rules as reversible rules• Tools making it easy to generate and maintain subpattern hierarchies, and to co-ordinate subpattern hierarchies and the hierarchies implicit in the optimization ofDDSs using dynamic programming type approaches• Tools making it easy to express algebraic properties of sets of combinatory oper-ations, e.g. mutual associativity• Ability to concisely express derivations of speciﬁc algorithms from case-speciﬁcinstantiations of theorems such as the Greedy Theorem and Dynamic Program-ming Theorem from [MO12] presented and leveraged above. – This provides a relatively straightforward path to enabling the system toderive new algorithms optimized for its own utilization in particular cases.Putting these various pieces together, in the context of the OpenCog Hyperon (next-generation OpenCog) project, we are getting quite close to an abstract sort of require-ments list for the deep guts of an Atomese 2 interpreter.27

Conclusion and Future Directions

We have presented here a common formal framework for expressing a variety of AGI-oriented algorithms coming from diverse theoretical paradigms and practical traditions.This framework is neither so specialized as to allow only highly restrictive versions ofthe algorithms considered, nor so general as to be vacuous (e.g. the AGI-oriented al-gorithms we ﬁnd of interest can be represented especially concisely and simply in thisframework, much more so than would be the case for arbitrary computational algo-rithms).Our explorations here have led to some interesting and nontrivial formal conclu-sions, e.g. regarding the power of the requirement of mutual associativity among com-binatory operations in a combinatory-operation-based function-optimization setting.Mutual associativity, it turns out, both leads to correspondence between subgoal hi-erarchies in goal-oriented processing and subpattern hierarchies in a knowledge story,and enables execution of combinatory-operation-based function-optimization via futu-morphisms and histomorphisms.On the other hand, it is also clear there is a heuristic aspect to the application ofthese theoretical results to practical AGI systems – because in the case of real AGIsystems, the knowledge base (e.g. the OpenCog Atomspace) is constantly changingas a result of cognitive actions, so that one usually can’t precisely calculate a fold ofa cognitive operator across a knowledge base without the knowledge base changingmidway before the fold is done. This is not necessarily problematic in practice, butit does mean that most of the theoretical conclusions drawn here are only preciselyapplicable in the limiting case where real-time knowledge base changes in the courseof a fold/unfold operation (or other sort of cognitive operation) are minimal.Formal treatment of the general case of greedy or approximate stochastic DP styleexecution of a DDS in the case where many DDS action steps entail signiﬁcant knowl-edge base revisions remains a step for the future. Even without that, however, it appearsthat the formulation given here does provide some clarity regarding the parallels andinterrelationships between diﬀerent cognitive algorithms, and the sort of infrastructureneeded for eﬀectively implementing these algorithms.

References [BDM96] Richard Bird and Oege De Moor. The algebra of programming. In

NATOASI DPD , pages 167–203, 1996.[Ben09] Ben Goertzel. Reinforcement learning: Some limitations of the paradigm,2009.[CPV17] Mauricio Contreras, Rely Pellicer, and Marcelo Villena. Dynamic optimiza-tion and its relation to classical and quantum constrained systems.

Physica A:Statistical Mechanics and its Applications , 479:12–25, 2017.[GIGH08] B. Goertzel, M. Ikle, I. Goertzel, and A. Heljakka.

Probabilistic LogicNetworks . Springer, 2008. 28Goe94] Ben Goertzel.

Chaotic Logic . Plenum, 1994.[Goe06] Ben Goertzel. A System-theoretic Analysis of Focused Cognition,and its Implications for the Emergence of Self and Attention. 2006. .[Goe10] Ben Goertzel. Inﬁnite-order probabilities and their application to modelingself-referential semantics. In

Proceedings of Conference on Advanced Intel-ligence 2010, Beijing , 2010.[Goe16] Ben Goertzel. Probabilistic growth and mining of combinations: a unifyingmeta-algorithm for practical general intelligence. In

International Conferenceon Artiﬁcial General Intelligence , pages 344–353. Springer, 2016.[Goe19] Ben Goertzel. Distinction graphs and graphtropy: A formalized phenomeno-logical layer underlying classical and quantum entropy, observational seman-tics and cognitive computation.

CoRR , abs/1902.00741, 2019.[Goe20a] Ben Goertzel. Folding and unfolding on metagraphs. 2020. https://arxiv.org/abs/2012.01759 .[Goe20b] Ben Goertzel. Grounding occam’s razor in a formal theory of simplicity. arXiv preprint arXiv:2004.05269 , 2020.[Goe20c] Ben Goertzel. What kind of programming language best suits integrativeagi? In

International Conference on Artiﬁcial General Intelligence , pages142–152. Springer, 2020.[Goe21a] Ben Goertzel. Info-evo: Using information geometry to guide evolutionaryprogram learning. arxiv. org/ TBD , 2021.[Goe21b] Ben Goertzel. Interesting pattern mining. 2021. https://wiki.opencog.org/w/Interesting_Pattern_Mining .[Goe21c] Ben Goertzel. Paraconsistent foundations for probabilistic reasoning, pro-gramming and concept formation, 2021.[GP08] Ben Goertzel and Cassio Pennachin. How might probabilistic reasoningemerge from the brain? In

Proceedings of the First AGI Conference , vol-ume 171, page 149. IOS Press, 2008.[GPG13a] Ben Goertzel, Cassio Pennachin, and Nil Geisweiller.

Engineering GeneralIntelligence, Part 1: A Path to Advanced AGI via Embodied Learning andCognitive Synergy . Springer: Atlantis Thinking Machines, 2013.[GPG13b] Ben Goertzel, Cassio Pennachin, and Nil Geisweiller.

Engineering GeneralIntelligence, Part 2: The CogPrime Architecture for Integrative, EmbodiedAGI . Springer: Atlantis Thinking Machines, 2013.[Gri84] Robert B Griﬃths. Consistent histories and the interpretation of quantummechanics.

Journal of Statistical Physics , 36(1):219–272, 1984.29LL19] Jussi Lindgren and Jukka Liukkonen. Quantum mechanics can be understoodthrough stochastic optimization on spacetimes.

Scientiﬁc reports , 9(1):1–8,2019.[Loo06] Moshe Looks.

Competent Program Evolution . PhD Thesis, Computer Sci-ence Department, Washington University, 2006.[MO12] Shin-Cheng Mu and José Nuno Oliveira. Programming from galois connec-tions.

The Journal of Logic and Algebraic Programming , 81(6):680–704,2012.[OHS94] AKIRA OHSUMI. Nonlinear ﬁltering, bellman equations, and schrodingerequations. ?????????? , 885:1–7, 1994.[Pea09] Judea Pearl.

Causality . Cambridge university press, 2009.[Per85] Asher Peres. Reversible logic and quantum computers.

Physical review A ,32(6):3266, 1985.[PGL02] Martin Pelikan, David Goldberg, and Fernando Lobo. A survey of optimiza-tion by building and using probabilistic models.

Computational Optimizationand Applications , 21:5–20, 2002.[PN05] W David Pan and Mahesh Nalasani. Reversible logic.

IEEE Potentials ,24(1):38–41, 2005.[Pow09] Warren B Powell. What you should know about approximate dynamic pro-gramming.

Naval Research Logistics (NRL) , 56(3):239–249, 2009.[PPA98] Anna L Patterson, Vaughan Pratt, and Gul Agha.

Implicit programming andthe logic of constructible duality . 1998.[Raa16] David Raab. Cps fold. 2016. https://sidburn.github.io/blog/2016/05/07/cps-fold .[SS14] Zachary Sparks and Amr Sabry. Superstructural reversible logic. In . Citeseer, 2014.[Tor95] Virginia Torczon. Pattern search methods for nonlinear optimization. In