EEntropy, Computing and Rationality
Luis A. Pineda Universidad Nacional Aut´onoma de M´exico
Abstract
Making decisions freely presupposes that there is some indeterminacy inthe environment and in the decision making engine. The former is reflectedon the behavioral changes due to communicating: few changes indicate rigidenvironments; productive changes manifest a moderate indeterminacy, but alarge communicating effort with few productive changes characterize a chaoticenvironment. Hence, communicating, effective decision making and productivebehavioral changes are related. The entropy measures the indeterminacy of theenvironment, and there is an entropy range in which communicating supportseffective decision making. This conjecture is referred to here as the
The PotentialProductivity of Decisions .The computing engine that is causal to decision making should also havesome indeterminacy. However, computations performed by standard TuringMachines are predetermined. To overcome this limitation an entropic mode ofcomputing that is called here
Relational-Indeterminate is presented. Its imple-mentation in a table format has been used to model an associative memory.The present theory and experiment suggest the
Entropy Trade-off : There is anentropy range in which computing is effective but if the entropy is too low com-putations are too rigid and if it is too high computations are unfeasible. Theentropy trade-off of computing engines corresponds to the potential productivityof decisions of the environment.The theory is referred to an Interaction-Oriented Cognitive Architecture. [email protected] Preprint submitted to Elsevier September 23, 2020 a r X i v : . [ c s . A I] S e p emory, perception, action and thought involve a level of indeterminacy anddecision making may be free in such degree. The overall theory supports anecological view of rationality. The entropy of the brain has been measuredin neuroscience studies and the present theory supports that the brain is anentropic machine. The paper is concluded with a number of predictions thatmay be tested empirically. Keywords:
Communication and computing Entropy, Potential Productivityof Decisions, Entropy Trade-Off, Relational-Indeterminate Computing, TableComputing, Minimal Algorithms, Decision Making Trade-Off, Interpretation,Action, Associative Memory, Principle of Rationality, Cognitive Architecture
1. Potential Productivity of Decisions
People and machines make decisions all the time but the question is whetherthese are made freely or are rather predetermined. This is the old oppositionbetween determinism and free will. The latter presupposes that there must besome indeterminacy either in the environment in which the decision is made orin the mind of the decision maker, or in both.The indeterminacy corresponds to the information content of the environ-ment or the control volume . Information is measured in communication theorywith Shannon’s entropy [1]: s = − (cid:80) ni =1 p ( x i ) × log ( p ( x i )). This is a formulaof expected value. The term log ( p ( x i )) is the length in bits of a message withprobability p ( x i ); therefore, the entropy is the average length of a message inthe communication environment. The probability of a message corresponds tothe probability of the reported event. Events that are certain to occur haveprobability of one, and need not be communicated, or are communicated with“messages of length zero”. The length of the message increases with its informa-tion content and the larger the entropy the greater the amount of informationin the control volume. The information entropy reflects the uncertainty of theevents that need to be communicated and, consequently, the indeterminacy ofthe environment. 2ommunication is achieved through signals carrying messages but while theformer are physical phenomena the interpretation of the latter belongs to theplane of content. The entropy times the total number of messages in the controlvolume is proportional to the energy invested in communicating in such envi-ronment over a period of time; hence, the entropy reflects the overall effort thatthe community invest in communicating. These considerations are explicit orfollow directly from Shannon’s original presentation. Here, this discussion is ex-tended to the relation between communication, decision making and behavioralchange.Communication allows individuals of a society –the family, the school, theoffice, the working institution, the church, the linguistic community, etc.– toprofit from the knowledge and experience of others. Communicative actions un-derlie the intention to change the knowledge, beliefs, desires, feelings, emotions,intentions and, most fundamentally, the course of action that the interlocutorswould undergo without the information provided. Communicating presupposesthat it is possible to change behavior. If the interlocutors are to deviate fromtheir normal course of action on the basis of novel information they must havethe choice; hence communication is a precondition for decision making in socialenvironments.Effective decision making reflects the indeterminacy of the physical but alsothe social environment. To enact a decision it must be physically and behav-iorally possible: if the world or the society are too rigid, behavior cannot bechanged, decisions cannot be enacted, communicating is not reinforced and theentropy is low. Conversely, productive behavioral changes due to communicationreflect effective decision making and that the environment is less determined,with the corresponding larger entropy. However, large communication effort andhigh entropy but with few productive behavioral changes reflect that decisionmaking is not effective and that the environment is too unpredictable or chaotic.These considerations suggest that the entropy is not only a measure of the in-determinacy of the physical environment but also of the plane of interpretationor content, and that there is a range of entropy in which decision making is3ffective. This conjecture is called here The Potential Productivity of Decisions .The relation between the entropy of communication environments and thepotential productivity of decisions, whose value is designated as τ , is illustratedthrough the following scenarios, each defining a particular control volume: • Factory production line: workers in a factory line do not communicateand if they do, communication does not affect the dynamics of the envi-ronment. The entropy and τ are very low or zero. There is no decisionmaking. • Normal daily life: people communicate commonly to change beliefs andbehavior of others; speech acts in normal conversation aim to achieve suchan effect. The entropy is low or moderate and τ has an acceptable value.There is space for decision making. • Creative environments: people communicate effectively to make decisionsthat if enacted have important consequences. The entropy and τ are opti-mal. There is effective decision making with potential significant impact. • Crisis situation: an earthquake or a pandemic. People communicate agreat deal and the entropy is very high but the environment is chaotic and τ has a very low value. There may be a large decision making activity,but decisions are not enacted or do not achieve the intended effect or doit poorly.The simplest characterization of the τ profile of environments is a gaussianfunction ψ from the entropy s to the potential productivity of decisions, suchthat τ = ψ ( s ) = ae − ( s − s ) / c where s ≥ a is the value of τ at the optimalentropy s o , and c is the standard deviation. Each of the four scenarios abovecorrespond to a particular control volume with entropy s i and productivity ofdecisions τ i . These are illustrated by shifting s i from left to right in Figure 1.The potential productivity of decisions can be seen as a cost-benefit pa-rameter of communicating in a control volume. This is an ecological variable4 igure 1: Potential Productivity of Decisions indicating the degree to which communicating provides an adaptation advantageand this behavior is reinforced. In the limiting case, when τ = 0, communicationis of little use and agents may be better off by themselves, and when τ is toolarge communication is reduced to social noice. This conjecture may be testedempirically.Communication and decision making can be performed by a wide varietyof animal species that produce sounds or motor actions with communicativeintent. Such “speech acts” can be codified through messages, although mostspecies have a pretty determined “view of the world”, have limited abilities tocope with environmental changes, and their entropy and τ may be both verylow. From an ecological and evolutive perspectives, the potential productivityof decisions measures the degree in which the plane of content matters for thespecies. Those that do not communicate or communicate little are limited tobehave reactively to the signals sensed in the environment, and there is noreason to suppose that they sustain a plane of content or have a mind, and alarge communication effort that is not accompanied by productive behavioralchanges is not reinforced.
2. Relational-Indeterminate Computing
The computing engine that is causal and essential to decision making mustalso have a choice. If such engine is deterministic, decisions are predeterminedand decision making is an illusion. According to standard thinking in Computer5cience the Turing Machine (TM) is the most general computing machine –allother general enough computing machines are equivalent to it– and this machineis deterministic. In
Computing Machinery and Intelligence [2] Turing statedthat the predictions made by digital computers are “rather nearer to practicalitythan that considered by Laplace” (Turing, 1950, s.5). Laplace’s demon is indeeda TM that computes all future and previous states of the universe given the fullset of the physical laws and the initial conditions of a particular state, and it doesso instantly. Hence, decisions “made” by digital computers are predetermined.Computing machines hold physical states and interchange signals but do notmake interpretations and decisions by themselves. Communication, information,knowledge and decision making are objects at plane of interpretation that is heldin the mind of humans and of other animals with a developed enough neuralsystem. For this, computing machinery is designed and used in relation to astandard configuration, which defines the format of representations, and a setof interpretation conventions.The most basic interpretation convention in the theory of computability isthat every TM computes a particular function, and the enumeration of all TMscorrespond to the set of computational functions (e.g., [3]). A function is arelation between two arbitrary sets of objects, the domain and the codomain,whose members are called here the arguments and the values respectively, suchthat an argument is related to one value at the most, and this relation is givenwhen the function is defined.Mathematical objects are immutable, and the relation between the argu-ments and values of all functions is fixed. Algorithms are mechanical proceduresthat produce the value of the given argument, but the output of the computationis predetermined. Algorithms can be seen as intensional definitions of functionsbut the same knowledge can be expressed extensionally, as in tables, and whatan algorithm does is simply to render explicitly the implicit knowledge. For allof this, under the standard interpretation conventions, TM computations arepredetermined by necessity. Hence, the entropy of a TM is zero, and the notionof entropy is alien to the standard theory of TMs and computability.6he absolute determinism of TMs can be questioned from the perspective ofthe so-called non-deterministic automata [4]. There are non-deterministic finitestate automata that have an equivalent deterministic one, such as those accept-ing regular languages; in this case, the non-determinism can be seen as a meansto express abstractions but the actual computations are deterministic; however,there are automata that are genuinely non-deterministic, such as those accept-ing ambiguous languages or machines that explore a problem space heuristically,such as computer chess programs. In these latter cases the input or argumentis associated to more than one value and the object that is computed is not afunction but a mathematical relation.The determinism of TMs is also questioned from the perspective of stochasticcomputations. Turing suggested this strategy in the 1950 paper: when theproblem space is too large, it can be partitioned in regions, and the value of thefunction for the given argument can be found by jumping into promising regions,using random numbers. His example was the problem of finding whether anumber between 50 and 200 is equal to the sum of the squares of its constituentdigits (Turing, 1950, s.7). This problem has no solution but illustrates thecontrast between the deterministic strategy –iterating from the first to the lastnumber in the domain– and the stochastic one –choosing a number in the domainrandomly and making the test until the solution is found. This latter processmay never end, unless all trials are recorded and the computation finishes whenall the instances of the problem have been tested, but with a very high costin terms of memory and processing time. Turing called these methods the systematic and the learning respectively, and stated that the latter may beregarded as a search for a form of behavior and that “since there is probablya very large number of satisfactory solutions the random method seems to bebetter than the systematic” (Turing, 1950, s.7). As there are many solutions,the object of computing is again a relation rather than a function.The stochastic method requires that there is some indeterminacy in thecomputing process. This was also made explicit by Turing who proposed avariant of a digital computer with a random element, and that random numbers7an be simulated in deterministic computers; for instance, by choosing the nextdigit in the expansion of the number π (Turing, 1950, s.4). These digits arenot known in advance but the sequence is predetermined, and such numbersare pseudo-random rather than random; hence, computations involving themare in the end determined. Genuine random numbers can be and are producedby sensing a property of the external environment –that cannot be predicted–whose value can be seen as a hidden argument of the stochastic algorithm.Although such argument is not known, the object being computed is neverthelessa function, and the process is still deterministic.This limitation can be overcome by generalizing the interpretation conven-tions and postulate that the object of computing is the mathematical relationrather than the function. A relation assigns an arbitrary number of objects ofthe codomain to each object in the domain. Hence, the value of the relation fora given argument is indeterminate. To address this indeterminacy the basic no-tion of evaluating a relation is construed here as choosing randomly one amongall the values of the relation for the given argument. This is: R ( a i ) = v j suchthat v j is selected randomly –with an appropriate distribution– among the val-ues associated to the argument a i in the relation R . This latter interpretationconvention is more general, computations become indeterminate, the computingengine is stochastic and the machine has an intrinsic computing entropy. Thismode is called here Relational-Indeterminate Computing (RIC).The entropy of a relational-indeterminate computation is defined here as thenormalized average indeterminacy of all the arguments of the relation R . Let µ i be the number of values associated to the argument a i in R , ν i = 1 /µ i and n isthe cardinality of the domain. In case R is partial and µ i = 0 for the argument a i then ν i = 1. The computational entropy e –or the entropy of R – is definedhere as e ( R ) = − /n (cid:80) ni =1 log ( ν i ).The communication and computational entropies have a common normalizedscale. A function has one value for all of its arguments and its entropy is zero.Partial functions do not define a value for all the arguments, but this is fullydetermined and the entropy of partial functions is also zero.8he relational-indeterminate mode of computing focuses on relational infor-mation and has three basic operations: abstraction , containment and reduction .Let the sets A = { a , ..., a n } and V = { v , ..., n m } , of cardinalities n and m , bethe domain and the codomain of a finite relation R : A → V . For purposes ofnotation, for any relation R a function with the same name is defined as follows: R : A × V → { , } such that R ( a i , v j ) = 1 or true if the argument a i is relatedto the value v j in R , and R ( a i , v j ) = 0 or false otherwise.RCI has three basic operations: abstraction , containment and reduction . Let R f and R a be two arbitrary relations from A to V , and f a be a function withthe same domain and codomain. The operations are defined as follows: • Abstraction: λ ( R f , R a ) = R , such that R ( a i , v j ) = R f ( a i , v j ) ∨ R a ( a i , v j )for all a i ∈ A and v j ∈ V –i.e., λ ( R f , R a ) = R f ∪ R a . • Containment: η ( R a , R f ) is true if R a ( a i , v j ) → R f ( a i , v j ) for all a i ∈ A and v j ∈ V (i.e., material implication), and false otherwise. • Reduction: β ( f a , R f ) = f v such that, if η ( f a , R f ) holds, f v ( a i ) = R f ( a i )for all a i , where the random distribution is centered around f a ( a i ). If η ( f a , R f ) does not hold, β ( f a , R f ) is undefined –i.e., f v ( a i ) is undefined–for all a i .Abstraction is a construction operation that produces the union of two rela-tions. A function is a relation and can be an input to the abstraction operation.Any relation can be constructed out of the incremental abstraction of an ap-propriate set of functions. The construction can be pictured graphically byoverlapping the graphical representation of the included functions on an emptytable, such that the columns correspond to the arguments, the rows to the valuesand the functional relation is depicted by a mark in the intersecting cells.The containment operation verifies whether all the values associated to anargument a i of R a are associated to the same argument of R f for all the ar-guments, such that R a ⊆ R f . More generally, the containment relation is false9nly in case R a ( a i , v j ) = 1 and R f ( a i , v j ) = 0 –or if R a ( a i , v j ) > R f ( a i , v j )– forat least one ( a i , v j ).The set of functions that are contained in a relation, which is referred to hereas the constituent functions , may be larger than the set used in its construction.The constituent functions are the combinations that can be formed by takingone value among the ones that the relation assigns to an argument, for all thearguments. The table format allows to perform the abstraction operation bydirect manipulation and the containment test by inspection. The constructionconsists on forming a function by taking a value corresponding to a marked cellof each column, for all values and for all columns. The containment is carriedon by verifying whether the table representing the function is contained withinthe table representing the relation by testing all the corresponding cells throughmaterial implication.For this, the abstraction operation and the containment test are productive.This is analogous to the generalization power of standard supervised machinelearning algorithms that recognize not only the objects included in the trainingset but also other objects that are similar enough to the objects in such set.Reduction is the functional application operation. If the argument function f a is contained in the relation R f , reduction generates a new function such thatthe value assigned to each of its arguments is selected randomly from the valuesassigned to the same argument in R f . If f a is not contained in R f the value ofsuch functional application operation is not defined. The reduction operationselects a function included in the relation on the basis of a “cue” function. Thelower the entropy of the relation the larger the similarity between the objectretrieved from the relation and the cue. In the limiting case, when the entropyof the relation is zero but reduction is applicable, the reduction selects the cueitself.The distinction between non-entropic and entropic computing correspondsto the contrast between “local” versus “distributed” representations. Accord-ing to Hinton [5], TMs hold local representations in which the units of formare related one-to-one to their corresponding objects at the level of meaning10r content, while in distributed representations this relation is many-to-many .Representations in Turing machine are strings of symbols on the tape whereeach word denotes a particular basic unit of content and non-basic meaningsare produced by the composition of the meanings of words into meanings ofphrases and sentences, but basic words are never overlapped on the tape or thememory, and representations are local. In this setting basic representationalobjects are mutually independent and the entropy is zero. This contrasts withthe representation of a set of functions overlapped on a table, where a markedcell can contribute to the representation of more than one function and a func-tion can share marked cells with other functions. Hence, the representation isdistributed and the entropy measures its indeterminacy.Standard TMs can compute relations by computing its constituent functions.However, the information provided by the intersections is lost if the functionsare considered as independent objects. The computing entropy measures theinformation provided by such intersections. If the functions constituting a re-lation are mostly independent, there are few intersections and the entropy ishigh; however, if there is a large number intersections the entropy is decreasedand computations become more determined accordingly.The inclusion of the entropy in a theory of computing gives rise to a newcomputational trade-off that here is called The Entropy Trade-Off : If the en-tropy of the machine is very low the computation is pretty determined, being theTM the extreme fully determinate case whose entropy is zero; in the other ex-treme, if the entropy is too high, the information is confused and computationsare not feasible. However, if the entropy is moderate there is a certain amountof indeterminacy in the computing engine, that allows multiple behaviors andcomputations are still feasible.The entropy trade-off of computing engines corresponds to the potentialproductivity of decisions in the environment.The interaction of the agent with the environment introduces some inde-terminacy to the computing process. The external input, such that the faceresulting of flipping a coin, is input into the agent’s control volume, as illus-11 igure 2: Embedded Control Volumes trated in Figure 2, and from its point of view the value of such variable isgenuinely random. Control volumes can be embedded in more extensive en-vironments from the most specific ones to the universe as a whole. Whetherspontaneous events can occur within a control volume or the knowledge of suchevents is a case of incomplete information is the deep question whose answeropposes determinism and the indeterminate views of the world.
3. Table Computing and Associative Memory
The implementation of RIC in a table format is referred to as
Table Com-puting [6]. This format was used to define and implement an associative mem-ory system, where standard tables are used as
Associative Memory Registers (AMRs) [7]. The system uses three main algorithms that perform operationsbetween the corresponding cells of memory registers directly, which are calledhere
M emory Register , M emory Recognize and
M emory Retrieve . Thesealgorithms implement the λ , η and β operations respectively, and are so sim-ple that here are called Minimal Algorithms . The objects stored in the AMRsare distributed representations of individuals or classes of individuals. Theserepresentations have an entropy level depending on the information stored inthe AMRs at a given time, and the memory system conforms to the entropy12rade-off. The memory operations are computed in parallel, both between thecorresponding cells of AMRs and between the full set of AMRs included in thememory system. This parallelism is a property of the memory at the com-putational system level in Marr’s sense [8] and not only a contingency of theimplementation. The power of the distributed representation comes from thecoordinated simultaneous computing at all local memory cells and the AMRs.
The associative memory system was used for storing distributed represen-tations of hand written digits [7]. In the experiment an associative register forholding the representation of each one of the ten digits was defined. All in-stance digits in the training corpus were placed on an input visual buffer, andmapped into a set of features with their corresponding values –which is calledhere the space of characteristics – through a standard deep-neural convolutionalnetwork [9]. Intuitively, this corresponds to seeing each digit, mapping it into itscorresponding abstract modality independent representation through a bottom-up perceptual operation, and registering it into its corresponding associativeregister through the
M emory Register algorithm.The memory recognition operation was implemented by mapping the digitto be recognized into its corresponding representation through the same deep-neural network, and applying the
M emory Recognize algorithm.The results show that if the entropy is zero or very low the overall recog-nition precision and recall are very low; that both precision and recall increaseaccording to the increase of the entropy, but that recall decreases when the en-tropy is very high; that there is an interval of moderate entropy values in whichprecision and recall are both very satisfactory; and that the entropy trade-offholds. Such entropy interval determines the operational characteristics of theassociative memory system.The experiment of memory retrieve consists on presenting arbitrary handwritten digits to the memory to be used as cues and map the recovered functionin the space of characteristics into a concrete representation in an output visual13uffer through transposed convolution, as in standard architectures to generatepictures out of abstract characteristics within the deep neural networks frame-work [10]. The results show that the objects retrieved from memory are similarto the cue only when the entropy is very low, and that the similarity decreasesquite sharply with small entropy increments. Memory retrieve is a constructiveoperation such that the retrieved objects go from “photographic” images, tosimilar images, to imaged objects, depending on whether the entropy is zero,moderate or too high respectively.The overall functionality of the memory system conforms to the intuitionthat memory recognition is a flexible and robust capability but recovering orretrieving objects from memory is a much more selective and restrictive opera-tion.In the present system information is stored and recognized locally throughthe basic logical disjunction and material implication operations between cells oftables or associative registers, in addition of the standard assignment operatorand a random generator. The memory system is associative, distributed anddeclarative, and has a dual symbolic and sub-symbolic interpretation. Theexperiment was implemented as a simulation in a standard processor with agraphic GPU board, but the hardware construction of the device should not beproblematic with current integrated circuits technology.
4. Entropy and Cognitive Architecture
The computing engine that is causal and essential to cognition and rational-ity needs to be related to the cognitive architecture of the agent. An architectureincluding perception, motricity or action, thought, schematic thinking and reac-tive behavior, which here is called
Interaction-Oriented Cognitive Architecture or IOCA , is illustrated in Figure 3. The architecture is stated in terms of IOCA is an abstract generalization of the architecture with the same name that wasimplemented in the service robot Golem-III [11]. Publications and videos about this robotare available at http://golem.iimas.unam.mx . Golem-III was built with standard digital igure 3: Cognitive Architecture functional modules, in the sense of other cognitive architectures such as SOAR[12] and ACT-R [13]. The modules are stated at the computational theory inMarr’s sense [8] and hence independently of particular representational formatsor processing strategies. The purpose of this presentation is to illustrate theroles of the entropy, the potential productivity of decisions and the relational-indeterminate computing in cognition.Decision making is at the center of thought and rationality. Theories of ra-tionality may be classified in three main kinds: i) those that support a generalmechanism that works according to first principles; ii) those that support thatthinking is schematic and carried out through a rich variety of specific mech-anisms that allow an effective interaction with the environment; and iii) thosethat support both modes. The first and second kinds are depicted by the mod-ules Thought and
Schemes in Figure 3. The present architecture is an instanceof the third kind.Theories sustaining that thought uses a general mechanism and conforms to computers and its only source of entropy is the external environment, which is very predictablein the traditional competition settings and benchmarks for this kind of devices, and the overallbehavior, including the decisions made by the robot, is quite predetermined.
Bounded Rationality [14, 15]. The original program used the min-imax decision making strategy although computed with limited computationalresources and aided by heuristics. In this respect it departed explicitly from theomniscient rationality and from the notion of economical or administrative manadvocated by Von Neumann and Morgenstern in
Game Theory and EconomicBehavior [16]. This research program gave rise to the
Physical Symbol SystemsHypothesis holding that a system of grounded symbols provides the necessaryand sufficient conditions to produce general intelligence [17, 18], and was fur-ther developed with Newell’s claim that there is a system level directly abovethe symbol level, that he called the knowledge level , in which the only rule ofbehavior is what he called
The Principle of Rationality [19].This program was later developed into the SOAR [12] and ACT-R [13] cog-nitive architectures, the former focused on modeling AI tasks and applicationsand the latter on developing a theory of mind and its relation to the brain. Inthis paradigm, decision making was conceived as a pure thought process whererationality was enacted through symbolic manipulation; interpretations werealready available in the symbolic format; and the actions performed with thepurpose of satisfying the agent’s decisions achieved their intended effect neces-sarily. In practice, interpretations and actions were performed by human-usersand computational thought was detached from the world, as in computer chessand similar programs.This limitation of the original program of AI has been addressed from dif-ferent perspectives such as neural networks [20] and probabilistic causal models(e.g., [21, 22]), which reject symbolic representations, and embedded architec-tures [23], embodied cognition [24] and enactivism [25], that reject representa-16ions and symbolic computing altogether. However, regardless whether or notthought and memory are representational or implemented through symbolic orsub-symbolic computing, these faculties have to be connected with the worldthrough perception and action in a principled way.Schematic thinking, on its part, consists on mapping interpretation into ac-tions through a specialized module. Schemes may be innate, although theymay be used contingently by analogy to learned concepts, or may be devel-oped through experience and manifested as habitual behavior. The
Schemes module is exemplified by the so-called procedural representations in ArtificialIntelligence, such as Minsky’s Frames [26], that opposed production systems andlogical approaches, which may be considered general reasoning mechanisms.Theories of thought and decision making should also be seen in relation oftwo perspectives: the mechanisms that the agent uses to make decisions andthe constraints of the environment that allow such decisions be enacted in theworld. The potential productivity of decisions measures the overall impact thatdecisions can have in an environment, but particular decisions should take intoaccount such constraints and this knowledge should be available in memory.Since the early work of Simon [27] theories of bounded rationality distin-guish behavioral constraints that relate to the decision maker, from externalconstraints that refer to the ecological structure of the environment. However,the characterization of what is “behavioral” and what is “external” is by nomeans easy to make. In practice, reasoning has traditionally lean to modelbehavioral constraints and the need for an ecologically motivated research pro-gram has been called for (e.g., [28, 29]). This concern has been addressed morerecently within the so-called
Ecological Rationality (e.g., [30]) which investigatesthe coupling between the heuristics and the features of the environment for act-ing effectively in the world through a continuous interacting cycle. Heuristicsin this latter paradigm are commonly described as algorithmic encapsulatedprocesses.The dual theory, in turn, considers that thinking and problem solving dorequire a general deliberative mechanism, but thought is an expensive resource17nd this module may be bypassed by schemes that map the output of perceptioninto the input of the motricity module. Conversely, schemes may be quiterigid and lead to irrational behavior, but this can be avoided if the agent isa competent subject and has the interest and the energy to address the taskthrough deliberative thinking (e.g., [31]).The observations can also be connected directly with basic actions renderingreactive behavior, as illustrated by the dotted arrow in Figure 3. In this archi-tecture the agent is always engaged in a cycle of perception, schematic behaviorand action, that uses thought on demand and embeds reactive behavior; hencethe name IOCA. The architecture is only a schematic idealization, as in naturalcognitive architectures the functional modules may overlap and the boundariesbetween them may be loosely demarcated.
The inputs of the perception module are the observations that the agentsense in the environment and the relevant knowledge stored in memory, and itsoutputs are the corresponding interpretations. These are in turn the inputs tothe thought module, in which reasoning is performed. Thought interacts withmemory, where knowledge is registered and recalled, and its outputs are theintentions of the agent, which are construed here as specifications for actions thatcan be rendered through linguistic or motor behavior. The motricity module hasas its inputs the intentions produced by thought but also interacts with memory,and its outputs are the actions that the agent performs in the environment.The perception module is stated in the Bayesian perspective. The agent isendowed with the capacity to make observations O , by organizing the infor-mation provided by the senses into a basic entity, property or scene, such as aphysician feeling the heat of the patient’s body or the chess player identifyingthe pieces on the board. The observations can be performed by anyone whohas not a particular impairment preventing him or her to do so, and can beconsidered innate. Observations are the manifestation of events in the world,and perception produces the hypotheses of what are such events on the basis of18he observations and the relevant knowledge. The term P ( E/O ) stands for themost likely event that happened in the world given the observation. As theremay be many potential events, the best hypothesis is the event that maximizesthis value for the given observation. This term is the output of perception or
The Interpretation .This is computed by Bayes’s theorem which is stated as follows: P ( E/O ) =
ArgM ax E P ( O/E ) × P ( E ) /P ( O ). The term P ( O/E ) –the likelihood – repre-sents, for instance, the ability of a physician to be able to tell how likely is thatthe patient has fiver if he or she has typhoid –the observation and the event re-spectively. This term represents the expertise of the physician that is acquiredthrough the years of practice, and is readily available when an observation ismade. Here this term is referred to as
The Perceptual Ability .The term P ( E ), on its part, usually referred to as the prior , stands for theknowledge of the agent of how likely is that such an event occurs in the world.For instance, the physician may know that 9 out of 10 children in town havetyphoid.Finally, the term P ( O ) is the probability of the observation. This probabilityis in turn P ( O/E ) × P ( E ) + P ( O/E ) × P ( E ) where the first and second termsof the sum correspond to the true and false positives –the times the observationwas produced by the event and the times the observation was produced byother events. However, the term P ( O ) is dropped in the standard formulationof the noisy channel because the computation aims to select the most likelyevent in relation to the same observation, and the full expression is simplified to P ( E/O ) =
ArgM ax E P ( O/E ) × P ( E ). The resulting value is a weighing factorrather than a probability, but it is enough to compute the best interpretation.The bayesian expression states simply that the best interpretation hypothesis isthat the event that most likely happened is the one that maximizes the productof the likelihood and the prior.The hypothesis that people make interpretations using Bayes’ theorem hasbeen tested and the early empirical evidence suggested that this is not thecase [32, 33] and that people normally ignore the priors when individuating19r specific information is available, the so-called base-rate neglect or fallacy(e.g., [32]). However, base rates interact with specific information, and whetherthey are considered depends on their relevance for the task at hand [34]. Morerecently it has been argued that the experiments that gave rise to such resultwere based on the implicit assumption that people, for instance, physicians, useBayes’ theorem in the standard probability format. In those experimentes theposterior probability P ( E/O ) had to be computed given the prior, the likelihoodand the rate of false positives (often called the base rate, the hit rate and the falsealarm rate) and the experiments assumed that the Bayes’ theorem is known andcan be used operationally, but as normal people, such as most physicians, arenever taught probability theory and are unfamiliar with the probability format,this knowledge must be innate and subconscious. No surprisingly people ingeneral do very poorly.However, presenting the information as natural frequencies do support natu-ral bayesian reasoning [35, 29] and that base-rate neglect is unfounded [28]. Forinstance, if instead of the actual probabilities a physician knows the number ofpeople in his or her demarcation, those who have presented an illness and hadthe symptoms, and those who had the symptoms but did not have the illness,he or she can compute the actual posterior probability, although using a sim-pler but correct form of Bayes’ formula. Gigerenzer [35] calls the latter form ofacquiring the information the natural frequency format and argues very clearlythat the same mathematical object –in this case Bayes’ theorem– can have dif-ferent representations and be computed by different algorithms, and that theinformation should be presented in the appropriate format.From an ecological and evolutive perspective the information format must beavailable directly in the environment (e.g., [29]) but the interpretation machinemust map such format into one akin for memory and processing too. Clearly, ifthe information is presented in the probabilities format but the processor andmemory use a natural frequencies format or vice versa, there would have to bea costly and unnatural translation.An instance of the natural frequencies format all the way through is illus-20rated by the associative memory presented in Section 3. There, the digitsinstances in the external medium are feed in serially into their correspondingassociative registers through a transfer function, implemented by the neuralnetwork, that places the information in the standard configuration of the tablecomputing machine. This format consists on a large number of abstract amodalfeatures that are analogous to their modality specific input and output represen-tations –the digits on a piece of paper or a computer screen that are input andoutput through modality specific buffers– and all instances of the same digit arerepresented in a table but abstracted through the logical disjunction operation.Hence, the format used by the memory corresponds to a natural frequenciesformat.The mapping from the input modal buffer into its corresponding amodalrepresentation corresponds to computing a likelihood; recognizing or retrievingthe content of the memory registers corresponds to computing a prior; and thememory recognition and recall operations correspond to selecting the object thatmatches the cue from its right memory register among all the other registers, and“maximizes” “the product” of “the likelihood” and “the prior”. For instance,in the experiment in Section 3.1 the likelihood is the descriptor representing thecue provided by the input neural network –which plays the role of the bottom-up or low-level perception; the “events” are the digits presented to the agentfor recognition or recall, whose representations are stored in their correspondingassociative memory registers. These contain the abstractions of all instances ofthe corresponding digits previously seen and registered; and the maximizationoperation corresponds to recognizing and retrieving one specific digit among theten possible ones, on the basis of the cue.This suggests that Bayesian interpretation does not mean necessarily thatpeople actually compute the posterior probability and that there may be othermodes of computing that implement the process in a more effective way. Thepurpose of the computation is to interpret the observation presented for recog-nition: a clinical physician is concerned with choosing the most likely diseasegiven the symptoms and his or her knowledge and experience, in order to decide21he best treatment, but he or she does not need to know mathematics and makethe actual computations, as it is the case commonly.From the neurosciences perspective, perception, action, learning and mem-ory accord to Bayesian principles, the so-called Bayesian brain hypothesis [36].The implicit Bayesian maximization involved in the production of the inter-pretation may require a cycle of memory recall operations such the currentinterpretation is compared with the likelihood of the next observation, and thefinal interpretation is stable enough to be reliable.There may be a wide variety formats, representational systems and process-ing strategies and the basic intuition underlying Bayes’ theorem can be gen-eralized into the proposition that the best interpretation strategy is to selectthe best hypothesis that results from pondering the information provided bythe perceptual abilities with the relevant knowledge available in memory. Thisintuition is very strong and is referred to here as
The Principle of Interpretation .The Bayesian analysis can be applied for the action part or the motricitymodule that generates the action, as illustrated in Figure 3. In this case theinputs are the intentions I produced by thought which need to be enactedthrough the actions that the agent can perform, and the output is the bestaction that achieves the intention, which is designated by the term P ( A/I ). Thebayesian expression for the action case is P ( A/I ) =
ArgM ax A P ( I/A ) × P ( A ).As in the interpretation case, the goal is to chose the best action in relation tothe same intention, and the probability of the intention can be dropped.The likelihood P ( I/A ) represents the extent to which performing a particularaction renders the given intention, or the extent to which the intention can beachieved given that the action is performed. For instance, that a melody will beproduced given that a sequence of keys is pressed on the piano. The primitiveor innate actions A can be performed by anyone, such as pressing the keys, butto produce music by playing the piano requires years of rehearsing. The abilityis acquired through practice and experience, as in perception, and this latterlikelihood is called here The Motor Ability .The term P ( A ), on its part, represents the feasibility that the action can be22nacted in the environment. This involves behavioral constraints of the agent,such as his or her capacity to perform such action, or the cost that needs tobe afforded for achieving it, but also external aspects that depend on boththe physical environment and the dispositions and intentions of other agentsor the society. The external aspects correspond to the potential productivityof decisions, but for particular actions. If this knowledge is not considered theactions may not achieve the intended effect due to factors that are not underthe control of the agent.Finally, the bayesian law selects the action that maximizes the product ofthe motor abilities with the cost or feasibility of the action. As in the interpre-tation case, the motor ability and the knowledge about actions can be expressedthrough a variety of formats and computing strategies.On the basis of these consideration the bayesian law can be generalized intothe proposition that the best strategy for acting in the world is to select thebest action that satisfy the intention by pondering the information provided bythe motor abilities with the knowledge about the actions available in memory.This latter proposition is referred to here as The Principle of Action .Perception and action are commonly modeled through standard algorithmsand the entropy of the machine is not considered or is zero. However, naturalmemory is entropic most likely, memory recall is an indeterminate operation, thelinks from memory into perception and action introduce a level of indeterminacy,and perception and action conform to the entropy trade-off.
Schematic behavior relates perception and action through specific highlyspecialized modules. A paradigmatic form of schematic behavior may be daily-life intentional behavior –walking, eating, bathing, talking, etc. These kindsof actions bypass thought and relate perceptual and motor abilities directly, aslong as the expectations of the agent are met in the world. However, whena spontaneous event occurs in a relevant respect, there is an interruption andrational agents engage in a deliberative thought process. The schematic behavior23ay be put on hold to attend the event, or may be continued and performedsimultaneously with thought, but the attention is focused on this latter process.In Cognitive Psychology schematic thinking is exemplified by the use ofheuristics that explain biased but systematic behaviors [32], that were presentedin opposition to Bayesian Inference [35, 29]. In such view the heuristics handlediverse sorts of common interpretation rightly and very effectively, although onoccasions they can lead to irrational behavior.A more recent illustration provided within the ecological rationality is theso-called gaze-heuristics which is used to track objects following a trajectory,such as baseball players catching a ball. It consists on fixating the gaze on theball, start running and adjust the speed so that the angle of gaze in relationto the ground remains constant [30]. In this behavior there is a continuousinterpretation and action process, and the essence of the heuristics is to computethe speed as a function of the angle. In a physical model the full trajectorywould have to be computed, but, according to ecological rationality, this is toocomplex and people are unable to do it. Nevertheless, in such view it is heldat the same time that the angle and the speed are much simpler computationsthat are actually performed by people in real time. However, computing theseparameter requieres significant computational resources, metric systems andmeasuring devices, which might not be available to people and other higherevolved animals that can achieve these tricks; and natural computing shouldproceed by other means.In the present proposal computing the scheme can be construed as follows:the perceptual ability produces the angle –represented in a space of abstractand amodal characteristics– out of the image in the input visual buffer; thisinterpretation is the argument of a scheme mapping the angle into a speed –inthe same abstract and amodal space– but implemented through table computingor some form of analogical or diagrammatic reasoning, which does not use astandard algorithm involving costly computations. The speed is in turn theinput to the perceptual ability and the motricity module renders the actionthrough the motor actuators. 24his basic model does not resort to memory and although computing thescheme directly may be very effective, the agent may not be able to adapt toeven slight changes in the environment. A more robust model would includea perceptual and a motor memory for enriching the input interpretation andconsidering the potential contingencies of the action, and the overall behaviorwould be informed by the knowledge of the agent, according to the full use ofthe principles of interpretation and action.Schemes compute functions or relations whose arguments and values standfor the interpretations and intentions respectively. Schemes are normally im-plemented through standard algorithms and compute by TMs, but they canbe implemented by other means, such as neural networks, analogical and dia-grammatic machines, or the relational-indeterminate implemented through ta-ble computing, among other specialized modes of computing. In case entropicprocesses are included, behavior would be more flexible and should obey theentropy-trade off.In any case, motor actions produced through schematic behavior are highlyconditioned by perception and the scheme proper. To say that the baseballplayer makes the decision to increase or decrease its speed and the positions ofhis or her body organs is a manner of speaking. Explicit decision making anddeliberative thought allows people to anticipate the world in the short, middleand long term, and the relation between interpretation and intentional action ismediated by knowledge and values. For this, deliberative and schematic thinkingshould be distinguished.
The thought module is zoomed in Figure 4. Its inputs and outputs arethe interpretations produced by perception and the intentions that need to beenacted by the motricity module respectively. The thought process consists ona pipeline of a diagnosis, a decision making and a planning inferences.The input to the diagnosis module is the interpretation of the event thatcaused the interruption of the schematic cycle and its output is a hypothesis25 igure 4: The Inference Pipe-line of Thought of its cause, which is the main input to the decision making module. Thislatter module has other inputs such as the agent’s interest, preferences, values,affections, etc., which may have a subjective component, and its output is thedecision proper: what to do about the interpretation hypothesis. The decisionis in turn the goal of the planning module whose output is the specification ofthe actions that need to be carried out in order to enact the decision and theoutput of the thought process as a whole is the intention. The intentions areenacted as actions and the agent must verify that the effects of the actions areas expected and may remain engaged in an inference cycle until the intendedeffects are achieved. This cycle is also accompanied by learning, so the agent isbetter prepared to deal with similar contingencies in the future.For instance, someone enters into his or her house and sees a puddle in theliving room coming out of the kitchen. This is already an interpretation that canbe express linguistically. The problem for the house owner is what to do aboutit but first he or she needs to find out what is the cause of the observation, anda diagnosis inference is performed. This process can be thought of in bayesianterms but also as a case of abductive symbolic reasoning, which underlies thesame basic intuition. The diagnosis involves the synthesis of a set of hypotheses,such as that a pipe was broken or that his or her child forgot to turn off thefaucet. The output of the diagnostics module is a ranked set of hypotheticalcauses, which is the main input to the decision making module, in addition to26he subjective inputs as discussed above and illustrated in Figure 4, and itsoutput is the decision proper. If the diagnosis was that the sink tap was turnedon, the decision is to turn it off and dry the puddle; but if the diagnosis wasthat a pipe was broken, the decision is to fix it. Another potential decision isdoing nothing, but the subjective inputs, such as the interests and values, placethe decision in a deeper perspective, and modulate the decisions that particularagents make.The decision becomes the goal of a plan, that has to be induced and executed.For instance, if the decision is to turn off the tap and dry the puddle, the planmay be go into the kitchen, stand in front of the sink, turn off the tap, get amop and dry the floor; but if it is to fix the pipe, the plan may be turn off thehouse main water valve and phone the plumber. The output of the planningmodule is the specification of the actions that need to be performed with thepurpose that the decision is enacted. This specification is the intention, whichis the input to the motricity module. Some of the actions may be performedlinguistically and other by physical motor behavior, but in the end these are allmotor actions. The agent needs to monitor whether the effects of the actionsare as expected through the main interaction cycle with the world, and remainengaged recurrently until the plan is achieved and the problem that gave rise tothe thought process is fixed. An additional input to the decision making module is the knowledge whetherthe decision can be enacted in the world. Some of this information depends onthe types of actions that the agent can perform through its motricity module,but there is an additional dimension that depends on the knowledge of thecausal structure and uncertainty of the world. Each particular decision madeby a particular agent may or may not be enacted due to environmental orecological aspects, and a good decision maker should take into account such This cycle of inference has been implemented in the robot Golem-III. Videos showing therobot performing simple tasks but involving the full inferential pipe are available at http://golem.iimas.unam.mx/inference-in-service-robots
DecisionMaking Trade-off : the best decision is the one that results from pondering thevalue of the potential decisions with the uncertainty that they can be enactedin the environment. The largest the value of the decision for the agent andthe lesser the uncertainty that such a decision can be enacted, the better thedecision. This is also an application of the Bayesian principle, where the valueof the decisions to achieve and intended effect can be seen as a likelihood andthe knowledge of whether the effect can be enacted in the world as a prior.As in perception, action and schematic behavior, it is possible to hypothesizethat deliberative thought is performed through direct mechanisms and has animplicit entropy, and conforms to the entropy trade-off.
Knowledge stored in memory opposes perceptual and motor abilities in sev-eral dimensions. While the former is available in memory, can be expresseddeclaratively, and is acquired and learned through language, the latter are deeplyembedded in the perceptual and motor structures and are acquired through ex-tensive training. While knowledge is registered, recognized and retrieved frommemory and used in language, abilities are used but the information that iscausal to their deployment cannot be recalled or remembered. While knowledgeis transparent to consciousness, at least when is stored or is retrieved from mem-ory or used in reasoning, and can be the object of reflection and introspection,the experience of deploying an ability is felt but opaque to consciousness; andwhile knowledge is causal and essential to intentional behavior, the reports orexplanations of what is the “knowledge” involved in abilities are a posteriorireconstructions of the processes that are causal to the experience itself. Abili-ties are embedded deeply into the perceptual and motor engines, but knowledgeis better thought of as the object that is held in memory. Schemes are sim-28lar in most of these respects to abilities, although may have a larger innatecomponent, and oppose to knowledge in the same dimensions.In AI symbolic systems knowledge is held in knowledge bases, that may holdincomplete information, may involve interpretation heuristics and may performconceptual reasoning directly, and have a mostly symbolic character. Knowl-edge can be stored and retrieved, used in reasoning and inform perception andaction, and the functional module that holds such a kind of information may beconsidered a memory proper.Symbolic knowledge-bases oppose sub-symbolic machines, such as neuralnetworks. These latter machines cannot express symbolic structures; hence,cannot hold information declaratively, and individual or episodic memories can-not be recalled. For these reasons sub-symbolic systems are better thought ofas transfer functions for classification or prediction, for instance, but should notbe considered proper memories (e.g., [37]).The associative memory in the present architecture has both a symbolicand a sub-symbolic aspects. The interpretation of the register’s content as awhole has a symbolic character but its structure is sub-symbolic. Memoriesare entropic computing machines and the entropy trade-off establishes their op-erational range. The associative memory registers in the experiment reportedin Section 3.1 hold only basic units of content, such as the digits. This is aproof on concept experiment, and it is an open question whether other kindsof individual objects can be modeled, or whether larger structures for holdingcomposite contents can be created, but the experiment suggests that the causaland essential engine that gives rise to memory may have such a kind of dis-tributed structure, that hold the information in natural but abstract formats,and computing is performed through minimal algorithms.The cognitive architecture also suggests the possibility that logical reasoningcan be a memory operation in which the premises and conclusions of argumentsare represented in the space of characteristics, such that the representation ofthe premises is included in the representation of the conclusion. This wouldallow for a representational view of knowledge, but one in which the actual29ymbols appear only in the input and output modality specific buffers, andthe “symbolic manipulation” consists on memory operations in the abstractcharacteristics space.
Natural frequencies do present the information in a manner more akin to thenatural mode of computing than other formats, such as standard mathematicalnotation. However, according to standard thinking in Cognitive Science, com-puting is nevertheless performed using standard algorithms in a TM or someequivalent machine. The intuition is that the human brain is such a powerfulmachine that can easily evaluate the arithmetic operations involved in the com-putation (e.g., [29, 30]); hence people do use algorithms in the manner that theyare computed by digital computers of the standard sort.However, the format of the Turing Machine is linguistic and propositional–representations are strings of symbols on the tape– and if the brain were aTM the natural frequency format would have to be translated into the symbolicor propositional one, and computing would be equally hard. Conversely, theprobabilistic format is a propositional representation and if this were the actualformat employed by the machine, probabilistic reasoning in this format wouldbe easy. More generally, for computing to be effective the format in whichthe information is presented should be alike to the actual format in which thecomputations are performed.This proposition is stated in more general terms in the theory of TuringMachines and computability, and is a fundamental concept in computer scienceand the construction of computing machinery: TMs’ representations are subjectto a set of interpretations conventions and to a standard configuration . Theformer are needed to interpret the workings of the machine, starting with themost basic convention that a TM computes a function, and that the input andoutput strings represent the argument and value respectively.The standard configuration in turn specifies the format of the representationsin relation to the finite control including the scanning device and the computing30edium; for instance, if the medium is a tape, a grid or a set RAM registers.The configuration specifies as well how the input and output devices, such asthe keyboard and the monitor, must place and take the information from thecentral processing unit, and also allows that the output string of one computa-tion can be the input to the next. The interpretation conventions and standardconfiguration of table computing described in Section 3 are specified accordingto these criteria.Sustaining that the mind uses algorithms computed by a TM is an a posteri-ori rationalization but not a causal explanation of mental behavior. Mathemati-cal concepts, notations and metric systems are historical and cultural constructs,that appeared much latter than the machinery used by natural computing, andthere is no reason to suppose that the brain and the mind use such constructs, inthe same sense that there is no reason to suppose that rationalizations providedby people of the knowledge involved in deploying an ability, such as riding abicycle, are causal explanations.Traditional approaches to rationality focus on the general mechanisms andthe particular strategies that support rational behavior, and underly the as-sumption that computing is performed through a general computing device suchas the Turing Machine. The present theory suggests to reverse such view andfocus on the computing engine that is causal to rational behavior.Postulating mental algorithms is a very productive metaphor that has con-tribute to make great progress in Cognitive Science, but interpreting it literallycannot be sustained unless the specification of the mode or modes of naturalcomputing is provided.The actual natural computing engine is likely to use a highly distributed for-mat where computing is performed by very simple processing units that computevery simple algorithms. The complexity comes from the coordinated computa-tion of such units. The interpretation of such distributed representations aswhole units of content is exemplified by minimal algorithms. This view is quitesimilar to the initial formulation of neural networks [20] but its implementationin table computing, for instance, can be done with massive arrays that perform31omputations in parallel in a few computing steps, and does not need to bereduced to TMs and computed through costly algorithms, as has been the casewith artificial neural networks to the present day.
5. Levels of Cognition
The cognitive architecture in Figure 3 includes the main modules of cogni-tion but some of its components or even full modules can be removed, renderingsimpler forms of cognition. Here three main levels are distinguished: 1) the fullarchitecture supporting perception and action, schematic behavior, deliberativethought and memory; 2) the architecture resulting from removing the delib-erative thought and the memory modules but preserving the schemes; and 3)the one supporting only basic observations and actions, and rendering reactivebehavior.In a level-2 architecture “the intentions” are reduced to the output of theschemes which drives the action directly. This architecture preserves the per-ceptual and action abilities, and behavior may be refined through training andrehearsing, but lacks the input from deliberative thought and memory, andthere is no genuine decision making. Agents that have such an architecturedo not have knowledge proper, learning is reduced to training, and behavior isschematic and data-driven.This kind of agents may be biased due to both the schemes and the trainingdata, and such prejudices could not be modulated because the lack of knowledge.This is a strong limitation in relation to the fully rational architecture wherethe perceptual and the action abilities are also trained on the basis on empiricaldata, but the knowledge acquired and learned through language allows for theappearance of values, and the development of an affective logic, and biases andprejudices can be modulated, reduced and even eliminated.The level-2 architecture is exemplified by current deep-learning and rein-forcement learning models and their applications [9]; for instance, self-drivenvehicles such as cars and drones, and even chess, shogi and go playing programs3238].The level-3 architecture only supports basic or “innate” observations andmotor actions which are connected directly, and agents endowed of these ca-pabilities have very limited or none training capabilities, and deploy mostlyreactive behavior. Agents with this kind of architecture do not make interpre-tations and hence do not communicate.Agents with architecture level-1 may bypass thought and use perceptualand action abilities directly, which can also be bypassed by reactive behaviors;hence, the level-1 architecture embeds a level-2 and a level-3, and agents with alevel-2 architecture embed a level-3 too. The hierarchical embedding, along thelines proposed by Brooks [23], may be essential for effective interaction with theworld.
6. Principle of Rationality
A computing agent is rational to the degree in which its actions allow it tosurvive and improve its living conditions, for itself and for other agents in itsenvironment, in the short, middle and long term. A precondition of rationalityis that the agent’s intentions reflect its own needs and desires; that the decisionmaking engine conforms to the decision making trade-off; that schematic be-havior is effective but biases are prevented; that the agent behaves according tothe principles of interpretation and action; and that the potential productivityof decisions allows or provides the space for decisions to be enacted. This isreferred to here as
The Principle of Rationality .Perception produces an interpretation hypothesis about the state of theworld; a decision is a hypothesis about the best course of action to achievea state of the world that the agent believes is needed or desired; the intentionunderlies such a hypothesis; an action is based on the hypothesis that its conse-quences will satisfy such intention; and the whole cycle of perception, thoughtand/or schematic behavior and action is hypothetical. As in evolution –wheregenetic accidents produce traits that enhance, inhibit or create new capabilities,33ut only those that provide an advantage are preserved, with the consequentimpact on the individual agent and the species– there is no objective mea-sure or judgement of how rational is behavior but its consequences –benefits orshortcomings– for the agent and its social and physical environment.Irrational behavior occurs when decisions and actions harm or diminish thequality of life of the agent and/or the environment. These behaviors may bedue to impairments in perception and action, or to limitations in thought anddecision making, or to the use of heuristics that result on unfounded actions.Rational behavior should conform to the entropy trade-off and the environ-ment should sustain a moderate level of entropy where the potential productivityof decisions is satisfactory or optimal. If the entropy of the computing engineis too low, in relation to the normal level, the agent will act obsessively or ac-cording to stereotypes, and will not have the flexibility required to attend thechanging demands of the environment; but if the entropy is too high in relationto the normal level, the behavior would not be focused to act productively, aswhen the attention is impaired. In a sense, a healthy mind and genuine decisionmaking would reflect a moderate or optimal level of entropy.The present principle of rationality opposes Newell’s corresponding principle.In his view, thought is a “pure process” performed by symbolic manipulationwhile in present formulation thought and memory are related to perception andaction in a congruent manner, and the agent is placed or grounded in the world.
7. Brain Entropy
The brain is an entropic machine that sustains a large number of states.A brain state consists of a reliable pattern of brain activity that involves theactivation and/or connectivity of multiple large-scale brain networks, and somestates, such as rest, alert and mediation, have been studied [39]. Brain statesmay have a large number of substates whose ongoing fluctuations influencestrongly higher cognitive functions [40] and the brain entropy may be relatedto the number of states that are accessible for brain functioning [41].34he entropy of a brain state can be measured through functional fMRI [42].The region under study is divided into voxels, each having a unique value ofthe blood-oxygen-level-dependent signal (BOLD) at the scanning time, which iscorrelated with the level of activity of the voxel, and maps of the activity of thebrain when people are performing a mental task can be created. The techniqueconsists on registering a sequence or window of BOLD values over time andcomputing the indeterminacy of a voxel, which can be characterized throughthe so-called Sample Entropy or
SampEn . This methodology has been appliedto measure the changes of the brain entropy while a periodic sensorimotor taskwas performed, with the following results [42]: • Brain entropy provides a physiologically and functionally meaningful brainactivity measure. • There was an entropy decrease in the visual and sensorimotor brain regionsassociated to the task in relation to the rest state. • The entropy of the neocortex regions is lower than the entropy of the restof the brain –cerebellum, brain stem, limbic area, etc. • Entropy brain clusters with particular levels of entropy correspond toanatomical or functional areas of the brain.The levels of entropy of the neocortex suggest that the brain conforms to theentropy trade-off. There is a range of entropy values in which interpretation,thought and intentional actions can be effective, as sustained by the free-energyprinciple and the Bayesian brain hypothesis [36]. Other brain structures sus-taining higher entropies constitute standard biological machinery that supportsbut do not make interpretations proper. The cited experiment suggests thatthe resting state is flexible enough to address the changing demands of the envi-ronment, and hence its relatively higher level of entropy, but the entropy lowerswhen intentional task are performed to achieve the focus and specificity requiredfor performing higher cognitive functions.35he relation between intelligence and brain entropy has been investigated,and a correlation between the entropy at the resting-state and intelligence –measured with verbal and performance IQ tests– has been found [41]: the higherthe entropy the higher the IQ. This result is somehow paradoxical because higherIQ is associated to higher leves of indeterminacy, and the experiment is at oddswith the entropy decrease associated to intelligence. However, the result can beplaced in the perspective of the entropy trade-off which suggests that lower IQis associated to rigid or predetermined behavior, a higher IQ requieres some de-gree of indeterminacy, but if the entropy is increased considerably, performancewould be decreased accordingly.Although there is not a well-established and widely accepted notion of brainstate, and the current measure of brain entropy may be a gross approxima-tion, the present considerations, in conjunction with the notions of relational-indeterminate computing and the computing entropy, suggest that the neocortexfunctional regions of the brain are computing entropic engines, and hence theyare not Turing Machines, and the brain as a whole is not a Turing Machine.These considerations also suggest that older cortical and subcortical regionsshould have a very high entropy and may not be considered computing engines.
8. Technical Challenges and Predictions
The cognitive architecture can be simulated with a TM and in such casethe computations are deterministic. If perception, action, schematic behavior,thought and memory, are simulated with standard digital computers, the in-duction of interpretations, the synthesis of actions, and the decision makingprocess are predetermined. There may be some indeterminacy due to the en-tropy of the environment, but if the process is carried on by a TM its entropyis zero. However, if the simulation is made with other modes of computingthat are intrinsically entropic, such as the relational-indeterminate, or perhapsanalogical or quantum computing, or even holography, there may be a level ofindeterminacy of the computing agent in addition to the indeterminacy of the36nvironment, and decision making may be free in such degree.The present theory poses the challenge of defining and implementing com-puting processes and associative memories with minimal algorithms that arecausal to such kind of behavior.The present theory suggests as well a number of hypotheses that may betested empirically, for instance: • Evolutionary psychology and sociology: – The potential productivity of decisions: it may be possible to definenatural control volumes of human and non-human animal environ-ments, measure their entropies, and count the productive behavioralchanges due to communication. If the prediction is sound a τ -profileshould be identified. The value of τ could predict social phenomena,such as the size or degree of organization of a social group. – The entropy should be related to the phylogeny of the brain: theoldest structures should have a large level of entropy and the entropylevel should decrease according to the more varied and flexible behav-ior of younger structures; the neocortex, associated to the executivefunctions, should have the lower levels of entropy in the resting state,as suggested by the results cited above. – The brain entropy of animals in the resting state with more developedneural structures should be lower than the entropy of less developedanimals in analogous brain states. • Neurosciences: – The entropy of the brain functional modules: the entropy of brainstructures associated to perceptual and motor abilities and schematicbehavior should decrease to a lower enough level from the restingstate to achieve the determinism associated to concrete interpreta-tions and actions. The brain entropy associated to thought and mem-37ry should also decrease from the resting state, but it must remainhigh enough to allow for creative thinking and memory. – Long term memory should have a higher entropy level than workingmemory in the resting state. Natural forgetting occurs if the entropyexceeds the operational range of long term memory as the storedconcepts are confused. If the entropy of working memory gets toolow, on the other hand, thinking becomes schematic and behavior ismore predictable. – Mental disorders: disorders of attentional networks and particularconditions [43, 44] should be associated to abnormal levels of entropyin relation to the resting state; for instance, obsessive-compulsivedisorder should be associated to lower than normal levels; Attention-Deficit/Hyperactivity Disorder (ADHS) to higher than normal levels;and depressive and maniac states should be associated to lower andhigher than normal levels respectively. • Cognitive Psychology: – Concrete versus abstract thought: The brain entropy decrease inrelation to the resting state should be greater in concrete problemsolving than in abstract thinking. – Propositional versus Analogical and Diagrammatic Reasoning: Thebrain entropy decrease of symbolic or propositional reasoning, whichis more algorithmic, should be larger than the entropy decrease ofdiagrammatic or analogical reasoning, that may be more natural. – Higher intelligence is associated to a better executive control, hencethe entropy level of the central executive should be correlated withintelligence within its operational entropy range. The correlationbetween higher IQ and higher entropy [41] supports this prediction.However, if the entropy of the central executive exceeds its optimalvalue the IQ should decrease according to the entropy trade-off.38 more intriguing and fundamental conjecture is that the mind evolved fromcommunicating. Entities that do not communicate are merely reactive and havezero entropy, hence do not make interpretations and may not sense or experi-ence the world. Schematic behavior is the paradigmatic form of experiencingthe world somehow unconsciously, that is common in human and non-humananimals with a developed enough nervous system; and deliberative thought isa form of experience that involves decision making and anticipating the world,where communication is a more productive behavior, as characterized by thepotential productivity of decisions. The decrease of the entropy but taking intoaccount the entropy trade-off may be correlated with the level of experience andconsciousness in humans and a great variety of animal species.
9. Acknowledgments
The author thanks Gibr´an Fuentes and Rafael Morales for their help in thedesign and implementation of the experiment in Section 3.1. The author alsothanks the partial support of grant PAPIIT-UNAM IN112819, M´exico.
References [1] C. E. Shannon, A mathematical theory of communication, The Bell Sys-tem Technical Journal 27 (3) (1948) 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x .[2] A. M. Turing, Computing machinery and intelligence, Mind 59 (1950) 433—460.[3] G. S. Boolos, R. C. Jeffrey, Computability and Logic (Third Edition), Cam-bridge University Press, 1989.[4] J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata The-ory, Languages, and Computation (3rd Edition), Addison-Wesley LongmanPublishing Co., Inc., Boston, MA, USA, 2006.395] G. E. Hinton, J. L. McClelland, D. E. Rumelhart, Distributed representa-tions (chapter 3), in: D. E. Rumelhart, J. L. McClelland (Eds.), ParallelDistributed Processing, Explorations in the Microstructure of Cognition,Vol.1: Foundations, The MIT Press, Cambridge, Mass., 1986.[6] L. A. Pineda, The mode of computing, CoRR abs/1903.10559. arXiv:1903.10559 .URL http://arxiv.org/abs/1903.10559 [7] L. A. Pineda, G. Fuentes, R. Morales, An entropic associative memory,Manuscript in preparation.[8] D. Marr, Vision: A Computational Investigation into the Human Repre-sentation and Processing of Visual Information, Henry Holt and Co., Inc.,New York, NY, USA, 1982.[9] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (11) (2015)436–444. doi:10.1038/nature14539 .[10] A. Radford, L. Metz, S. Chintala, Unsupervised representation learningwith deep convolutional generative adversarial networks (2015). arXiv:1511.06434 .[11] L. A. Pineda, N. Hern´andez, A. Rodr´ıguez, G. Fuentes, R. Cruz, Delibera-tive and conceptual inference in service robots, Manuscript in preparation.[12] J. E. Laird, The Soar Cognitive Architecture, MIT Press, Cambridge, MA,2012.[13] J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, Y. Qin,An integrated theory of the mind, Psychological Review 111 (2004) 1036–1060.[14] H. Simon, Models of Man, Social and Rational: Mathematical Essays onRational Human Behavior in a Social Setting., Wiley, New York, 1957.4015] G. Wheeler, Bounded rationality, in: E. N. Zalta (Ed.), The Stanford En-cyclopedia of Philosophy, spring 2020 Edition, Metaphysics Research Lab,Stanford University, 2020.[16] J. von Neumann, O. Morgenstern, A. Rubinstein, Theory of Games andEconomic Behavior (60th Anniversary Commemorative Edition), PrincetonUniversity Press, 1944.URL [17] A. Newell, H. Simon, Computer science as empirical inquiry: Symbols andsearch, Communications of the
ACM
19 (3) (1976) 113–126.[18] H. A. Simon, The Sciences of the Artificial, 3rd Edition, MIT Press, Cam-bridge, MA, 1996.[19] A. Newell, The knowledge level, Artificial Intelligence 18 (1982) 87–127.[20] D. E. Rumelhart, J. L. McClelland, the PDF Research Group, ParallelDistributed Processing, Explorations in the Microstructure of Cognition,Vol.1: Foundations, The MIT Press, Cambridge, Mass., 1986.[21] J. Pearl, Causality, 2nd Edition, Cambridge University Press, Cambridge,UK, 2009. doi:10.1017/CBO9780511803161 .[22] L. E. Sucar, Probabilistic Graphical Models Principles and Applications,1st Edition, Advances in Computer Vision and Pattern Recognition,Springer London, London, 2015.[23] R. Brooks, Intelligence without representation, Artificial Intelligence 47(1991) 139–159.[24] M. L. Anderson, Embodied cognition: A field guide, Artificial Intelligence149 (2003) 91–130.[25] T. Froese, T. Ziemke, Enactive artificial intelligence: Investigating the sys-temic organization of life and mind, Artificial Intelligence 173 (2009) 466–500. 4126] M. Minsky, The Society of Mind, Simon and Schuster, New York, 1986.[27] H. Simon, A behavioral model of rational choice, The Quarterly Journal ofEconomics 60 (1) (1955) 99–118.[28] J. Koehler, The base rate fallacy reconsidered: Descriptive, normative andmethodological challenges, Behavioral and Brain Sciences 19 (1996) 1–53. doi:10.1017/S0140525X00041157 .[29] L. Cosmides, J. Tooby, Are humans good intuitive statisticians after all? re-thinking some conclusions of the literature on judgment under uncertainty,Cognition 58 (1996) 1–73.[30] P. M. Todd, G. Gigerenzer, Ecological rationality: Intelligence in the World,New York: Oxford University Press, 2012.URL http://hdl.handle.net/11858/00-001M-0000-0024-EE01-A [31] A. K. Barbey, S. A. Sloman, Base-rate respect: From ecological rationalityto dual process, Behavioral and Brain Sciences 30 (2007) 241–254. doi:doi:10.1017/S0140525X07001653 .[32] A. Tversky, D. Kahneman, Judgment under uncertainty: Heuristics andbiases, Science 185 (1974) 1124–1131. doi:10.1126/science.185.4157.1124 .[33] W. Casscells, A. Schoenberger, T. B. Graboys, Interpretation by physiciansof clinical laboratory results, N Engl J Med 299 (18) (1978) 999–1001. doi:10.1056/NEJM197811022991808 .[34] M. Bar-Hillel, The base-rate fallacy in probability judgments, Acta Psy-chologica 44 (3) (1980) 211–233. doi:doi.org/10.1016/0001-6918(80)90046-3 .[35] G. Gigerenzer, U. Hoffrage, How to improve bayesian reasoning withoutinstruction: Frequency formats, Psychological Review 102 (1995) 684–704.4236] K. J. Friston, The free-energy principle: a unified brain theory?, NatureReviews Neuroscience 11 (2) (2010) 127–138.URL https://doi.org/10.1038/nrn2787 [37] J. A. Fodor, Z. W. Pylyshyn, Connectionism and cognitive architecture: Acritical analysis, Cognition 28 (1-2) (1988) 3–71.[38] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez,M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan,D. Hassabis, General reinforcement learning algorithm that masters chess,shogi, and go through self-play, Science 362 (2018) 1140–1144. doi:10.1126/science.aar6404 .[39] Y. Y. Tang, M. K. Rothbart, M. Posner, Neural correlates of establishing,maintaining, and switching brain states, Trends in Cognitive Science 16 (6)(2018) 330–337.URL https://doi.org/10.1016/j.tics.2012.05.001 [40] E. Zagha, D. A. McCormick, Neural control of brain state., Curr OpinNeurobiol 29 (2014) 178–186.URL https://doi.org/10.1016/j.conb.2014.09.010 [41] G. N. Saxe, D. Calderone, L. J. Morales, Brain entropy and human intelli-gence: A resting-state fmri study, PLoS ONE 13 (2) (2018) e0191582.URL https://doi.org/10.1371/journal.pone.0191582 [42] Z. Wang, Y. Li, A. R. Childress, J. A. Detre, Brain entropy mapping usingfmri, PLoS ONE 9 (3).URL https://doi.org/10.1371/journal.pone.0089948https://doi.org/10.1371/journal.pone.0089948