AAttracting Sets in Perceptual Networks
Robert [email protected]
Summary
This document gives a specification for the model used in [1]. It presents a simple way ofoptimizing mutual information between some input and the attractors of a (noisy) network,using a genetic algorithm. The nodes of this network are modeled as simplified versionsof the structures described in the “interface theory of perception” [2]. Accordingly, thesystem is referred to as “perceptual network”.The present paper is an edited version of technical parts of [1] and serves as accompanyingtext for the Python implementation
PerceptualNetworks , freely available under [3].1. Prentner, R., and Fields, C.. Using AI methods to Evaluate a Minimal Model forPerception.
OpenPhilosophy , 2, 503-524.2. Hoffman, D. D., Prakash, C., and Singh, M.. The Interface Theory of Perception.
Psychonomic Bulletin and Review , 22, 1480-1506.3. Prentner, R..
PerceptualNetworks. https://github.com/RobertPrentner/PerceptualNetworks .(accessed September 17 2020) 1 a r X i v : . [ q - b i o . N C ] S e p Introduction
The model which we propose in this paper, perceptual networks , combines many features ofwell-known modeling paradigms but gives them a more psychological interpretation. Moreconcretely, perceptual networks are defined as networks of individual and excitable agentsthat (i) have experiences corresponding to their states, (ii) are located on a graph whichencodes their interaction-topology, and (iii) whose individual actions affect the performanceon the network level. Such agents perceive certain messages and send messages, based ontheir experiences (states), to other agents in their environment.Similar to artificial neural networks, perceptual networks can learn from experience byeither changing their updating rule or by adjusting their connectivity within the network.These two processes correspond to a “perceptual” learning process (i.e. how to representincoming information) and an “action-related” learning process (i.e. where to send infor-mation to) respectively. The analysis of information processing capacities embodied by a perceptual network is afforded by its basic graph-structure.
Perceptual networks share thisfeature with random Boolean networks; and the idea that processes on the individual levelaffect the global functioning of the network is consistent with the main thrust of agentbased models and cellular automata.We will demonstrate in the following sections how perceptual networks could addresstwo different but related problems. First is the problem of how such a model gives riseto stable structures. Second is the problem of how these structures enable “perception”,if perception is understood broadly in terms of a representation of some input, whichresults from the application of (internal) rules embodied in the network under conditionsof uncertainty (modeled as noise).
Perceptual networks are a simplification of a framework previously proposed to studyconsciousness [3]. Given an external measurable space (a “world”), a “conscious agent”in this framework is a six-tuple consisting of two measurable spaces (“perception space”and “action space”), three kernels that connect the world and these spaces (“perception”,“decision” and “action”) and a time-counter which counts the number of kernel execu-tions. This definition was refined and a network consisting entirely of “reduced consciousagents”, a concept introduced to distinguish “internal” aspects of a conscious agent fromits “extrinsic” aspect, was introduced [4]. There it was shown that such a network could2e applied to problems in psychology and formally reproduce the architectures of manyreceived models in cognitive science.
Perceptual networks are simplifications of this more general formalism and feature nodes(“agents”) that integrate information (“messages”) and relay this information to neighbor-ing nodes in the network. The mapping of sensory information could be represented bya simple rule R j which connects the state m j of a message each agent receives from itsneighbors to its future state x j . The state of m j is determined by the topology of thenetwork A : x x ... x n A −→ m m ... m n { R j } −→ x (cid:48) x (cid:48) ... x (cid:48) n (1)For simplicity, we look at deterministic systems, but the model should eventually beextended to model probabilistic inferences. More precisely, the updating process wouldthen correspond to a homogeneous discrete-time Markov chain, which would feature atleast one stationary distribution (under the assumption that it is irreducible and non-periodic), cf. Chapter 3 in [5]. The construction allows for learning; learning could affectthe rules describing information integration, { R j } , as well as network topology A . The rules define a circular updating scheme from network states to network states. Weillustrate the state evolution by a toy network comprising N = 4 nodes, as shown in Table1. The continuous and dotted lines represent two different topologies A and A (cid:48) (left andmiddle column), giving rise to two different state evolutions, each starting in the sameinitial state (1 , , ,
0) (right column). Since the network is finite and closed, the stateevolution will either land in a stationary state or an attracting set. In the present examplean attracting set will be reached in each case. We have used a simple updating rule whichtakes the form: m j = ( A (cid:126)x ) j x (cid:48) j = m j ≥
10 else , (2)3here A is the adjacency matrix of A . Note that in the actual implementation Perceptu-alNetworks we use a more refined threshold condition than in the present example (cf. Eq(10) in 2.1).Table 1: Example of a perceptual network with two different topologies (continuous anddotted lines respectively). Initially, nodes 1 and 3 are excited, the rules (given in text)and the adjacencies of the network (middle column) determine the state evolution (rightcolumn). Network Adjacencies State Evolutions (cid:52) (cid:52) . . . (cid:103) (cid:103) . . .
Which attracting set a state is part of (or whether it is a transient state) is determinedby the network topology and the rules governing its evolution. In our toy model, the sameinitial state (1 , , ,
0) reaches two different attracting sets (indicated by the colored arrowsin Table 1). In the blue case, the initial state is part of an attracting set of size 2, whereasin the red case it is transient. In the blue case, the network realizes a flip-flop circuitwhich permanently oscillates between the states (1 , , ,
0) and (0 , , ,
1) (Table. 1, rightcolumn, left-hand -side). In the red case, the network falls into an attracting set whichcycles through its states with period 3 (Table. 1, right column, right-hand -side). In bothcases, an initial state of (0 , , ,
0) would be stationary and could be interpreted as ground-or terminal state of the network.The stationary states and attracting sets represent (“kinetically”) stable entities whichare associated with the network. This gives an explicit answer to the question how a(transitory) ontology on the individual level could give rise to stable entities on the level4f networks. We will now show how such states could also be conceived of as encodingperceptual inferences.
Mathematically, the process of perception is often modeled as abductive inference usingBayesian probability theory. The perceptual system chooses from a variety of “interpreta-tions” x , each consistent with an input signal i . A posterior probability can be assigned toeach interpretation, given the input. Using Bayes’ rule, it can be recovered from multiply-ing a likelihood function that assigns a probability to each input given a particular perceptwith the prior probability of that percept: p ( x | i ) = p ( i | x ) · p ( x ) p ( i ) = α · p ( i | x ) p ( x ) , (3)where α = p ( i ) = (cid:80) x p ( i | x ) p ( x ) is a normalization constant, which can be neglected whencomparing different posterior probabilities given the same input. The way how this schemeis (approximately) realized in a living organism can be quite complicated.In vision science, it is often assumed that the space of percepts is homomorphic to thespace of states in the world, and thus the likelihood function could be given the interpreta-tion of “a mapping” from world states to input states, e.g., the optical projection from a 3Dworld onto the retina of the eye. Such an approach to visual perception is sometimes calledan “inverse-optics” approach, since the task for the visual system is to undo the effectsof optical projection [6]. Given such an interpretation and knowing likelihood and priorprobability, it is possible to calculate posterior probabilities using the Bayesian schemeabove. To determine which percept x is selected based on the posterior distribution, oneusually introduces a loss function that describes the error in the process of choosing aninterpretation. The most straightforward and principled loss function is given by a delta-function, centered around the maximum of the posterior distribution (the so-called “MAP”- estimate). Other, more involved choices are possible [7]. In an evolutionary setting, theassumption that world states and scene interpretations generically are homomorphic hasshown to be too strong an assumption [2, 8, 9], and we thus refrain from it.For a perceptual network we assume that the “sensory input” i corresponds to the initialstate of a certain (sub-)set of nodes and “perceptual interpretations” x correspond to thestable states of the network. The set of rules { R j } encodes a “perceptual strategy”, and5nowing the rules and the networks topology, we could compute posterior probabilitiesdirectly based on priors defined on inputs to maximize a quantity that represents a “fitnesspayoff” with respect to the problem at hand. We thus introduce a “payoff-function” definedon the space of inputs and perceptions: F : I × X → [0 , , ( i, x ) (cid:55)→ r ∈ [0 , . (4)What the network sees is not merely “given” by its input but by the the way it is internallyprocessed, reflected by the utility of its strategy. On this account, the evolution of per-ceptions does not serve as ladder to the truth but as way to find strategies that maximizefitness payoffs.For convenience, one often chooses the interval [0 ,
1] for payoff values, but in generalany finite interval defined on the positive Reals will suffice. With this definition of apayoff-function we could calculate an “expected payoff”: (cid:104) F (cid:105) = (cid:88) i,x F ( i, x ) p A ( i, x | R j ) (5)We want to infer the “average best” perceptual system implemented by a network. (Forsimplicity, we chose to keep the topology of the graph fixed which is indicated by p A .)Since the posterior probability p A ( x | i ) is determined by the set of rules { R j } , we need toadjust the rules in order to maximize the expected payoff which is related to the state ofour network: { R j } opt = argmax ( (cid:104) F (cid:105) ) = argmax (cid:88) i,x, F ( i, x ) p A ( i, x |{ R j } ) (6)From now on we will drop the conditional dependence on { R j } for simplicity when writingfitness payoffs. In our model, and in absence of any further assumption, we assume a particular form ofthe payoff function which equals the logarithm of the probability p A ( i, x ) divided by the6robability of the marginal probabilities p ( i ) p A ( x ). This results in a mutual information: (cid:104) F A (cid:105) = I( I ; X ) A = (cid:88) i,x p A ( i, x ) log (cid:18) p A ( i, x ) p ( i ) p A ( x ) (cid:19) (7)= (cid:88) i p ( i ) · D KL ( p A ( x | i ) || p A ( x )) = (cid:104) D KL ( p A ( x | i ) || p A ( x )) (cid:105) i . (8)In other words, the payoff-function is set equal to the difference in self-information be-tween a posterior probability p A ( x | i ) and a marginal probability p A ( x ). More intuitively, (cid:104) F A (cid:105) could be thought of as quantifying the information that is present in the asymptoticstates of the perceptual network about the initial state. The maximum value of the mutualinformation is bounded by the entropy H ( I ) of the input:max(I( I ; X )) ≤ max( H ( I )) . (9)This bound is obtained whenever the perceptual rule leads to a one-to-one mapping frominputs to perceptions. In any realistic setting (e.g. where the space of inputs largelyexceeds the state of percepts) this will likely not be the case, and we have to relate inputand perception probabilistically and optimize Eq. (6) instead.Already a noisy input would force us to regard this as a stochastic problem whichwill generally lower the maximal value of mutual information achievable in the network.To each sensation there exist several perceptual representations which all have a certainposterior probabilities, which are specified by the network’s evolution under noise (simplyspeaking, “noise” is taken to lead to a perceptual “misrepresentation” of the input).On this account the “goal” of perception is being able to distinguish as best as possiblebetween different inputs. Perception, thus understood, amounts to the ability to recognizedifferences in the world (or more generally: perception is the ability to recognize “differencesthat make a difference” [10] for fitness). Finding a set of rules defined for individual nodeswhich maximizes the mutual information between perception and input of the network isthe “perceptual problem” that needs to be solved by our model. For this, we can borrowtechniques to search through the rule-set using algorithms from machine learning. Onepossibility is to use evolutionary programming techniques such as genetic algorithms [11]as discussed below. The formulation of the problem makes it in principle amenable to optimization techniques involvingcalculating gradients of a “free-energy” functional (e.g. [12]). However, in the more general setup where A preliminary experiment
We tested the performance of a perceptual network with N = 16 nodes for 3-bit in-puts distributed on the first 3 nodes. The system was initialized in the state X =( i [1] , i [2] , i [3] , , ,
0) and was evolved according to a set of rules { R j } on a fixed topol-ogy A . The attracting sets were recorded for each input. This allowed us to construct aconditional probability p ( x | i ) defined on the space of inputs, which later informed mutualinformation (i.e. fitness payoffs).For each input, we introduced uncertainty by including a probability ( p = 0 .
05) toflip an input bit during the initialization step. We then used a genetic algorithm (GA) todetermine the best set of rules in terms of fitness. Each rule is specified by the integervalues µ and µ (see Eq. (10) below), which thus defined the “genes” of a perceptualnetwork . We evolved the network for 50 generations, each comprising 25 individuals. Thebest 20% were kept ( λ = 0 . p = 0 .
01) ofrandomly shifting the values of µ and µ by ±
1. To make our model biologically moreplausible, we assumed a refractory period of 1, which means that any node after it hasbeen excited cannot be excited immediately in the next round.We chose a rule that assigns the next state of a node solely dependent on the numberof messages it receives, independent of source or any other statistical property. Moreconcretely, we assume a rule of the following type: x (cid:48) j = m j ∈ [ µ , µ ] and x j = 00 else , (10)with 1 ≤ µ ≤ deg( j ) and µ ≤ µ ≤ deg( j ) where the messages m j are computed as m j = ( A (cid:126)x ) j (cf. the simple example in section 1.2 ). Any node thus receives a messagewhich is the sum of all states of the adjacent nodes. In total there are · deg( j ) [deg( j ) + 1]possible rules per node with degree j . If we consider a network of size N , each node arbitrary (non-differentiable and non-continuous) payoff functions are used, such methods are no longerstraightforwardly applicable. N R = N (cid:89) j =1 deg( j )(deg( j ) + 1)2 ≈ (cid:18) (cid:104) deg (cid:105) (cid:19) N . (11)(Assuming that each node has approximately the same average degree (cid:104) deg (cid:105) .) So, eventhough the rule expressed in Eq. (10) is quite simple, the rule space scales exponentiallywith the size of the network.In this investigation, we optimized the set of rules { R j } for a fixed network topologywhich either resembled (i) a 4-lattice, (ii) a scale free (SF) network constructed using theBarab´asi-Albert method [13] , or (iii) a complete graph where any node is connected to anyother other node. We also tested our results against randomly generated (Erd¨os-Renyi)networks with sparse ( (cid:104) deg (cid:105) ≈ N ) and dense ( (cid:104) deg (cid:105) ≥ N ) degree distributions.We compared the evolved rule-set to a variant of a majority rule where each nodetransitions from 0 to 1 if and only iff at least half of its neighbors are in state 1, else itgoes to 0. The majority rule has previously been found to be very efficient in solving the“density classification task” for a cellular automaton when defined on small-world graphs[15]. The density classification task is highly global but involves only a single rule atthe local level. It could thus be regarded as benchmark test for local and parallelizedcomputational architectures.Note that one could rely on existing Python libraries for using genetic algorithms (e.g.[16]) or use other optimization methods than evolutionary ones, e.g. graph neural networks[17]. We decided to implement a simple genetic algorithm by hand instead; our goal wassketching the conceptual ideas of perceptual networks . Results for some exemplary networks are given in Table 2. The rules which evolved after50 generations of the GA, favor interface strategies [2] , which generically lead to no struc-tural similarity between input states and asymptotic states of the network. The perceptualstates of the network do not mirror any structure in the input other than a probabilisticrelationship given by the posterior p ( x | i ) which informed fitness payoff (mutual informa-tion).By contrast, the majority rule mirrors the structure of the messages. More precisely, themajority rule says that the state of a node is a homomorphic representation of the messages9t receives: the more content in the message, the higher the probability to undergo statetransition (in our implementation either yes/no). While the majority rule led to satisfyingsolutions for particular topologies, it is generically not able to compete with a collection ofinterface rules { R j } on a random topology.Different behaviors were observed for different topologies: SF networks lead to a quickemergence of “fit” strategies on a network with only sparse degree distribution ( (cid:104) deg (cid:105) < (cid:104) deg (cid:105) = √ N ). Good and fast converging results have beenobtained for complete graphs due to the intrinsic refractionory period of the rule. Sinceonly a few combinations of rules lead to good results, convergence time has been small.However, we expect the estimated computational cost in realistic scenarios to be muchhigher for such topologies (due to the average degree of (cid:104) deg (cid:105) ≈ N ). SF networks thuspromise to offer a good compromise between cost and fitness.Similar results have been obtained for randomized graphs, where networks with sparsedegree distribution behaved similar to the SF networks and networks with dense degreedistributions behaved similar to complete networks. This indicates that the decisive factoris average degree distribution. But effects for different placements of the initial nodeshave been observed in the SF case (roughly: networks where input nodes lie in differentcommunities of the graph, behave optimally)The results have not been assessed rigorously in terms of statistics and should onlyconvey a preliminary sense of what is possible with this method of inquiry. In the model outlined in the previous section, we have modeled perceptual inference interms of strategies that compute the mutual information between input and asymptoticstates of a network. In general, we found that the use of genetic algorithms leads toquick convergence on interface strategies, i.e. strategies that do not follow any fixed (non-evolvable) rule which, generically, do not mirror the structure of the input.While we have optimized rules of the individual nodes, the topology of the network10able 2: Results for some randomly initialized networks, defined on a fixed topology vi-sualized in the left column (input states are highlighted in red). The perceptual rules forthe fittest strategy after the genetic algorithm are displayed in the middle column by thethresholds µ and µ for each node. Averaged values for mutual information (fitness) forthe initial and final population after the GA and results for the majority are given on rightcolumn (the maximally achievable value is 2.14.)Network, N = 16 Thresholds Mutual information (std. dev)4-lattice, (cid:104) deg (cid:105) = 4 µ = , µ = I( I ; X ) I( I ; X ) I( I ; X ) maj (cid:104) deg (cid:105) = 1 . µ = , µ = I( I ; X ) I( I ; X ) I( I ; X ) maj N = 16 Thresholds Mutual information (std. dev)Complete graph, (cid:104) deg (cid:105) = 15 µ = , µ = I( I ; X ) I( I ; X ) I( I ; X ) maj References [1] Prentner, R., and Fields, C.. Using AI methods to Evaluate a Minimal Model forPerception.
OpenPhilosophy , 2, 503-524.[2] Hoffman, D. D., Prakash, C., and Singh, M.. The Interface Theory of Perception,
Psychonomic Bulletin and Review , 22, 1480-1506.[3] Hoffman, D. D., and Prakash, C.. Objects of consciousness.
Frontiers in Psychology , 5:77.[4] Fields, C., Hoffman, D. D., Prakash, C., and Singh, M.. “Conscious agent networks:Formal analysis and application to cognition.
Cognitive Systems Research , 47,186213.[5] Gebali, F..
Analysis of Computer Networks , second edition, Cham: Springer, .[6] Palmer, S. E..
Vision Science: Photons to Phenomenology . Cambridge MA: MIT Press, .[7] Mamassian, P., Landy, M., and Maloney, L. T.. “Bayesian Modelling of Visual Percep-tion.” In
Probabilistic Models of the Brain: Perception and Neural Function , edited byR.P.N. Rao et al., 1336. Cambridge MA: MIT Press, .[8] Mark, J. T., Marion, B. B., and Hoffman, D. D.. Natural selection and veridical per-ceptions.
Journal of Theoretical Biology , 266(4), 504515.149] Prakash, C., Fields, C., Hoffman, D. D., Prentner, R., and Singh, M.. Fact, Fiction,and Fitness,
Entropy , 22(5), 514-23.[10] Bateson, G..
Mind and Nature: A Necessary Unity . New York: Duton, .[11] Holland, J. H..
Adaptation in Natural and Artificial Systems. An Introductory Analysiswith Applications to Biology, Control, and Artificial Intelligence.
Cambridge MA: MITPress, .[12] Friston K.. The free-energy principle: a unified brain theory?
Nature Reviews Neuro-science , 11(2), 127138.[13] Albert, R., and Barab´asi, A.-L.. Statistical mechanics of complex networks.
Reviewsof Modern Physics , 74(1), 47-97.[14] Das, R., Mitchell, M., and Crutchfield, J.. A genetic algorithm discovers particle-basedcomputation in cellular automata. In
Parallel Problem Solving from Nature , edited nyDavido, Y. et al., 344353 Berlin: Springer, Berlin, .[15] Watts, D. J., and Strogatz, S. H.. Collective dynamics of “small-world” networks.
Nature , 393(6684), 440-442.[16] Fortin, F.-A., De Rainville, F.-M.,Gardner, M.-A. Parizeau, M., and Gagn´e, C..DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research , 13, 2171–2175. https://github.com/DEAP/deap (accessed: September 17 2020)[17] Scarselli, F., Gori, M., Tsoi, A. C. , Hagenbuchner, M., and Monfardini, G.. TheGraph Neural Network Model.
IEEE Transactions on Neural Networks2009