[PDF] An information-theoretic approach to self-organisation: Emergence of complex interdependencies in coupled dynamical systems

Abstract

Self-organisation lies at the core of fundamental but still unresolved scientific questions, and holds the promise of de-centralised paradigms crucial for future technological developments. While self-organising processes have been traditionally explained by the tendency of dynamical systems to evolve towards specific configurations, or attractors, we see self-organisation as a consequence of the interdependencies that those attractors induce. Building on this intuition, in this work we develop a theoretical framework for understanding and quantifying self-organisation based on coupled dynamical systems and multivariate information theory. We propose a metric of global structural strength that identifies when self-organisation appears, and a multi-layered decomposition that explains the emergent structure in terms of redundant and synergistic interdependencies. We illustrate our framework on elementary cellular automata, showing how it can detect and characterise the emergence of complex structures.

Full PDF

AArticle

An information-theoretic approach toself-organisation: Emergence of complexinterdependencies in coupled dynamical systems

Fernando Rosas *, Pedro A.M. Mediano , Martin Ugarte and Henrik J. Jensen Centre of Complexity Science and Department of Mathematics, Imperial College London, UK Department of Electrical and Electronic Engineering, Imperial College London, UK Department of Computing, Imperial College London, UK CoDE Department, Université Libre de Bruxelles, Belgium Institute of Innovative Research, Tokyo Institute of Technology, Japan * Correspondence: [email protected]; Tel.: +44 (0)20 7589 5111

Abstract:

Self-organisation lies at the core of fundamental but still unresolved scientiﬁc questions, andholds the promise of de-centralised paradigms crucial for future technological developments. Whileself-organising processes have been traditionally explained by the tendency of dynamical systemsto evolve towards speciﬁc conﬁgurations, or attractors, we see self-organisation as a consequenceof the interdependencies that those attractors induce. Building on this intuition, in this work wedevelop a theoretical framework for understanding and quantifying self-organisation based oncoupled dynamical systems and multivariate information theory. We propose a metric of globalstructural strength that identiﬁes when self-organisation appears, and a multi-layered decompositionthat explains the emergent structure in terms of redundant and synergistic interdependencies. Weillustrate our framework on elementary cellular automata, showing how it can detect and characterisethe emergence of complex structures.

Keywords:

Self-organisation; multivariate information theory; coupled dynamical systems; partialinformation decomposition; high-order correlations; multi-layer complexity

1. Introduction

It is fascinating how some systems acquire organisation spontaneously, evolving from lessto more organised conﬁgurations in the absence of centralised control or an external driver. In aworld constricted by the second law of thermodynamics and driven by “no free lunch” principles,self-organisation phenomena dazzle us by creating structure seemingly out of nowhere. Besides thisaesthetic dimension, self-organisation plays a key role at the core of out-of-equilibrium statisticalphysics [1], developmental biology [2], and neuroscience [3]. Additionally, self-organisation serves asinspiration for new paradigms of de-centralised organisation where order is established spontaneouslywithout relying on an all-knowning architect or a predeﬁned plan, such as with the Internet ofThings [4,5] and blockchain technologies [6]. In this context, self-organisation is regarded as anattractive principle for enabling robustness, adaptability and scalability into the design and managmentof large-scale complex networks [7–9].Originally, the notion of self-organisation was introduced in the ﬁeld of cybernetics [10,11].These seminal ideas quickly propagated to almost all branches of science, including physics [1,12],biology [2,13], computer science [14,15], language analysis [16,17], network management [18,19],behavioral analysis [20,21] and neuroscience [22,23]. Despite this success, most working deﬁnitionsof self-organisation still avoid formal deﬁnitions and rely on intuitions following an “I know when a r X i v : . [ n li n . AO ] A p r of 25 I see it” logic, which might eventually prevent further systematic developments [24]. Formulatingformal deﬁnitions of self-organisation is challenging, partly because self-organisation has been used indiverse contexts and with different purposes [25], and partly due to the fact that the basic notions of“self” and “organisation” are already problematic themselves [26].The absence of an agreed formal deﬁnition, combined with the relevance of this notion forscientiﬁc and technological advances, generates a need for further explorations about the principles ofself-organisation.

In the spirit of Reference [27], we explore to what extent an information-theoretic perspectivecan illuminate the inner workings of self-organising processes. Due to the connections betweeninformation theory and thermodynamics [28,29], our approach can be seen as an extension of previousworks that relate self-organisation and statistical physics (see e.g. [30–32]). In previous research,self-organisation has been associated with a reduction in the system’s entropy [30,33,34] – in contrast,we argue that entropy reduction alone is not a robust predictor of self-organisation, and additionalmetrics are required.This work establishes a way of understanding self-organising processes that is consistent with theBayesian interpretation of information theory, as described in Reference [28]. One contribution of ourapproach is to characterise self-organising processes using multivariate information-theoretic tools –or, put differently, to provide a more ﬁne-grained description of the underlying phenomena behindentropy reduction. We propose that self-organising processes are driven by spontaneous creationof interdependencies, while the reduction of entropy is a mere side effect of this. Following thisrationale, we propose the binding information [35] as a metric of the strength of the interdependenciesin out-of-equilibrium dynamical systems.Another contribution of our framework is to propose a multi-layered metric of organisation,which combines quantitative and qualitative aspects. Most proposed metrics of organisation inthe ﬁeld of complex systems try to map the whole richness of possible structures into a singledimension [36]. In contrast, drawing inspiration from theoretical neuroscience [37,38], we putforward a multi-dimensional framework that allows for a ﬁner and more subtle taxonomy ofself-organising systems. Our framework builds on ideas based on the

Partial Information Decomposition (PID) framework [39], which distinguishes various information sharing modes in which thebinding information is distributed accross the system. This fundamental distinction overcomescounterintuitive issues of existent multiscale metrics for structural complexity, such as the one reportedin References [40,41], including negative information values that do not have operational meaning.A ﬁnal contribution of this work is to establish a novel connection between information theory anddynamical systems. The standard bridge between these two disciplines includes symbolic dynamics,the Kolmogorov-Sinai entropy, Rényi dimensions and related concepts [42]. In contrast, in thispaper we propose to apply information-theoretic analyses over the statistics induced by invariantmeasures over the attractors. In this way, attractors can be seen as statistical structures that generateinterdependencies between the system’s coordinates. This statistical perspective enriches standardanalyses of atractors based on fractal dimensions and other geometrical concepts.The rest of this paper is structured as follows. First, Section 2 brieﬂy introduces the key ideas ofthis work. Then, Section 3 discusses fundamental aspects of the deﬁnition of self-organisation andcoupled dynamical systems. Section 4 presents the core ideas our information-theoretic approach,which are then developed quantitavely in Section 5. Our framework is illustrated in Section 6 with anapplication to elementary cellular automata. Finally, Section 7 discusses our ﬁndings and summarisesour main conclusions. of 25

2. Key Intuitions

This section introduces the key ideas of our framework in an intuitive fashion. These ideas aremade rigorous in the following sections.

The overall conﬁguration of a group of agents can be represented by a probability distributionover the set of their possible conﬁgurations. Independent agents who are maximally random arecharacterised by “ﬂat” distributions (technically, distributions that satisfy the maximum entropyprinciple [28]). The temporal evolution of the system then “shapes” this distribution, in the sameway as a sculptor shapes a ﬂat piece of marble into a sculpture (c.f. Figure 1). In our view, the shape ofthe resulting distribution encodes the key properties that emerge from the temporal dynamics of thesystem, and a substantial part of our framework is to provide tools to measure and describe varioustypes of sculptures. Importantly, just as the sculptor reveals a ﬁgure by removing the superﬂuousmarble that is covering it, the temporal evolution generates interdependencies not by adding anythingbut by reducing the randomness/entropy of the system.

Consider two agents with random and uncorrelated initial states, as in the analogy above. Theirjoint entropy, which quantiﬁes their collective randomness, can be depicted as two circles, the size ofeach circle being proportional to how random the corresponding agent is (c.f. Figure 1). The circles areshown disjoint to reﬂect the fact that the agents are initially uncorrelated. From this initial situation,there are two qualitatively different ways in which their joint entropy can decrease: the state of eachagent could become less random in time, while their independency is preserved; or the agents couldbecome correlated while their individual randomness is preserved. Although both cases show overallentropy reduction, one needs to distinguish ﬁner features of the shape of the resulting distributionto discriminate between genuine self-organisation in the latter scenario and mere dissipation in theformer. t t H ( X ) H ( Y ) I ( X ; Y ) H ( X | Y ) H ( Y | X ) H ( X ) H ( Y ) t t Figure 1.

Left:

Two maps correponding to two dynamical systems, denoted by Φ and Φ (cid:48) , seen as asculptor who takes away “superﬂuous” entropy/marble to let structures appear from inside. The ﬁgureshows The Atlas and

The Bearded Slave (circa 1525–30) by Michelangelo Buonarroti, who was famous forletting his ﬁgures emerge from the marble “as though surfacing from a pool of water” [43] (picturestaken from commons.wikimedia.org).

Right:

Likewise, the joint entropy of two (or more) agents coulddecrease either because they become less random individually ( Φ (cid:48) ), or because they become correlated( Φ ). In this article we provide tools to measure how self-organising systems shape distributions asentropy is reduced – or marbled is carved out – from the initial state. of 25

3. The Goal and Constraints of Self-Organisation

Which dynamical properties enable agents to self-organise? Beyond superﬁcial differences, moststudies agree that proper self-organisation requires three fundamental principles to hold:(i)

Global structure: the system evolves from less to more structured collective conﬁgurations.(ii)

Autonomy: agents evolve in the absence of external guidance.(iii)

Horizontality: no single agent can determine the evolution of a large number of other agents.The principles of autonomy and horizontality constitute constraints, in the sense that a systemthat is not autonomous or horizontal cannot be called self -organising. Conversely, the principle ofglobal structure is closer to a goal to be achieved. Hence, one could reformulate the above deﬁnition ofself-organisation as the following optimisation problem: generate

Global structure subject to

Autonomy and horizontality. (1)The following subsections provide a formalisation of these three fundamental principles.

An elegant way to formalise these ideas is provided by the literature of coupled dynamicalsystems. Loosely speaking, a dynamical system is a process that evolves in time, such that its presentconﬁguration determines its future evolution following a deterministic rule [44]. Differential equationsand ﬁnite difference equations are examples of dynamical systems. Furthermore, a collection ofdynamical systems are said to be coupled if the future state of each process is affected not only by itsown state but also by the state of other processes.Let us consider a system composed by N parts or subsystems, which we call “agents” adoptingthe terminology from the robotics and multi-agent systems literature. However, these agents couldcorrespond to different coordinates of the spatial movement of a single entity [45], or to sub-systemsof heterogenous nature. The set of possible states for the k -th agent is denoted as Ω k , and hence theset of possible conﬁgurations of the system is Ω : = ∏ Nk = Ω k , henceforth called “phase space.” Theconﬁguration of the system at time t ∈ T ⊂ [ ∞ ) is determined by the vector x t = ( x t , . . . , x Nt ) ∈ Ω ,where x kt ∈ Ω k is the corresponding state of the k -th agent and T is a collection of time indices. Byassuming that the agents constitute coupled dynamical systems, the evolution of the group of agentsis determined by a collection of maps { φ ( h ) t } with h ≥

0, where φ ( h ) t : Ω → Ω drives the evolution ofthe system such that x h + t = φ ( h ) t ( x h ) . Intuitively, h corresponds to an initial time and t is the length ofthe evolution process.Please note that the choice of deterministic coupled dynamical systems as the basis of ourframework has been made for simplicity of presentation. The generalisation of our ideas and methodsto stochastic dynamics is straightforward. We now discuss aspects of the formalisation of (1) based on the language of dynamical systems.3.2.1. AutonomyIntuitively, we say that a system is autonomous if it has no architect or “mastermind” controllingits evolution from the outside. Using the dynamical systems language introduced above, we canreadily deﬁne necessary conditions for the autonomy of a system: we say that a system is autonomousif the collection of maps { φ ( h ) t } are time-invariant – i.e. if its temporal evolution looks the sameindependently of the initial time h . Technically, autonomy requires that φ ( h ) t ( x h ) = φ ( h ) t ( x h ) forany h , h ∈ T and x h = x h . This symmetry ensures that there is no organising inﬂuence guiding of 25 the system from outside. In the rest of this manuscript time translation symmetry is assumed, whichallows us to disregard the starting time and drop the superscript ( h ) , using φ t as a shorthand notation. Locality: agents can only interact with a small number of other agents.Locality is a sufﬁcient condition for horizontality, since if no agent can interact with many otheragents then the direct inﬂuence of each agent is limited. Conveniently, locality can be elegantlyaddressed within the framework of coupled dynamical systems. To do this, let us ﬁrst introduce thenotation φ kt for the k -th coordinate of the map φ t , i.e. φ t ( x ) = ( φ t ( x ) , . . . , φ Nt ( x )) . Then, one can deﬁnethe interaction network between agents as follows: there exists a link from agent i to agent j if φ jt ( x ) isaffected by changes in the values of x i , the i -th coordinate of x .These directed networks can be encoded by an N × N adjacency matrix A = [ a ij ] , where a ij = i -th agent is connected with the j -th agent and zero otherwise. Locality is, hence, equivalent to A having sparse rows, imposing a ﬁxed bound restricting the number of non-zero entries in each row.In the following we assume locality, and leave the formalisation of horizontality for future work.3.2.3. StructureOne of the biggest challenges in the formalisation of self-organisation is to address the notion of structure . A large portion of the literature employs this concept without developing a formal deﬁnitionof it, relying only on intuitive understanding. Furthermore, authors from different ﬁelds point towardsthis same intuition using related but different concepts, including global behaviour, organisation,coordination, or pattern.Existing approaches to attempt a formalisation of the notion of structure use either attractors,or minimal description length and Kolmogorov complexity. These approaches, and their drawbacks,are discussed in Appendix A. Our own approach, which relies in multivariate information theory, ispresented in the next Section.

4. Structure as Multi-Layered Statistical Interdependency

This section introduces our framework to study emergence of structure in coupled dynamicalsystems. The key idea in our approach is to understand structure as statistical interdependency and,hence, to regard patterns as deviations from statistical independence, i.e. as interdependent randomvariables. As argued below, these statistical interdependencies are best described using tools frommultivarate information theory.Adopting an information-theoretic perspective requires a step of abstraction, namely to place theanalysis not in trajectories but in ensembles, as explained in Section 4.1. Then, Section 4.2 exploresthe relationship between the dynamics of the joint Shannon entropy and the increase of statisticalinterdependency. This discussion is further developed by introducing a decomposition of the Shannonentropy in Sections 4.3 and 4.4. Additionally, autonomous systems allow simple descriptions. Thanks to the property φ t ( φ t ( x )) = φ t + t ( x ) , autonomousevolutions in discrete time are characterised by the single maping φ : = φ by noting that φ n = ( φ ) n , while autonomousevolutions in continuous time can be characterised by a vector ﬁeld or a set of time-invariant differential equations. of 25 For simplicity of exposition, in the rest of the paper we will focus in the case of discrete phasespace Ω . However, most of our results still hold for continuous dissipative systems. Traditionally, the study of dynamical systems is fundamentally built on how individual trajectoriesexplore the space of possible system conﬁgurations. However, the information-theoretic perspectiveworks not over trajectories but over ensembles (i.e. probability distributions). As a matter of fact,associating entropy values to trajectories is usually problematic, as it involves a number of ad-hoc (andoften unacknowledged) assumptions. We make our assumptions explicit, and develop our analysison an ensemble of systems initialised with stochastic initial conditions. The technicalities behind thisapproach are developed in the sequel.Let us consider the case where the initial condition of the system is not a particular conﬁguration x ∈ Ω , but an ensemble of conﬁgurations described by a probability distribution µ . Interestingly, themap φ t not only induces a dynamic on the space of conﬁgurations Ω , but it also induces a dynamicon the space of all probability distributions over Ω , denoted as M ( Ω ) . Consider, as an example,the discrete distribution µ = ∑ ∞ j = c j δ x j where δ x j is the Dirac delta (or the Kronecker delta if Ω is discrete). For this measure, the probability of a subset of conﬁgurations O ⊂ Ω is calculated by µ ( O ) = ∑ ∞ j = c j x j ( O ) , where x j ( O ) = x j ∈ O and zero otherwise. A natural time-evolution ofthis probability distribution is given by µ t = ∑ ∞ j = c j φ t ( x j ) . One can generalise this construction for anarbitrary initial probability distribution µ by introducing the Frobenius-Perron operator [49], which isan operator over M ( Ω ) deﬁned as Φ t { µ } ( O ) : = µ (cid:0) φ − t ( O ) (cid:1) = µ (cid:0) { x ∈ Ω | φ t ( x ) ∈ O } (cid:1) . (2)Note that the collection { Φ t {·} , t ∈ T } generates a dynamic over M ( Ω ) , and hence constitutes anew dynamical system. The set of probability distributions { µ t = Φ t { µ } , t ∈ T } induces a corresponding multivariatestochastic process X t = ( X t , . . . , X Nt ) = φ t ( X ) , which follows a joint probability distribution p X t = µ t (for the complete statistics of X t and technical details of this correspondence, see Appendix B). Notethat the properties of this stochastic process are completely determined by the initial distribution µ and the map φ t . Each sub-process X kt describes the uncertainty related to the state of the agent k attime t , the statistics of which are found by marginalising the joint statistics of p X t . The aim of the nextsubsections is to explore the statistiscal interdependencies that can exist among these sub-processes. The joint Shannon entropy of the system at time t , given by H ( X t ) : = − ∑ x ∈ Ω p X t ( x ) log p X t ( x ) ,corresponds to the information required to resolve the uncertainty about the state of the system attime t (see Appendix C). The uncertainty reﬂected by this entropy has two sources [51]. One source isstochasticity in the initial condition, i.e. when the initial conﬁguration of the system at time t = As a matter of fact, the measure-theoretic objects that are more studied within dynamical system theory (namely, invariantmeasures [46]) are distributions that are derived from mean values over trajectories. Technically speaking, a sequence of symbols in isolation has no Shannon entropy or mutual information because itinvolves no uncertainty. The literature usually associates a value of entropy by considering a stochastic model which mostlikely generated the sequence. However, this practice relies on strong assumptions (e.g. ergodicity, or independence ofsucessive symbols), which might not hold in practice. For the treatment of this issue by stochastic thermodynamics, seeReferences [47,48]. Interestingly, there exists a subset of M ( Ω ) that is isomorphic to Ω , namely the set of distributions of the form { µ x = x | x ∈ Ω } . Therefore, it is consistent to call M ( Ω ) a generalised state space , which corresponds to the notion of “state” that is usedby quantum mechanics [50]. of 25 the system potentially transit from a single starting conﬁguration to two or more different futureconﬁgurations. Dynamical systems have deterministic transitions, and hence only exhibit the ﬁrst typeof uncertainty.When considering discrete phase spaces, the deterministic dynamics guarantee that theuncertainty due to random initial conditions cannot increase; it can only decrease or be conserved. Asa simple example, let us consider a dynamical system with a single point attractor: even if one doesnot know where a trajectory starts, one knows that the trajectory ends in the attracting point. In thiscase, any information encoded in the initial condition is vanished by the dynamics, as one cannot ﬁndout where trajectories are coming from. We call this phenomenon “information dissipation,” whichmathematically can be stated as: H ( X t ) ≥ H ( X t + h ) for all h > A ,its basin of attraction B ( A ) is the largest subset of Ω such that lim t → ∞ φ t ( x ) = A for all x ∈ B ( A ) .Intuitively, any trajectory starting in B ( A ) asymptotically runs into A . Similarly, the evolution of aninitial distribution µ supported on B ( A ) eventually ends up being supported almost only on A when t is large enough; correspondingly, its Shannon entropy tends to decrease due to the reduced portion ofthe phase space where the system is conﬁned to dwell. As such, information dissipation (i.e. entropydecreasing due to the action of attractors) is a necessary condition for self-organisation.It is tempting to postulate entropy reduction as a strong indicator of self-organisation, based on aloose interpretation of entropy as a metric of disorder. However, the relationship between entropyand disorder is problematic, as disorder has different meanings in various contexts and there existsno single widely accepted deﬁnition for it. Moreover, entropy reduction is not a sufﬁcient conditionfor self-organisation [24]. For example, consider a group of uncopled damped oscillators initialisedwith random initial positions and velocities. This system evolves towards the resting state where allvelocities are zero, which is the only point attractor of the system – thereby reducing its entropy tozero. However, one would not want to call this evolution as one that is promoting self-organisation, asthe agents are never engaged in any interaction. A key idea that emerges from the previous discussion is to relate organisation with agentinterdependency. Following this rationale, we propose that self-organisation is related to the increaseof interdependency between the agents due to the dynamics. To formalise this intuition, we explore adecomposition of the total entropy in two parts: one that quantiﬁes interaction and one that measuresuncorrelated variability.To introduce the decomposition, let us ﬁrst consider the following identity: H ( X jt ) = I ( X jt ; X − jt ) + H ( X jt | X − jt ) , (4)where we are using the shorthand notation X − jt = ( X t , . . . , X j − t , X j + t , . . . , X Nt ) , and I ( · ; · ) is thestandard Shannon mutual information. This equality states that the entropy of the state of the j -thagent, as quantiﬁed by H ( X jt ) , can be decomposed into a part that is shared with the other agents, I ( X jt ; X − jt ) , and a part that is not, H ( X jt | X − jt ) . This intuition is made rigurous by the Slepian-Wolfcoding scheme [55,56], which shows that I ( X jt ; X − jt ) corresponds to information about the j -th agent of 25 that can be retrieved by measuring other agents, while H ( X jt | X − jt ) is information that can only beretrieved by measuring the j -th agent.Following the above rationale, the total “non-shared information” in the system is nothing morethan the sum of the non-shared information of every agent, and corresponds to the residual entropy [35]: R ( X t ) : = N ∑ j = H ( X jt | X − jt ) . (5)One can verify that the agents are statistically independent at time t if and only if H ( X t ) = R ( X t ) .The complement of the residual entropy corresponds to the binding information [57], which quantiﬁesthe part of the joint Shannon entropy that is shared among two or more agents. This can be computedas B ( X t ) : = H ( X t ) − R ( X t ) = H ( X t ) − N ∑ j = H ( X jt | X − jt ) . (6)Note that the above formula corresponds to a multivariate generalisation of theinformation-theoretic identity I ( X ; Y ) = H ( X , Y ) − H ( X | Y ) − H ( Y | X ) , which captures linear andnon-linear dependecies that might exist between two or more agents. As such, the binding informationis one of several multivariate generalisations of the mutual information, and is the only one known toenable a non-negative decomposition of the joint entropy [58].In summary, the binding information provides a natural metric of the strength of the statisticalinterdependencies within a system. In fact, this metric is consistent with the intuition that a faithfulmetric of organisational richness should be small for systems with maximal or minimal joint entropy(see Reference [59] and references therein). On the one hand, maximal entropy takes place whenagents are independent, which implies that H ( X t ) = R ( X t ) and hence B ( X t ) = B ( X t ) ≤ H ( X t ) = H ( X t ) to be non-increasing, both B ( X t ) and R ( X t ) can increase or decrease. In contrast with theentropy, an increase in binding information is an unequivocal sign that statistical structures are beinggenerated within the system by its temporal evolution. Although the binding information provides an attractive information-theoretic metric oforganisation strength, a one-dimensional description is not rich enough to describe the range ofphenomena observed in self-organising agents. To obtain a more detailed picture we use the PartialInformation Decomposition (PID) framework, which allows us to develop a ﬁner decomposition ofthe binding information and distinguish between different modes of information sharing. Originally,PID was introduced to study various aspects of information-theoretic inference, which considera target variable predicted using the information provided by a number of information sources(see References [39,60–62] and references therein). A key intuition introduced by these works is todistinguish between various information modes: in particular, redundant information corresponds toinformation about the target variable that can be retrieved from more than one source, and synergisticinformation corresponds to information that becomes available only when two or more sources areaccessed simultaneously.Traditional PID approaches divide the variables between target and sources, having each of them avery different role in the framework. Nevertheless, it is possible to propose symmetric decompositionsof the joint Shannon entropy using PID principles that avoid these dialectic labellings [58,63,64]. Inthis case, the total information encoded in the system’s conﬁguration is decomposed in redundant,unique and synergistic components. Redundancy takes place when measuring a single agent allowsthe observer to predict the state of other agents. Synergy corresponds to high-order statistical of 25 effects that can constrain groups of variables without imposing low-order restrictions. This idea ofsynergistic information is a generalisation of the well-known fact that random variables can be pairwiseindependent while being jointly interdependent. The relationship between synergistic informationand high-order correlations in the context of statistical physics has been explored in References [58,60].The work reported in [58] describes a decomposition for the binding information for the case ofsystems of N = B ( X t ) = N ∑ n = b n ( X t ) , (7)where b n ( X t ) measures the portion of the binding information that is shared among exactly n agents.The index n refers to the number of agents that are linked by the corresponding relationship. Therefore, b n ( X t ) quantiﬁes the strength of interdependencies that link groups of n agents.To illustrate these ideas, let us explore some simple examples where this decomposition can becomputed directly from our desiderata. Example 1.

Consider two independent Bernoulli random variables U and V with parameter p = (i.eH ( U ) = H ( V ) = ). Then,(i) If ( X t , X t , X t ) = ( U , U , U ) , then R ( X t ) = and B ( X t ) = H ( U ) . Furthermore, because of the tripleidentity b ( X t ) = H ( U ) , and hence b ( X t ) = .(ii) If ( X t , X t , X t ) = ( U , U , V ) , then R ( X t ) = H ( V ) and B ( X t ) = H ( U ) . In this case, b ( X t ) = H ( U ) and hence b ( X t ) = .(iii) If ( X t , X t , X t ) = ( U , V , U (xor) V ) , then R ( X t ) = and B ( X t ) = H ( U ) + H ( V ) . Furthermore, dueto the triple interdepedency b ( X t ) = H ( U ) + H ( V ) , and hence b ( X t ) = . b n ( X t ) = n − ∑ i = I ni ( X t ) , (8) where I ni ( X t ) denotes information that is shared between n agents, and becomes fully available afteraccessing i < n of the agents involved in the sharing. In other words, i is the smallest number ofagents that enables the use of the information that corresponds to I ni ( X t ) for predicting the state of theremaining n − i agents. Note the use of upperscripts and lowerscripts differentiate between group sizesand order of the sharing mode. This decomposition introduces a range of ( i , n ) -interdependencies,where n is the extension of the interdependency (how many agents are involved) while i is the “degreeof synergy.” With this notation, redundancies correspond to i = I ni ( t ) for i ≥ i .Based on these ideas, another way of decomposing the binding information is by focusing on thepossible information sharing modes , i.e. ways in which information can be shared among the agentsaccording to i . By combining Equations (7) and (8), one can then present the following decomposition: B ( X t ) = N ∑ n = n − ∑ i = I ni ( X t ) = N − ∑ i = m i ( X t ) , (9)where m i ( X t ) : = ∑ Nn = i + I ni ( X t ) corresponds to information sharing modes that are fully accessedwhen measuring sets of i agents. In particular, m ( X t ) collects all the “redundancies” of the system, i.e.sharing modes that are fully accessed by measuring only one of the agents involved in the sharing.Correspondingly, the terms m i ( X t ) for i ≥ Example 2.

Consider U and V as deﬁned in Example 1. Then,(i) If ( X t , X t , X t ) = ( U , U , U ) , then m ( X t ) = H ( U ) , as the information contained in any variable allowsto predict the others, while m ( X t ) = .(ii) If ( X t , X t , X t ) = ( U , U , V ) , then similarly as above m ( X t ) = H ( U ) and m ( X t ) = . Both cases areredundancies (same i) of disimilar extension (different n).(iii) If ( X t , X t , X t ) = ( U , V , U (xor) V ) , then measuring one agent does not allow any predictions overthe others, while by measuring two agents one can predict the third one. This implies that m ( X t ) = H ( U ) + H ( V ) , and hence m ( X t ) = .

5. A Quantitative Method to Study Time-Evolving Organisation

In this section we leverage the ideas discussed in Section 4 to develop a method to conduct aquantitative and qualitative analysis of self-organisation in dynamical systems. The goal of this methodis twofold: to detect when self-organisation is taking place, and to characterise it as redundancy- orsynergy-dominated. For this, Section 5.1 ﬁrst develops upper and lower bounds for the terms ofthe decompositions of the binding information presented in Section 4.4. Then, Section 5.2 outlines aprotocol of four steps that can be applied in practical scenarios. α L = ( α , . . . , α L ) to be a vector of L integer indices with 1 ≤ α < α < · · · < α L ≤ N , and B ( X α L t ) to be the binding information of the agents that correspond to those indices attime t , i.e. B ( X α L t ) = H ( X α L t ) − L ∑ j = H ( X α j t | X α t , . . . , X α j − t , X α j + t , . . . , X α L t ) , (10) For a discussion on the statistical properties of the xor , please see Reference [58].1 of 25 where X α L t = ( X α t , . . . , X α L t ) . Also, let us denote as I L the set of all index vectors α L of length L , whichcorrespond to the possible subsets of L agents with cardinality |I L | = ( NL ) .Recall that b n ( X t ) corresponds to information that is shared exactly by n agents, and hence ∑ Ln = b n ( X t ) is the information shared by L or less agents. As B ( X α L t ) corresponds to the informationshared between agents α , . . . , α L it is clear that for any L ∈ {

2, . . . , N } the following bounds hold: L ∑ n = b n ( X t ) ≤ ∑ α L ∈I L B ( X α L t ) ≤ (cid:18) NL (cid:19) max α L ∈I L B ( X α L t ) . (11)Although these bounds might not be tight, Equation (11) suggests that max α L ∈I L B α L ( t ) can beuseful for sizing the value of ∑ Ln = b n ( t ) . In particular, if max α L ∈I L B α L ( t ) = b n ( X t ) = n =

2, . . . , L , which due to Equation (7) would imply that B ( X t ) = ∑ Nn = L + b n ( X t ) .These bounds are illustrated in the following example. Example 3.

Consider U and V as deﬁned in Example 1. Let us focus in L = , and note that for this case I = {{

1, 2 } , {

1, 3 } , {

2, 3 }} , and hence max α ∈I B ( X α t ) = max i , j ∈{ } I ( X it ; X jt ) and ∑ α ∈I B ( X α t ) = ∑ i = ∑ j = i + I ( X it ; X jt ) . (12) Using this, it is direct to ﬁnd that:(i) If ( X t , X t , X t ) = ( U , U , U ) , then ( ) max α ∈I B ( X α t ) = ∑ α ∈I B ( X α t ) = H ( U ) , and henceEquation (11) shows that b ( X t ) ≤ H ( U ) . This bound is not tight, as b ( X t ) = (c.f. Example 1). Also,note that for L = one ﬁnds that max α ∈I B ( X α t ) = B ( X t ) = H ( U ) , showing that the bounds don’tneed to be monotonic on L.(ii) If ( X t , X t , X t ) = ( U , U , V ) , then max α ∈I B ( X α t ) = ∑ α ∈I B ( X α t ) = H ( U ) . This bound is tight, asb ( X t ) = H ( U ) (c.f. Example 1).(iii) If ( X t , X t , X t ) = ( U , V , U (xor) V ) , then max α ∈I B ( X α t ) = , and hence the bounds determine thatb ( X t ) = . m i ( X t ) accounts for the information about other agents that is obtained whenmeasuring groups of i agents, but not less. Similarly, ∑ Li = m i ( X t ) is the predictability about otheragents that is obtained when accessing L or less agents. Therefore, one can provide the followingbounds valid for any L ∈ {

1, . . . , N − } : ψ L ( t ) ≤ L ∑ i = m i ( X t ) ≤ N ∑ j = ∑ α L ∈I L α i (cid:54) = j I ( X α L t ; X jt ) ≤ N (cid:18) N − L (cid:19) ψ L ( t ) , (13)where we have used the shorthand notation ψ L ( t ) : = max j ∈{ N } max α L ∈I L α i (cid:54) = j I ( X α L t ; X jt ) . (14)As in Equation (11), this shows that ψ L ( t ) can be used as a proxy for estimating the relevance of ∑ Li = m i ( X t ) . In particular, if ψ L ( t ) = ∑ Li = m i ( X t ) =

0. Therefore, by using Equation (9), if ψ L ( t ) = L + ψ L ( t ) can reveal the distribution of sharing modesacross the system. First, note that ψ L ( t ) is a non-decreasing function of L : information (in the Shannonsense) “never hurts,” and hence having larger groups of agents for making predictions cannot reduce predictive power. Secondly, in most scenarios ψ L ( t ) is concave: the additional perdictability obtained byincluding one more agent usually shows diminishing returns as L grows. In effect, the most informativeagents are normally selected ﬁrst, and hence for large values of L one can just add agents with weakinformative power, which can also be redundant with the agents already considered. Accordingly,scenarios where ψ L ( t ) as function of L is concave are called redundancy-dominated . In contrast, scenariosin which ψ L ( t ) is convex are called synergy-dominated . Intuitively, in synergy-dominated scenariosagents might be uninformative by themselves, but become informative when grouped together.Therefore, a convex ψ L ( t ) is a sign of a synergistic system, one that has larger predictability gains when L grows.These ideas and bounds are illustrated in the following example. Example 4.

Consider again U and V as deﬁned in Example 1. Focusing in L = , one ﬁnds that ψ ( t ) = max i , j ∈{ N } I ( X it ; X jt ) . Therefore, one can ﬁnd that:(i) If ( X t , X t , X t ) = ( U , U , U ) , then ψ ( t ) = H ( U ) . Therefore, the bounds in Equation (13) show thatH ( U ) ≤ m ( X t ) ≤ H ( U ) .(ii) If ( X t , X t , X t ) = ( U , U , V ) , then again ψ ( t ) = H ( U ) , hence the bounds are the same as above.(iii) If ( X t , X t , X t ) = ( U , V , U (xor) V ) then ψ ( t ) = , which in turn guarantees that m ( X t ) = .By noting that ψ ( t ) = max { I ( X t ; X t X t ) , I ( X t ; X t X t ) , I ( X t ; X t X t ) } , a direct calculation shows that ψ ( t ) = H ( U ) for the three above cases. By considering ψ ( t ) : = , one ﬁnds that cases (i) and (ii) areredundancy-dominated, while case (iii) is synergy-dominated.5.2. Protocol to Analyse Self-Organisation in Dynamical Systems Wrapping up these results, we propose the following deﬁnitions for self-organisation. Notethat these are aimed at quantifying organisation, while the constraints of “self” are guaranteed byrestricting to autonomous maps with sparse interaction matrices (see Section 3.2).

Deﬁnition 1.

Consider a coupled dynamical system with autonomous evolution and a bounded number ofnon-zero elements per row in its interaction matrix. Then, the system is self-organising if B ( X t ) is an increasingfunction of t. Moreover, the value of B ( X t ) is used as a metric of organisation strength. Deﬁnition 2.

A self-organising process is said to be synergy-dominated if lim t → ∞ ψ L ( t ) is convex as functionof L. If lim t → ∞ ψ L ( t ) is concave, the process is said to be redundancy-dominated. Note that for certain processes lim t → ∞ ψ L ( t ) can exhibit a combination of convex and concavesegments, which suggests the coexistence of redundant and synergistic structures at different scales.An example of this is discussed in Section 6.2.4.Following these deﬁnitions, we propose the following protocol for analysing a given dynamicalsystem. The steps are:(0) Check that the maps satisfy autonomy and locality (Section 3.2).(1) Consider a random initial condition given by a uniform distribution over the phase space, µ , anduse it to drive the coupled dynamical system. This involves initialising the system in the leastbiased initial conﬁguration, i.e. with maximally random and independent agents.(2) Compute the evolution of the probability distribution given by µ t = Φ t { µ } . This can be donedirectly using the map, a master equation [65], or in the case of a ﬁnite phase space by computingnumerically all the trajectories.(3) Compute the joint Shannon entropy H ( X t ) , the residual information R ( X t ) , and the bindinginformation B ( X t ) as a function of t .(4) For values of t at which B ( X t ) >

0, compute ψ L ( t ) for L =

1, . . . , N . Note that by considering a ﬂat initial condition in step (1), one ensures that the system initiallyhas no correlations, i.e. B ( X ) =

0. Therefore, if one ﬁnds that B ( X t ) > t >

0, one can besure that these interdependencies were entirely created by the dynamics of the system. Also, whilestep (3) clariﬁes if self-organisation is taking place following Deﬁnition 1, (i.e. by checking if B ( X t ) > t >

6. Proof of Concept: Cellular Automata

Cellular Automata (CA) are a well-known class of discrete coupled dynamical systems widelyused in the study of complex systems and distributed computation [66]. A CA is a multi-agent systemin which every agent has a ﬁnite set of possible states, and evolves in discrete time steps following aset of simple rules based on its own and other agents’ states. For simplicity, we focus our analysis onsynchronous update CA. CA are a natural candidate for our measures, since they have been often used in other studies ofself-organisation [68], some of them are capable of universal computation [69], and they provide a richtestbed for theories of distributed computation and collective behaviour in complex systems [70].

Our analysis focuses on Elementary Cellular Automata (ECA), which constitute a particularsubclass of CA. In ECA, agents (or cells ) are arranged in a one-dimensional cyclic array (or tape ). Thestate of each cell at a given time step has two possible states, 0 or 1, and is a boolean function of the stateof itself and its immediate neighbours at the previous time step. The same boolean function dictates thetime evolution of all agents, inducing a spatial translation symmetry. Hence, each of the 256 differentboolean functions of three binary inputs induces a different evolution rule . Rules are then enumeratedfrom 0 to 255 and each ECA, irrespective of its number of agents, can be classiﬁed by its rule. Moreover,each rule has an equivalent class of rules, given by the rules obtained by reﬂection (exchanging rightand left) and inversion (exchanging zeros and ones). Keep in mind that all the statistical resultsdiscussed in this section are equally valid for all the members of the corresponding equivalence class.For a more detailed description of ECA and their numbering system, see Reference [68].In our simulations, we followed the protocol outlined in Section 5.2 over arrays of N cellsthat followed one ECA rule. We initialised one copy of the ECA in each of the 2 N possible initialconditions and numerically computed the temporal evolution of each one of them. As is standard inthe ECA literature, the automata were simulated under periodic boundary conditions. The probabilitydistribution at time t , µ t , was calculated after the system reached a pseudo-stationary regime , whichplays the role of a non-equilibrium steady-state [71,72]. These calculations were performed usingmethods outlined in Appendix D, which allowed us to consider arrays up to size N = H ( X t ) , B ( X t ) and R ( X t ) . These plots show if the ECA shows signs ofself-organisation according to Deﬁnition 1, and if the joint entropy decreases or remains constant(c.f. Section 4.2).(b) The interdependency between individual cells through time, as given by the mutual informationbetween a single cell at time t = I ( X ; X kt ) for t ∈ {

0, 1 . . . } and k ∈ {

1, . . . , N } ). This reﬂects the predictive power of the state of acell in the initial condition over the future evolution of the system. For a survey about asynchronous CA, please see Reference [67]. To use an analogy, one can think of the information content of a cell as a drop of ink that is thrown into the river of thetemporal evolution of the system.4 of 25 (c) The mutual information between every pair of cells for the pseudo-stationary distribution.Because of the spatial translation symmetry of ECA, it sufﬁces to take any cell and computeits mutual information with each other cell. We call this “spatial correlation,” as it measuresinterdependencies between cells at the same time t .(e) The curve ψ L (c.f. Section 5) for the pseudo-stationary distribution, which is used to characterisea self-organising system as either redundancy- or synergy-dominated as per Deﬁnition 2. Thiscurve can also be interpreted as how much of a cell can be predicted by the most informativegroup of L other cells. Now we present and discuss the proﬁles of some well-known rules, which illustrate paradigmaticbehaviour. As the behaviour of ECA is known to be sometimes affected by the speciﬁc number ofagents (see e.g. Reference [73]), we only discuss results that are exhibited consistently for a range ofvalues of N . Figures show results of ECA with N =

17 agents, while extended versions of these resultsfor all rules with N =

4, . . . , 17 agents can be found in https://cellautomata.xyz.6.2.1. Strong redundancy: rule 232Rule 232 is commonly referred to as the majority rule, as one cell’s next state is 1 if and only iftwo or more of its predecessors are 1. The dynamics of this rule when starting from a random initialcondition are governed by interactions between nearest neighbours, which are resolved after fewsteps into stable conﬁgurations (Figure 2a). As a result of this brief interaction, the dynamics generatebinding information while decreasing the joint entropy, as shown in Figure 2d.In agreement with those observations, it is found that one cell at the initial conditionhas high predictive power over the state of itself and its nearest neighbours in the future(Figure 2b). Correspondingly, the proﬁle of pairwise mutual information terms between cellsat the pseudo-stationary regime shows exponentially decaying correlations as a function of celldistance (Figure 2c).The curve of ψ L shows a concave shape, growing strongly for the ﬁrst two (nearest) neighbours,growing slightly for the third and fourth nearest neighbours, and remaining then essentially ﬂat(Figure 4). This means that remote neighbours are practically independent, which is consistent withthe pairwise correlation proﬁle. Note that knowing all the other cells provides an 75% prediction overa given cell, meaning that there is a non-negligible amount of residual entropy.In summary, Rule 232 shows the signature of redundancy-dominated self-organisation. Thisbehaviour was found consistently in rules that evolve towards ﬁxed states and rules that evolvetowards periodic orbits with relatively short cycle lengths, which are known in the CA literature asClass 1 and Class 2 rules, respectively [69].6.2.2. Synergistic proﬁle: rule 30Rule 30 is known for generating complex geometric patterns, and has a sensitive dependence toinitial conditions [74]. This rule, among others, has provided key insights to understand how simplerules can generate complex structures. For example, similar patterns can be found in the shell of the conus textile cone snail species. Rule 30 has also been proposed as a stream cipher for cryptography [75],and has been used as a pseudo-random number generator [76].Visual inspection suggests that the information processing done by this rule is much more complexthan Rule 232. In effect, Figure 3d shows that this rule generates high B ( X t ) through a much longermixing time. Intriguingly, the predictive information of a single cell seems to disapear after very fewsteps (Figure 3b), meaning that knowing the state of a single cell of the initial condition is not usefulfor predicting the state of any cell at later stages. Even more intriguingly, the pseudo-stationary regimeshows that each pair of cells is practically independent (Figure 3b), in direct contrast with the highvalue of B ( X t ) . − S p a t i a l c o rr e l a t i o n H ( X t ) B ( X t ) a) c) d) b) Figure 2.

Combined results for rule 232. (a)

Example of evolution starting from random initialconditions. Note that this example system is larger than the one used in the simulation for plots (b-d). (b)

Mutual information between the initial state of a cell and the future state of the same cell and itsneighbours (black is higher). (c)

Proﬁle of pairwise mutual information terms between cells at thepseudo-stationary regime shows a typical exponential decay. (d)

Time evolution generates interactionreﬂected by B ( X t ) , which is of the same order of magnitude than R ( X t ) = H ( X t ) − B ( X t ) . Both B and H are reported in bits. These aparent paradoxes are solved when one considers high-order correlations by studying thebehaviour of ψ L (Figure 4). In effect, the convex shape of the curve shows a pronounced synergisticstructure: groups of less than 8 cells show no interdependency, but groups of 15 allow almost perfectprediction! This shows that the self-organisation driven by Rule 30 generates high-order structures. Inparticular, for arrays of 17 cells, the fact that ∑ j = m j ( X t ) ≈ xor logic gates: the future state of each cell correponds to the xor of its two precessors. When started from a single active cell, Rule 90 generates a Sierpinsky triangle,while when started from a random initial condition it generates irregular triangular patterns. Rule 90is known for having connections with number theory, as discussed in Ref. [68].Together with Rule 60, which is also composed by concatenated xor s, Rule 90 was found to bethe most synergistic rule of all 256 ECA. In fact, for an array of N cells started with random initialconditions, after the second step any group of N − ψ L = L < N −

1, and therefore m L ( X t ) = L < N − R ( X t ) = B ( X t ) = H ( X t ) = N −

1, indicating that thebinding information of Rules 60 and 90 corresponds exclusively to synergy of the highest order, i.e. B ( X t ) = m N − ( X t ) . − S p a t i a l c o rr e l a t i o n H ( X t ) B ( X t ) a) c) d) b) Figure 3.

Combined results for rule 30. (a)

Example of evolution starting from random initial conditions.Note that this example system is larger than the one used in the simulation for plots (b-d). (b)

Mutualinformation between the initial state of a cell and the future state of the same cell and its neighbours(black is higher). (c)

At the pseudo-stationary regime, there exists no mutual information betweenany pair of cells. (d)

Despite having no signiﬁcant pairwise correlations, the dynamics generate largeamounts of interdependency between the cells reﬂected by a high value of B ( X t ) . Both B and H arereported in bits. We found that most ECA rules with attractors of length of the order of the phase space (known asClass 3 and 4 in the CA literature [77]) exhibit synergy-dominated self-organisation. Besides rules 30,60 and 90 (and the ones in their equivalence classes), rules 18 and 146 have the strongest convexityin their ψ L proﬁles. Interestingly, the fact that rules 60 and 90 have been found to have the highestsynergy is consistent with the crucial role played by xor gates in cryptography. ψ L Interestingly, some rules show both convex and concave sections in ψ L . Examples of thisphenomenon are rules 14, 22, 41, 54, 62, 73, 106 and 110, with the shape of ψ L being sometimessensitive to the system size. Rules 106 and 110, in particular, show a clear distinction between a convexsegment for small L and concave segment for large L . When compared with rule 106, rule 110 has itsinﬂection point at a smaller L , which could be related with the more localised structures seen in thisrule. Based on these results, we hypothesise that a combination of synergy and redundancy within asingle system could provide a richer, or more “complex” structure. However, further investigation inlarger systems would be necessary to conﬁrm that the inﬂection point is actually an intrinsic propertyof the rule – and not a ﬁnite-size effect. For a discussion of this connection, see Reference [58] and Section 4.2.7 of 25 ψ L Rule 232: Redundancy 0 5 10 150.00.51.0 Rule 30: Moderate synergy0 5 10 150.00.51.0 Number of cells ( L ) ψ L Rule 90: Strong synergy 0 5 10 150.00.51.0 Number of cells ( L )Rule 106: Mixed proﬁle Figure 4.

While the concave shape of ψ L for Rule 232 shows that correlations are mostly redundant,the convex shape for Rule 30 shows the dominance of synergies of order 10 or more. Rules 60 and90 are the only rules that generate purely synergistic structure of the highest order. Results for Rule106 show an inﬂection point where ψ L switches from convex to concave, suggesting the coexistence ofsynergistic small-scale and redundant large-scale structures.

7. Discussion

This paper presents an information-theoretic framework to study self-organisation in multi-agentsystems, which explores how statistical structures are spontaneously generated by the evolution ofcoupled dynamical systems. To guarantee the absence of centralised control guiding the process, werestrict ourselves to autonomous systems where each agent can interact directly with only a smallnumber of other agents. To isolate structures that are purely created by the system’s dynamics, weconsider the evolution of agents that are initially maximally random and independent.A fundamental insight behind our framework is the fact that deterministic dynamical systems areable to create correlations by destroying information . In effect, we saw that while the temporal evolution ofmany dynamical systems reduce their joint Shannon entropy, this condition can be the consequence oftwo qualitatively opposite scenarios: in one case interdependency is created while the stochasticity ofeach agents is preserved; and in the other mere information dissipation occurs (each agent becomesless random while remaining independent of each other). Following this line of thought, and divergingfrom the standard literature, we propose to attribute self-organisation to processes where the strengthof interdependencies increases with time. In this work we use the binding information as metric ofglobal interdependency strength.As a second step, we propose a multi-layered description of the attained organisation based onsynergies and redundancies of various orders. The key idea is to decompose the information stored inthe system, as quantiﬁed by the joint Shannon entropy, considering two principles: extension (how many agents are linked), and sharing mode (how many agents need to be measured in order to obtainpredictive power). The information sharing mode of order 1 corresponds to redundancy, which takesplace when by measuring only one agent one can (partially) predict the state of a number of other agents.Synergy takes place when such predictive power is accessible only when measuring two or moreagents simultaneously. We proposed these decompositions as formal structures, without providingan explicit way to compute the values of their components for arbitrary probability distributions.Nevertheless, upper and lower bounds for these components are provided, which in some cases canallow a complete determination of the decomposition.Using the proposed framework, this work is the ﬁrst – to the best of our knowledge – todemonstrate cases of high-order statistical synergy in relatively large systems. In particular, weshowed that the ECA that corresponds to rule 90 generates maximal synergy, which, accordingReference [58] and Section 4.2, could enable the development of interesting cryptographic applications.Moreover, our results suggest that some rules can exhibit a coexistence of redudant and synergisticstructures at different scales. However, more work is needed in order to conﬁrm this hypothesis andexplore its implications.Let us remark that our framework does not intend to compare diverse systems on aunidimensional ranking of organisational richness. Accodingly, it would not be correct to claimthat rule 90 attains a richer organisation than other ECA. Our framework uses increments in thebinding information to detect self-organisation, and then applies an multi-dimensional informationdecomposition to provide qualitative insight of the result of this process. As a result, different types ofstructures (e.g. redundant, synergistic or mixed) are acknowledged in their diversity, without trying tocollapse their properties into a single number.An interesting extension of this work would be to use some of the recently proposed measures ofsynergy (see e.g. [78–81]) to build exact formulas for the proposed decompositions. This would allowa more precise characterisation of the strength of each information sharing mode. However, this couldprove to be challenging, as most of these metrics are designed for systems of three variables with theirextensions to larger systems not being straightforward.Another natural extension would be to apply the presented framework to study continuouscoupled dynamical systems, and also their stochastic counterparts (e.g. stochastic differentialequations). Interestingly, while the entropy of continous systems can be negative, the bindinginformation is still a non-negative quantity and hence its decomposition can be carried out directlyusing the framework proposed in Section 4.4. Moreover, all the presented results and methods arevalid for systems with random dynamics, with the sole exception of the fact that the joint entropycan increase (in contrast to what was discussed in Section 4.2). The main challenge for this would beto develop faithful estimators of the corresponding densities for cases where analytical expressionsare not available. This task could, for example, be approached by using well-established methods ofBayesian inference [82] and density estimation [83].To study the structure of a particular attractor in a non-ergodic system, one could focus theanalysis on the corresponding natural invariant measure (c.f. [84]) instead of studying the evolutionfrom the uniform distribution. It would be of interest to explore if well-known chaotic attractors canbe explained in terms of the synergies and redundancies they induce in the corresponding coordinates,which could provide a new link between chaos theory and multivariate information theory. Thesedevelopments could allow to study real-world phenomena, e.g. sensorimotor control loops [45,85].Also, this development could enable a bridge between the ideas presented in this paper and theextensive literature of self-organising coupled oscillators (see e.g. [86,87]).Finally, it is worth emphasising that the statistical character of our proposed framework makesit orthogonal to some well-established self-organisation principles, such as the enslaving principle formulti-scale systems [88] or the free energy principle for autopoetic organisms [23]. As a matter of fact,it remains to be explored to what extent those principles can be enriched by including multi-layereddecompositions in terms of redundancies and synergies. Also, please note that the presented approach to self-organisation is restricted to structures that are generated within known possibilities, which isrelated to the idea of “weak emergence” [89]. An attractive extension would be to include phenomenarelated to “strong emergence,” i.e. processes in which evolution can affect the state space itself,generating entirely new conﬁgurations for the system to explore. An attractive way of attemptingthis extension could be to combine the presented framework with the notion of super-exponentiallygrowing phase spaces presented in Reference [90].

Author Contributions:

All authors participated in the development of the concepts, wrote and revised themanuscript. The numerical evaluations over ECA were carried out by M.U. All authors have read and approvedthe ﬁnal manuscript.

Funding:

FR was supported by the European Union’s H2020 research and innovation programme, under theMarie Skłodowska-Curie grant agreement No. 702981.

Acknowledgments:

The authors thank Carlos Gershenson, Michael Lachmann, Robin Ince and Ryan James forhelpful discussions. The authors also acknowledge useful suggestions from Karl Friston, Tiago Pereira and ananonymous reviewer, which greatly improved the paper.

Conﬂicts of Interest:

The authors declare no conﬂict of interest. The funding sponsors had no role in the designof the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in thedecision to publish the results.

Appendix A. Alternative approaches to formalising global structure

Structure as Geometrical Properties of Attractors

A natural way to attempt a formalisation of structure is by relating it to the notion of attractor fromthe dynamical systems literature. To deﬁne what an attractor is, let us note that the set { x s | x s = φ s ( x ) } corresponds to the trajectory that originates from the initial condition x ∈ Ω . A set B ⊂ Ω is stable if itcontains the trajectories of all its elements, i.e. for all y ∈ B and t ∈ N we have that y t = φ t ( y ) ∈ B .An attractor is a set A that is stable and has no stable proper (non-empty) subsets. Common attractorsare ﬁxed points, limit cycles (i.e. periodic trajectories) and strange attractors [91].Since the early efforts of Ashby [10,92], it has been noted that self-organisation is a consequenceof the tendency of dynamical systems to evolve towards attractors. The system, hence, becomes more“selective” as time passes. Following this rationale, one can argue that the distinctive properties ofthese attracting conﬁgurations are the ones that emerge within the time-evolution.Could one relate the attractor’s geometrical structure to properties of structure and organisation?An attractive fact is that “interesting” dynamics are usually associated with non-linear equations,which in turn generate strange attractors with exotic geometric properties; while on the other hand theattractors of linear dynamics have uninteresting geometrical structure. Following this line of thought,one could attempt to establish relationships between the geometrical structure of attractors (e.g. interms of fractal structure) and properties of the organisation attained by the agents. Although this isplausible, the route to develop such endeavour is not straightforward. Structure as Pattern Complexity

Interesting approaches for formalising the concept of structure or pattern can be found in thecomputer science and signal processing literature. One such approach is to relate pattern strength withthe Kolmogorov complexity (KC) [93], which is the length of the shortest computer program that isable to generate the pattern as output. In this way, pattern stength is inversely proportional to thevalue of the corresponding KC: very structured conﬁgurations can be generated by short programs,while random conﬁgurations with no structure can only be the output of a program of the same lengthof the sequence itself.Using the KC to measure pattern strength is attractive due to its intuitiveness, and because itsquantitative nature can allow comparisons between heterogenous structures [94]. Unfortunately, theKC has been proven to be not computable (this imposibility being related to Gödel’s incompletness theorem [95]), which hinders its practical value. Another weakness of this approach is that the KCdoes not have, to the best of our knowledge, properties about the relationship between the complexityof a system and the complexity of it parts, nor properties of how the KC evolves in time under diversedynamical conditions. These two limitations are overcomed by adopting an information-theoreticframework, as we do in the main body of the paper.

Appendix B. From a Dynamical System to a Stochastic Process

In order to consider probabilities deﬁned over a metric phase space Ω , one needs to introduce acollection of “events,” denoted by B , which correspond to measurable subsets of Ω . It is natural to askthis collection to be a σ -ﬁeld, so that if B , B ∈ B then B ∪ B ∈ B and B ∩ B ∈ B are guaranteed [96].A probability measure µ is a function µ : B (cid:55)→ [ ∞ ) such that µ ( Ω ) =

1, which satisﬁes therelationship µ (cid:16) ∪ ∞ j = B j (cid:17) = ∑ ∞ j = µ ( B j ) if B j ∈ B for all j ∈ N and ∩ ∞ j = B j = ∅ . When considering amap φ t over the phase space, it is natural to require φ t and B to match together appropiately, i.e. for all B ∈ B then φ − t ( B ) = { x ∈ Ω | φ t ( x ) ∈ B } ∈ B . In that way, one can guarantee the consistency of thedeﬁnition given in (2).Given a probability distribution µ , any measurable function Y : Ω (cid:55)→ R can be considered to bea random variable with statistics deﬁned as P { Y ∈ I } : = µ (cid:0) Y − ( I ) (cid:1) . (A1)Above, Y − ( I ) = { x ∈ Ω | Y ( x ) ∈ I } and I ⊂ R . Similary, the multivariate stochastic process X t = ( X t , . . . , X nt ) induced by the map φ t is deﬁned by the joint statistics are given by P { X i t ∈ I , . . . , X i m t m ∈ I m } : = µ (cid:16) ∩ mj = (cid:110) x ∈ Ω (cid:12)(cid:12)(cid:12) φ jt j ( x ) ∈ I j (cid:111)(cid:17) , (A2)where i j ∈ {

1, 2, . . . , n } are a collection of indices, t , . . . , t m ∈ T is a collection of time points, I j ⊂ R for j =

1, . . . , m , and φ jt is the j -th coordinate of the map at time t as deﬁned in Section 3.2. For discretephase spaces, the joint probability distribution of X t is given by p X t ( x ) = P { X t = x } for x ∈ Ω . Appendix C. Information and Entropy

The entropy is a functional over the probability distribution that describes the state of knowledgethat an observer has with respect to a given system of interest [28]. In this context, uncertainty inthe system corresponds to information that can be potentially extracted by performing adequatemeasurements.Following this line of thought, the amout of information needed to specify a single conﬁgurationwithin | Ω | possibilities is log | Ω | , where the base of the logarithm can be choosen according to thepreferred units for counting information (bits, nats, or others). If a system with a phase space ofcardinality | Ω | at time t follows a statistical distribution p X t , then this information gets divided asfollows [58]: log | Ω | = H ( X t ) + N ( X t ) , (A3)where H ( X t ) : = − E { log p X t ( X t ) } is the joint Shanon entropy of the system, and N ( X t ) : = log | Ω | − H ( X t ) is the “negentropy”. After an observer comes to know the statistics of the system, as encodedby p X t , the average amount of information needed to specify a particular conﬁguration decreases fromlog | Ω | to H ( X t ) ; therefore, the negentropy corresponds to the bits that are disclosed by the knowledge Please note that practical estimation of the KC can be attempted via upper bounds, which can be calculated using losslesscompression algorithms [94].1 of 25 of the statistics. In contrast, the Shannon entropy measures the information that is not disclosed by thestatistics, which can only be obtained when the conﬁguration of the system is actually measured.

Appendix D. Simulation Details

This appendix describes the procedure for calculating the evolution of probability distributionsover ECA (c.f. Section 7). The possible states of an ECA with N cells were encoded as a binarynumbers, and hence the phase space corresponds to Ω = {

0, . . . , 2 N − } . Probability distributionsover the phase space were stored as an array L µ = ( µ ( ) , . . . , µ ( N − )) , where µ ( k ) ≥ k ∈ {

0, . . . , 2 N − } and ∑ N − k = µ ( k ) = Ω , which correspond to sequences of binary numbers ( s , s , . . . ) such that s k + = φ ( s k ) with φ : Ω → Ω being the function that encondes the ECA rule. Our interestwas to ﬁnd the step at which φ ( · ) brings the trajectory back to a state that has been already visitedbefore. Note that all trajectories end up in a periodic attractor, this being a consequence of the ﬁnitenessof Ω . From a trajectory starting at s ∈ Ω , we store the pair ( p s , a s ) with p s being the length of thetrajectory until reaching a state in the periodic attractor, and a s being the legth of the periodic attractor(i.e. the number of states between the ﬁrst and the second appearance of a repeated state). The interestof these numbers lays in the fact that φ t ( s ) = φ K ( t , s ) ( s ) ∀ t ∈ { p s , p s +

1, . . . } , (A4)where K ( t , s ) : = p s + ( t − p s ) mod a s . Above, φ t ( s ) = φ ◦ · · · ◦ φ ( s ) is the t -th composition of φ withitself.The ECA is said to have reached a pseudo-stationary regime when it has been run for a number ofsteps t s such that, for any initial distribution µ , a trajectory of distributions ( µ , µ , . . . , µ t s ) obtainedby time evolution would reach a distribution that has already been visited before. The minimalnumber of steps needed to a reach pseudo-stationary regime, denoted by t , can be calculated as t = LCM (cid:0) { a s } s ∈ Ω (cid:1) + max s ∈ Ω p s , (A5)where LCM stands for the least common multiple . Above, the last term ensures that each state enteredtheir periodic attractor, and the former is the smallest number of steps that guarantees a simultaneousfull cycle of all the attractors. For the considered ECA with N =

17 cells, the largest values foundwhere t ≈ .In order to be able to study the statistics of ECA under pseudo-stationary regimes, we developedan efﬁcient way to compute the evolution of a given initial distribution µ for very large number of steps.Let us represent the initial distribution µ by the array L µ , and the resulting distribution after t stepsas µ (cid:48) with its corresponding vectorial representation L µ (cid:48) . Our key idea is to compute the trajectoryfrom each s ∈ Ω only for K ( t , s ) steps – as additional multiples of a s correspond to mere cycles overits periodic attractor. The general procedure for computing µ (cid:48) using this idea goes as follows:1. Initialize the components of L µ (cid:48) with zeros.2. For each s ∈ Ω : compute s (cid:48) = φ K ( t , s ) ( s ) and then add µ ( s ) to µ (cid:48) ( s (cid:48) ) (i.e. add L µ [ s ] to L µ (cid:48) [ s (cid:48) ] ).This technique resulted to be very efﬁcient, as the largest values found over all ECA rules for N = s ∈ Ω p s = s ∈ Ω a s = As mentioned in Section 4.1, the set of all probability distributions µ over Ω together with their dynamics form a newdynamical system, which also has periodic attractors. From this point of view, t s is the smallest integer such that alltrajectories of distributions reach their periodic attractor.2 of 25 References

1. Haken, H. Synergetics: an introduction. Non-equilibrium phase transition and self-organisation in physics,chemistry and biology. Phys. Bull. , , doi:10.1088/0031-9112/28/9/027.2. Camazine, S. Self-Organization in Biological Systems . Princeton University Press: Princeton, NJ, USA, 2003.3. Tognoli, E.; Kelso, J.S. The metastable brain.

Neuron , , 35–48.4. Ding, Y.; Jin, Y.; Ren, L.; Hao, K. An Intelligent Self-Organization Scheme for the Internet of Things. IEEEComput. Intell. Mag. , , 41–53. doi:10.1109/MCI.2013.2264251.5. Athreya, A.P.; Tague, P. Network self-organization in the Internet of Things. In Proceeding of theInternational Conference on Sensing, Communications and Networking (SECON), New Orleans, LA, USA,24–24 June 2013, doi:10.1109/SAHCN.2013.6644956.6. MacDonald, T.J.; Allen, D.W.; Potts, J. Blockchains and the boundaries of self-organized economies:Predictions for the future of banking. In Banking Beyond Banks and Money ; Springer Cham: Switzerland,2016; pp. 279–296.7. Prokopenko, M., Ed.

Guided self-organization: Inception ; Vol. 9, Springer Science & Business Media, 2013.8. Kuze, N.; Kominami, D.; Kashima, K.; Hashimoto, T.; Murata, M. Controlling large-scale self-organizednetworks with lightweight cost for fast adaptation to changing environments.

ACM Transa. Auto Adapt.Syst. , , 9.9. Rosas, F.; Hsiao, J.H.; Chen, K.C. A technological perspective on information cascades via social learning. IEEE Access , , 22605–22633.10. Ashby, W.R. Principles of the self-organizing dynamic system. J. Gen. Psychol. , , 125–128.11. Foerster, H.v. On self-organizing systems and their environments. Self-Organizing Syst. , 31–50.12. Haken, H.; Jumarie, G.

A Macroscopic Approach to Complex System . Springer:Berlin, Heidelberg, 2006.13. Crommelinck, M.; Feltz, B.; Goujon, P.

Self-Organization and Emergence in Life Sciences . Springer:Berlin,Heidelberg, 2006.14. Heylighen, F.; Gershenson, C. The meaning of self-organization in computing.

IEEE Intell. Syst. , .15. Mamei, M.; Menezes, R.; Tolksdorf, R.; Zambonelli, F. Case studies for self-organization in computerscience. J. Syst. Archit. , , 443–460.16. De Boer, B. Self-organization in vowel systems. J. phon. , , 441–465.17. Steels, L. Synthesising the origins of language and meaning using co-evolution, self-organisation and levelformation. Appr. Evol. Lang. , pp. 384–404.18. Prehofer, C.; Bettstetter, C. Self-organization in communication networks: principles and design paradigms.

IEEE Commun. Mag. , , 78–85.19. Dressler, F. A study of self-organization mechanisms in ad hoc and sensor networks. Comput. Commun. , , 3018–3029.20. Kugler, P.N.; Kelso, J.S.; Turvey, M. On the concept of coordinative structures as dissipative structures: I.Theoretical lines of convergence. Tutor. motor behav.r , , 3–47.21. Kelso, J.S.; Schöner, G. Self-organization of coordinative movement patterns. Hum. Mov. Sci. , , 27–46.22. Kelso, J.S. Dynamic patterns: The self-organization of brain and behavior ; MIT press:London, UK, 1997.23. Friston, K. The free-energy principle: a uniﬁed brain theory?

Nat. Rev. Neur. , , 127.24. Shalizi, C.R.; Shalizi, K.L.; Haslinger, R. Quantifying self-organization with optimal predictors. Phys. Rev.L , , 118701.25. Gershenson, C. Guiding the self-organization of random Boolean networks. Theory Bio. , , 181–191.26. Gershenson, C.; Heylighen, F. When can we call a system self-organizing? In Advances in Artiﬁcial Life.Springer:Berlin, Heidelberg 2003, pp.606–614.27. Krakauer, D.; Bertschinger, N.; Olbrich, E.; Ay, N.; Flack, J.C. The information theory of individuality. arXivpreprint arXiv:1412.2447 .28. Jaynes, E.T. Probability Theory: the Logic of Science ; Cambridge university press:London, UK, 2003.29. Mezard, M.; Montanari, A.

Information, Physics, and Computation ; Oxford University Press:New York, NY,USA, 2009.30. Nicolis, G.; Prigogine, I.

Self-Organization in Non-Equilibrium Systems: From Dissipative Structures to OrderThrough Fluctuations ; Wiley, 1977.

31. Heylighen, F.; others. The science of self-organization and adaptivity.

Encyclopedia Life Support Syst. , , 253–280.32. Pulselli, R.; Simoncini, E.; Tiezzi, E. Self-organization in dissipative structures: A thermodynamic theoryfor the emergence of prebiotic cells and their epigenetic evolution. Biosyst. , , 237–241.33. Klimontovich, Y.L. Turbulent Motion. The Structure of Chaos. In Turbulent Motion and the Structure of Chaos ;Springer: Berlin, Germany, 1991; pp. 329–371.34. Gershenson, C.; Fernández, N. Complexity and information: Measuring emergence, self-organization, andhomeostasis at multiple scales.

Complexity , , 29–44.35. Vijayaraghavan, V.S.; James, R.G.; Crutchﬁeld, J.P. Anatomy of a spin: the information-theoretic structureof classical spin systems. Entropy , , 214.36. Lloyd, S. Measures of complexity: a nonexhaustive list. IEEE Control Syst. Mag. , , 7–8.37. Tononi, G.; Sporns, O.; Edelman, G.M. A measure for brain complexity: relating functional segregationand integration in the nervous system. Proc. Nat. Aca. Sci. , , 5033–5037.38. Friston, K.J.; Tononi, G.; Sporns, O.; Edelman, G. Characterising the complexity of neuronal interactions. Hum. Brain Map. , , 302–314.39. Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv preprintarXiv:1004.2515 .40. Bar-Yam, Y. Multiscale complexity/entropy. Adv. Complex Syst. , , 47–63.41. Allen, B.; Stacey, B.C.; Bar-Yam, Y. Multiscale information theory and the marginal utility of information. Entropy , , 273.42. Beck, C.; Schögl, F. Thermodynamics of chaotic systems: an introduction ; Cambridge University Press: London,UK, 1995.43. Vasari, G.

The Lives of the Artists ; Vol. 293, Oxford University Press:New York, NY, USA, 1991; pp. 58–59.44. Robinson, R.C.

An introduction to dynamical systems: continuous and discrete . Am. Math. Soc. , .45. Nurzaman, S.; Yu, X.; Kim, Y.; Iida, F. Goal-directed multimodal locomotion through coupling betweenmechanical and attractor selection dynamics. Bioinspir. biomim. , , 025004.46. Schuster, H.G.; Just, W. Deterministic chaos: an introduction ; John Wiley & Sons: Hoboken, NJ, USA, 2006.47. Ao, P. Deterministic Chaos: An Introduction, In

Turbulent Motion. The Structure of Chaos. In: TurbulentMotion and the Structure of Chaos. Fundamental Theories of Physics , Springer: Heidelberg, Berlin, Germany,2006.48. Seifert, U. Stochastic thermodynamics, ﬂuctuation theorems and molecular machines.

Rep. Prog. Phys. , , 126001.49. Ott, E. Chaos in dynamical systems ; Cambridge University Press: London, UK, 2002.50. Breuer, H.P.; Petruccione, F.; others.

The Theory of Open Quantum Systems ; Oxford University Press :NewYork, NY, USA, 2002..51. Schreiber, T.; Kantz, H. Noise in chaotic data: diagnosis and treatment.

Chaos , , 133–142.52. Cover, T.M.; Thomas, J.A. Elements of Information Theory ; John Wiley & Sons: Hoboken, NJ, USA, 2012.53. Schulman, L.S.

Time’s Arrows and Quantum Measurement ; Cambridge University Press:London, UK, 1997.54. Martynov, G. Liouville’s theorem and the problem of the increase of the entropy.

Soviet J. Exp. Theor. Phys. , , 1056–1062.55. Slepian, D.; Wolf, J. Noiseless coding of correlated information sources. IEEE Transa. Inf. Theor. , , 471–480.56. El Gamal, A.; Kim, Y.H. Network Information Theory ; Cambridge university press:London, UK, 2011.57. Te Sun, H. Nonnegative entropy measures of multivariate symmetric correlations.

Inf. Control , , 133–156.58. Rosas, F.; Ntranos, V.; Ellison, C.J.; Pollin, S.; Verhelst, M. Understanding interdependency throughcomplex information sharing. Entropy , , 38.59. Feldman, D.P.; Crutchﬁeld, J.P. Measures of statistical complexity: Why? Phys. Lett. A , , 244–252.60. Olbrich, E.; Bertschinger, N.; Rauh, J. Information decomposition and synergy. Entropy , , 3501–3517.61. Barrett, A.B. Exploration of synergistic and redundant information sharing in static and dynamicalGaussian systems. Phys. Rev. E , , 052802.62. Lizier, J.T.; Bertschinger, N.; Jost, J.; Wibral, M. Information Decomposition of Target Effects fromMulti-Source Interactions: Perspectives on Previous, Current and Future Work. Entropy , .

63. Rosas, F.; Ntranos, V.; Ellison, C.J.; Verhelst, M.; Pollin, S. Understanding high-order correlations using asynergy-based decomposition of the total entropy. In Proceedings of the 5th joint WIC/IEEE Symposiumon Information Theory and Signal Processing in the Benelux, Brussels, Belgium, 5 June 2015, pp. 146–153.64. Ince, R.A. The Partial Entropy Decomposition: Decomposing multivariate entropy and mutual informationvia pointwise common surprisal. arXiv preprint arXiv:1702.01591 .65. Van Kampen, N.G.

Stochastic Processes in Physics and Chemistry.

Elsevier: Oxford, UK, 1992.66. Mitchell, M.; others. Computation in cellular automata: A selected review.

Nonstand. Comput. , pp.95–140.67. Fates, N. A guided tour of asynchronous cellular automata. International Workshop on Cellular Automataand Discrete Complex Systems. Springer:Berlin, Heidelberg, 2013, pp. 15–30.68. Wolfram, S.

A New Kind of Science . Wolfram media: Champaign, IL, USA, 2002.69. Wolfram, S. Universality and complexity in cellular automata.

Phys. D: Nonlinear Phenom. , , 1–35.70. Lizier, J. The local information dynamics of distributed computation in complex systems. PhD thesis,University of Sydney, Australia, 2010.71. Esposito, M.; Van den Broeck, C. Three faces of the second law. I. Master equation formulation. Phys. Rev.E , , 011143.72. Tomé, T.; de Oliveira, M.J. Entropy production in nonequilibrium systems at stationary states. Phys. Rev. L , , 020601.73. Betel, H.; de Oliveira, P.P.; Flocchini, P. Solving the parity problem in one-dimensional cellular automata. Nat. Comput. , , 323–337.74. Cattaneo, G.; Finelli, M.; Margara, L. Investigating topological chaos by elementary cellular automatadynamics. Theor. Comput. Sci. , , 219–241.75. Wolfram, S. Cryptography with cellular automata. Conference on the Theory and Application ofCryptographic Techniques. Springer: New York, NY, USA, 1985, pp. 429–432.76. Wolfram, S. Random sequence generation by cellular automata. Adv. Appl. Math. , , 123–169.77. Martinez, G.J.; Seck-Tuoh-Mora, J.C.; Zenil, H. Computation and universality: class IV versus class IIIcellular automata. arXiv preprint arXiv:1304.1242 .78. Ince, R.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy , , 318.79. James, R.G.; Ellison, C.J.; Crutchﬁeld, J.P. dit: a Python package for discrete information theory. J. OpenSource Softw. .80. Makkeh, A.; Theis, D.O.; Vicente, R. BROJA-2PID: A robust estimator for bivariate partial informationdecomposition.

Entropy , , 271.81. Finn, C.; Lizier, J.T. Pointwise partial information decomposition using the speciﬁcity and ambiguitylattices. Entropy , , 297.82. Gelman, A.; Stern, H.S.; Carlin, J.B.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis . Chapmanand Hall/CRC:New York, NY, USA, 2013.83. Bishop, C.M.

Pattern Recognition and Machine Learning (Information Science and Statistics) ; Springer-Verlag:Berlin, Heidelberg, 2006.84. Young, L.S. What are SRB measures, and which dynamical systems have them?

Journal of Statistical Physics , , 733–754.85. Sándor, B.; Jahn, T.; Martin, L.; Gros, C. The sensorimotor loop as a dynamical system: how regular motionprimitives may emerge from self-organized limit cycles. Front. Robot. AI , , 31.86. Pikovsky, A.; Rosenblum, M.; Kurths, J.; Kurths, J. Synchronization: A Universal Concept in Nonlinear Sciences .Cambridge University Press: London, UK, 2003.87. Kuramoto, Y.

Chemical Oscillations, Waves, and Turbulence , Springer Science & Business Media: Heidelberg,Berlin, Germany, 2012.88. Haken, H. Synergetics.

Phys. Bull. , , 412.89. Chalmers, D.J. Strong and weak emergence. .Reemerg. Emerg. , pp. 244–256.90. Jensen, H.J.; Pazuki, R.; Pruessner, G.; Tempesta, P. Statistical mechanics of exploding phase spaces: Onticopen systems. J. Phys. A: Math. Theor. , , doi:10.1088/1751-8121/aad57b..91. Strogatz, S.H. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering ;CRC Press: Boca Raton, FL, USA, 2018.

92. Ashby, W.R. Principles of the self-organizing system. In

Principles of Self-Organization: Transactions of theUniversity of Illinois Symposium ; Foerster, H.V.; G. W. Zopf, J., Eds.; Springer: New York, NY, USA, 1962; pp.255–278.93. Kolmogorov, A.N. Three approaches to the quantitative deﬁnition oﬁnformation’.

Prob. Inf. Trans. , , 1–7.94. Li, M.; Vitanyi, P. An introduction to Kolmogorov complexity and its applications , 3rd edition; Springer: NewYork, NY, USA, 2008.95. Chaitin, G.J.

Information, Randomness & Incompleteness: Papers on Algorithmic Information Theory , WorldScientiﬁc: Singapore, 1990.96. Loeve, M.