[PDF] Physical grounds for causal perspectivalism

Abstract

In this paper we ground the asymmetry of causal relations in the internal physical states of a special kind of open dissipative physical system, a causal agent. A causal agent is an autonomous physical system, maintained far from equilibrium by a low entropy source of energy, with accurate sensors and actuators. It has a memory to record sensor measurements and actuator operations. It contains a learning system that can access the sensor and actuator records to learn and represent the causal relations. We claim that causal relations are relations between the internal sensor and actuator records and the causal concept inherent in these correlations is then inscribed in the physical dynamics of the internal learning machine. The existence of contingent internal memory states means each causal agent is in a different physical state. We argue that it is in this sense that causal relations are perspectival. From the outside, averaging over internal states, the causal agents are identical thermodynamic systems.

Full PDF

PPhysical grounds for causal perspectivalism.

G. J. Milburn and S. ShrapnelSeptember 15, 2020

Abstract

In this paper we ground the asymmetry of causal relations in the internal physical statesof a special kind of open dissipative physical system, a causal agent. A causal agent is anautonomous physical system, maintained far from equilibrium by a low entropy source ofenergy, with accurate sensors and actuators. It has a memory to record sensor measure-ments and actuator operations. It contains a learning system that can access the sensor andactuator records to learn and represent the causal relations. We claim that causal relationsare relations between the internal sensor and actuator records and the causal concept inher-ent in these correlations is then inscribed in the physical dynamics of the internal learningmachine. The existence of contingent internal memory states means each causal agent is ina diﬀerent physical state. We argue that it is in this sense that causal relations are per-spectival. From the outside, averaging over internal states, the causal agents are identicalthermodynamic systems.

Is causation in the external, physical world or in our heads? Russell [1] famously denied theformer while the latter seems unacceptably subjective. The current interventionist [2, 3] accountof causation, viewed as a perspectival account [4], seems to be somewhere in between, “anirenic third way” in the words of Price [5]. We will demonstrate that causal claims depend fortheir truth on the internal physical states of a special kind of machine, a causal agent. Thiswill enable us to defend a perspectival view of causation that is not anthropocentric, and isdependent on the laws of physics, especially thermodynamics. Our objective is to give empiricalsupport to Price’s “causal viewpoint” as “a distinctive mix of knowledge, ignorance and practicalability that a creature must apparently exemplify, if it is to be capable of employing causalconcepts.” [5]. In Cartwright’s terms, we physically ‘ground the distinction between eﬀectivestrategies and ineﬀective ones” [6]. As Ismael writes “Causes appear in mature science not asnecessary connections written into the fabric of nature, but robust pathways that can be utilizedas strategic routes to bringing about ends.”[7]. We seek these ‘robust pathways’ in the physicalstructure of learning machines.One aim is to relieve the apparent tension between our usual understanding of causation andour best theories of fundamental physics. At heart, our position rests on a kind of fundamentalismabout “open systems physics”. In doing so, we answer Russell’s criticism and put causation ﬁrmlyback on physical ground. A second aim is to provide a concrete, physically realistic, minimalmodel of a causal agent as a special kind of machine. This provides a framework from whichto consider philosophical questions pertinent to agential theories of causation. Our aims arecircumscribed by what we can be grounded in physical reasoning. Whether we should think ofcausal asymmetry as stemming from free will or intention, or whether a human agent exceeds oris bounded by the features of our minimal model, we leave open.1 a r X i v : . [ phy s i c s . h i s t - ph ] S e p e take it as given that causal concepts ﬁnd no application in the fundamental physics ofclosed systems. This is Russell’s view. A system is closed if it cannot interact with anythingoutside itself. Possibly the only truly closed system is the universe itself. There may be otherclosed systems but the quantum theory of measurement tells us we cannot know anything aboutthem, as all observed quantum systems are necessarily open and this cannot be made negligibledue to decoherence. Even without resorting to quantum theory, classical physics tells us thatwhen we take general relativity into account there is no closed system other than the universeitself. This is simply a consequence of the fact that the gravitational interaction cannot bescreened. We will henceforth regard spatially restricted closed systems as irrelevant to ourdiscussion. We consider partially open sub-systems of the universe, and the interactions betweenthem. It is our contention that casual relations can only be understood in terms of open systems.What then of fundamental physics? The time symmetric dynamical laws of physics enable usto say, to a high degree of accuracy, what will happen in partially open systems, the prototypicalexample of which is called an experiment. Furthermore, the laws themselves deﬁne what we meanby a partially open system by specifying the relevant scale of interactions between subsystems.For example, when we observe the elastic collision between two laboratory scale objects we assumethat the gravitational eﬀect of the sun can be ignored as the gravitational constant appearingin Newton’s law is very small compared to the internal electrical forces ultimately responsiblefor the interaction. This assumption is vindicated by the high degree to which momentum isconserved in the collision. A system is partially open if its internal interactions are much strongerthan any interactions with systems external to it. Of course, this is dependent on experimentalaccuracy and the time-scale of the experiment: in the long run even small interactions matter.We are mainly going to talk about a special kind of open system: one that makes measure-ments upon its environment via sensors and then acts upon that environment via actuators inways that depend on the measurement results [8]. As we will discuss, there is a key thermody-namic asymmetry between sensors and actuators: actuators do work on the world but the worlddoes work on the sensors.The argument we make here is summarised as follows. In closed classical and quantumsystems, causal relations are irrelevant just as Russell has argued. Russell’s argument does notapply to open subsystems of the universe (with boundaries deﬁned by good physical arguments)as open subsystems are described by irreversible dynamics. Given a boundary there are twoways to describe an open subsystem: from the inside with access to all internal states and fromthe outside with access to only external states and coarse grained descriptions of the subsystemThere is a special class of open subsystems called causal agents. A causal agent (CA) is anopen system maintained in a non equilibrium steady state stabilized by access to an external lowentropy source of energy and that contains specialised subsystems; sensors , actuators , memories ,and a learning machine . Causal relations correspond to relations between internal actuator andsensor records as represented by the states of the internal learning machine. These are objectivestates from the internal perspective of the CA. From an external perspective (the thermodynamicperspective) the internal records are unknown and the description is given in terms of a coarsegrained interaction between a subsystem driven far from equilibrium interacting with the worldaround it: causal relations play no role in this description, only thermodynamics. All CAs requireaccess to a source of free energy to function. Thus they can only arise in a universe that is notin thermal equilibrium. This last point was emphasised recently by Rovelli[10] however in ourmodel the agent is a physical system that, by external driving and dissipation, is maintained ina non equilibrium steady state.In section 2 we give a deﬁnition of a causal agent. In section 3 we discuss a very simple classical In this paper free energy means Helmholtz free energy. A related but distinct concept of free energy has beenused to describe the informational aspects of agents by Friston and Stephan[9]

We now deﬁne a causal agent (CA) as a special kind of open physical system interacting withits environment. A causal agent has at least the following features;1. it is open, ﬁnite and localised in space and time,2. it can persist in one or more non equilibrium stable steady states which are stabilised byaccess to a low entropy source of energy.3. it has sensors the physical state of which is changed by the world external to it (a mea-surement in other words),4. it has actuators capable of changing physical states in the external world,5. it has a memory capable of storing the record of sensor readings ( sensations) and actuatorsettings (actions),6. it has a learning component implemented by dissipative physical systems (digital or ana-logue) that enable the CA to represent the correlations inherent in the sensor and actuatorrecords via the dynamical states of the learning system itself.The ﬁrst and second of these features is intended to emphasize that a CA is an open subsystemthat interacts only with its local environment subject to the usual constraints of relativity. Thatis to say, only physical events in the past and future light cones of the CA are relevant fordescribing its behaviour. That a CA is ﬁnite implies that the available resources (eg. energy,time) are restricted. The spatio-temporal boundary that delimits the causal agent from the restof the world is the fundamental origin of the asymmetry between cause and eﬀect. The sensors,actuators, memories and learning can only operate with low error probability if the machinehas access to a low entropy source of energy. The reason for this is analogous to the resolutionof the Maxwell demon argument[11]. All measurements are subject to noise. A measurementis accurate only if the sensor has lower entropy than the unknown state of the system it ismonitoring. Similarly, an actuator that is in thermal equilibrium with the world it is attemptingto control does not work.The third of these features captures the key point that the CA can make measurementsof a ﬁnite set of physical quantities in its environment. Such measurements are mediated bylocal physical interactions between the CA and matter/ﬁelds in its local environment. Forexample, it may make measurements of the position and velocity of nearby particles by emittingoptical photons and absorbing the scattered light from the particles. Or it may measure thethermodynamic state (pressure, temperature, etc) of external subsystems. These measurementsare described by the known physical interactions between parts of the CA and the external world.These interactions are irreversible in the sense that the world changes the thermodynamic state3f the sensors. Crucially, the measurement records are random variables (i.e. subject to noiseand error) that ultimately arise from thermodynamic constraints.The fourth feature enables the CA to apply forces and do work, or other thermodynamictransformations, on subsystems in the external world. It may use local electrical circuits toapply electromagnetic forces to do work on external subsystems; it may emit radiation to directlyheat external subsystems; it may simply act with mechanical elements (levers, hammers etc. ).Like measurements, these actions are subject to noise and error arising from thermodynamicconstraints.The ﬁfth feature is critical to the deﬁnition of a CA. It must contain a ﬁnite memory to recordboth measurements (from sensors) and actions (of its actuators). These records are stochasticinternal records by which a contingent history of its experience in the world is represented bychanges in its internal physical state. The stochastic nature of measurement results distinguishthe physical state of one CA from another of identical construction acting in an identical envi-ronment: the probability that two distinct CAs contain identical records is very small. In thisrespect each CA is unique. The records stored in memory may be said to encode the internalperspective of an individual CA. We have stipulated that memories are ﬁnite. This means that amechanism must be exist to restore the systems implementing a memory to a single ﬁxed state.For example if the memory is very small it will need to be reset for continued use[11]. In anycase the memory needs to be initialised in a known (low entropy) state in order to reliably storeinformation and they must be subject to very little noise or, if not, capable of being reset usingerror correction which requires a low entropy source[8].The sixth feature enables a CA to act contingently in a way that reﬂects how sensations deter-mine actions. This could be via a deterministic, or hard wired, ’reﬂex’ connecting measurementto actions (as in a simple thermostat). Alternatively, the rules might be based on an internallook-up table from measurement records to appropriate actions. We will assume that the CA hasan internal learning machine of some sort. This may fall short of the universal computationalcapability of a modern digital computer: Possibly a set of special-purpose sophisticated analoguecomputers may suﬃce. In any case, the learning machine is capable of physically representingthe correlations between sensor and actuator records.The last feature implies that the CAs have an additional, rather subtle feature: an internalmechanism to enable it to sequence actions conditioned on measurement results. It might beas simple as a mechanism to ensure that actions occur after measurements. Perhaps as soon asa bit is switched from zero to one in memory, an action is triggered. Of course this assumes afundamental asymmetry between sensors and actuators. In our model they are distinguished bythe thermodynamic processes they experience: the world does work on sensors but the actuatorsdo work on the world; they are built that way. We stress that the CA need not be synchronizedto any external clock or directly signal its internal state to the external world. For a CA, time is apurely local and perspectival feature. The passing of time is reﬂected in the gradual accumulation(or erasure) of memory records.In the rest of this paper we will simply assume that machines exhibiting all these featureseither currently exist or can be built. Our objective is to show that CAs of this kind can bestbe described using causal relations grounded on their internal memory records. The behaviourof a CA is exhaustively described in terms of learned relationships between the records storedin memory. The CA discovers and represents these relationships through its internal learningcapability. If an external observer does not have access to these internal records (the standardthermodynamic assumption), the behaviour of the machine cannot be described using causalrelations. Rather, its behaviour is described using the physical theory of open systems (statisticalmechanics, thermodynamics etc).It is possible that animals exhibit some or all of the features of a CA as we have deﬁned4hem. If they do, clearly they contain much more besides. For example, we do not discuss howa CA draws upon a low entropy source of energy, how it maintains itself in a metastable statein a changing environment state or weather it is artiﬁcially engineered, self assembled machineor reproducing.Our construction of a causal agent has considerable overlap with similar constructs by otherauthors. The importance of sensors and actuators for artiﬁcial agents is a staple of textbookson artiﬁcial intelligence[14] that provide the raw data for learning algorithms. Briegel[15] alsostresses the importance of sensors and actuators for embodied agents. His novel concept of‘projective simulations’ plays the role of a learning machine in our model. He emphasises the roleof stochasticity for creative learning agents and possible quantum enhancements. In elucidatinghis concept of action based semantics Floridi[16] describing a two-machine artiﬁcial agent (AA).This enabled the relationship between the internal states of the AA to play the role of ‘semantic-inducing resources’ for what would otherwise be raw bit strings without resorting to an externalsemantic crutch. The ‘two machines’ of his scheme roughly correspond to the actuator/sensormachines and the learning machine in our model. Our view is close to the agent model introducedin a biological setting by Friston and Stephan[9] and subsequent developments [17]. We willreturn to this in the discussion section.

Consider the thermodynamic system shown in Fig.(1). A local system, the ‘source’, can emitparticles of variable kinetic energy. It is driven by work done from a power supply towards anon equilibrium steady state with its local environment at temperature T s . The particles traveltowards a small potential hill. If they have enough kinetic energy, they can surmount the hilland never return to the source. If they do not have suﬃcient kinetic energy they will be reﬂectedfrom the potential hill and return to the source. We will assume that the motion of the particlesonce they leave the source is entirely conservative, that is to say, particles move without friction.As particles that surmount the barrier are lost, we will assume that the source is supplied by aparticle reservoir such that the average particle number of the source is constant in time. But tomake very sure that particles with suﬃcient energy to surmount the barrier do not encounter ayet higher barrier we will add a particle absorber to the right of the potential hill. Overall, thisentire physical arrangement is an irreversible system. See Fig. (1).To begin we will assume no work is done on the particle source. Suppose the source initiallyemits particles which can take one of two possible values of kinetic energy ( e − , e + ), with equalprobability, such that e − < E as e + > E . All particles that have energy e + are absorbed atthe right thereby raising the internal energy of the absorber. All particles that have energy e − < E are returned to the system and absorbed. The system has access to an environment thatemits/absorbs particles locally to keep the average particle number and average energy constant.The entropy of the source is one bit in natural units and its average energy ( e − + e + ) /

2. Theaverage energy of the absorber is steadily increasing as it absorbs particles from the source. Thisenergy is heat extracted form the environment of the source. From the external perspectivethe source is simply a source of heat as far as the absorber is concerned: it is not in thermalequilibrium.Now suppose that external work is done on the particle source so that it is driven away fromits initial thermal equilibrium to a new non equilibrium steady state such that the probabilitydistribution { p − , p + } changes such that p − >> p + . From this external perspective, it appearsthat the thermodynamic state of the source has changed: it has lower entropy and lower average5 dWdQ T sheat/particle source e- e+e- absorberdNpowersource Figure 1: The external perspective. The system on the left, the source, is connected to a sourceof work (increment dW ) and also a thermal/particle reservoir (increments dQ, dN respectively).When the work done on the source is zero it is in thermal equilibrium and simply heats a distantabsorber. By doing work on the source we can bias it to emit predominantly low energy particlesand reduce the heating rate of the absorber.energy. Now far fewer particles make it over the barrier and the rate of change of energy in theabsorber decreases. From this perspective the physical interpretation is clear: the environment ofthe source (i.e. its local reservoir of heat and particles) is driving its entropy lower and loweringthe emitted power. Ultimately the entropy of the source environment must increase to satisfythe second law.Let us summarise. By deﬁnition, the external perspective assumes that the internal statesof the CA are unknown. Only a coarse grained thermodynamic state is known. All similar CAshave the same thermodynamic description. Let us now imagine how this looks from the internal perspective of the source, if we assume itis a CA. This is depicted in Fig.(2). The actuator records now track the energy of an emittedparticle: recording a if an e − particle was emitted and a if an e + particle was emitted. Thesensor records track if a particle was received back: recording a if no particle is received backand a if a particle is received back. The CA has access to a (classical) learning machine thatrewards a sensor record of , that is to say, it gets a reward if an emitted particle is receivedback. Of course in reality both actuators and sensors have a small error probability so that therecord does not exactly match what actually happened in the actuator and sensor devices.Initially the actuator emits particles with energy e ± with equal probability. The CA can nowlearn that there is a causal correlation between emitting an e − particle and receiving a particle.Using its access to an external source of power it then begins to shift the distribution of theenergy of emitted particles such that its actuator record is composed entirely of s. Due tophysical errors this does not mean that every emitted particle had an energy e − . That would bean unphysical state of zero entropy. However, for an eﬀective learning strategy the sensor recordis composed overwhelmingly of s provided the power source has suﬃciently low entropy (i.e.high free energy). Note that in the steady-state the entropy of the CA has decreased. This is anecessary feature of all CAs that learn[18]. 6 - e+e-A: 101001S: 110101dWdQ T sheat/particle sourcedNpowersource Figure 2: The internal perspective. The same system as described in Fig.(1) but for which wehave access to its contingent internal states representing actuator and sensor records.From this internal perspective the CA has learned something about the outside world; theexistence of the potential energy bump with an energy hill greater than e − and this is representedin the steady-state operation of the learning machine. This is a kind of measurement and itsaccuracy depends on how low an error rate can be achieved in the sensors and actuators, thatis to say, how low an entropy is associated with their records. From an internal thermodynamicperspective, the entropy of the records has been reduced by the learning machine. Learningmachines (at least classical learning machines) necessarily reduce entropy and thus must besupplied by a source of free energy.The internal records of the CA are completely contingent. As the actuator/sensor recordsare random binary strings every CA will, almost certainly, have diﬀerent records. The physicalinternal states of each CA is unique but, from the external perspective the internal records areunknown (by deﬁnition), and every CA of this type behaves as an identical thermodynamicmachine.This simple example may be given a quantum description but this requires that the CAbecomes intrinsically uncertainty. Firstly we cannot use states of deﬁnite energy as such statesare not localised in space. We must use wave-packet states that are localised in position andmomentum. Secondly such states are dispersive under free evolution. That is to say Schroedingerevolution leads to wave packet spreading. Even if the average energy of the state is less than thebarrier height, it can still pass the barrier by quantum tunnelling. Likewise even if the averageenergy of the wave-packet state is greater than the barrier it can still be reﬂected from the barrier.This leads to a necessary uncertainty in sensor and actuator records.There is diﬀerence for the quantum description that is more fundamental if we demand thatboth sensor and actuator records are classical numbers. In the case of the sensor this means thatwhatever system is acting as a sensor must make a position measurement on reﬂected states if itis to record their return to the agent. This requires decoherence and thus the sensor is necessarilyan open system. Furthermore the results of such a measurement are intrinsically uncertain. Thesituation is quite similar for the actuators which must be controlled by a classical binary digit.7lassical control of this kind also requires decoherence[19]. We thus see that randomness anddecoherence enter the quantum description in an essential way.This is true for all agents that use quantum states but retain classical records in their sensorsand actuators. It is certainly possible for agents to use purely quantum sensors and actuators[20]and there is a uniquely quantum form of control called coherent control using the theory ofcascaded systems in quantum optics[21]. This is a uniquely quantum mode of operation but itis far from clear how one could incorporate learning into such a scenario. This simple example illustrates the conditions for the sensors and actuators to operate in a waythat enables a CA to work. We will now capture these constraints in a more abstract form. It isclear that there must be a physical asymmetry between the behaviour of sensors and actuators.Actuators act on the world by changing the thermodynamic state of some part of it. Sensors onthe other hand are acted upon by some part of the world so that their thermodynamic state ischanged.Work done on/by a system is constrained by the change in the free energy(Helmholtz ). Thekey relation is W ≤ − ∆ F , the average work extracted from a system must be less than thedecrease in free energy and if work is done on the system the free energy increases. Physicalchanges to the sensors increase the free energy of the CA while actuators decrease the free energyof the CA. See Fig.(3). These thermodynamic asymmetries must be built into the physicalconstruction of sensors and actuators.Average work, and corresponding changes in free energy, are part of the external descriptionof how the thermodynamic state of a system changes. To describe the process from the internalview we make use of the Jarzynski equality[13]. From this perspective, work is a random variableconditioned on contingent physical processes inside the agent.From the internal perspective of a CA, a time series record is kept of whether a sensor oractuator ‘triggered’ or not in each time step. This record can be represented by a binary string A = { . . . } , S = { . . . } of arbitrary (but ﬁnite) length where 1 indicates theaction/sensation did take place and 0 that it did not. This is depicted schematically in Fig.(3).These records may not be faithful representations of what actual physical events took place:Both sensors and actuators are subject to noise and may fail to have done what the recordindicates. This may seem a simple practical constraint but in fact it is fundamental consequenceof the second law which prevents a ﬁnite system from achieving a zero entropy state and an errorfree state has zero entropy. In physical terms it means that both sensors and actuators are opensystems subject to environmental sources of noise. In the classical case this noise is thermal andturns oﬀ at zero temperature. In the quantum case there can be noise even at zero temperaturedue to spontaneous emission.Let us assume for simplicity that both actuator and sensors have only two physically distin-guishable states labeled by a binary variable x = 0 , E x aresuch that E < E . We assume that, in the absence of actions and sensations, each system ishighly likely to be found in an initial ‘ready’ state. In the case of a sensor the ready state is thelower energy state and in the case of an actuator ready state is the high energy state.In order to incorporate errors and noise we will assume that the system representing a sensoror an actuator is coupled to an environment such that the physical state ﬂuctuates and thus x ( t )is a stochastic variable. Let p x be the occupation probability of each state ( p + p = 1). In the We choose the Helmholtz free energy as we focus on electro-mechanical systems. Were one describing a causalagent based on biology or chemistry the Gibss free energy would be more appropriate ..1011010101000100010...00101001001000011111 external system he a t b a t h t e m p e r a t u r e T sensoractuatorlocaltemperature Tcm e WAWSYX I(X;Y)Causal Agent Δ F Δ F SA < 0> 0 Δ Felearning machine

Figure 3: Schematic of the interaction between a causal agent with a system external to it. Thebinary string X is a record of actions. The binary string Y is a record of sensations. The workdone by the actuator device on the external system is W A . The work done by the external sensor on the sensor device is W S . The change in free energy of the actuator is ∆ F A and the net changein free energy of the external system is ∆ F e . The mutual information between the actuatorrecord and the sensor record, I ( X ; Y ), is a measure of the correlation between the two. Thecausal relation between them is encoded in the physical states of the learning machine that actsas a complex feedback control from sensors to actuators.absence of internal (actions) or external (sensations) inputs, the occupation probability for eachstate is the stationary solution to the birth-death master equations of the form dp dt = − γ p + γ p = − dp dt (1)where γ corresponds to a transition 0 → γ corresponds to the transition 1 →

0. Thecorresponding stationary distributions are then given by p ( ∞ ) p ( ∞ ) = γ γ (2)The conditions that distinguish the quiescent state of sensors and actuators (the ‘ready’ statefor sensations and action) are γ < γ sensor (3) γ > γ actuator (4)In the case of a sensor, prior to a sensation, it is more likely to be found in the lower energystate x = 0 than the higher energy state x = 1. In the case of an actuator, prior to an action,it is more likely to be found in the higher energy state x = 1 than the low energy state x = 0.This is illustrated in Fig.(4).It is important to stress however that neither the sensor nor the actuator are in thermalequilibrium. They are in non equilibrium steady states due to external driving of a dissipative9 A B t t n(t) n(t)

10 20 30 40 500.20.40.60.81.0 - - - - - - - - Sensor Actuator10 0 01

Figure 4: The typical quiescent state (the ‘ready’ state) stochastic dynamics of the energy (scaledby E ) for the two-state system. (A) Sensor case, γ /γ = 0 . γ /γ = 10 . γ , go to zero as the tem-perature goes to zero however in the quantum case they may not go to zero due to dissipativequantum tunnelling (as in optical bistability for example [12]). This simple two state birth-deathmaster equation model can describe both classical and quantum sensors and actuators.It will be important for later discussion to note what happens under time reversal of themaster equation. In this case the role of γ and γ are switched. In other words, under timereversal an actuator becomes a sensor and a sensor becomes an actuator.These stationary distributions describe an ensemble of identical systems but an individualsystem is certainly not stationary. In fact it is switching between the two states at rates de-termined by γ , γ . The distinction between the ensemble states and the individual stochasticstates corresponds to the view from the outside and the view from the inside respectively. In along time average the ratio of the times spent in each state is given by the ratio of the transitionprobabilities τ τ = γ γ (5)in the limit that the total time τ + τ → ∞ . Thus prior to actions and sensations, the sensorspends more time in the lower energy state x = 0 while the actuator spends more time in thehigher energy state x = 1.From the internal perspective, we need to describe the energy of a single system. In thisexample x ( t ) is a stochastic process known as a random telegraph signal. Technically this meansthat there are two Poisson processes dN ( t ) x where x = 0 or x = 1 that take values 0 , t to t + dt . This means that in most small time intervals the state of the systemdoes not change but every now and then the system can jump from one state to the other. If ajump does happen, one or the other of dN ( t ) x = 1 in the inﬁnitesimal time interval t → t + dt .The probability to take the value 1 in time interval dt is then simply P ( dN x ( t ) = 1; t + dt, t ) = E ( dN x ) = γ x dt (6)where E deﬁnes an ensemble average over many realisations. These equations imply that thecontinuous record of the the state label x ( t ) satisﬁes the stochastic diﬀerential equation, dx ( t ) = (1 − x ( t )) dN + ( t ) + x ( t ) dN − ( t ) (7)A simulation of this stochastic process is shown in Fig.(4). The internal states of other compo-nents in the agent are responding to these ﬂuctuating signals at all times. The agent is said to be10 - - ε ε ε AS _f

Figure 5: The mean energy for a sensor (S) and actuator (A) as a function of (cid:15) . The bias valuesfor the quiescent states of each are indicated by (cid:15) . When internal and external control functionsact, the value of epsilon is changed to (cid:15) f = − (cid:15) .in a ready state or quiescent state if the time average of the signal corresponds to the stationarystates in Eq.(2).We now need to describe how these devices respond to internal (actions) and external (sen-sations) inputs. We will refer to these inputs as the control functions. First deﬁne γ = γe − (cid:15)/ (8) γ = γe (cid:15)/ (9)If (cid:15) is a constant the steady state average energy is E [ E ] = E (1 + e ± (cid:15) ) − (10)where − (cid:15) corresponds to an actuator and + (cid:15) corresponds to a sensor. This is shown in Fig.(5).To set the devices to their quiescent state, a particular bias value (cid:15) > (cid:15) in time from the quiescent state bias toa ﬁnal value (cid:15) f . The control functions push the devices away from their stationary state for ashort time, i.e. the occupation probabilities are changed, p x → p (cid:48) x . In the case of a sensor theinput comes from outside while for the case of an actuator the input comes from inside the CA.This fundamental asymmetry requires a clear boundary between the agent and the world. Oncethe control pulses have passed the sensors and actuators revert to their quiescent steady states.As a simple example we will assume that the ﬁnal value (cid:15) f is chosen to be the initial biasvalues multiplied by minus one. Such a value of (cid:15) f simply swaps the steady state occupationprobability of the states.From the inside view, this means that some control pulses induce a state change and somedo not. In other words, the change in the internal energy of the system is a stochastic variableas is the work done by/on the external world, w . For example, in the case of a actuator, theprobability that the device remains in its quiescent state and does not change state as the controlfunction acts, is simply p e = (1+ e (cid:15) ) − . This is the error probability for the actuator. In this casethere are two possible values for the change in the energy of the system 0 , − E with probabilitydistributions, P r (0) = p e , P r ( − E ) = 1 − p e . Thus in the case of an actuator, the work w doneby the system on the outside world is one of the two values 0 , − E , ﬂuctuating between the twovalues from trial to trial. From the outside view, with this choice of (cid:15) f , the average entropyof the sensor and actuators do not change while the sensor experiences an increase in averageenergy and the actuator experiences a decrease in average energy.The Jarzynski equality [13] is a relation between the ensemble averaged values of w over manytrials and the change in free energy corresponding to the two distributions p x , p (cid:48) x for systems in11ontact with a heat bath. It thus relates the inside (stochastic) description to the thermodynamicoutside description. It is E [ e − βw ] = e − β ∆ F (11)where β − = k B T and ∆ F = ∆ E − β ∆ S with E the average energy and S the average entropy(in natural units) for each of the distributions p x , p (cid:48) x .In our presentation there is no requirement for the sensors and actuators to be thermalsystems: they are maintained in arbitrary non equilibrium steady states. Nonetheless a Jaryznskitype equality holds (see [41]). We replace k B T with E as a convenient unit of energy and theequality takes the form E [ e − w/E ] = e − ∆ F/E (12)For the choice of (cid:15) f made here the change in free energy is simply the change in average internalenergy as the average entropy does not change. For a sensor the free energy increases by E andfor an actuator the free energy decreases by E when the control functions act. The relation inEq.(12) indicates that the world has done work on a sensor while an actuator does work on theworld.Our treatment is entirely classical but can easily be extended to the quantum case. A modelfor quantum enhanced sensors and actuators is discussed in [20]. In that study an exampleis given for which the learning rate of a quantum enhanced agent in a noisy environment issigniﬁcantly better than the corresponding classical version.Returning to the internal view, we note that we are not quite done yet. The agent needs torecord a binary digit to label if a control pulse was present or not in some interval of internaltime. In the case of a sensor this will write a record of sensation to memory. In the case of anactuator this will write a record of an action to memory. There needs to be a physical process,a switch, in the agent responding to the ﬂuctuating energy states of the sensor/actuator and onthe basis of time spent in the high/low energy state, record a 1 if it is not in the expected steadystate or otherwise record a 0.It is easy enough to devise physical systems that integrates the random telegraph signaldescribing a particular stochastic history of sensors and actuators. The integrated signal canthen be used to switch a memory. We will give an example for sensors. The actuator case issimilar. For example, we could deﬁne a stochastic variable Z ( t ) that satisﬁes the stochasticdiﬀerential equation dZ ( t ) = − κZ ( t ) dt + Adn ( t ) (13)where κ, A are positive constants and dn ( t ) satisﬁes Eq. (7), the random telegraph processcorresponding to the continuous-time readout of the sensor/actuator state. In this example Z ( t )might be a voltage on a resistor driven by a current proportional to dn/dt It then follows thatfor initial conditions pushed into the far past so that all transients have died out, Z ( t ) = A (cid:90) t −∞ dn ( t (cid:48) ) e − κ ( t − t (cid:48) ) (14)itself a stochastic process. The ensemble average of Z ( t ) is E [ z ( t )] = ¯ Z = Aκ p (15)Before an input pulse acts this is small but shortly after an input pulse it is large. We now letthe stochastic signal Z ( t ) drive a Bonhoeﬀer-van der Pol (BVP) neuronal oscillator [22]. Thisis a dynamical system that has a stable ﬁxed point, except for particular range of values ofthe driving, Z ( t ), when the ﬁxed point becomes unstable and a stable limit cycle forms. We12hus need to set up the BVP oscillator to switch to the limit cycle when the integrated value of n ( t ) that determines Z ( t ) takes values appropriate to the limit cycle with high probability. Theappearance of the limit cycle is the record of the sensor/actuator record. Note that the limitcycle only appears after the input pulse to the sensor triggers a response and decays away as thesensor returns to its steady state. This response can then be used to trigger further actions inthe agent such as learning or feedback control of the actuator. In our model the only data a CA has access to is the content of its sensor and actuator records. Inorder to learn causal concepts we grant the machine some additional systems that can implementlearning based on this data. In this section, we consider some suﬃcient conditions required inorder for a machine to learn such relations.We will assume the validity of the Church-Turing-Deutsch (CTD) principle. Nielsen[23] statesthis as follows:Every physical process can be simulated by a universal computing device.We will take this to be the equivalent to the statement:all sensor data acquired by the CA can be treated as the output of a Turing machinewhich takes the actuator data as input. This is similar to Solomonoﬀ induction[24].The CTD principle is not to be confused with the Church-Turing thesis which is a meta-mathematical statement about computable functions. The CTD principle is rather to be re-garded as a physical principle much like the laws of thermodynamics. A good discussion on thisdistinction can be found in [25]. Of course the CTD principle may simply be wrong. Ultimatelyexperimental evidence will decide the issue.An important consequence of the CTD principle is that a suitably conﬁgured and suﬃcientlycomplex subsystem of the universe can simulate the results of any physical experiment. In ourcase this means that an agent can be any physical device with suﬃcient complexity to implementa universal computer, possibly a quantum computer. Deutsch refers to this as a kind of universalself-similarity[35].Many attempts have been made to show that the physical world violates the CTD principleand they continue unabated[26]. In many cases that have been analysed carefully one typicallyﬁnds an implicit assumption about the nature of physical resources that seems unphysical[27].If the world does not satisfy the CTD principle then presumably bits of it might be organisedto compute functions outside of the class of Turing computable functions. Such a possibilityis known as hypercomputation[28]. A famous model was introduced by Hava Siegelmann[29]( see the critique by Douglas[30]). There is also the interesting case of Malament-Hogarthspacetimes[31] and Malament-Hogarth Machines[33]. A good analysis of the unlikely prospectsfor hypercomputation can be found in Davis[32]If the world does not obey the CTD principle than neither do causal agents as we have deﬁnedthem. This would open up a huge new class of possibilities for learning machines. Penrose[34]has argued that the human brain is a good example of a physical system that does not obey theCTD principle. We will simply assume that, for the causal agents we are discussing, the onlysensor inputs they receive can be regarded as the output of a Turing machine which takes theactuator data as input.Broadly speaking, causal relations between sensor and actuator records correspond to classesof learnable functions that relate function inputs (actuator records) to function outputs (sensor13ecords), or symbolically S = f ( A ). In the presence of noise and error, the function is stochasticand depends on additional uncorrelated noise variables, S = f ( A, N ). We ﬁrst consider possibleconstraints on the kind of function classes that the CA can learn.Recall that our CA are ﬁnite. As such, we require that the sample complexity (the numberof examples required to learn the causal relation in question) is bounded in some reasonablemanner. The framework of probably approximately correct learning (PAC) provides one avenueto evaluating this task. Here one asks how many examples are required for the CA to ﬁnd withhigh probability (1 − δ ) an hypothesis ( h ) that will make no more than (cid:15) errors on future unseenexamples. For relatively simple functions it is possible to determine directly if the function istractably learnable for a given number of samples m .Let us consider a simple example. Imagine the function given by the world is the XORfunction. While XOR is an abstract Boolean function the device that implements it in the worldis a machine and constrained by the laws of thermodynamics. A simple example is shown inFig.(6) This means errors, while possibly rare, will be inevitable. However the CA knows no - - - - - - - - - - - - - A=00,11 A=01 A=10E

Figure 6: A triple-well model of a XOR function built from a triple-well potential and the highlydissipative motion of two particles. Each actuator applies a linear force with positive or negativeslope and records 1 or 0 respectively. If it records 00 and 11 the net force is zero and theparticles remain localised on either side of the origin providing the temperature is not too high.If it applies either 01 or 10 a particle can leave its metastable localised state through thermalﬂuctuations and fall down into the central well which represents the output of the XOR function.The sensor detects the presence of a particle or not at the origin. Thermal ﬂuctuations can leadto an error in the function evaluation. Note also correct operation requires that the CA ﬁnd thesystem in the correct metastable state to begin with. This is not a state of thermal equilibrium.If it is not in this state initially, the gate will also be in error.more about this device than what it can learn from actuator and sensor records.The CA uses a pair of actuators with four possible actions labeled by binary numbers (00,01,10or 11). The sensor responds to the external system in two physically distinct ways and storesthe result in a single binary variable. If the XOR works perfectly the sensor records should beperfectly correlated with the actuators record: a 0 for 00 or 11 and a 1 for 01 or 10. Let us assumesome reward is attached to the returning 1 signal. Can the CA learn the correct intervention toeﬃciently bring about the reward?The PAC learning of Boolean functions is discussed in [36]. To cast this into the PACframework we deﬁne our sample space as the set of four inputs (00,01,10,11), where the CAaims to develop a hypothesis h that maps each example to either 0 or 1. There are 16 possibleBoolean functions of 2 variables: 2 = 4 possible inputs and in each case there are two possibleoutput values (1 , . The normal algebraic form for twobit gates is f ( x, y ) = a ⊕ b.x ⊕ c.y ⊕ dx.y , where the parameters a, b, c, d are also binary variables.For example XOR corresponds to b = c = 1 , a = d = 0. These parameters label each of the16 hypotheses. Although we desire that the hypothesis h should agree with the true function(XOR) perfectly, we acknowledge that errors make perfect certainty unattainable.14ctuators and sensors are physical systems and as such are not immune from noise and error.This will need to be kept low by constraining them to a low entropy environment. Even ifthe actuators and sensors work almost perfectly we need to make some assumptions about theexternal world. Consider the example in Fig. (6). In order for the XOR function to be learnedas the correct description of this system it is necessary that the initial metastable state willbe encountered by the CA with high probability. The probability that this occurs depends ontemperature T of its environment and is proportional to e − βE with β = 1 /k B T . If the triple-wellis too hot, almost every sensor record will be 1 independent of what the actuator does. This islike an erasure error in communication theory. A single bit ﬂip error in the actuator or sensorrecord, or even in the physical device implementing the XOR, is equivalent to a single bit ﬂiperror in the sensor record as XOR is associative.Let the total probability of error be (cid:15) . If an error happens on a single trial the actuator/sensordata will ﬁt any one of the 15 wrong functions of two variables with probability (cid:15) . In a sequenceof N trials of actuator/sensor triples, the probability that every trial invalidates the hypothesis XOR = T rue is bounded by (1 − (cid:15) ) N . We want to ensure this probability is less than some smallnumber δ . The basic result of PAC learning is that we need to ensure our data set is big enoughsuch that N ≥ (cid:15) (cid:18) ln 1 δ + ln 16 (cid:19) (16)Then if the learning algorithm returns a hypothesis that is consistent with this many trials then,with probability at least 1 − δ , it has error at most (cid:15) .While the PAC framework is useful in many contexts, there is an assumption that the numberof possible hypothesis for a given problem is ﬁnite (as in our example). Clearly, if we allow ouractuator/sensor records to range on a larger alphabet, perhaps some ﬁnite precision representa-tion of real variables, this places ever greater physical demands on memory and operation andmay even cause one to doubt the role of real numbers in physics [37]. In such cases, PAC learn-ing will no longer give us any guarantees. An alternative approach is the Vapnik Chervonenkisdimension (VC)[38]. This gives an indication of the capacity of the learned hypothesis to classifyfuture answers to a set of intuitively relevant questions: (i) given a set of m labeled examples,is there a consistent hypothesis? (ii) if we change labels, is there still a consistent hypothesis?(iii) what is the largest m for which we can still produce a consistent hypothesis for which theanswer to (ii) is always “yes”? The answer is the VC dimension. We now give a simple model for a CA learning a two bit Boolean function. Let us suppose thatthe CA has a Boolean function emulator that takes the actuator record in each trial, ( x i , y i ),copies it and sends it to an internal emulator. The actuator settings can be randomly chosenat ﬁrst. The emulator computes the function f ( x i , y i ) = a i ⊕ b i .x i ⊕ c i .y i ⊕ d i x i .y i where thecontrol parameters in the ﬁrst trial are chosen randomly with equal probability. The CA nowcompares the output of the emulator to the corresponding sensor record s i returning from theworld using a comparator function C ( x i , y i , s i ) = f ( x i , y i ) ⊕ s i . If this is equal to zero, thefunction has correctly given the required sensor record while if it is one it gives an error. Thusthe output of the comparator can be used as an error signal to feedback to the emulator andused to generate another actuator setting, or one can simply choose a random setting for theactuator. The emulator continues until the probability of error is reduced to as small a value aspossible. See Fig. (7). Once this is done the CA can be said to ‘have’ the causal concept implicitin the primary actuator and sensor records when the agent interacts with this particular kind of15 rimarysensor recordprimary actuator recordemulationsensor record Ccopy actuator recordfeedbackLearning machinecausal agentemulation andlearning Figure 7: A schematic of a learning machine based on a physical emulator with feedback. Asingle round of learning proceeds as follows. The primary actuator record registers what actionis taken on the external world while the primary sensor record indicates what sensation is receivedfrom the external world. The primary actuator record is copied and sent to an emulation engineto produce an emulated sensor record. This is compared to the primary sensor record by acomparator (C) and the result fed back to the emulation engine which then updates. A newaction is taken and the process repeats until some goal is met for the comparator output. Thefeedback process and update may be a discrete time or continuous time stochastic process. Thismodel for learning is similar to the concept of predictive processing recently developed in thephilosophy of neurosciance, see [39] and references therein.world. These internal representations may well be opaque to an outside observer. In this sensethe causal concepts, like sensor and actuator records, are perspectival.Note that once the correct function in the external world is identiﬁed by the emulator, f ( x, y ) ⊕ s = s ⊕ s = 0 for all actuator inputs. In other words the function c ( x, y, s ) = f ( x, y ) ⊕ s is a cost function and it is a minimum when the correct function is found; although this languageis not very appropriate when we only have binary values. The learning machine simply walks atrandom through the function space until it ﬁnds the correct one and then it stops walking. Inthe next section we will look more closely at the idea of minimising a cost function.In reality errors compromise this learning protocol. It is no longer the case that when thecorrect function has been found, f ( x, y ) ⊕ s = 0 for all actuator inputs. Errors can arise either inthe action of the external world, inside the agent or both. To describe errors in a Boolean functionwe add an additional noise binary variable n so that the gate function becomes g ( x, y, n ) = f ( x, y ) ⊕ n where n is a random binary variable such that the probability P r ( n = 1) = η isthe error probability. Now it is the case that s = f ( x, y ) ⊕ n . The agent will occasionally missthe correct gate identiﬁcation and continue its random walk. Clearly this will slow down thelearning.If the agent has access to a large memory a diﬀerent learning strategy can be implemented.Once a large enough record of sensor actuator pairs has been accumulated, a conventional learningalgorithm can be implemented inside the agent, for example a nearest neighbour algorithm ora neural network algorithm[14]. Errors will slow the learning and we can give a performancemeasure in terms of the error probability that the correct gate is found. In Fig.(8) we plot theprobability for an external XOR gate to give the right output for input of (0 ,

1) versus errorprobability using a Nearest Neighbour machine learning algorithm. As the error becomes moreprobable the agent cannot learn as the external world is becoming a random number generator.16 .2 0.4 0.6 0.80.20.40.60.8

ProbabilityXOR(0,1)=1 error probability

Figure 8: The probability that

P r (0 ⊕ We claim that a causal agent must be capable of learning and that the causal concepts implicitin correlations between sensor and actuator data are stored in the learning machine. A learningmachine is not a magic box. All learning algorithms must be implemented on a learning machine(e.g. a neural net built from an array of GPUs or a Boolean function emulator built fromelectronic circuits) and are thus subject to the laws of thermodynamics. The thermodynamics oflearning machines appears to be in its infancy so we will restrict this discussion to a few generalobservations. Furthermore, we will consider only classical learning machines (i.e. not those thatuse quantum coherence).Following Goldt and Seifert[18] we will base our discussion on a particular kind of learningmachine, the perceptron. We will begin by describing a device that can take any number ofinputs but produces a single binary variable at output. Our model is based on the two-statebirth-death master equation system discussed in section (4) and is thus intrinsically stochastic.Consider a single system with two states labeled with a single binary number y ∈ { , } .The dynamics is assumed to be stochastic and deﬁned by the master equation Eq. (1) for theoccupation probabilities p y We now use this model to deﬁne a perceptron[14], section 18.7.2. Letthe input data be a binary string of length n written as a vector (cid:126)x ∈ { , } n . We now deﬁne aperceptron by specifying how the parameter (cid:15) (that biases the transition probabilities) dependson binary inputs and weights, (cid:15) ( (cid:126)x ) = η (cid:88) k w k x k (17)where w k are the components of a unit vector (cid:126)w , that deﬁnes a point on the ( n − η > weights of the model. This means that | (cid:15) | < ηh where h is the Hamming weight of the vector (cid:126)x .The output of the device is a single (stochastic) binary number y ( t ) with mean ¯ y ( t ) = p ( t ).Clearly y ( t ) is a random telegraph process as discussed in section (1). In the steady state this isgiven by a sigmoidal function of (cid:15) ( (cid:126)x ) ¯ y ss = 11 + e (cid:15) ( (cid:126)x ) (18)17f we initialise the weights by a random unit vector then, with high probability, (cid:15) = 0, thesteady state distribution is uniform and the entropy a maximum. The learning proceeds byfeeding back onto the (cid:126)w until p ( t ) = 1 on correctly labeled training inputs (cid:126)x T . Clearly correctlabelling corresponds to the distribution shifting from uniform to highly non uniform in responseto these applied feedback forces. As we have seen this corresponds to the entropy decreasingand the free energy increasing. Thus learning requires doing work on the system, by feedback,in such a way that the entropy decreases so that the free energy increases. Eﬃcient learningrequires that we do work and decrease the entropy.The stochastic diﬀerential equation for the observed output is dy ( t ) = − (1 − y ( t )) dN ( t ) + y ( t ) dN ( t ) (19)where the point processes are deﬁned by E [ dN y ] = γe ± (cid:15) ( t ) / dt (20)with + for y = 1 and − for y = 0, and (cid:15) ( t ) = η (cid:126)w ( t ) .(cid:126)x T (21)The feedback is thus deﬁned by giving an equation of motion for (cid:126)w ( t ).The cost function to be minimised is the error probability p , i.e dp ( t ) = 0. Using thedeﬁnition of the error probability dp ( t ) = − d (cid:126)w ( t ) .(cid:126) ∇ w p ( t ) (22)an we see that the equation of motion for (cid:126)w ( t ) must be chosen in such a way that d (cid:126)w ( t ) is parallelto (cid:126) ∇ w p ( t ) at every time t . Thus we deﬁne the learning rate L ( t ) by d (cid:126)w ( t ) = L ( t ) dt(cid:126) ∇ w p (23)As an example suppose we deﬁne the feedback update to (cid:15) ( t ) by (cid:15) ( t + dt ) = (cid:15) ( t ) + κdN ( t ) (24)where κ is the feedback strength. This protocol increases (cid:15) by κ only if the system makes atransition to the error state y = 0 and otherwise does nothing. It assumes that we have a ﬁne-grained observation of the process y ( t ) and can quickly respond to a transition 1 →

0. Thus the (cid:15) ( t ) is a stochastic process that obeys the stochastic diﬀerential equation d(cid:15) ( t ) = κdN ( t ) (25)If we average over the noise we see that the deterministic equation for (cid:15) ( t ) is d(cid:15) ( t ) dt = κγe − (cid:15) ( t ) / (26)with the solution e (cid:15) ( t ) / = 1 + κγt/ (cid:15) (0) = 0. The master equation, averaged over the feedback is then˙ p = − γ (1 + κγt/ p + γ (1 + κγt/ p (28)18 p (t) κ =0.0 κ =0.1 κ =0.5 t Figure 9: The error probability versus time for three diﬀerent values of the feedback strength.The solution for the error p ( t ) for three values of κ is shown in Fig.(9). This control/learningprocess implies that the weights obey the stochastic diﬀerential equation d (cid:126)w ( t ) .(cid:126)x T = ( κ/η ) dN ( t ) (29)A stochastic control protocol for the learning rate. Using Eq. (23) we see that ηL ( t ) dt (cid:16) (cid:126) ∇ w p .(cid:126)x T (cid:17) = ( κ/η ) dN ( t ) (30)so that κ = η (cid:16) (cid:126) ∇ w p .(cid:126)x T (cid:17) (31)and the learning rate is a stochastic point process given by L ( t ) dt = dN ( t ) (32)The average learning rate is then ¯ L ( t ) = γe − (cid:15) ( t ) / = γ κγt/ Let us reprise the question of causal asymmetry in our approach and identify its ultimate origin.Our claim is that causal relations are learned relations between sensor and actuator records19nside the CA. This is a complex feedback loop from sensors to actuators. There is a fundamentalthermodynamic asymmetry between these devices: actuators do work on the world and the worlddoes work on sensors.Of course for this to make sense there needs to be a boundary between the inside and outsideof the CA. This does not necessarily coincide with the spatial boundary of the CA. What is partof a subsystem and what is part of the environment depends on spacetime and energy scales.These are objective facts as described by the physics of the system involved. There may well besystems inside the physical CA that are best treated as part of the environment; for example theﬂuctuating forces that lead to noise in sensors and actuators. Such a boundary leads to a formof perspectivalism. What is inside for one CA is outside for another.The thermodynamic signiﬁcance of a boundary has been extensively developed by Kirchhoﬀet al. [47] using the concept of a ‘markov blanket’. They state thatIt is a statistical partitioning of a system into internal states and external states,where the blanket itself consists of the states that separate the two. The states thatconstitute the Markov blanket can be further partitioned into active and sensorystates.The last partition matches our emphasis on actuator and sensors at the interface between theagent and the world. Like Kirchhoﬀ et al., we stress that it is only the actuators and sensorsthat enable the internal and external states to communicate.CAs however are a rather special kind of open subsystem: they require access to a lowentropy source of energy (i.e. large free energy) in order for actuators, sensors and learningto work realiably. This necessarily means that CAs are not in thermal equilibrium with theirenvironment. Why do such systems exist at all? Why isn’t every subsystem, no matter whereyou draw the boundary, at thermodynamic equilibrium? The standard answer is the “PastHypothesis”[48].From the external perspective the internal records of a CA are inaccessible by deﬁnition.Consider an ensemble of CA-XOR gate pairs in the triple-well example in Fig.(6). From theexternal perspective there are two kinds of systems interacting through the exchange of workand heat. This interaction is irreversible. Given the inevitable errors in the CA actuators andsensors, the ﬁnal state of the XOR triple-well potential will ﬂuctuate across the ensemble; itsentropy will increase. There is a temporal asymmetry in this process.Form the internal perspective of the CA there is also a temporal asymmetry: increasing timemeans increasing the size of the stored records. The temporal asymmetry is internally localisedbut it is not dependent on the particular contingent contents of those accumulating records. Itis highly localised and it is necessarily aligned with a temporal arrow in the local environmentdue to the thermodynamics of causal agents.We have assumed that all data stored in memory is time series data. This requires us toaddress the question of how the data is time-stamped. This does not require an external globalclock. As all memory records are internal to the machine it only needs an internal clock. Onesolution is to make use of the generic phenomenon of self-oscillation and synchronisation whicharise naturally in open systems driven far from equilibrium[49, 50]. Indeed, all periodic clocksare based on self-ocscillations in open dissipative non linear systems[51]. The beating heart isan example in biology. A single internal self oscillation would suﬃce to provide an internal datatime-stamp.Our account is explicitly interventionist due to the presence of actuators in a CA. Sensorsalone are not enough to build a CA. Certainly one could easily build a CA with an algorithmto ﬁnd patterns in its sensor records (a Bayesian network, say), especially with time series datafrom multiple sensor types. It is easy to see that correlations could be found between records20rom diﬀerent types of sensors. For example a temperature sensor and a light sensor on a Marsrover would show highly correlated periodic patterns. Would this correlation indicate a causalrelation? Such a claim would be open to Hume’s objection: patterns or ’regularities’ do notground causal claims. Changes in the temperature sensor record do not cause the changes in thelight sensor record.In our presentation each CA encounters other CAs as simply another part of the world. Agiven CA can only give a course-grained description of other CAs as their internal records areunknown. (Of course this would change given a communication channel between CAs . . . a lan-guage redraws the boundary of a CA to some extent). This coarse-grained description — the‘external view’— is perfectly objective and consistent with the laws of thermodynamics. Like-wise the internal view of a given CA is entirely objective as the contingent records it stores areobjective physical internal states that likewise obey the laws of thermodynamics. From the inter-nal perspective, causation is asymmetric because sensors and actuators are thermodynamicallyasymmetric: there can be no ambiguity as to which system is which.There is one respect in which our model of a causal agent is deﬁcient . Can an agent assigna probability to its own actions? Phrased in terms of subjective probability, Liu and Price ask;”Can an agent hold a meaningful credence about a contemplated action, as she deliberates?”[52].Consider the example of an agent learning that the world implements an XOR gate (5.1 ). Inthis example the agent can choose actions completely at random and the learning proceedsmechanically to ﬁnd the function that represents the causal relation between actuators andsensors. The same is true for the abstract model in 6. In both these simple cases the probabilityfor an agent to choose an action, as assigned by a third party observer, is uniformly distributedon the space of actions. It would be hard to claim that in this case actions are under the agent’scontrol. Only the emulator’s response to actions is under the agents control and this might becalled deliberation. Clearly this is not what Ramsey[53] was getting at when he said “. In asense my present action is an ultimate and the only ultimate contingency.” Our model of a causalagent is too simple to ground the concept of agential volition.Let us contrast our position with that of Price [5] and the somewhat opposing views ofWoodward[54]. When discussing causation Price claims that there are four factors to considerwhich we can summarise as:(i) An action and its intended outcome are held to be related as cause and eﬀect; means andend are cause and eﬀect.(ii) There is a temporal asymmetry; causes typically precede their eﬀects.(iii) There is a temporal asymmetry in the application of cause-eﬀect; users typically deliberateabout future actions of information received in the past.(iv) Some temporal asymmetries in the environment, such as the prevailing thermodynamicasymmetry.He eventually concludes that these factors are best explained by the claimB is an eﬀect of A iﬀ doing A is a means of bringing about B, from an agent’sperspective —– roughly, if controlling A is a means of controlling B.In our model A represents an actuator record while B represents a sensor record and the relationbetween then is represented as a metastable state of the subsystem that does the learning. Allof these items are internal contingent states of a particular CA, and thus can be consideredperspectival. Thank you to Huw Price for pointing this out and bringing Ramsey’s work to our attention

Acknowledgements.

This work was supported by FQXi FFF Grant number FQXi-RFP-1814 and the Australian Re-search Council Centre of Excellence for Engineered Quantum Systems (Project number CE170100009).We thank Ken Arthur, Jenann Ismael, Michael Kewming and Huw Price for useful discussions.22 eferences [1] Russell, B. (1913).

On the Notion of Cause , Proceedings of the Aristotelian Society, 13:1–26.[2] Pearl, J. (2000).

Causality: Models, reasoning, and inference , Cambridge: Cambridge Uni-versity Press.[3] Woodward, J (2003),

Making Things Happen: A Theory of Causal Explanation , New York:Oxford University Press.[4] Ismael, J , (2015)

How do causes depend on us? The many faces of perspectivalism , Synthese,DOI 10.1007/s11229-015-0757-6[5] Price, H. (2007)

Causal perspectivalism . In H. Price and R. Corry (Eds.), Causation, physics,and the constitution of reality: Russell’s republic revisited, (pp. 250–292) Oxford: OxfordUniversity Press.[6] Cartwight, N., (1979)

Causal Laws and Eﬀective Strategies

Noˆus, Vol. 13, No. 4, SpecialIssue on Counterfactuals and Laws (Nov., 1979), pp.419-437.[7] Ismael, J. (2016)

How Physics Makes Us Free , Oxford University Press.[8] Jacobs, K., (2012)

Quantum measurement and the ﬁrst law of thermodynamics: The en-ergy cost of measurement is the work value of the acquired information , Phys. Rev. E ,040106(R).[9] Friston, K., and Stephan, K. E. (2007). Free-energy and the brain.

Synthese, 159(3), 417–458.[10] Rovelli, C. (2020)

Agency in Physics , arXiv:2007.05300.[11] Bennett, C. H., (1982)

The Thermodynamics of Comoputation , Int. J. Theor. Phys. , 905.[12] Carmichael, H. J. (2008), Statistical Methods in quantum Optics, VII, Springer.[13] Jarzynski, C. (1997) Nonequilibrium equality for free energy diﬀerences

Phys. Rev. Lett. 782690.[14] Russell, S.J. and Norvig, P. (2010),

Artiﬁcial Intelligence: A modern approach. , PrenticeHall New Jersey.[15] Briegel, H.J., and De Las Cuevas, G., (2012)

Projective simulation for classical and quantumlearning agents , Sci. Rep. 2, 400, (2012)[16] Floridi, L., (2011)

The Philosophy of Information

Oxford University Press.[17] Bruineberg, J., Kiverstein, J. and Rietveld, E.,

The anticipating brain is not a scientist: thefree-energy principle from an ecological-enactive perspective , Synthese (2018) 195:2417–2444.[18] Goldt, S and Seifert, U. (2017)

Stochastic Thermodynamics of Learning , Phys. Rev. Letts.118, 010601.[19] Milburn, G.J. (2012)

Decoherence and the conditions for the classical control of quantumsystems , G. J. Milburn, Proc. Roy Soc. A, 370, 4469.2320] Kewming, M. J., Shrapnel, S., Milburn, G.J. (2020)

Quantum enhanced agents arXiv:2007.04426.[21] Wiseman, H.M. and Milburn G.J. (2010)

Quantum Measurement and Control , CambridgeUniversity Press.[22] Taishin Nomura, Shunsuke Sato, Shinji Doi, Jose P. Segundo, Michael D. Stiber, (1994)

Global bifurcation structure of a Bonhoeﬀer-van der Pol oscillator driven by periodic pulsetrains

Biol. Cybern. 72, 55 67.[23] Nielsen M., (2004)

Interesting problems: The Church-Turing-Deutsch Principle

Quantum Information Theory and the Foundations of Quantum Theory ,Oxford University Press.[26] Sprevak, M., Copeland, J. and Shagrir, O. (2018),

Zuse’s Thesis, Gandy’s Thesis, andPenrose’s Thesis in Computational Perspectives on Physics, Physical Perspectives on Com-putation (edited by Michael Cuﬀano & Samuel Fletcher), Cambridge University Press: Cam-bridge, pp. 39–59[27] Broersma, H., Stepney, S. and Wendin, G.,

Computability and Complexity of UnconventionalComputing Devices , In: Stepney S., Rasmussen S., Amos M. (eds) Computational Matter.Natural Computing Series. Springer.[28] Copeland, B. Jack and Proudfoot, D. (1999), “Alan Turing’s Forgotten Ideas in ComputerScience”, Scientiﬁc American, New York. 253:4, 98-103.[29] Sigelmann, H., (1995)

Computation Beyond the Turing Limit , Science, 268, 545.[30] Douglas, K.

Learning to Hypercompute? An Analysis of Siegelmann Networks ,AISB/IACAP World Congress 2012 Birmingham, UK, 2-6 July 2012 Natural Computing,Unconventional Computing and its Philosophical Signiﬁcance, edt. Dodig-Crnkovic, G andGiovagnoli, R.[31] Hogarth M., (1994),

Non Turing computers and Non Turing computability , Philosophy ofScience Supplementary I, 126-138.[32] Davis, M, (2004), The Myth of Hypercomputation” in Alan Turing: Life and Legacy of aGreat Thinker, Ed. C. Teuscher Springer.[33] Manchal, J.B., (2018),

Malament–Hogarth Machines , The British Journal for the Philoso-phy of Science, axy023, https://doi.org/10.1093/bjps/axy023[34] Penrose, R., (2013), see preface to Zenil, H. ed. 2013. A Computable Universe. World Sci-entiﬁc.[35] Deutsch, D., (1997)

The Fabric of Reality , Allen Lane The Penguin Press, London.[36] Anthony M. (2005)

Learning Boolean Functions

Centre for Discrete and Applicable Mathe-matics, LSE. CDAM-LSE-2005-24 (2005).2437] R. Landauer, (1986)

Computation and physics: Wheeler’s meaning circuit? ,Foundations ofPhysics , , Issue 6, pp 551–564.[38] Vapnik, V.N. (2000) The Nature of Statistical Learning Theory , Springer, New York.[39] Kirchhoﬀ, M.D., (2018),

Predictive processing, perceiving and imagining: Is to perceive toimagine, or something close to it? , Philos. Stud. 175:751–767[40] Valiant, L. (2013),

Probably Approximately Correct: Nature’s Algorithms for Learning andProspering in a Complex World

Basic Books.[41] Seifert, U. (2012)

Stochastic thermodynamics, ﬂuctuation theorems and molecular ma-chines

Rep. Prog. Phys. 75 126001[42] Gardiner, C.W. (1983),

Handbook of Stochastic Processes for Physics, Chemistry and theNatural Sciences , Springer.[43] Rovelli, C. (2012)

Meaning = Information + Evolution , arXiv:1611.02420v1.[44] Jeﬀery,K., Pollack, R., Rovelli, C., (2019)

On the statistical mechanics of life: Schr¨odingerrevisited , arxiv.org, 1908.08374[45] England, J.L. (2015)

Dissipative adaptation in driven self-assembly , Nature Nanotechnology,DOI: 10.1038/NNANO.2015.250[46] Betti, A., Gori, M., (2016),

The principle of least cognitive action

Theoretical ComputerScience 633 (2016) 83–99.[47] Kirchhoﬀ M, Parr T, Palacios E, Friston K, Kiverstein J. 2018 The Markov blankets of life:autonomy, active inference and the free energy principle. J. R. Soc. Interface 15: 20170792.http://dx.doi.org/10.1098/rsif.2017.0792[48] Albert, D. (2000)

Time and Chance , Cambridge, MA: Harvard University Press[49] Pikovsky, A., Rosenblum, M. and Kurths, J., (2001)

Synchronization: A Universal Conceptin Nonlinear Sciences , Cambridge University Press.[50] Boccaletti, S., and Pisarchik, A. N. (2018)

Synchronization: From Coupled Systems toComplex Networks , Cambridge University Press.[51] Milburn, G.J.(2020)

The Thermodynamics of Clocks , Contemporary Physics (in prepara-tion).[52] Liu,L., Price, H. (2018) Ramsey and Joyce on deliberation and prediction, Synthese.https://doi.org/10.1007/s11229-018-01926-8[53] Ramsey, F. P. (1929) “General Propositions and Causality”, in: D. H. Mellor, ed. Founda-tions: Essays in Philosophy, Logic, Mathematics and Economics. London: Routledge andKegan Paul, 1978, pp. 133–51.[54] Woodward, J.,(2016)