[PDF] Cyber Kittens, or Some First Steps Towards Categorical Cybernetics

Abstract

We define a categorical notion of cybernetic system as a dynamical realisation of a generalized open game, along with a coherence condition. We show that this notion captures a wide class of cybernetic systems in computational neuroscience and statistical machine learning, exposes their compositional structure, and gives an abstract justification for the bidirectional structure empirically observed in cortical circuits. Our construction is built on the observation that Bayesian updates compose optically, a fact which we prove along the way, via a fibred category of state-dependent stochastic channels.

Full PDF

DDavid I. Spivak and Jamie Vicary (Eds.):Applied Category Theory 2020 (ACT2020)EPTCS 333, 2021, pp. 108–124, doi:10.4204/EPTCS.333.8 © T. St. Clere SmitheThis work is licensed under the Creative CommonsAttribution-Share Alike License.

Cyber Kittens, orSome First Steps Towards Categorical Cybernetics

Toby St. Clere Smithe

Department of Experimental Psychology,University of Oxford [email protected]

We deﬁne a categorical notion of cybernetic system as a dynamical realisation of a generalized opengame, along with a coherence condition. We show that this notion captures a wide class of cyberneticsystems in computational neuroscience and statistical machine learning, exposes their compositionalstructure, and gives an abstract justiﬁcation for the bidirectional structure empirically observed in cor-tical circuits. Our construction is built on the observation that Bayesian updates compose optically, afact which we prove along the way, via a ﬁbred category of state-dependent stochastic channels.

Those systems that we might classify as living, adaptive, or somehow intelligent all display a fundamentalproperty: they resist or avoid perturbations that would result in their existence becoming unsustainable.This means that they must somehow be able to sense their current state of affairs ( perception ) and re-spond appropriately ( action ). In particular, an adaptive system should sense the relevant aspects of itscurrent environmental state, and form expectations about the consequences of that state. In general, theinteraction with the environment will be stochastic, and the statistically optimal method of ‘sensing’ andprediction is Bayesian inference.Typically, however, the system has no direct access to the external state, only to sense data thatindirectly have external causes. Moreover, sense data are often very high-dimensional, and predictingtheir consequences is underdetermined. As a result, it is common to assume that successful organisms areimbued with some kind of generative model of the process by which external causes generate their sensedata. They can then use this model to infer those actions will bring (their beliefs about) their current statecloser to those expectations: a process called active inference .Systems such as these are inherently open, and often their internal models and beliefs are supposed tobe structured hierarchically—that is, compositionally. The processes of prediction and action sketchedhere are naturally bidirectional, and indeed our ﬁrst contribution in the present work is to show thatBayesian inference is abstractly structured as a category of optics [21, 6], the emerging canonical for-malism for (open) bidirectionally structured compositional systems.The compositional framework of open games [2, 13] builds on categories of optics to describe sys-tems of motivated interacting agents, but it is substantially more general than needed for classical gametheory: generalized open games naturally describe any bidirectionally structured open systems that canbe associated with a measure of ﬁtness. Consequently, such generalized open games provide a naturalhome for a compositional theory of interacting cybernetic systems, and using our notion of

Bayesianlens , we characterize a number of canonical statistical models as statistical games .However, mere open games themselves supply no notion of dynamics mediating the interactions. Wetherefore introduce the concept of dynamical realisation of an open game (Deﬁnition 4.7), as well as a . St. Clere Smithe

Acknowledgements

We thank the organizers of

Applied Category Theory 2020 for the opportunity topresent this work, and the anonymous reviewers for helpful comments and questions. We also thankBruno Gavranovi´c, Jules Hedges, and Neil Ghani for stimulating and insightful conversations, and creditJules Hedges for observing the correct form of the Bayesian update map in discussions at SYCO 6.

We begin by proving that Bayesian updates compose according to the ‘lens’ pattern [9] that sits at theheart of categories of open games and other ‘bidirectional’ structures. We ﬁrst show that Bayesian inver-sions are ‘vertical’ maps in a ﬁbred category of state-dependent channels. The Grothendieck constructionof this structure gives a category of lenses. Open games are commonly deﬁned using the more general‘optics’ pattern [2], and so we also show that, under the Yoneda embedding, our category of lenses isequivalently a category of optics.Throughout the paper, we work in a general category of stochastic channels; abstractly, this corre-sponds to a

Markov category [12] or copy-delete category [5]. Familiar examples of such categoriesinclude K (cid:96) ( D ) , the Kleisli category of the ﬁnitely-supported distribution monad D , and, for ‘continu-ous’ probabiliy, K (cid:96) ( G ) , the Kleisli category of the Giry monad. We will write c † π : = c † ( · ) ( π ) to indicatethe Bayesian inversion of the channel c with respect to a state π . Then, given some y ∈ Y , c † π ( y ) is anew ‘posterior’ distribution on X. We will call c † π ( y ) the Bayesian update of π along c given y .For a substantially expanded version of this section, including proofs and background expositionwith precise deﬁnitions of Bayesian inversion, see the author’s [25]. We will occasionally here refer todeﬁnitions or results in that paper. Deﬁnition 2.1 (State-indexed categories) . Let ( C , ⊗ , I ) be a monoidal category enriched in a Cartesianclosed category V . Deﬁne the C -state-indexed category Stat : C op → V - Cat as follows.

Stat : C op → V - Cat X (cid:55)→ Stat ( X ) : =  Stat ( X ) : = C Stat ( X )( A , B ) : = V ( C ( I , X ) , C ( A , B )) id A : Stat ( x )( A , A ) : = (cid:26) id A : C ( I , X ) → C ( A , A ) ρ (cid:55)→ id A  (1) f : C ( Y , X ) (cid:55)→  Stat ( f ) : Stat ( X ) → Stat ( Y ) Stat ( X ) = Stat ( Y ) V ( C ( I , X ) , C ( A , B )) → V ( C ( I , Y ) , C ( A , B )) α (cid:55)→ f ∗ α : (cid:0) σ : C ( I , Y ) (cid:1) (cid:55)→ (cid:0) α ( f • σ ) : C ( A , B ) (cid:1) Composition in each ﬁbre

Stat ( X ) is given by composition in C ; that is, by the left and right ac-tions of the profunctor Stat ( X )( − , =) : C op × C → V . Explicitly, given α : V ( C ( I , X ) , C ( A , B )) and β : V ( C ( I , X ) , C ( B , C )) , their composite is β ◦ α : V ( C ( I , X ) , C ( A , C )) : = ρ (cid:55)→ β ( ρ ) • α ( ρ ) . Since V Cyber Kittens is Cartesian, there is a canonical copier : x (cid:55)→ ( x , x ) on each object, so we can alternatively write ( β ◦ α )( ρ ) = (cid:0) β ( − ) • α ( − ) (cid:1) ◦ ◦ ρ . Note that we indicate composition in C by • and composition inthe ﬁbres Stat ( X ) by ◦ . Example 2.2.

Let V = Meas be a ‘convenient’ ( i.e. , Cartesian closed) category of measurable spaces,such as the category of quasi-Borel spaces [14], let P : Meas → Meas be a probability monad deﬁnedon this category, and let C = K (cid:96) ( P ) be the Kleisli category of this monad. Its objects are the objects of Meas , and its hom-spaces K (cid:96) ( P )( A , B ) are the spaces Meas ( A , P B ) [12]. This C is a monoidal cate-gory of stochastic channels, whose monoidal unit I is the space with a single point. Consequently, statesof X are just measures (distributions) in P X . That is, K (cid:96) ( P )( I , X ) ∼ = Meas ( , P X ) . Instantiating Stat in this setting, we obtain:

Stat : K (cid:96) ( P ) op → V - Cat X (cid:55)→ Stat ( X ) : =  Stat ( X ) : = Meas Stat ( X )( A , B ) : = Meas ( P X , Meas ( A , P B )) id A : Stat ( X )( A , A ) : = (cid:26) id A : P X → Meas ( A , P A ) ρ (cid:55)→ η A  (2) c : K (cid:96) ( P )( Y , X ) (cid:55)→ Stat ( c ) : =  Stat ( c ) : Stat ( X ) → Stat ( Y ) Stat ( X ) = Stat ( Y ) (cid:18) d † : P X → K (cid:96) ( P )( A , B ) π (cid:55)→ d † π (cid:19) (cid:55)→ (cid:18) c ∗ d † : P Y → K (cid:96) ( P )( A , B ) ρ (cid:55)→ d † c • ρ (cid:19) Each

Stat ( X ) is a category of stochastic channels with respect to measures on the space X . We canwrite morphisms d † : P X → K (cid:96) ( P )( A , B ) in Stat ( X ) as d † ( · ) : A ( · ) −→ • B , and think of them as generalizedBayesian inversions: given a measure π on X , we obtain a channel d † π : A π −→ • B with respect to π . Givena channel c : Y → • X in the base category of priors, we can pull d † back along c , to obtain a Y -dependentchannel in Stat ( Y ) , c ∗ d † : P Y → K (cid:96) ( P )( A , B ) , which takes ρ : P Y to the channel d † c • ρ : A c • ρ −−→ • B deﬁned by pushing ρ through c and then applying d † . Remark 2.3.

Note that by taking

Meas to be Cartesian closed, we have

Meas ( P X , Meas ( A , P B )) ∼ = Meas ( P X × A , P B ) for each X , A and B , and so a morphism c † : P Y → K (cid:96) ( P )( X , Y ) equivalentlyhas the type P Y × X → P Y . Paired with a channel c : Y → P X , we have something like a Cartesianlens; and to compose such pairs, we can use the Grothendieck construction [20, 26]. Deﬁnition 2.4 ( GrLens

Stat ) . Instantiating the category of Grothendieck F -lenses GrLens F (see [26])with F = Stat : C op → V - Cat , we obtain the category

GrLens

Stat whose objects are pairs ( X , A ) ofobjects of C and whose morphisms ( X , A ) (cid:55)→ ( Y , B ) are elements of the set GrLens

Stat (cid:0) ( X , A ) , ( Y , B ) (cid:1) ∼ = C ( X , Y ) × V (cid:0) C ( I , X ) , C ( B , A ) (cid:1) . (3)The identity Stat -lens on ( Y , A ) is ( id Y , id A ) , where by abuse of notation id A : C ( I , Y ) → C ( A , A ) is theconstant map id A deﬁned in (1) that takes any state on Y to the identity on A . The sequential composite of ( c , c † ) : ( X , A ) (cid:55)→ ( Y , B ) and ( d , d † ) : ( Y , B ) (cid:55)→ ( Z , C ) is the Stat -lens (cid:0) ( d • c ) , ( c † ◦ c ∗ d † ) (cid:1) : ( X , A ) (cid:55)→ ( Z , C ) with ( d • c ) : C ( X , Z ) and where ( c † ◦ c ∗ d † ) : V (cid:0) C ( I , X ) , C ( C , A ) (cid:1) takes a state π : C ( I , X ) on X to the . St. Clere Smithe c † π • d † c • π . If we think of the notation ( · ) † as denoting the operation of forming the Bayesianinverse of a channel (in the case where A = X , B = Y and C = Z ), then the main result of this section isto show that ( d • c ) † π d • c • π ∼ c † π • d † c • π , where d • c • π ∼ denotes ( d • c • π ) -almost-equality [25, Deﬁnition 2.5].In order to give an optical form for GrLens

Stat , we need to ﬁnd two M -actegories with a commoncategory of actions M . Let ˆ C and ˇ C denote the categories ˆ C : = V - Cat ( C op , V ) and ˇ C : = V - Cat ( C , V ) of presheaves and copresheaves on C , and consider the following natural isomorphisms. GrLens

Stat (cid:0) ( X , A ) , ( Y , B ) (cid:1) ∼ = C ( X , Y ) × V (cid:0) C ( I , X ) , C ( B , A ) (cid:1) ∼ = (cid:90) M : C C ( X , Y ) × C ( X , M ) × V (cid:0) C ( I , M ) , C ( B , A ) (cid:1) ∼ = (cid:90) ˆ M : ˆ C C ( X , Y ) × ˆ M ( X ) × V (cid:0) ˆ M ( I ) , C ( B , A ) (cid:1) (4)The second isomorphism follows by Yoneda reduction [17, 23], and the third follows by the Yonedalemma. We take M to be M : = ˆ C , and deﬁne an action (cid:12) of ˆ C on ˇ C as follows. Deﬁnition 2.5 ( (cid:12) ) . We give only the action on objects; the action on morphisms is analogous. (cid:12) : ˆ C → V - Cat ( ˇ C , ˇ C ) ˆ M (cid:55)→ (cid:18) ˆ M (cid:12) − : ˇ C → ˇ C P (cid:55)→ V (cid:0) ˆ M ( I ) , P (cid:1)(cid:19) (5)Functoriality of (cid:12) follows from the functoriality of copresheaves. Proposition 2.6. (cid:12) equips ˇ C with a ˆ C -actegory structure: unitor isomorphisms λ (cid:12) F : 1 (cid:12) F ∼ −→ F andassociator isomorphisms a (cid:12) ˆ M , ˆ N , F : ( ˆ M × ˆ N ) (cid:12) F ∼ −→ ˆ M (cid:12) ( ˆ N (cid:12) F ) for each ˆ M , ˆ N in ˇ C , both natural in F : V - Cat ( C , V ) .We are now in a position to deﬁne the category of abstract Bayesian lenses, and show that thiscategory coincides with the category of Stat -lenses.

Deﬁnition 2.7 (Bayesian lenses) . Denote by

BayesLens the category of optics

Optic × , (cid:12) for the action ofthe Cartesian product on presheaf categories × : ˆ C → V - Cat ( ˆ C , ˆ C ) and the action (cid:12) : ˆ C → V - Cat ( ˇ C , ˇ C ) deﬁned in (5). Its objects ( ˆ X , ˇ Y ) are pairs of a presheaf and a copresheaf on C , and its morphisms ( ˆ X , ˇ A ) (cid:55)→ ( ˆ Y , ˇ B ) are abstract Bayesian lenses —elements of the type

Optic × , (cid:12) (cid:16) ( ˆ X , ˇ A ) , ( ˆ Y , ˇ B ) (cid:17) = (cid:90) ˆ M : ˆ C ˆ C ( ˆ X , ˆ M × ˆ Y ) × ˇ C ( ˆ M (cid:12) ˇ B , ˇ A ) (6)Given v : C ( X , Y ) and u : V ( C ( I , X ) , C ( B , A )) , we denote the corresponding element of this type by (cid:104) v | u (cid:105) . A Bayesian lens ( ˆ X , ˇ X ) (cid:55)→ ( ˆ Y , ˇ Y ) is called a simple Bayesian lens.

Proposition 2.8. BayesLens is a category of lenses; a deﬁnition is given in [25, §2.2.1].

Proposition 2.9 ( Stat -lenses are Bayesian lenses) . Let ˆ ( · ) : C (cid:44) → V - Cat ( C op , V ) denote the Yonedaembedding and ˇ ( · ) : C (cid:44) → V - Cat ( C , V ) the coYoneda embedding. Then Optic × , (cid:12) (cid:16) ( ˆ X , ˇ A ) , ( ˆ Y , ˇ B ) (cid:17) ∼ = GrLens

Stat (cid:16) ( X , A ) , ( Y , B ) (cid:17) (7)so that GrLens

Stat is equivalent to the full subcategory of

Optic × , (cid:12) on representable (co)presheaves.12 Cyber Kittens

Remark 2.10.

We will often abuse notation by indicating representable objects in

BayesLens by theirrepresentations in C . That is, we will write ( X , A ) instead of ( ˆ X , ˇ A ) where this would be unambiguous. Proposition 2.11. BayesLens is a symmetric monoidal category. The monoidal product ⊗ is inheritedfrom C ; the unit object is the pair ( I , I ) where I is the unit object in C . For more details on the structure,see [21] or [19]. Deﬁnition 2.12 (Exact and approximate Bayesian lens) . Let (cid:10) c (cid:12)(cid:12) c † (cid:11) : ( X , X ) (cid:55)→ ( Y , Y ) be a simpleBayesian lens. We say that (cid:10) c (cid:12)(cid:12) c † (cid:11) is exact if c admits Bayesian inversion and, for each π : I → • X suchthat c • π has non-empty support, c † π is the Bayesian inversion of c with respect to π . Simple Bayesianlenses that are not exact are said to be approximate . Lemma 2.13.

Let (cid:10) c (cid:12)(cid:12) c † (cid:11) and (cid:10) d (cid:12)(cid:12) d † (cid:11) be sequentially composable exact Bayesian lenses. Then thecontravariant component of the composite lens (cid:10) d (cid:12)(cid:12) d † (cid:11) ◦ | (cid:10) c (cid:12)(cid:12) c † (cid:11) ∼ = (cid:10) d • c (cid:12)(cid:12) c † ◦ c ∗ d † (cid:11) is, up to d • c • π -almost-equality, the Bayesian inversion of d • c with respect to any state π on the domain of c such that c • π has non-empty support. That is to say, Bayesian updates compose optically : ( d • c ) † π d • c • π ∼ c † π • d † c • π . In this section, we supply mild generalizations of the structures underlying open games, building onthose in [2]; at ﬁrst, then, we consider games over arbitrary categories of optics

Optic R , L . Subsequently,we use games over Bayesian lenses (in the category of optics BayesLens introduced above) to exemplifya number of canonical statistical concepts, such as maximum likelihood estimation and the variationalautoencoder, and clarify their compositional structure using the notion of optimization game (Deﬁnition3.21). Owing to space constraints, we omit most proofs in this section; they will appear in a full paperexpanding the present abstract, and can be supplied at the request of the reader.

Observation 3.1.

In the graphical calculus for the compact closed bicategory of profunctors

Prof [22],the hom object

Optic R , L (( X , A ) , ( Y , B )) has the depiction XA RL YB C MD CD where the types on the wires are the 0-cells of

Prof , the monoidal actions R and L are depicted as(co)monoids, and the states and effects are (co)representable functors on the objects X , A , Y , B , treated asprofunctors. Deﬁnition 3.2 (Generalized context) . The context functor C : Optic R , L op × Optic R , L → Set takes the pairof optical objects (( X , A ) , ( Y , B )) to the type with depiction RL X YA B RL . St. Clere Smithe I in the underlying actegories. The action onmorphisms ( i.e. , optics) is by precomposition on the left and postcomposition on the right. Functorialityfollows accordingly.We can compose a context with an optic to obtain a ‘closed’ system, as follows: RL X YA B RL XA RL YB (cid:55)→ RL RLRL

Conjecture 3.3.

It is easy to show that a context on (( X , A ) , ( Y , B )) is equivalently a state ( I , I ) (cid:55)→ (( X , A ) , ( Y , B )) in the monoidal category of ‘double lenses’, Lens

Optic R , L [2]. Rendering this graphicallyleads us to the following conjecture: categories of double optics are instances of the doubling or CP construction from categorical quantum mechanics ( cf. [8, 7]). Proposition 3.4.

Let C and D be the (monoidal) actegories underlying Optic R , L , and denote their re-spective monoidal units by I C and I D . If these unit objects are terminal in their respective categories,then the contexts C (( X , A ) , ( Y , B )) simplify to X Y B A where we have depicted the representable presheaf on I D as to indicate that A is just discarded. Con-sequently, in this case, a context is just an optic ( I , B ) (cid:55)→ ( X , Y ) . Deﬁnition 3.5 (Generalized open game) . Let ( X , A ) and ( Y , B ) be objects in any symmetric monoidalcategory of optics Optic R , L . Let Σ be a U -category, for any base of enrichment U such that U - Prof iscompact closed. An open game from ( X , A ) to ( Y , B ) with strategies in Σ , denoted G : ( X , A ) Σ −→ ( Y , B ) ,is given by:(a) a play function P : Σ → Optic R , L (( X , A ) , ( Y , B )) ; and(b) a best response function B : C (( X , A ) , ( Y , B )) → U - Prof ( Σ , Σ ) .Given a strategy σ : Σ , we will often write (cid:104) v | u (cid:105) σ or similar to denote its image under P . A strategy isan equilibrium in a context (cid:104) π | k (cid:105) if it is a ﬁxed point of B ( (cid:104) π | k (cid:105) ) .Roughly speaking, the ‘best responses’ to a strategy σ in a context is are those strategies τ such thatchoosing τ would result in performance at the game at least as good as choosing σ ; equilibrium strategiesare those for which such deviation would not improve performance. Remark 3.6.

Note: whereas classic open games use a best-response relation, we categorify that hereto a best-response relator (in the terminology of [17]; i.e. , a ‘proof-relevant’ relation), so that we candescribe the trajectories witnessing the computation of equilibria, rather than their mere existence.

Proposition 3.7.

Generalized open games over the symmetric monoidal category of optics

Optic R , L withstrategies enriched in U form a symmetric monoidal category denoted Game ( U , R , L ) .14 Cyber Kittens

Since our games are only a mild generalization of those of [2], we refer the reader to §3.10 ofthat paper for an idea of the proof of the foregoing proposition, which goes through analogously. Thesequential composition of games is given by the sequential composition of optics, with the best responseto the composite being the product of the best responses to the factors. Similarly, parallel composition isgiven by the monoidal product of optics, and the best response to the composite is again the product ofthe best responses to the factors.We now consider some games over

BayesLens that supply the building blocks of the archetypalcybernetic systems to be considered in §4. For now, we will take the strategies simply to be discretecategories ( i.e. , sets), as in the standard formulation of open games. Consequently, we will take thecodomain of the best response function to be

Set ( Σ , Set ( Σ , )) , for each strategy type Σ . We assume theambient category of stochastic channels is semicartesian, so that the monoidal unit is the terminal object. Remark 3.8.

All the games we will consider henceforth will have play functions whose codomainsrestrict to the representable subcategory

GrLens

Stat of BayesLens ; in this work, we do not use the extragenerality afforded by

BayesLens , except insofar as it grants us the use of string diagrams in

Prof , whichwe ﬁnd helpful for reasoning intuitively about these systems. The generality of optics is however usedin the ‘game-theoretic’ games of [2], and in future work we hope to relate the cybernetic systems of thispaper to the game-theoretic setting of that earlier work. Remark 3.9.

All the statistical games considered in this paper will be ‘atomic’ in the sense of [2]: inparticular, the best response functions we consider will be constant, meaning that, in any context, theset of best strategies does not depend on the ‘current’ choice of strategy. Permitting such dependencewill be important in future work, however, when we consider how cybernetic systems interact, and hencerespond to each other.

Example 3.10.

A Bayesian lens of the form ( I , I ) (cid:55)→ ( X , X ) is fully speciﬁed by a state π : I → • X . Acontext for such a lens is given by a lens (cid:104) ! | k (cid:105) : ( I , X ) (cid:55)→ ( X , X ) where ! : I → • I is the unique map and k : X → • X is any endochannel on X . A maximum likelihood game is any game whose play functionhas codomain in Bayesian lenses of this form ( I , I ) (cid:55)→ ( X , X ) for any X : C , and whose best responsefunction is isomorphic to B ( (cid:104) ! | k (cid:105) ) = (cid:104) ρ | ! (cid:105) σ (cid:55)→ (cid:26) (cid:104) π | ! (cid:105) τ (cid:12)(cid:12)(cid:12)(cid:12) π ∈ arg max π : I → • X E k • π [ π ] (cid:27) where E is the canonical expectation operator ( i.e. algebra evaluation) associated to states in C , andwhere we have written (cid:104) ρ | ! (cid:105) σ and (cid:104) π | ! (cid:105) τ to denote the images of the strategies σ and τ under the playfunction. Intuitively, then, the best response is given by the strategy that maximises the likelihood of thestate obtained from the context k . Remark 3.11.

In what follows, we assume that the underlying category C of stochastic channels ad-mits density functions . Informally, a density function for a stochastic channel c : X → • Y is a measurablefunction p c : Y × X → [ , ] whose values are the probabilities (or probability densities) p c ( y | x ) at eachpair ( y , x ) : Y × X . We say that the value p c ( y | x ) is the probability (or probability density) of y givenx . In a category such as K (cid:96) ( D ≤ ) , whose objects are sets and whose morphisms X → • Y are functions X → D ( Y + ) , a density function for c : X → • Y is a morphism Y ⊗ X → • I ; note that in K (cid:96) ( D ≤ ) , I is notterminal. In the ﬁnitely-supported case, density functions are effectively equivalent to channels, but thisis not the case in the continuous setting, where they are of most use. For more on this, see [25, §2.1.4].A natural ﬁrst generalization of maximum likelihood games takes us from states I → • X to channels Z → • X ; that is, from ‘elements’ to ‘generalized elements’ in the covariant (forwards) part of the lens. Un-like Bayesian lenses ( I , I ) (cid:55)→ ( X , X ) , lenses ( Z , Z ) (cid:55)→ ( X , X ) admit nontrivial contravariant components, . St. Clere Smithe (cid:104) π | k (cid:105) : ( I , X ) (cid:55)→ ( Z , X ) for a Bayesian lens ( Z , Z ) (cid:55)→ ( X , X ) then constitutes a ‘prior’ state π : I → • Z and a ‘continuation’ channel k : X → • X which together witness theclosure of the otherwise open system. Example 3.12.

Fix a channel c : Z → • X with associated density function p c : X × Z → R + and a measureof divergence between states on Z , D : C ( I , Z ) × C ( I , Z ) → R . A corresponding (generalized) simpleBayesian inference game is any game whose play function has codomain BayesLens (( Z , Z ) , ( X , X )) and whose best response function is isomorphic to B ( (cid:104) π | k (cid:105) ) = (cid:10) d (cid:12)(cid:12) d (cid:48) (cid:11) σ (cid:55)→ (cid:40) (cid:10) c (cid:12)(cid:12) c (cid:48) (cid:11) τ (cid:12)(cid:12)(cid:12)(cid:12) c (cid:48) ∈ arg min c (cid:48) : V ( C ( I , Z ) , C ( X , Z )) E x ∼ k • c • π (cid:20) E z ∼ c (cid:48) π ( x ) [ − log p c ( x | z )] + D ( c (cid:48) π ( x ) , π ) (cid:21)(cid:41) = (cid:10) d (cid:12)(cid:12) d (cid:48) (cid:11) σ (cid:55)→ (cid:40) (cid:10) c (cid:12)(cid:12) c (cid:48) (cid:11) τ (cid:12)(cid:12)(cid:12)(cid:12) c (cid:48) ∈ arg min c (cid:48) : V ( C ( I , Z ) , C ( X , Z )) (cid:18) E z ∼ c (cid:48) π • k • c • π (cid:20) − (cid:90) X log p c ( d k • c • π | z ) (cid:21) + D ( c (cid:48) π • k • c • π , π ) (cid:19)(cid:41) where π : I → • Z and k : X → • X , and where the notation z ∼ π means “ z distributed according to the state π ”. Note that the second line follows from the ﬁrst by linearity of expectation. Proposition 3.13 ([16, Thm. 1]) . When D is chosen to be the Kullback-Leibler divergence D KL , mini-mizing the objective function deﬁning a simple Bayesian inference game is equivalent to computing an(exact) Bayesian inversion. Corollary 3.14.

Given two Bayesian inference games G : ( Z , Z ) (cid:55)→ ( Y , Y ) and H : ( Y , Y ) (cid:55)→ ( X , X ) , we cancompose them sequentially to obtain a game H ◦ | G : ( Z , Z ) (cid:55)→ ( X , X ) , which we will call a hierarchicalBayesian inference game . It is then an immediate consequence of Lemma 2.13 that, in any givencontext for which the forwards channels admit Bayesian inversion, the best response to the compositegame H ◦ | G (that is, the optimal inversion of the composite channel) is given simply by (the compositionof) the best responses to the factors H and G . Consequently, Bayesian inference games are closed undercomposition.Similarly, given a channel c : Z ⊗ Y → • X , we can consider the marginal Bayesian inference game inwhich the objective is to compute the inversion of the channel onto just one of the factors Z or Y in thedomain. Example 3.15 (Variational autoencoder game) . Fix a family F (cid:44) → C ( Z , X ) of forward channels and afamily P (cid:44) → C ( X , Z ) of backward channels such that each c : F admits a density function p c : X ⊗ Z → R + and each d : P admits a density function q : Z ⊗ X → R + ; think of these families as determiningparameterizations of the channels. We take our strategy type to be Σ = F × P . A simple variationalautoencoder game ( Z , Z ) Σ −→ ( X , X ) is any game with play function P : Σ → BayesLens (( Z , Z ) , ( X , X )) and whose best response function is isomorphic to B ( (cid:104) π | k (cid:105) ) = (cid:10) d (cid:12)(cid:12) d (cid:48) (cid:11) σ (cid:55)→ (cid:40) (cid:10) c (cid:12)(cid:12) c (cid:48) (cid:11) τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( c , c (cid:48) ) ∈ arg min c ∈ F , c (cid:48) ∈ V ( C ( I , Z ) , P ) E x ∼ k • c • π E z ∼ c (cid:48) π ( x ) (cid:20) log q ( z | x ) p c ( x | z ) p π ( z ) (cid:21) (cid:41) where π : I → • Z admits a density function p π : Z → R + , q : Z ⊗ X → R + is a density function associatedto c (cid:48) π , and k has type X → • X .16 Cyber Kittens

Proposition 3.16.

A best response to a variational autoencoder game is a stochastic channel c : F thatmaximises the likelihood of the state observed through the continuation k under the assumption thatthe generative process is in F , along with an inverse channel c (cid:48) π : P that best approximates the exactBayesian inverse c † π under the constraint of being in P . Proposition 3.17.

Variational autoencoder games generalize inference games for the Kullback-Leiblerdivergence. More precisely, the objective function deﬁning autoencoder games is of the same form asthat deﬁning inference games (3.12) when D = D KL .This prompts the following generalization: Example 3.18 (Generalized autoencoder game) . Fix two families of channels F , P and a strategy type Σ as in Example 3.15. Then a (generalized) simple autoencoder game ( Z , Z ) Σ −→ ( X , X ) is any game withplay function P : Σ → BayesLens (( Z , Z ) , ( X , X )) and whose best response function is isomorphic to B ( (cid:104) π | k (cid:105) ) = (cid:10) d (cid:12)(cid:12) d (cid:48) (cid:11) σ (cid:55)→ (cid:40) (cid:10) c (cid:12)(cid:12) c (cid:48) (cid:11) τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( c , c (cid:48) ) ∈ arg min c ∈ F , c (cid:48) ∈ V ( C ( I , Z ) , P ) (cid:18) E z ∼ c (cid:48) π • k • c • π (cid:20) − (cid:90) X log p c ( d k • c • π | z ) (cid:21) + D ( c (cid:48) π • k • c • π , π ) (cid:19)(cid:41) where π and k have respective types I → • Z and X → • X , and D is any measure of divergence between states.As with Bayesian inference games, we can generalize simple autoencoder games to hierarchical and marginal autoencoder games via the corresponding sequential and parallel compositions.The foregoing games have been purely statistically formulated, without capturing the motivatingfeature of an open system as something in interaction with an external environment. Nonetheless, wecan model a simple open system of hierarchical active inference that receives stochastic inputs from anenvironment and emits actions stochastically into the environment, as follows. Example 3.19 (Active inference game) . Let { S i } i be set of spaces of sensory data indexed by hierarchicallevels of abstraction i (for instance, the levels of abstraction might range from representations of wholeobjects to ﬁne details about their texture); similarly, let { A i } i be a set of spaces of possible actions simi-larly hierarchically organized. Consider the marginal autoencoder games ( S i + ⊗ A i , S i + ) (cid:55)→ ( S i + , S i + ) and ( A i + ⊗ S i , A i + ) (cid:55)→ ( A i + , A i + ) coupled via the symmetric monoidal structure ⊗ of C : S i + S i + ×(cid:12) S i S i ⊗ A i A i + A i + ×(cid:12) A i A i ⊗ S i (cid:55)→ S i + S i + ×(cid:12) S i S i ⊗ A i + A i + × (cid:12) A i A i ⊗⊗ ⊗ giving a composite game ( S i + ⊗ A i + , S i + × A i + ) (cid:55)→ ( S i ⊗ A i , S i × A i ) . Recall from [2, §§3.7-3.8] that acomposite game is given by the (sequential and parallel) composition of optics, with best-response givenby the product of the best-responses of the factors. . St. Clere Smithe S i + and A i + .This is not merely a diagrammatic convenience, but coincides with a common ‘mean ﬁeld’ simpliﬁcationin the modelling literature [3, 15]. The dashed box is a functorial box [18] depicting the Yoneda embed-ding; recall that optics in BayesLens were deﬁned over (co)presheaves, and so here we needed to lift themonoidal product on C into a diagram over its presheaf category Cat ( C op , Set ) .Next, compose these games along the hierarchy indexed by i , to obtain a game ( S N ⊗ A N , S N × A N ) (cid:55)→ ( S ⊗ A , S × A ) , such as an element of the following object: S S ×(cid:12)⊗ A A ×(cid:12)⊗ ×(cid:12)×(cid:12) ×(cid:12) S S ×(cid:12) A A ⊗⊗ ⊗⊗ ⊗⊗⊗⊗ ⊗⊗ Given a context with a strong prior about expected sensory states and a continuation that responds toan action of type A by feeding back a state on S , the best response can be shown to be that whichselects actions that, under the current state, maximize the likelihood of obtaining the expected ‘goal’state [3, 11]. Remark 3.20.

We have framed each of these statistical procedures as optimization problems not only tosuggest a link to the utility-maximising agents of game theory, but also because it suggests the use of iter-ative methods to compute best responses; note that computational tractability is an important motivationin the proof of Proposition 3.16.The question of providing such dynamical or, thinking of game composition as an algebra for build-ing complex systems, ‘coalgebraic’ semantics for (generalized) optimization games is the topic of thenext section. We ﬁrst formalize this notion.

Deﬁnition 3.21. An optimization game is any open game whose best response function can be deﬁnedby a function of the form Σ × C π −→ M ϕ −→ P , where Σ is a strategy type, C a context type, M any space,and P a poset. We call ϕ the ﬁtness function , and think of π as projecting systems into a space whosepoints can be assigned a ﬁtness. The best response function of an optimization game can then be deﬁnedby giving the subset of strategies contextually maximizing ﬁtness, for each context c : C . In this section, we begin to answer the question of precisely how the optimization games of the previoussection may be realized in physical systems, such as brains or computers. More formally, this meanswe seek open dynamical systems whose input and output types correspond to the domain and codomaintypes of the foregoing games, such that there is a correspondence between the behaviours of the abstractgames and their dynamical realisations, and such that the evolutions of the internal states of the dynamicalsystems correspond to strategic improvements in game-playing: by concentrating on optimization games,18

Cyber Kittens a natural measure of such improvement is encoded in the ﬁtness function underlying the best-responserelator.We do not require that there is a correspondence between internal states of the realisations and strate-gies for the corresponding games, but we do require that the ﬁtness functions extend to the the total statespaces of the closure of a realisation induced by the context. When there is a correspondence betweeninternal states and strategies, we can take advantage of Deﬁnition 3.5 and interpret trajectories over thestate space as trajectories over strategies witnessing the strategic improvement.We begin by sketching categories of dynamical games, and then use these ideas to deﬁne preliminarynotions of open cybernetic systems and categories thereof. We consider principally single systems whoseunderlying games are atomic (in the sense of Remark 3.9), and leave the study of the behaviour ofinteracting cybernetic systems to future work. Once more, we omit proofs in this section; they willappear in a paper to follow. Deﬁnition 4.1 (Discrete-time dynamical system over C ; after [24, 6]) . A discrete-time dynamicalsystem over C with state space S : C , input type A : C and output type B : C is a lens ( S , S ) (cid:55)→ ( B , A ) over C , i.e. in the following optical hom object: (cid:90) M : C Comon ( C )( S , M ⊗ B ) × C ( M ⊗ A , S ) ∼ = Comon ( C )( S , B ) × C ( S ⊗ A , S ) where the isomorphism follows by Yoneda reduction. Note that this requires that the ‘output’ map of thedynamical system is a comonoid homomorphism in C and hence deterministic in a category of stochasticchannels. Deﬁnition 4.2 (Category of discrete-time dynamical systems) . We deﬁne a category

Dyn C whose objectsare the objects of C and whose morphisms, denoted A S −→ B , are discrete-time dynamical systems; thesymbol above the arrow denotes the internal state space. Hom objects are given by Dyn C ( A , B ) = ∑ S : C Comon ( C )( S , B ) × C ( S ⊗ A , S ) . Identity dynamical systems on each A : C are the ‘no-op’ dynamical systems A A −→ A given by identityoptics id A : ( A , A ) (cid:55)→ ( A , A ) . Associativity and unitality of composition is inherited from the category ofoptics underlying Deﬁnition 4.1; a symmetric monoidal structure is similarly inherited. Deﬁnition 4.3 (Lenses over dynamical systems; after [21]) . The category of (monoidal) lenses over C -dynamical systems has as objects pairs ( X , A ) of objects in C and as morphisms, dynamical lenses . St. Clere Smithe ( X , A ) (cid:55)→ ( Y , B ) , elements of the type XA YB (cid:90) M : C Dyn C ( X , M ⊗ Y ) × Dyn C ( M ⊗ B , A ) ∼ = ∑ P , Q : C (cid:90) M : C C ( P ⊗ X , P ) × Comon ( C )( P , M ⊗ Y ) × C ( Q ⊗ M ⊗ B , Q ) × Comon ( C )( Q , A ) ∑ P , Q : C XA PQ PQ YBP Q . That is, a dynamical lens is a pair of dynamical systems coupled along some ‘residual’ type.

Remark 4.4.

At this point we begin to run into sizes issues. However, for the purposes of this paper, wewill simply assume that a satisfactory resolution of these matters is at hand; for instance, that there is ahierarchy of Grothendieck universes such that the coends over (‘large’) sums in the preceding deﬁnitionconstitute accessible objects.We now expand the deﬁnition of context in the dynamical setting. We will see that a dynamicalcontext is simply a closure of an open dynamical system: that is, a ‘larger’ system into which a ‘smaller’open dynamical system can plug such that the composite is a closed (but still uninitialized) system.

Proposition 4.5. If I is terminal in C , a context for a dynamical lens ( X , A ) (cid:55)→ ( Y , B ) is an element ofthe following type, denoted ˜ C (cid:0) ( X , A ) , ( Y , B ) (cid:1) : ∑ P , Q : C X YP Q AP P Q BQ

Interpreting this diagram, a context for a dynamical lens ( X , A ) (cid:55)→ ( Y , B ) amounts to an autonomousdynamical system with output type of the form X ⊗ M (for some residual type M ), coupled along theresidual M to an open dynamical system with input type Y ⊗ M and output type B ; and the A type isdiscarded. This is precisely what we should expect from a dynamical analogue of Proposition 3.4. Deﬁnition 4.6. A dynamical game is just a generalized open game (3.5) over the category of dynamicallenses. We write ( X , A ) ˜ Σ , S −−→ ( Y , B ) to indicate both the strategy type ˜ Σ and state space S . Dynamicalgames form a symmetric monoidal category in the corresponding way. For notational clarity, we willwrite ˜ G for a dynamical game, ˜ P for its play function, and ˜ B for its best response function.20 Cyber Kittens

Deﬁnition 4.7 (Dynamical realisation of an open game) . Let G : ( X , A ) Σ −→ ( Y , B ) be an open gamewith X , A , Y , B all objects of some symmetric monoidal category C . A dynamical realisation of G is a choice of dynamical game ˜ G : ( X , A ) ˜ Σ , S −−→ ( Y , B ) on the same objects, along with a function (cid:74) · (cid:75) : C (( X , A ) , ( Y , B )) → ˜ C (( X , A ) , ( Y , B )) lifting static contexts to dynamical contexts. Given a context (cid:104) π | k (cid:105) : C (( X , A ) , ( Y , B )) , we choose a representative (cid:104) (cid:74) π (cid:75) | (cid:74) k (cid:75) (cid:105) ∼ = (cid:74) (cid:104) π | k (cid:105) (cid:75) : ˜ C (( X , A ) , ( Y , B )) for its realisation.A ‘dynamical context’ is an element of the type given in Proposition 4 .

5: a context for a dynamicallens. A ‘static context’ is simply a context for the ‘static’ game that is being dynamically realized. Atthis stage, we impose no particular requirements on the context realisation function (cid:74) · (cid:75) , except to say thatin the intended semantics, (cid:74) (cid:104) π | k (cid:105) (cid:75) is a (coupled, open) dynamical system that constantly emits the state π and (by some mechanism) realizes the channel k . We call such a context stationary as neither π nor k vary in time; future work will generalize the results of this section to non-stationary contexts. Deﬁnition 4.8 (Open cybernetic systems) . An open cybernetic system is deﬁned by the data:• an open optimization game (Def. 3.21) G : ( X , A ) Σ −→ ( Y , B ) with X , A , Y , B all objects of somesymmetric monoidal category C ,• a ﬁtness function ϕ G : Σ × C → M ϕ −→ F where C = C (( X , A ) , ( Y , B )) ,• a dynamical realisation (cid:0) ˜ G : ( X , A ) ˜ Σ , S −−→ ( Y , B ) , (cid:74) · (cid:75) : C (( X , A ) , ( Y , B )) → ˜ C (( X , A ) , ( Y , B )) (cid:1) of G ,satisfying the following condition for each context (cid:104) π | k (cid:105) : C (( X , A ) , ( Y , B )) :• there exists a dynamical strategy ˜ σ : ˜ Σ , such that• writing Z for the total state space of the autonomous dynamical system (cid:74) (cid:104) π | k (cid:105) (cid:75) ◦ | ˜ P ( ˜ σ ) induced bythe context, there exists a function ν : Z → M projecting Z into the ‘ﬁtness landscape’ M , such that• there exists a ﬁtness-maximising ﬁxed point ζ ∗ : Z , in the sense that• for some equilibrium strategy of the static system σ ∗ : ﬁx B ( (cid:104) π | k (cid:105) ) , ϕ ( ν ( ζ ∗ )) ≤ ϕ G ( σ ∗ , (cid:104) π | k (cid:105) ) .A category of open cybernetic systems is a category of (generalized) open games such that each gameis an open cybernetic system with dynamics realised in the same category C , and such that the compositeof games is a cybernetic system whose ﬁtness-maximising ﬁxed point projects onto ﬁtness-maximisingﬁxed points of each of the factors in their corresponding local contexts. (See [2, §3.7] for the deﬁnitionof local context.)The idea here is that, by using the ﬁtness function of the underlying optimization game, the cyberneticcondition forces the behaviour of the dynamical realisation to coincide with the process of iterativelyimproving the strategies deployed by the system in playing the game. We summarize the condition in thediagram Σ × C M F ˜ Σ × ˜ C Z (cid:74) · (cid:75) ϕ ﬁx though this is in general ill-deﬁned: we do not require a function (cid:74) · (cid:75) : Σ → ˜ Σ , and nor do we require thatthe best response to ˜ G coincides in any way with the best reponse to G . Investigating such conditionsis the subject of future work; for instance, we may be interested in nested cybernetic systems, such ascharacterize evolution by natural selection, and how their ﬁtness functions constrain one another. Forsimilar reasons, we are also interested in the case where the ﬁtness function is itself non-stationary. . St. Clere Smithe Remark 4.9.

The codomain category of the cybernetic realisation functor is in general much largerthan the domain category of static games, and often it makes sense to consider dynamical games in thiscodomain category as if they were dynamical realisations of static games, even if in fact there is no staticgame to which they could correspond. For instance, adaptive systems in physical environments are ingeneral not realisations of static games because their contexts are irreducibly dynamical and thus notthe dynamical realisation of a static context; but over short time intervals, it can be productive to treatsuch systems as realisations of static games. In continuous time (not treated here), it is even possibleto consider dynamical games that are indeed realisations of games that are static when represented in asmoothly varying coordinate system. The free-energy framework of Theorem 4.10 is an example of acategory of cybernetic systems with a rich underlying category of dynamic games.A classic category of open cybernetic systems is found in the computational neuroscience literature,as summarized in the following theorem.

Theorem 4.10.

Consider the subcategory of

BayesLens spanned by ﬁnite-dimensional Euclidean spaces,with morphisms generated (under sequential and parallel composition) by the (variational) autoencoderand inference games whose forwards and backwards channels emit Gaussian measures with high-precision.The (discrete-time) free-energy framework for action and perception [3] instantiates a category of opencybernetic systems realising games over this subcategory.

Remark 4.11.

Typical presentations of ‘active inference’ under the free-energy principle are excessivelycomplicated by the lack of attention paid to compositionality. Because the free-energy framework in-stantiates a category of open cybernetic systems, a radically simpliﬁed compositional presentation ispossible. Such a presentation forms a companion to the present work.

Corollary 4.12.

The free-energy framework has been used to supply a computational explanation forthe pervasive bidirectionality of cortical circuits in the mammalian brain [1, 10]. A corollary of Theorem4.10 is that this bidirectionality is furthermore justiﬁed by the abstract structure of Bayesian inferenceand its dynamical realisation: because Bayesian updates compose optically, a cybernetic system realisingBayesian inference compositionally must instantiate this structure. We note also that the parallel inter-acting bidirectional structure of the active inference game (Example 3.19) is reproduced in the cortex.The free-energy framework realisation of autoencoder games is not unique; an alternative is found inmachine learning.

Theorem 4.13.

Consider the subcategory of

BayesLens spanned by ﬁnite-dimensional Euclidean spaces,with morphisms generated (under sequential and parallel composition) by the (variational) autoencoderand inference games whose forwards and backwards channels emit exponential-family measures. Thedeep (variational) autoencoder framework [15] instantiates a category of open cybernetic systems realis-ing games over this subcategory.Increasingly, the variational autoencoder framework is used to model complete agents in machinelearning, rather than merely dynamically realise static inference or learning problems. Indeed, thinking ofthe ‘free-energy framework’ as a collection of cybernetic realisations of autoencoder and active-inferencegames, the demonstration of the following corollary of Theorem 4.13 is unsurprising:

Corollary 4.14.

The “deep active inference agent” [27] is a cybernetic system realising an active infer-ence game in the variational autoencoder framework.We have heretofore concentrated on ‘variational Bayesian’ realisations of the games introduced in§3, as they most strikingly ﬁt the language of optimization used there. But we expect any other familyof approximate inference methods to supply a corresponding category of cybernetic systems. We thusmake the following conjecture.22

Cyber Kittens

Conjecture 4.15.

Consider the subcategory of

BayesLens spanned by ﬁnite-dimensional smooth mani-folds, with morphisms generated (under sequential and parallel composition) by the generalized autoen-coder and inference games. We expect sampling algorithms, such as Markov chain Monte Carlo, tosupply a corresponding category of open cybernetic systems of interest.Finally, we provide further justiﬁcation for Remark 3.6.

Observation 4.16.

Consider a variational autoencoder, realised as in Theorem 4.13. By choosing theparameterizations F , P of the forwards and backwards channels to coincide with the state spaces oftheir dynamical realisations, and the (static) play function P to take a parameter vector to the correspond-ing channel, the dynamical realisation induces a trajectory over the strategy space. Such trajectoriesorganize into sheaf whose sections are trajectories of arbitrary length [24], spans of which are againjust (generalized) dynamical systems; these spans are equivalently profunctors [4]. We can thus deﬁnea best-response function valued in profunctors whose elements are trajectories witnessing deviations ofstrategies to ‘better’ strategies, and whose dynamical equilibria correspond precisely to the equilibria ofthe ‘static’ best response function. On-going and Future Work

The structures sketched in this paper are merely ﬁrst steps towards a cat-egorical theory of cybernetics. In particular, since the ﬁrst draft of this work was written, we have cometo believe that the preliminary notions presented here of dynamical realisation, and by extension of opencybernetic system, are substantially less elegant than they could be. On-going work is focusing on thisissue. We hope that a consequence of this reﬁnement will be that the treatment of interacting cyberneticsystems is simpliﬁed. In this new setting, we will also treat non-stationary systems in dynamical contextsand in continuous time, thereby supplying a general compositional treatment of (amongst other things)the ‘free-energy’ framework.Finally, with respect to applications, we are interested in using these tools to realise game-theoreticgames and to investigate the connections between repeated games and dynamical realisation. There aredeep links with reinforcement learning to be explored, and we seek a setting for the study of nested andmutli-agent (‘ecological’) systems.

References [1] A. M. Bastos, W. M. Usrey, R. A. Adams, G. R. Mangun, P. Fries & K. J. Friston (2012):

Canonicalmicrocircuits for predictive coding . Neuron

Bayesian open games . Available at http://arxiv.org/abs/1910.03656v1 .[3] Christopher L Buckley, Chang Sub Kim, Simon McGregor & Anil K Seth (2017):

The free energyprinciple for action and perception: A mathematical review . Journal of Mathematical Psychol-ogy

81, pp. 55–79, doi:10.1016/j.jmp.2017.09.004. Available at http://arxiv.org/abs/1705.09156v1 .[4] Jean Bénabou (2000):

Distributors at work . Available at . Lecture notes written by Thomas Streicher.[5] Kenta Cho & Bart Jacobs (2017):

Disintegration and Bayesian Inversion via String Diagrams . Math. Struct. Comp. Sci. 29 (2019) 938-971 , doi:10.1017/S0960129518000488. Available at http://arxiv.org/abs/1709.00322v3 . . St. Clere Smithe Profunctor optics, a categorical update . Available at http://arxiv.org/abs/2001.07488v1 .[7] Bob Coecke & Aleks Kissinger (2016):

Categorical Quantum Mechanics II: Classical-QuantumInteraction . doi:10.1142/S0219749910006502. Available at http://arxiv.org/abs/1605.08617v1 .[8] Bob Coecke & Aleks Kissinger (2017):

Categorical Quantum Mechanics I: Causal Quantum Pro-cesses . In Elaine Landry, editor:

Categories for the Working Philosopher , chapter 12, OxfordUniversity Press, pp. 286–328. Available at https://arxiv.org/abs/1510.05468v3 .[9] J. Nathan Foster, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce & Alan Schmitt(2007):

Combinators for bidirectional tree transformations . ACM Transactions on ProgrammingLanguages and Systems

The free-energy principle: a uniﬁed brain theory?

Nature Reviews Neuro-science

Active inference and epistemic value . Cognitive Neuroscience

A synthetic approach to Markov kernels, conditional independence and theo-rems on sufﬁcient statistics . Available at http://arxiv.org/abs/1908.07021v8 .[13] Neil Ghani, Jules Hedges, Viktor Winschel & Philipp Zahn (2016):

Compositional game theory . Proceedings of Logic in Computer Science (LiCS) 2018 , doi:10.1145/3209108.3209165. Availableat http://arxiv.org/abs/1603.04641v3 .[14] Chris Heunen, Ohad Kammar, Sam Staton & Hongseok Yang (2017):

A Convenient Category forHigher-Order Probability Theory . doi:10.1109/lics.2017.8005137.[15] Diederik P. Kingma (2017):

Variational Inference & Deep Learning . Available at https://hdl.handle.net/11245.1/8e55e07f-e4be-458f-a929-2f9bc2d169e8 .[16] Jeremias Knoblauch, Jack Jewson & Theodoros Damoulas (2019):

Generalized Variational Infer-ence . Available at http://arxiv.org/abs/1904.02063v4 .[17] Fosco Loregian (2015):

This is the (co)end, my only (co)friend . Available at http://arxiv.org/abs/1501.02503v4 .[18] Paul-André Melliès (2006):

Functorial Boxes in String Diagrams . In:

Computer Science Logic ,Springer Berlin Heidelberg, pp. 1–30, doi:10.1007/11874683_1.[19] Joe Moeller & Christina Vasilakopoulou (2018):

Monoidal Grothendieck Construction . Availableat http://arxiv.org/abs/1809.00727v2 .[20] nLab authors (2020):

Grothendieck construction . Available at http://ncatlab.org/nlab/show/Grothendieck+construction . Revision 62.[21] Mitchell Riley (2018):

Categories of Optics . Available at http://arxiv.org/abs/1809.00738v2 .[22] Mario Román (2020):

Open Diagrams via Coend Calculus . Available at http://arxiv.org/abs/2004.04526v2 .24

Cyber Kittens [23] Mario Román (2020):

Profunctor optics and traversals . Available at http://arxiv.org/abs/2001.08045v1 .[24] Patrick Schultz, David I Spivak & Christina Vasilakopoulou (2019):

Dynamical Systems andSheaves . Applied Categorical Structures , pp. 1–57, doi:10.1007/s10485-019-09565-x. Available at http://arxiv.org/abs/1609.08086v4 .[25] Toby St. Clere Smithe (2020):

Bayesian Updates Compose Optically . Available at http://arxiv.org/pdf/2006.01631v1 .[26] David I. Spivak (2019):

Generalized Lens Categories via functors C op → Cat . Available at http://arxiv.org/abs/1908.02202v2 .[27] Kai Ueltzhöffer (2018):

Deep Active Inference . Biological Cybernetics http://arxiv.org/abs/1709.02341v5http://arxiv.org/abs/1709.02341v5