aa r X i v : . [ c s . A I] N ov Geometry of Friston’s active inference
Martin [email protected] Inc.,TokyoNovember 21, 2018
Abstract
We reconstruct Karl Friston’s active inference and give a geometricalinterpretation of it.
In Biehl et al. (2018) we have reconstructed the active inference approach asused in Friston et al. (2015). Here we present a radically shortened and moregeneral account of active inference. We also present for the first time a geomet-rical interpretation of active inference.We use the same notation and model as in Biehl et al. (2018). There readerscan also find a translation table to the notations used in Friston et al. (2015,2016). E E S S A A M M E S Figure 1: Bayesian networks of the PA-loopThe active inference agent we are describing in the following interacts withan environment according to the perception-action loop defined by the Bayesiannetwork in Section 1. There, we write E t for the environment state, S t for thesensor value, A t for the action, and M t for the memory state of the agent attime step t . We assume that for all time steps t the transition dynamics of the1 E ˆ E ˆ S ˆ S ˆ A ˆ A ˆ E ˆ S Θ Θ Θ Figure 2: Bayesian network of the generative model.environment p( e t +1 | a t +1 , e t ) and the dynamics of the sensors p( s t | e t ) are time-homogenous and fixed. We further assume that the agent has perfect memory,i.e. at all times m t := sa ≺ t .The difference to the setup of reinforcement learning for partially observableMarkov decision problems is that there is no explicit reward signal. Instead theagent uses a motivation functional M which evaluates the agent’s current beliefsabout the consequences of actions (see Section 3). A standard reward signal R t can easily be added as an additional sensor value by letting S t → ( S t , R t ).With the environment, sensor, and memory dynamics fixed it remains tospecify the action generation mechanism p( a t | m t ). We first describe this in twoseparate steps, inference and action selection , that can be performed one afterthe other. Then we show how active inference combines the two steps in oneoptimization procedure. On the way we also define motivation functionals.Action generation in more detail: The agent employs a parameterized modelin order to predict the consequences of its actions. At each time step it receivesa new sensor value in response to an action and updates its model by condi-tioning on the new memory state. Additionally conditioning on the variouspossible future actions (or policies) results in a conditional probability distri-bution which we call the active posterior . The active posterior represents theagent’s beliefs about the consequences (for future sensors, latent variables, andinternal parameters) of its actions. Obtaining the active posteriors is referredto as the inference step. Subsequently, the agent constructs (using softmax)a probability distribution over future actions by assigning high probability tothose actions whose entries in the active posterior achieve high values whenplugged into the motivation functional (e.g. expected future reward if there isan explicit one). This results in the what we call the induced policy . Obtainingthe latter is referred to as the action selection step. Afterwards the agent cansimply sample from this induced policy to generate actions.In active inference, both inference and action selection are turned into op-timization problems and then combined in a multiobjective optimization. The2nference step can be turned into an optimization using variational inference(see e.g. Bishop, 2011; Blei et al., 2017). Variational inference introduces avariational version of the active posterior. Since the variational active posteriorgenerally differs from the original/true active posterior it leads to a, generallydifferent, variational induced policy . Action selection is then formulated as anoptimization by introducing an additional (third) policy whose divergence fromthe variational induced policy is to be minimized. Active inference then op-timizes a sum of the respective objective functions and afterwards the agentcan sample from the third policy. With a bit of notational trickery this can bewritten in the form Friston uses, which looks similar to a variational free energy(or evidence lower bound). The agent performs inference on the generative model given by the Bayesiannetwork in Section 1. The variables that model variables occurring outside ofp( a | m ) in the perception-action loop (Section 1), are denoted as hatted versionsof their counterparts. To clearly distinguish the probabilities defined by thegenerative model from the true dynamics, we use the symbol q instead of p.Here, θ , θ , θ are the parameters of the model. To save space,write θ :=( θ , θ , θ ).The last modelled time step ˆ T can be chosen as ˆ T = T ( T is the final stepof the PA-loop), but it is also possible to always set it to ˆ T = t + n , in whichcase n specifies a future time horizon from current time step t . Active posterior
At time t the agent plugs its experience sa ≺ t into its genera-tive model by setting ˆ A τ = a τ , for τ < t and ˆ S τ = s τ , for τ < t and conditioningon these. The consequences of future actions can be obtained by additionallyconditioning on each possible future action sequence ˆ a t : ˆ T . This leads to the con-ditional probability distribution that we call the active posterior (the experience sa ≺ t is considered fixed): q(ˆ s t : ˆ T , ˆ e
0: ˆ T , θ | ˆ a t : ˆ T , sa ≺ t ) (1) Variational active posterior
In active inference the active posterior is ob-tained via variational inference (see e.g. Blei et al., 2017).We write r (instead of p or q) to indicate variational probability distri-butions and φ for the entire set of variational parameters. Note, the pa-rameter φ contains parameters for each of the future action sequences ˆ a t : ˆ T i.e. φ = ( φ ˆ a t : ˆ T ) ˆ a t : ˆ T ∈A ˆ T − t +1 . The variational active posterior is of the formr(ˆ s t : ˆ T , ˆ e
0: ˆ T , θ | ˆ a t : ˆ T , φ ). To construct the variational active posterior we cyclethrough the possible future action sequences ˆ a t : ˆ T and compute each entry. Fora fixed ˆ a t : ˆ T the variational free energy, also known as the (negative) evidence3ower bound (ELBO) in variational inference literature, is defined as: F [ˆ a t : ˆ T ,φ, sa ≺ t ] := X ˆ s t : ˆ T , ˆ e
0: ˆ T Z r(ˆ s t : ˆ T , ˆ e
0: ˆ T , θ | ˆ a t : ˆ T , φ ) log r(ˆ s t : ˆ T , ˆ e
0: ˆ T , θ | ˆ a t : ˆ T , φ )q( s ≺ t , ˆ s t : ˆ T , ˆ e
0: ˆ T , θ | ˆ a t : ˆ T , a ≺ t ) d θ (2)Then variational inference amounts to solving for each ˆ a t : ˆ T the optimisationproblem: φ ∗ ˆ a t : ˆ T ,sa ≺ t := arg min φ F [ˆ a t : ˆ T , φ, sa ≺ t ] . (3) Let ∆ AP be the space of active posteriors. Then a motivation functional isa map M : ∆ AP × A ˆ T − t +1 → R taking active posteriors d( ., ., . | . ) ∈ ∆ AP and a sequences of future actions ˆ a t : ˆ T to a real value M (d( ., ., ., | . ) , ˆ a t : ˆ T ) ∈ R .An example of a motivation functional is the expected value of the sum overfuture rewards (if one of the sensor values is defined as the reward). Otherpossibilities can be found in Biehl et al. (2018). Now define for some γ ∈ R +0 andsome motivation functional M a softmax operator σ M γ mapping active posteriorsd( ., ., . | . ) to probability distributions over future action sequences: σ M γ [d( ., ., . | . )](ˆ a t : ˆ T ) := 1 Z ( γ ) e γ M (d( .,.,. | . ) , ˆ a t : ˆ T ) . (4)Then we call q(ˆ a t : ˆ T | sa ≺ t ) := σ M γ [q( ., ., . | ., sa ≺ t )](ˆ a t : ˆ T ) (5)the induced policy of active posterior q( ., ., . | ., sa ≺ t ) and M . And we callr(ˆ a t : ˆ T | φ ) := σ M γ [r( ., ., . | ., φ )](ˆ a t : ˆ T ) (6)the induced variational policy of variational active posterior r( ., ., . | ., φ ) and M .Note that,if φ is not the optimized parameter φ ∗ sa ≺ t then the induced vari-ational policy cannot be expected to lead to actions that actually reflect thepreferences encoded in M . On the other hand, if the true active posterior, orthe optimized variational active posterior are used and γ → ∞ the induced pol-icy should correspond to the best guess for an agent with the given generativemodel, variational distributions, and motivation M . Now introduce an additional variational policy s(ˆ a t : ˆ T | ρ ) parameterized by ρ inorder to approximate the induced variational policy r(ˆ a t : ˆ T | φ ) of the variational4( ., ., . | ., sa ≺ t ) r( ., ., . | ., φ ) s(ˆ a t : ˆ T | ρ )r(ˆ a t : ˆ T | φ )q(ˆ a t : ˆ T | sa ≺ t ) D D σ M γ σ M γ Figure 3: On top the space of probility distributions over future action sequences∆ ˆ A ˆ T − t +1 on the bottom the space of active posteriors ∆ AP .posterior (action selection). To make this happen we can minimize the Kullback-Leibler divergence between the two.To do inference and action selection at once we can then minimize the sum ofthe variational free energy (Equation (2)) and the Kullback-Leibler divergencew.r.t. ρ, φ (for an illustration of the situation see Figure 3): X ˆ a t : ˆ T s(ˆ a t : ˆ T | ρ ) F [ˆ a t : ˆ T , φ, sa ≺ t ] | {z } D + KL[s( ˆ A t : ˆ T | ρ ) || r( ˆ A t : ˆ T | φ )] | {z } D (7)If φ and ρ are optimized the agent can then sample actions from s(ˆ a t : ˆ T | ρ ).Now if we change notation and let s(ˆ a t : ˆ T | ρ ) → r(ˆ a t : ˆ T | ρ ) and r(ˆ a t : ˆ T | φ ) → q(ˆ a t : ˆ T | φ ) then the above can be rewritten as: X ˆ a t : ˆ T , ˆ s t : ˆ T , ˆ e
0: ˆ T Z r(ˆ s t : ˆ T , ˆ e
0: ˆ T , θ, ˆ a t : ˆ T | ρ, φ ) log r(ˆ s t : ˆ T , ˆ e
0: ˆ T , θ, ˆ a t : ˆ T | ρ, φ )q( s ≺ t , ˆ s t : ˆ T , ˆ e
0: ˆ T , θ, ˆ a t : ˆ T | φ, a ≺ t ) d θ. (8)This looks similar to a variational free energy or evidence lower bound and isthe form of the free energy found in Friston et al. (2016). What distinguishesit from a true variational free energy is the occurrence of the parameter φ innumerator and denominator. Acknowledgement
I want to thank Manuel Baltieri and Thomas Parr for helpful discussions aswell as Christian Guckelsberger, Daniel Polani, Christoph Salge, and Sim´on C.5mith, for feedback on this article.
References
Biehl, M., Guckelsberger, C., Salge, C., Smith, S. C., and Polani, D. (2018).Expanding the Active Inference Landscape: More Intrinsic Motivations inthe Perception-Action Loop.
Frontiers in Neurorobotics , 12.Bishop, C. M. (2011).
Pattern Recognition and Machine Learning . InformationScience and Statistics. Springer, New York.Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational Inference:A Review for Statisticians.
Journal of the American Statistical Association ,112(518):859–877.Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., and Pezzulo, G.(2016). Active Inference: A Process Theory.
Neural Computation , 29(1):1–49.Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., and Pezzulo,G. (2015). Active Inference and Epistemic Value.