aa r X i v : . [ m a t h . O C ] M a y A Bayesian perspective on classical control
Manuel Baltieri
Laboratory for Neural Computation and Adaptation, RIKEN Centre for Brain Science , Wako, Saitama, [email protected]
Abstract —The connections between optimal control andBayesian inference have long been recognised, with the fieldof stochastic (optimal) control combining these frameworksfor the solution of partially observable control problems. Inparticular, for the linear case with quadratic functions andGaussian noise, stochastic control has shown remarkable resultsin different fields, including robotics, reinforcement learningand neuroscience, especially thanks to the established duality ofestimation and control processes. Following this idea we recentlyintroduced a formulation of PID control, one of the most popularmethods from classical control, based on active inference, a theorywith roots in variational Bayesian methods, and applications inthe biological and neural sciences. In this work, we highlightthe advantages of our previous formulation and introduce newand more general ways to tackle some existing problems incurrent controller design procedures. In particular, we consider1) a gradient-based tuning rule for the parameters (or gains) ofa PID controller, 2) an implementation of multiple degrees offreedom for independent responses to different types of signals(e.g., two-degree-of-freedom PID), and 3) a novel time-domainformalisation of the performance-robustness trade-off in termsof tunable constraints (i.e., priors in a Bayesian model) of a singlecost functional, variational free energy.
Index Terms —PID control, active inference, Bayesian infer-ence, optimal control, optimal tuning, performance-robustnesstrade-off
I. I
NTRODUCTION
In the last few decades, the importance of probabilisticapproaches to optimal control theory has been highlightedby different applications of Bayesian methods to problems ofcontrol. In his pioneering work, Bellman introduced Markovdecision processes [1] as part of what is now known asstochastic optimal control [2]–[6]. This formulation capturedthe intrinsic probabilistic nature of problems of optimal controland decision making, with state transitions, outcomes andactions/decisions that cannot always be easily described inpurely deterministic terms. Bellman’s approach extended onhis own work on the dynamic programming method for(deterministic) optimal control, defining the Bellman equation,its uses and limitations, including the idea of the curse of di-mensionality [7]. Shortly after, Kalman introduced the notionsof observability and controllability of a system [8], with theformer expressing the degree to which states can be estimatedfrom noisy observations, and the latter representing the degreeof control over a system when different manipulations areapplied. Kalman also noticed that his filter was dual to thelinear quadratic regulator (LQR), a now well known method in
MB is a JSPS International Research Fellow supported by a JSPS Grant-in-Aid for Scientific Research (No. 19F19809). (optimal) control theory that he also established [9], showinghow both solutions require solving Riccati equations, forwardin time for filtering (error covariance matrix) and backwardin time for control (Hessian of the cost-to-go function). Inthe following years, several results improved the treatmentof stochastic optimal control problems, including for instancethe separation principle [4], [10] and its applications to thetreatment of regulation in the presence of uncertainty, i.e.,for (linear) partially observable control problems. Due toits analytical tractability and its combination of estimationand control algorithms, the linear quadratic framework (i.e.,stochastic optimal control for linear state-space models withGaussian white noise and quadratic cost functions) has sincethen become a standard approach in different fields, includingnot only control theory and engineering [3], [4], but alsorobotics [13] and neuroscience [14], [15].In recent years, the results based on the notion of dualityin the linear case have been extended to (some classes of)nonlinear systems [16]–[18], highlighting further connectionsbetween control and estimation. Notably, these extended du-alities often rely on more efficient variational approximationscommonly used in problems of inference. For instance, rele-vant advances in stochastic optimal control and reinforcementlearning have been driven by the use of methods commonlyadopted to approximate intractable problems of Bayesianinference, e.g., variational Bayes. These methods have beenshown to outperform standard dynamic programming andreinforcement learning algorithms for the control of differentclasses of problems [6], [16], [19], [20]. Building on theseideas, a similar approach has been proposed and adopted inneuroscience in an attempt to characterise brain function andsensorimotor control under a unifying probabilistic framework:active inference. While a full treatment of this framework isbeyond the scope of the present work (for some technicalreviews see, for instance, [21], [22]), we highlight how activeinference combines methods from machine learning (varia-tional Bayes), control theory (stochastic control) and statisticalinference (hierarchical and empirical Bayes) to form a theorythat includes several existing results from different fields asspecial cases, from predictive coding, to the infomax principle,to statistical models of learning, to risk-sensitive and KL-control [22]–[27].Most of these results rely, at the moment, on the applicationof (approximate) Bayesian approaches to optimal control, with Also known in econometrics as certainty equivalence property [11], [12],but see [4] for a possible distinction. lmost no mention of classical control methods. While classi-cal methods can be seen as a special case of optimal control,the possible advantages specific to Bayesian formulations ofclassical algorithms such as Proportional-Integral-Derivative(PID), remain largely unexplored. In this work we look at clas-sical controllers from the perspective of approximate Bayesianinference, discussing the implications of variational Bayesianmethods for the future of, in particular, PID control [28]. Thisperspective has previously been adopted in, for instance, [29]where a new gradient-based gain tuning rule was derived inclosed-form for optimal regulation near the set-point/referencegoal. In the next sections we present three cases in supportof a new (Bayesian) framework to design and study classicalcontrol methods that ought to be seen as complementaryto existing ones, e.g., optimal control and frequency-domainanalysis. We will first, briefly, 1) recapitulate the previouslyderived results for gain tuning introducing new connectionsto path integral control and estimation in the presence ofbiases, then 2) present a more formal and in depth treatmentof the connections between PID controllers with two degreesof freedom and Bayesian inference schemes, and finally 3)focus on the open challenge of framing different competingconstraints of the performance-robustness trade-off in PIDcontrol, here defined in terms of priors and hyperpriors ona probabilistic generative model.II. C
ASE
1: A B
AYESIAN DERIVATION OF
PID
GAINS ANDTHEIR OPTIMISATION
PID controllers are the most popular choice for SISOsystems regulation in different areas of engineering [30]. Theirpopularity is mainly due to their simplicity and relativelylow number of tunable parameters. However, despite onlyincluding a few key parameters, or gains, their tuning (oroptimisation) remains largely an open challenge [28], [31].Existing tuning methods are often limited to specific casesor applications, relying on (ad-hoc) analytical rules, simpleheuristics, frequency domain analysis, optimisation (includingthe use of artificial neural networks) or a combination of theabove (for a survey, see [31]), that hardly generalise acrossdifferent classes of problems. Here we report a more generalmethod than can be explicitly derived by taking a different,Bayesian perspective on control problems.Previous work relating classical control to optimal ob-servers, and thus indirectly to Bayesian methods, showedthat the integral component of PID controllers correspondsto a process of estimation of unknown (but linear/constantor step) perturbations, equivalent to a Kalman-Bucy filterwith augmented state for the inference of unknown inputs (orbiases) [32], [33]. Using the same approach, this connectionwas then generalised to higher order polynomial disturbances,equivalent to controllers including further integration terms[34], i.e., corresponding to PIID, PIIID, etc. controllers. In[29], we derived a fully probabilistic version of PID control,highlighting in particular some of the relationships betweenintegral control and an emerging framework in computationaland cognitive neuroscience, active inference [25], [26]. Using active inference, we thus defined a more explicit generativemodel to describe an underlying stochastic process produc-ing PID control as a gradient descent of a cost functional,variational free energy. While these two approaches, [34] and[29], share a number of features, they also present some coretechnical differences. Our proposal in fact includes: • a more direct interpretation of the control matrix R ,commonly used as a weight for the cost of control inthe value (or cost-to-go) function [3], [4], • a gradient-based algorithm to optimise R , and • a generalisation to (some classes of) nonlinear problems.As shown in [32], [34], the control matrix R is particularlyrelevant for the computation of the gains of PID controllers,here treated as part of the feedback matrix of a linear quadraticcontroller. In active inference, such gains correspond to spe-cific hyperparametrisations of the linear state space (genera-tive) model used to approximate the dynamics of the systemto control, i.e., the (expected) precision, or inverse covariance,of the observation noise [29]. This result is closely relatedto Kalman’s duality of inference and control [8], [18], [35],highlighting the mathematical correspondence between theprocesses of stochastic estimation and deterministic control,At the same time, the active inference formulation extendsthis duality beyond simply noting mathematical similarities,in order to include an account of the dual role of action inthe context of exploration/exploitation problems [36], [37].Furthermore, given the role of R as (expected) precision inthe generative model, the gain parameters of PID controllerscan be optimised via a gradient descent on the same costfunctional, i.e., variational free energy [29], following a secondorder scheme introduced in [38] that, under some assumptions,holds also for some classes of nonlinear problems [24], [38].III. C ASE
2: PID
CONTROL WITH
IN ACTIVEINFERENCE
In many applications of PID control, it is often desirableto build regulators that can respond to external disturbanceswhile avoiding large fluctuations (e.g., overshooting) due tochanges of the target of the regulation process. In standardPID control, these requirements are shown to be conflicting[39], [40] thus leading, in the most general case, to a multi-objective optimisation problem whose solutions lie on a Paretofront defined by • changes in the control target (i.e., set-point response), and • changes in the amplitude of a step disturbance (i.e.,disturbance response).To overcome the limitations induced by this trade-off, previouswork (see [28], [40] and references therein) introduced theidea of controllers with two degrees of freedom, or 2DOF,PID. Multiple degrees of freedom obtained by augmentingcontrollers with multiple internal loops of PI or PID control(see also equivalent examples such as PI-PD control [28]), thenensure that different constraints can be treated independently,using parameters from different sub-loops to encode separatedesired behaviours [28], [40].ur probabilistic derivation of PID control via a variationalapproximation of Bayesian inference showcases a clear anddirect interpretation of the presence of two degrees of freedom,here derived using rather general arguments. Unlike previousproposals, one need not augment a controller with an extrafeedforward component that can separate the effects of acompensator for disturbances or set-point changes [40]. Inactive inference, the existence of two degrees of freedom is asimple consequence of the probabilistic (Bayesian) descriptionof the generative model used to derive a controller [29]. Thisbecomes more obvious after looking at the variational freeenergy (see equation (13) in [29]) here reported as F ≈ (cid:20) µ π ˜ z (cid:16) ˜ y − g (˜ µ x , ˜ µ v ) (cid:17) + µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) (cid:21) (1)where y, µ x , µ v are observations (or measurements), and ex-pected hidden states (the estimate of the state of the system toregulate) and inputs (the set-point) respectively. Hyperparam-eters µ π z , µ π w are the expected precisions on observation andsystem noise, and f () , g () are state transition and observationfunctions. The tilde simply highlights a notation used togroup different derivatives, or rather embeddings orders, ofa variable, e.g., ˜ y = { y, y ′ , y ′′ } , see [29] for more details.The simplified (i.e., under Gaussian assumptions) variationalfree energy functional in (1) contains two sets of predictionerrors, essentially instantiating two degrees of freedom for thecontroller. Notice that unlike equation (13) in [29], here weexplicitly replaced π ˜ z , π ˜ w with µ π ˜ z , µ π ˜ w from the beginning,to highlight the fact that hyperparameters µ π ˜ z , µ π ˜ w are only estimates of some “true” hyperparameters π ˜ z , π ˜ w . This fol-lows from a full Bayesian treatment of the control problem,considering all variables to be random variables [41] (cf. tradi-tional point-estimates in frequentist frameworks for statisticallearning). In our case, to simplify the mathematical treatment,we treat them as Gaussian random variables with means µ π ˜ z , µ π ˜ w (and covariances to be discussed in the next section).Importantly, these expectations are updated on a slower timescale [29], following schemes found in [23] and in particular[38], emphasising how parameters and hyperparameters of agenerative model ought to be considered as fixed quantitiesover a certain (i.e., long) time scale.The two sets of prediction errors, µ π ˜ z (cid:16) ˜ y − g (˜ µ x , ˜ µ v ) (cid:17) and µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) weighted by hyperparameters µ π ˜ z and µ π ˜ w , represent likelihood and prior of a Bayesian updatescheme formulated using generative models under Gaussianassumptions, the Laplace [42] and the variational Gaussian[43] approximations (to clarify their role see discussion inChapter 3 of [27]). The update equations minimising theseprediction errors [29] (also called recognition dynamics [44]),are similar to the update and prediction steps of standard Some terms in the free energy functional are hereby dropped for clarity.For the treatment of an extra set of terms important for the optimisation ofPID gains, see [29]. For a more complete discussion of other terms which areconstant during the optimisation phase, see [21], [23], [38]. algorithms from estimation theory such as Kalman-Bucy filters[45], and equivalent to feedback and feedforward loops in2DOF PID controllers [40]. In this set up, PID control with asingle degree of freedom can be derived as the limit case forfully observable states, i.e., ˜ y = ˜ x (cf. state feedback meth-ods in [30]). The independence of set-point and disturbanceresponses crucial for 2DOF PID controllers then corresponds,in this framework, to a generative model having system andmeasurement noise independent of one another, a standardassumption for linear state space models.IV. C ASE
3: T
HE PERFORMANCE - ROBUSTNESS TRADE - OFFFOR
PID
CONTROLLERS UNDER ACTIVE INFERENCE
The presence of conflicting criteria for the design of PIDcontroller is a well known issue in the control theory literature[46], as partially highlighted in the previous section. Thisconflict is often referred to as the performance-robustnesstrade-off [28], [47], [48]. Controllers are usually designed tooptimise some given performance criteria while, at the sametime, attempting to maintain a certain level of robustnessin face of uncertainty and unexpected conditions during theregulation process. The performance of a controller is normallyassessed using one or more of the following criteria [28], [47]: • load disturbance response , or how a controller reacts tochanges in external inputs, e.g., a step input, • set-point response , or how a controller responds to dif-ferent set-points over time, • measurement noise response , or how noise on observa-tions impacts the regulation process,while robustness is mainly evaluated based on: • robustness to model uncertainty , or how uncertainty onplant/environment dynamics affects the controller.The goal of a general methodology for the design and tuning ofPID controllers is to bring together these (and possibly more)criteria into a formal, unified and tractable framework thatcan be applied to a large class of compensation problems. Anexample in this direction is presented in [49] (see also [50]–[52] for other partial attempts). This methodology is basedon the maximisation of the integral gain (equivalent, near thereference point, to the minimisation of the integral of the errorfrom the set-point [39]), subject to constraints derived froma frequency domain analysis related to the Nyquist stabilitycriterion applied to the controlled system [49]. Here, wepropose our Bayesian formulation as an alternative (and inmany cases complementary) framework for the design of PIDcontrollers that leverages the straightforward interpretation ofthe performance-robustness trade-off for PID controllers interms of uncertainty parameters (i.e., hyperparameters, pre-cisions or inverse covariances) in the variational free energy[29]. To highlight its potential, we discuss the four standardcriteria listed above as part of performance-robustness trade-off to address what can be gained using a Bayesian perspective. A. Load disturbance response
A classic design principle for PID controllers is basedon the response of a controller to perturbations that drive process away from the target value [39]. Random, zero-mean disturbances are commonly modelled as white Gaussianvariables, and the parameters of the controller are simplytuned to reject such noise. Integral control then guarantees anappropriate response to step disturbances, equivalent to non -zero-mean noise (or to a bias term [33]), by accumulatingand compensating for the ensuing steady-state error [32], [39],[53], [54]. The load disturbance response is usually expressedin terms of a minimisation of the Integral Absolute Error (IAE)between the state of the system to regulate and its target:
IAE = Z ∞ o | e ( t ) | d t (2)or approximated by the Integral Error (IE) for non-oscillatingor oscillating but well-damped systems [39]: IE = Z ∞ o e ( t ) d t (3)The IE criterion is especially relevant because it gives astraightforward intuition of the role of integral gain since,under a few simplifying assumptions (including a system’sinitial state close to the target value), the IE is equal to theinverse of k i as t → ∞ [39]. This implies that for large(theoretically, infinite) integral gains, the IE is minimised.While useful for its straightforward interpretation of this freeparameter, practical and physical limitations often restrict themaximisation of the integral gain.Our formulation builds on previous work showing how theuse of integral control is optimal for unknown step pertur-bations applied to a system [32], [54]. In statistical terms,the presence of such disturbances can be formally seen as a bias term in an estimation process [55], showing how rejecting(step) perturbations is equivalent to estimating biases [33]. Inactive inference we can extend this (exact) result for linearsystems and disturbances to nonlinear cases (not limited topolynomial perturbations as in [34]), where a more general(but often only approximate or limited to special classes ofnonlinearities) duality of estimation and control is obtainedusing variational and path integral formulations [16], [18], orvia probability integral transforms in the form of hierarchicalgenerative models [24].Furthermore, in our (Bayesian) formulation we gain asecond and arguably deeper intuition on the role of theintegral gain, which is now explicitly represented as one ofthe expected precisions (or inverse covariance) of observations ˜ y , i.e., µ π z , see [29]. This prescribes a simple and alternativeway of understanding why the maximisation of k i is usuallya good heuristic for regulation problems where PID control isused [28]: maximising k i is in fact equivalent to minimisinguncertainty on measurements y , by maximising (minimising)the expected precision µ π z (variance µ σ z ) of the measurementsof the system to regulate. At the same time, this can alsoexplain some of the limitations of this heuristic, discussed inthe frequency domain for instance in [49]. The maximisation of k i , without any constraints, corresponds to the minimisationof the expected measurement variance µ σ z , such that t → ∞ , µ σ z → . (4)In practice, however, one should always consider a certainlevel of intrinsic, i.e., aleatoric, uncertainty whose varianceis fundamentally irreducible. Even an optimal controller can’tovercome the limited sensitivity of a sensor (here representedby the “real” σ z , as opposed to its estimate µ σ z ), bringing µ σ z down to 0 is thus not possible if σ z > . In otherproposals [49], the same aleatoric uncertainty σ z is effectivelyapproximated with a measure that captures the levels ofcontrollability of a system through the definition of appropriatesensitivity functions in the frequency domain.Our Bayesian implementation also extends the intuition be-hind the integral gain as a precision of observations to the othertwo gains, k p and k d . In our formulation, these gains becomein fact estimated precisions of higher embedding orders of theobservations, y ′ , y ′′ , often also called generalised coordinatesof motion [23], [38]. These embedding orders essentiallyrepresent a Taylor expansion (in time) of continuous randomvariables defined according to a Stratonovich (rather than Ito)interpretation, equivalent to non-Markovian (semi-Markovian,of Markovian of order n ) stochastic processes [24], [29],[45]. In practice, for measurements taken at a high enoughfrequency, and with controllers having a short enough intrinsictime scale to regulate such high frequency measurements (i.e.,a time scale approaching the underlying continuous models ofthe systems to regulate), observation noise should be treated ascoloured, rather than white as in standard delta-autocorrelatednoise. Under these assumptions, the implementation of PIDcontrol and its extensions (e.g., multiple I and D terms)becomes simply a linear approximation of a measured non-Markovian trajectory. Perhaps in a more intuitive way, wecan see expected precisions µ π ˜ z = { µ π z , µ π z ′ , µ π z ′′ } assimultaneously 1) representing the precision of a trajectoryin the state-space (rather than the precision on a point) and2) regulating the convergence rate of measurements to aset- trajectory (rather than point), specifying how quickly acontroller ought to respond to a sudden change in a set ofobservations and their higher orders of motion. B. Set-point response
Following the load disturbance rejection property, a secondperformance criterion used for the design of PID controllersis their set-point response, i.e., how controllers respond tovariations in the set value used as a target to regulate asystem. Naively, this could be seen as closely related to loaddisturbances: rather than changes in the measurement, wenow have changes in the target value, both of them used todefine some error term, e . In practice however, it is desirableto decouple these two problems, creating a controller withdifferent sensitivities to load disturbances or set-point updateswhenever necessary [39]. This requires a controller with twodegrees of freedom, as discussed in more detail in section III,which is an inherent feature of the active inference formulationhere expectations of hidden states ˜ µ x are updated using a(Bayesian) scheme that balances (via a set of independentexpected precisions, or weights) prediction errors on both • observations, µ π ˜ z (cid:16) ˜ y − g (˜ µ x , ˜ µ v ) (cid:17) , where load distur-bances can appear as part of the measurements ˜ y , and • system dynamics, µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) , where set-trajectories can be updated as inputs/priors ˜ µ v .Mirroring the role of µ π ˜ z for load disturbances, expectedprecisions µ π ˜ w on dynamic prediction errors effectively im-plement a response mechanism to set-trajectories updates,with high expected precisions implying a fast response, andlow precisions entailing a slow one. Equivalently, from aprobabilistic perspective this can be explained with the ideathat the former describes a model with low uncertainty ondynamics (high precision = low covariance) meaning that anyvariation from such dynamics should quickly be dealt with; thelatter encodes, on the other hand, the fact that high expectedcovariance allows for changes in set-trajectories, i.e., suddenupdates are not surprising, therefore changes can be slow (andin the limit for very large covariances, almost absent). C. Measurement noise response
A third common requirement for PID controllers is re-lated to their performance in face of noisy or uncertainmeasurements. These may be due, for instance, to physicalconstraints/sensitivities of the sensors. In the literature, highfrequency measurement noise [30] is usually tackled via acareful and ad-hoc controller design, including for examplepre-filtering of the observed data [47]. In the Bayesian formu-lation of PID controllers that we introduced, we have a directmeasure of (the best estimate of) the measurement noise: theexpected precision or inverse covariance µ π ˜ z of the randomvariable z and its higher orders of motion. Measurement noiseis thus related to the same set of hyperparameters used toexplain load disturbances rejection which, on the other hand,can be seen as low frequency noise. This shows another trade-off between design criteria, in this case related to the highfrequency properties of measurement noise and the (usually)low frequency of external disturbances.The previously identified maximisation of expected preci-sion µ π z (integral gain k i ) implies an increased cutoff fre-quency of the low-pass filter implemented by linear generativemodels of the kind we introduced to approximate PID control[56]. This suggests that, while low-frequency disturbances canbe suppressed more quickly (even if at the cost of possiblyovershooting), this comes at the expenses of a “hypersensitiv-ity” to high-frequency noise, i.e., not rejecting as much noiseas otherwise possible with slower load disturbance responses(as shown in a simple model, for instance, in [56]). In ourframework, this can easily be noticed by looking at the roleplayed by expected precision µ π ˜ z in the time domain, encodingexpected variability in observed data without a clear distinctionbetween rare perturbations and persistent noise.At the same time, however, the active inference formulationcan be used to treat this problem in a more principled way, introducing informative priors on expected precisions µ π ˜ z ,i.e., hyperpriors η π ˜ z (or more complicated functions h ( η π ˜ z ) ),see [24] for a formal treatment. The variational free energyfunctional then includes another set of prediction errors, (cf.(1)), F ≈ (cid:20) µ π ˜ z (cid:16) ˜ y − g (˜ µ x , ˜ µ v ) (cid:17) + µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) + µ p ˜ z (cid:16) µ π ˜ z − h ( η π ˜ z ) (cid:17) (cid:21) (5)with µ p ˜ z (cid:16) µ π ˜ z − h ( η π ˜ z ) (cid:17) playing the role of L2 (or Tikhonov)regularisation terms for the ensuing recognition dynamicsderived as a gradient descent on (5). Using these predictionerrors one can effectively encode, for instance, constraints thatreject strong high frequency noise by specifically targetingfrequent large instantaneous fluctuations of expected precision µ π ˜ z , penalising them with a Gaussian hyperprior (an L2regularisation term that affects “outliers”) centred at η π ˜ z .While such hyperprior would certainly then also influence theresponse to sparse step changes, expected precisions µ π ˜ z couldbe updated by slowly shifting hyperpriors η π ˜ z to reflect biasesin measurements ˜ y that persist over a long period of time.Importantly, while the cost functional presents in this casesome new terms, the underlying minimisation scheme remainsthe same: the recognition dynamics will simply include extraregularisation terms while still following a gradient descent onthe (augmented) variational free energy.In the same way expected precisions µ π ˜ z regulate the re-sponse to changes in the observations due to load disturbances,expected precisions on higher order stochastic properties (e.g.,expected precisions on expected precisions, µ p ˜ z ) can then beseen as regulating how a controller adapts to varying levelsof measurement noise covariance given some (informative)priors h ( η π ˜ z ) . For example, in cases where the varianceof measurement noise changes over time, e.g., due to thenatural degradation of sensors, our formulation can includemechanisms that take into account existing prior knowledgeand that can be used by a controller to dynamically adaptto new levels of noise. More in general, in the presence of stochastic volatility (i.e., models where the covariances ofdifferent random variables are themselves random variables[41]), one can easily encode prior knowledge of higher orderproperties of random variables by including extra hierarchicallayers on the generative model we introduced for PID control. D. Robustness to model uncertainty
PID controllers are usually designed to withstand some levelof model uncertainty, inherent in any system we observe,interact with and try to regulate. In control theory, thisproblem affects compensators attempting to regulate a systemwhile having access only to a limited amount of informationregarding the dynamics of the system itself. PID controllers areespecially popular as a “model free” strategy, or rather, for the To maintain a notation similar to the one used in [24], [29]. mall number of tunable parameters that are necessary to af-ford robust, although often suboptimal, control [46]. In controlproblems, this robustness is sometimes captured by sensitivityfunctions [47], [49], providing a proxy for, among other things,the sensitivity of a feedback system to variations in modelsof process dynamics. In our derivation of PID control as aprocess of Bayesian (active) inference, the uncertainty of thedynamics is represented by the expected precisions of systemdynamics, µ π ˜ w , in the linear generative model defined in [29].For instance, low expected precisions µ π ˜ w , expressing highuncertainty/covariance, encode the (Bayesian) belief that largefluctuations in the dynamics can be expected, while high ex-pected precisions express the fact that dynamics should showonly small fluctuations. Moreover, using our formulation wecan describe the behaviour of a PID controller such that undercontrollability assumptions [4], [8], it effectively “imposes”its own (linear) dynamics/priors on a system through largerweighted prediction errors µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) , by forcingit into an attractor encoded in the set-trajectory representedby the controller’s priors. The state of affairs of the world isonly partially relevant to a PID controller, since as long asconditions of reachability and controllability [4] are met, all itdoes is try to drive a (controllable) system towards the desiredequilibrium encoded by its priors on a set-trajectory.As in the case of measurement noise, our formulation allowsfor the construction of an extra layer of hyperpriors to handlemodel uncertainty: in the active inference formulation we canin fact include priors on expected precisions µ π ˜ w to representexisting information on the expected/desired dynamics of asystem to regulate F ≈ (cid:20) µ π ˜ z (cid:16) ˜ y − g (˜ µ x , ˜ µ v ) (cid:17) + µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) + µ p ˜ w (cid:16) µ π ˜ w − k ( η π ˜ w ) (cid:17) (cid:21) (6)For instance, it is not hard to imagine that, following standardhierarchical or empirical Bayes methods in statistical inference[41], information on existing control problems could be usedto define classes of systems whose shared statistical propertiesform generic priors η π ˜ w . These priors could then be usedto initialise our model in a suitable part of the state spaceto ensure a desired level of robustness (and if a similarapproach were to be adopted for η π ˜ z , to guarantee somedesired performances). In such settings, expected precisions µ π ˜ w can still be optimised via a simple gradient descent, nowL2-regularised with newly introduced priors entering the vari-ational free energy equation in the form of weighted predictionerrors, µ p ˜ w (cid:16) µ π ˜ w − k ( η π ˜ w ) (cid:17) . This approach, especially whenemploying empirical Bayes, is similar in spirit to the cleverinitialisation achieved in deep learning approaches via “pre-training”, where introducing an unsupervised learning phasebefore supervised training showed substantial improvementsin the performance and generalisation properties of neuralnetworks [57], [58]. V. D ISCUSSION
The duality of estimation and control has long been recog-nised and exploited in problems of regulation under constraintsof partial observability, i.e., stochastic control [3], [4], [10],[16], [18], [35]. This property relies on the mathematicalequivalence of some classes of estimation and regulation prob-lems, formulated as Bayesian inference and optimal controlrespectively. The applications of this duality have led to aseries of significant new results in different areas, such asreinforcement learning [20], robotics [59] and neuroscience[14] where methods of approximate
Bayesian inference arenow often employed to improve existing solutions. In this workwe built on some of these previous ideas, discussing possibleapplications of Bayesian inference theories and related approx-imations to methods from classical control. In particular, wefocused on PID control and on our previous implementationof this method in terms of Bayesian active inference [29],proposing this as a general unifying framework for the designof PID controllers still largely missing to date [28], [31].In [29] we recently introduced a gradient-based procedurefor gain tuning, using an interpretation of these parametersas stochastic properties (i.e., expected precisions, or inversevariances) of the system to regulate. Here we expanded onthis formulation by providing direct links to Kalman’s duality[8], [18], [35] and Bayesian estimation in the presence of biasterms, i.e., unknown step inputs [33].We then discussed standard problems such as the necessityof two degrees of freedom in order to afford independentresponses to load and set-point changes [40]. Using the proba-bilistic interpretation given in [29], we then drew a comparisonbetween a pragmatic introduction of two degrees of freedom[40], represented by feedback and feedforward sub-loops instandard 2DOF PID control, and the more principled formu-lation of active inference, aligned with update and predictionequations of filtering algorithms (e.g., Kalman-filters [9]), andthe use of prior and likelihood densities in recursive Bayesianupdate schemes [45].Crucially, we then proposed to frame one of the majoropen challenges for methods like PID control, the generalperformance-robustness trade-off due to the presence of con-flicting design criteria [28], [31], in terms of variational freeenergy minimisation [17], [23], [29], [38], [60] F ≈ (cid:20) µ π ˜ z (cid:16) ˜ y − g (˜ µ x , ˜ µ v ) (cid:17) + µ π ˜ w (cid:16) ˜ µ ′ x − f (˜ µ x , ˜ µ v ) (cid:17) + µ p ˜ z (cid:16) µ π ˜ z − h ( η π ˜ z ) (cid:17) + µ p ˜ w (cid:16) µ π ˜ w − k ( η π ˜ w ) (cid:17) (cid:21) (7)In this formulation, simple constraints (load disturbance re-sponse and set-point change response) can easily be mappedto first order weighting parameters on the mean estimates ofthe state of the system to regulate. More complex ones, onthe other hand, (measurement noise response and robustnessto model uncertainty) can be introduced in terms of stochasticvolatility [41], i.e., by treating second moments (expected The following equation combines (5) and (6). recisions, or hyperparameters) as random variable havingappropriate hyperpriors encoded in the generative model. Thismapping provides an immediate understanding of differentdesired statistical properties of the system to govern (seetable I), now summarised in Table I.TABLE I: Active inference as a general framework for PIDcontrollers design (adapted from [29] and here extended).
Criterion Mapped to Interpretation in Active Inference
Loaddisturbanceresponse µ π ˜ z Expected inverse covariance of the observa-tions (i.e., precision), with low covarianceimplying a fast response, and vice versaSet-pointchangeresponse µ π ˜ w Two degrees of freedom derived from thepresence of two sets of prediction errors,sensory and dynamics, mapping to likeli-hood and priors of a Bayesian inferenceprocessMeasurementnoiseresponse (priors on) µ π ˜ z Direct mapping of measurement noise toinverse covariance of the observations (i.e.,precision), with hyperpriors (priors on ex-pected precisions) introduced to differen-tiate high frequency noise from low fre-quency disturbancesRobustnessto modeluncertainty (priors on) µ π ˜ w Direct mapping of model uncertainty toexpected covariances of system fluctuations,representing the hidden dynamics of thesystem to control, with hyperpriors that candescribe initial knowledge of, for example,a class of similar regulation problems to fa-cilitate the optimisation of states/parameters(similar to the role of unsupervised “pre-training” in deep learning [58])
VI. C
ONCLUSION AND FUTURE WORK
In an influential paper, ˚Astr¨om and H¨agglund asked whetherPID control can play a role in the future of control theoryand engineering [28]. Despite being the most used controllerin industry, the emergence of more specialised and betterperforming methods over the years, such as model predictivecontrol, cast doubts on its long term applications and uses.˚Astr¨om and H¨agglund however argued that due to its combinedeffectiveness and simplicity, PID is likely to remain relevantfor the foreseeable future, perhaps in conjunction with othermethods. At the same time, they highlighted a series ofexisting problems and open challenges faced by PID, includinga relatively limited number of theoretical results in areas suchas gain tuning and general (PID) controller design. In this workwe built on our previous formulation of PID control in terms ofactive inference, a modern theory combining stochastic controland probabilistic Bayesian inference under the umbrella ofvariational free energy minimisation, to propose new appli-cations of Bayesian methods to PID controllers in order toestablish a more general design framework. After introducinga new practical implementation of optimal gain tuning in [29],here we extended our proposal highlighting the connectionsbetween different design principles for PID, from the impor-tance of multiple degrees of freedom to optimal tuning withconflicting performance-robustness criteria. This framework gives an interpretation of a series of different constraints asfirst and second order properties of a generative model thatgenerates a PID controller as a gradient-based minimisation ofa single cost functional, variational free energy. In the future,we will focus on simulations testing the current proposal usingstandard control benchmarks and following a vast literatureon Bayesian models (see [24], [41] and references therein).We will then also draw more direct connections to modernmachine and reinforcement learning, combining the presentwork with methods from [61], where preliminary results basedon these and other ideas are utilised in the field of deepreinforcement learning with large neural networks performingamortised inference. R
EFERENCES[1] R. E. Bellman, “A markovian decision process,”
Journal of Mathematicsand Mechanics , pp. 679–684, 1957.[2] K. J. ˚Astr¨om,
Introduction to stochastic control theory . Academic Press,1970.[3] B. Anderson and J. B. Moore,
Optimal control: linear quadraticmethods . Prentice-Hall, Inc., 1990.[4] R. F. Stengel,
Optimal control and estimation . Courier Corporation,1994.[5] E. Todorov, “Optimal control theory,”
Bayesian brain: probabilisticapproaches to neural coding , pp. 269–298, 2006.[6] H. J. Kappen,
Optimal control theory and the linear Bellman equation .Cambridge University Press, 2011, p. 363387.[7] R. E. Bellman,
Dynamic Programming . Courier Dover Publications,1957.[8] R. E. Kalman, “Contributions to the theory of optimal control,”
Bol.Soc. Mat. Mexicana , vol. 5, no. 2, pp. 102–119, 1960.[9] ——, “A new approach to linear filtering and prediction problems,”
Journal of basic Engineering , vol. 82, no. 1, pp. 35–45, 1960.[10] W. M. Wonham, “On the separation theorem of stochastic control,”
SIAMJournal on Control , vol. 6, no. 2, pp. 312–326, 1968.[11] H. A. Simon, “Dynamic programming under uncertainty with a quadraticcriterion function,”
Econometrica, Journal of the Econometric Society ,pp. 74–81, 1956.[12] H. Theil, “A note on certainty equivalence in dynamic planning,”
Econometrica: Journal of the Econometric Society , pp. 346–349, 1957.[13] F. L. Lewis, D. M. Dawson, and C. T. Abdallah,
Robot manipulatorcontrol: theory and practice . CRC Press, 2003.[14] E. Todorov and M. I. Jordan, “Optimal feedback control as a theory ofmotor coordination,”
Nature neuroscience , vol. 5, no. 11, p. 1226, 2002.[15] E. Todorov, “Stochastic optimal control and estimation methods adaptedto the noise characteristics of the sensorimotor system,”
Neural compu-tation , vol. 17, no. 5, pp. 1084–1108, 2005.[16] S. K. Mitter and N. J. Newton, “A variational approach to nonlinearestimation,”
SIAM journal on control and optimization , vol. 42, no. 5,pp. 1813–1833, 2003.[17] H. J. Kappen, “Linear theory for control of nonlinear stochastic sys-tems,”
Physical review letters , vol. 95, no. 20, p. 200201, 2005.[18] E. Todorov, “General duality between optimal control and estimation,”in
Decision and Control, 2008. CDC 2008. 47th IEEE Conference on .IEEE, 2008, pp. 4286–4292.[19] H. Attias, “Planning by probabilistic inference.” in
AISTATS . Citeseer,2003.[20] E. Todorov, “Efficient computation of optimal actions,”
Proceedings ofthe national academy of sciences , vol. 106, no. 28, pp. 11 478–11 483,2009.[21] C. L. Buckley, C. S. Kim, S. McGregor, and A. K. Seth, “The free energyprinciple for action and perception: A mathematical review,”
Journal ofMathematical Psychology , vol. 14, pp. 55–79, 2017.[22] L. Da Costa, T. Parr, N. Sajid, S. Veselic, V. Neacsu, and K. J. Friston,“Active inference on discrete state-spaces: a synthesis,” arXiv preprintarXiv:2001.07203 , 2020.[23] K. J. Friston, N. Trujillo-Barreto, and J. Daunizeau, “DEM: A variationaltreatment of dynamic systems,”
NeuroImage , vol. 41, no. 3, pp. 849–885,2008.24] K. J. Friston, “Hierarchical models in the brain,”
PLoS ComputationalBiology , vol. 4, no. 11, 2008.[25] ——, “The free-energy principle: a unified brain theory?”
Naturereviews. Neuroscience , vol. 11, no. 2, pp. 127–138, 2010.[26] K. J. Friston, T. FitzGerald, F. Rigoli, P. Schwartenbeck, and G. Pezzulo,“Active inference: a process theory,”
Neural Computation , vol. 29, no. 1,pp. 1–49, 2017.[27] M. Baltieri, “Active inference: building a new bridge between controltheory and embodied cognitive science,” Ph.D. dissertation, Universityof Sussex, 2019.[28] K. J. ˚Astr¨om and T. H¨agglund, “The future of PID control,”
Controlengineering practice , vol. 9, no. 11, pp. 1163–1175, 2001.[29] M. Baltieri and C. L. Buckley, “PID control as a process of activeinference with linear generative models,”
Entropy , vol. 21, no. 3, p.257, 2019.[30] K. J. ˚Astr¨om and R. M. Murray,
Feedback systems: an introduction forscientists and engineers . Princeton university press, 2010.[31] K. H. Ang, G. Chong, and Y. Li, “PID control system analysis, design,and technology,”
IEEE transactions on control systems technology ,vol. 13, no. 4, pp. 559–576, 2005.[32] C. D. Johnson, “Optimal control of the linear regulator with constantdisturbances,”
IEEE Transactions on Automatic Control , vol. 13, no. 4,pp. 416–421, 1968.[33] ——, “On observers for systems with unknown and inaccessible inputs,”
International journal of control , vol. 21, no. 5, pp. 825–831, 1975.[34] ——, “Further study of the linear regulator with disturbances – The caseof vector disturbances satisfying a linear differential equation,”
IEEETransactions on Automatic Control , vol. 15, no. 2, pp. 222–228, 1970.[35] R. E. Kalman, “On the general theory of control systems,” in
Pro-ceedings First International Conference on Automatic Control, Moscow,USSR , 1960.[36] Y. Bar-Shalom and E. Tse, “Dual effect, certainty equivalence, and sep-aration in stochastic control,”
IEEE Transactions on Automatic Control ,vol. 19, no. 5, pp. 494–500, 1974.[37] M. Baltieri and C. L. Buckley, “On Kalman-Bucy filters, linear quadraticcontrol and active inference,” arXiv preprint arXiv:2005.06269 , 2020.[38] K. J. Friston, K. Stephan, B. Li, and J. Daunizeau, “Generalisedfiltering,”
Mathematical Problems in Engineering , vol. 2010, 2010.[39] K. J. ˚Astr¨om,
PID controllers: theory, design and tuning . ResearchTriangle Park, 1995.[40] M. Araki and H. Taguchi, “Two-degree-of-freedom PID controllers,”
International Journal of Control, Automation, and Systems , vol. 1, no. 4,pp. 401–411, 2003.[41] C. Robert,
The Bayesian choice: from decision-theoretic foundations tocomputational implementation . Springer Science & Business Media,2007.[42] D. J. MacKay,
Information theory, inference and learning algorithms .Cambridge university press, 2003.[43] M. Opper and C. Archambeau, “The variational Gaussian approximationrevisited,”
Neural computation , vol. 21, no. 3, pp. 786–792, 2009.[44] C. S. Kim, “Recognition dynamics in the brain under the free-energyprinciple,”
Neural Computation , vol. 30, no. 10, pp. 2616–2659, 2018.[45] A. H. Jazwinski,
Stochastic Processes and Filtering Theory . AcademicPress, 1970, vol. 64.[46] D. E. Rivera, M. Morari, and S. Skogestad, “Internal model control: PIDcontroller design,”
Industrial & engineering chemistry process designand development , vol. 25, no. 1, pp. 252–265, 1986.[47] K. J. ˚Astr¨om and T. H¨agglund,
Advanced PID control . ISA-TheInstrumentation, Systems, and Automation Society Research TrianglePark, NC, 2006.[48] O. Garpinger, T. H¨agglund, and K. J. ˚Astr¨om, “Performance and ro-bustness trade-offs in PID control,”
Journal of Process Control , vol. 24,no. 5, pp. 568–577, 2014.[49] K. J. ˚Astr¨om, H. Panagopoulos, and T. H¨agglund, “Design of PIcontrollers based on non-convex optimization,”
Automatica , vol. 34,no. 5, pp. 585–601, 1998.[50] M. Zhuang and D. Atherton, “Automatic tuning of optimum PIDcontrollers,” in
IEE Proceedings D (Control Theory and Applications) ,vol. 140, no. 3. IET, 1993, pp. 216–224.[51] M. Grimble and M. Johnson, “Algorithm for PID controller tuning usingLQG cost minimization,” in
Proceedings of the 1999 American ControlConference , vol. 6. IEEE, 1999, pp. 4368–4372. [52] R. T. O’Brien and J. M. Howe, “Optimal PID controller design usingstandard optimal control techniques,” in
Proceedings of the 2008 Amer-ican Control Conference . IEEE, 2008, pp. 4733–4738.[53] B. A. Francis and W. M. Wonham, “The internal model principle ofcontrol theory,”
Automatica , vol. 12, no. 5, pp. 457–465, 1976.[54] E. D. Sontag, “Adaptation and regulation with signal detection impliesinternal model,”
Systems & control letters , vol. 50, no. 2, pp. 119–126,2003.[55] B. Friedland, “Treatment of bias in recursive filtering,”
IEEE Transac-tions on Automatic Control , vol. 14, no. 4, pp. 359–367, 1969.[56] B. W. Andrews, T.-M. Yi, and P. A. Iglesias, “Optimal noise filteringin the chemotactic response of Escherichia coli,”
PLoS Comput Biol ,vol. 2, no. 11, p. e154, 2006.[57] I. Harvey and J. V. Stone, “Unicycling helps your French: Spontaneousrecovery of associations by learning unrelated tasks,”
Neural Computa-tion , vol. 8, no. 4, pp. 697–704, 1996.[58] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature , vol. 521,no. 7553, pp. 436–444, 2015.[59] E. A. Theodorou, J. Buchli, and S. Schaal, “A generalized path inte-gral control approach to reinforcement learning,”
Journal of machinelearning research , vol. 11, no. Nov, pp. 3137–3181, 2010.[60] E. A. Theodorou and E. Todorov, “Relative entropy and free energydualities: Connections to path integral and kl control,” in . IEEE, 2012,pp. 1466–1473.[61] A. Tschantz, M. Baltieri, A. Seth, C. L. Buckley et al. , “Scaling activeinference,” arXiv preprint arXiv:1911.10601arXiv preprint arXiv:1911.10601