On the relationship between a Gamma distributed precision parameter and the associated standard deviation in the context of Bayesian parameter inference
OOn the relationship between a Gamma distributed precisionparameter and the associated standard deviation in the context ofBayesian parameter inference
Manuel M. Eichenlaub ∗ School of Engineering, University of Warwick, United Kingdom
Abstract
In Bayesian inference, an unknown measurement uncertainty is often quantified in terms ofa Gamma distributed precision parameter, which is impractical when prior information on thestandard deviation of the measurement uncertainty shall be utilised during inference. This paperthus introduces a method for transforming between a gamma distributed precision parameterand the distribution of the associated standard deviation. The proposed method is based onnumerical optimisation and shows adequate results for a wide range of scenarios.
Keywords: Gamma distribution, measurement uncertainty, Bayesian inference
In the context of Bayesian parameter inference, it is common to model the error associated withthe observed data as follows [1]: y ( t ) = g ( · ) + ε with ε ∼ N (0 , p − ) , (1)where y ( t ) is the observed data, g ( · ) the observation function and ε the Gaussian distributed mea-surement error with zero mean and precision p . This precision is commonly described as beingGamma distributed with shape and rate parameters a and b , respectively p ∼ G a ( a, b ) for p, a, b > (2)The values of a and b define the probability density function (PDF) over p and are updated duringparameter inference from a prior distribution of p , defined by a and b . The use of a Gammadistribution over p is justified by the fact that like the precision, the Gamma distribution is de-fined over positive values only. It furthermore forms a conjugate prior to the Gaussian distributedlikelihood, therefore leading to analytically tractable posterior distributions and update rules [1, 2].An example of this approach can be found in a variational Bayesian method for the identification ∗ email: [email protected] a r X i v : . [ s t a t . M E ] J a n tochastic nonlinear models [1, 3].The prior for p is often chosen to be weak and uninformative [4]. However, in a number of practicalapplications, the collection of data is a known process and information on the measurement error ε can be found in the literature. Here, the measurement error is commonly quantified in the formof the standard deviation s . An example for this would be the coefficient of variation of a certainimmunoassay, e.g. of insulin. This information on s can therefore be used to specify the prior PDFover p . Additionally, it is useful to allow the interpretation of the posterior distribution over p interms of s . This paper therefore introduces a method for the forwards and backwards transformationbetween the PDFs over p and s . p to s The Gamma distribution over p is defined by the following PDF of shape and rate parameters a and b , respectively [5], f p ( p | a, b ) = b a Γ( a ) p a − exp( − pb ) for p, a, b > , (3)where Γ( · ) is the Gamma function. The mean and variance of this Gamma PDF are given by [5] E [ p ] f p = ab and Var [ p ] f p = ab . (4)The standard deviation s of the measurement error ε and its precision p are related as follows: s = 1 √ p (5)In order to determine the PDF of s in terms of a and b the following theorem is used. If f x is aPDF over the random variable x and the mapping y = h ( x ) is introduced, then the PDF over therandom variable y is given by [5]: f y ( y ) = f x ( h − ( y )) (cid:12)(cid:12)(cid:12)(cid:12) d h − ( y ) d y (cid:12)(cid:12)(cid:12)(cid:12) (6)Defining s = h ( p ) = 1 / √ p from expression (5) and therefore h − ( s ) = 1 /s , expression (6) can beused to determine the PDF f s over s as follows: f s ( s | a, b ) = f p ( 1 s | a, b ) (cid:12)(cid:12)(cid:12)(cid:12) dd s s (cid:12)(cid:12)(cid:12)(cid:12) = b a Γ( a ) (cid:18) s (cid:19) a − exp (cid:18) − bs (cid:19) s = 2 b a Γ( a ) s − a − exp (cid:18) − bs (cid:19) (7)2sing symbolic computation, this new probability distribution can be characterized by the followingexpression for the mean µ s and the standard deviation σ s , valid for a > : µ s = E [ s ] f s = √ b Γ( a − )Γ( a ) ,σ s = Var [ s ] f s = b (cid:34) a − − Γ( a − ) Γ( a ) (cid:35) . (8)To facilitate the numerical calculation, the logarithm of the Gamma function log Γ( · ) is used insteadof the fast growing Gamma function itself. This modifies expressions (8) to give µ s = √ b exp (cid:20) log Γ( a −
12 ) − log Γ( a ) (cid:21) σ s = b (cid:20) a − − exp (cid:20) log Γ( a −
12 ) − log Γ( a ) (cid:21)(cid:21) (9)An example of the PDFs over p and s is provided in Figure 1, where it is demonstrated that themean of f p does not simply transform into the mean of f s by applying the mapping s = 1 / √ p .Figure 1: Examples of the two PDFs of f p and f s for a = b = 2 . The dashed vertical lines displaythe values of the respective means. s to p Expressions (9) allows the interpretation of the posterior PDF of p , specified by a and b usingthe corresponding distribution over s . To specify the prior distribution over p based on a chosenprior PDF over s , which in turn can be based on existing information, the following procedure isintroduced. Defining µ and σ characterising the the prior PDF over s , the goal is to calculate theassociated values for a and b , characterising the prior PDF over p . First, the following substitutionis defined: S ( a ) = Γ( a − ) Γ( a ) = exp (cid:20) log Γ( a −
12 ) − log Γ( a ) (cid:21) . (10)3his is followed by the combination and reformulation of the expressions (9) into D ( a ) = µ S ( a ) − σ a − − S ( a ) = 0 . (11)This eliminates b and makes it possible to find a by solving the equation D ( a ) = 0 and subse-quently calculating b using: b = µ S ( a ) . (12)To find a , expression (11) is reformulated into a constrained numerical minimisation task: a = arg min a (cid:0) log (cid:2) D ( a ) + 1 (cid:3)(cid:1) for a > . (13)The square operation and addition of one within the logarithm ensures that the objective function isalways positive except for log[ D ( a ) +1] , where the expression is zero. The logarithm facilitates thenumerical calculations as the values of only D ( a ) would grow rapidly as a increases. An exampleof the objective function for different values of µ and σ is given in Figure 2, demonstrating a clearminimum of the objective function at a .Figure 2: Examples of the objective function log[ D ( a ) + 1] for differing values of of µ and σ .The minima correspond to the values of a , where D ( a ) = 0 . The dashed lines give the respectivevalues of the upper bound estimation of ˆ a .This minimisation can be numerically accomplished using an appropriate constrained implementa-tion of an optimisation technique, e.g. fminbnd in MATLAB or minimize_scalar in Scipy. Sincethe expression (9) is only valid for a > the lower bound is set to 1. To specify the upper boundbased on the given values of µ and σ , the function S ( a ) from expression (10) is approximatedwith the first two terms of its series expansion for a → ∞ , giving: ˆ S ( a ) = 1 a + 34 a . (14)4ith this approximation, the minimum ˆ a of D ( a ) can be found analytically ˆ a = 18 (cid:115)
49 + µ σ + 50 µ σ + µ σ , (15)which is used as the upper bound for the constrained minimisation (see Figure 2). Expressions (14)- (15) were found using symbolic calculation. This procedure now allows the calculation of a andsubsequently b with expression (12) based on µ and σ , thereby specifying the prior distributionover p . All symbolic calculations were done using Mathematica and the respective code is availableonline (https://github.com/manueich/Noise_Gamma). In order to assess the accuracy of the numerical calculations when transforming from s to p , themethod was implemented in MATLAB 2020a with the function fminbnd and Python with the func-tion minimize_scalar using identical optimisation settings. The respective code is available online(https://github.com/manueich/Noise_Gamma).As a test, a wide range of possible values for µ and σ are transformed into values for a and b and subsequently back-transformed into µ and σ using expressions (9). These results can then becompared to the starting values of µ and σ . A test is considered as passed if the original values for µ and σ can be recovered with a relative error smaller than 1 % on both parameters. For µ , a totalof 1000 values logarithmically scaled on a range between − and are chosen. Values outsidethis range should not be encountered in practice. Subsequently, for each value of µ , a total of 1000values for σ are logarithmically scaled on a range between − · µ and · µ are tested. The results of the validation procedure for both MATLAB and Python are displayed in Figure 3and are very similar. Based on this the following cut-off values are proposed, which should ensurea robust calculation of a and b based on µ and σ .• · − < µ < • · − < σµ < These ranges should cover a large number of possible values that could be encountered in practice.5igure 3: Results of the validation procedure for (a) MATLAB and (b) Python. Red indicates afailed and green indicates a passed test. The solid black lines give the suggested cut-off values.
References [1] J. Daunizeau, K. J. Friston, and S. J. Kiebel. “Variational Bayesian identification and predictionof stochastic nonlinear dynamic causal models”. In:
Physica D: Nonlinear Phenomena
Pattern Recognition and Machine Learning (Information Science andStatistics) . Springer-Verlag, 2006.[3] Jean Daunizeau, Vincent Adam, and Lionel Rigoux. “VBA: A Probabilistic Treatment of Non-linear Models for Neurobiological and Behavioural Data”. In:
PLoS Comput Biol
BayesianAnalysis