Extended Dynamic Generalized Linear Models: the two-parameter exponential family
EExtended Dynamic Generalized Linear Models:the two-parameter exponential family
MARIANA A. O. SOUZA ∗ Universidade Federal FluminenseHELIO S. MIGON † Universidade Federal Rio de JaneiroSeptember 26, 2018
Abstract
We develop a Bayesian framework for estimation and prediction of dynamic models for observationsfrom the two-parameter exponential family. Different link functions are introduced to model both themean and the precision in the exponential family allowing the introduction of covariates and time seriescomponents. We explore conjugacy and analytical approximations under the class of partial specifiedmodels to keep the computation fast. The algorithm of West et al. (1985) is extended to cope with the two-parameter exponential family models. The methodological novelties are illustrated with two applicationsto real data. The first, considers unemployment rates in Brazil and the second some macroeconomicvariables for the United Kingdom.
Generalized linear models (GLMs) have become a standard class of models in the data analyst’stoolbox. Proposed by Nelder and Wedderburn (1972), GLMs are widely used in many areas of knowledge.They allow modelling data of many different natures via the probabilistic description as an element ofthe exponential family and relating the response mean and the linear predictor in a non-linear form. The ∗ Address for correspondence : Mariana Albi O. Souza, Departamento de Estat´ıstica, Instituto de Matem´atica eEstat´ıstica, Universidade Federal Fluminense, Rua Mario Santos Braga s/n, 7o. andar, Centro, Niter´oi, RJ, Brazil. CEP24020-140.
E-mail : [email protected]ff.br. † E-mail : [email protected]. a r X i v : . [ s t a t . O T ] A ug LM class is a useful alternative for data analysis since it accommodates skewness and heteroskedasticity,besides allowing analysis using the data in their original scale. The evolution of these models as well asdetails regarding inference, fitting, model checking, etc., is documented in the seminal book of McCullaghand Nelder (1989) and many others in the recent literature.The main criticism of the use of the one-parameter exponential family in certain applications is thatsamples are often found to be too heterogeneous to be explained by a one-parameter family of modelsin the sense that the implicit mean-variance relationship in such a family is not supported by the data.To overcome this limitation Gelfand and Dalal (1990) and Dey et al. (1997) introduced the class of two-parameter exponential family, which includes the ones presented by Efron (1986) and Lindsay (1986) asspecial cases. They argue that the introduction of a second parameter allows taking into account theover-dispersion usually present in the data, an issue that has been recognized by data analysts for manyyears.During the 1990s, special attention was devoted to modelling the mean and the variancesimultaneously. Taguchi type methods led to some efforts to jointly model the mean and the dispersionfrom designed experiments, avoiding the data transformation that is usually necessary to satisfy theassumptions of traditional linear models Nelder and Lee (2001). The process of quality improvementaims to minimize the product variation caused by different types of noise. Quality improvement mustbe implemented in the design stage via experiments to assess the sensitivity of different control factorsthat affect the variability and mean of the process. Nelder and Lee (2001) discussed how the main ideasof a GLM can be extended to analyse Taguchi’s experiments. From a static point of view, the Bayesianinference for this class of models is fully discussed in the papers previously cited, while some alternativeaspects of MCMC are discussed in Cepeda and Gamerman (2005) and Cepeda et al. (2011).Our aim in this article is to extend the class of models introduced by Gelfand and Dalal (1990)and Dey et al. (1997) to deal with time series data and to propose a fast algorithm for estimation andprediction of this class of models. To reach this objective we propose an algorithm based on analyticalapproximations, for example, based on Laplace approximations. This way we are extending the conjugateupdating method proposed in West et al. (1985).The remainder of the manuscript is organized as follows. Section 2 introduces the class of modelswe are focused on. In Section 3 the conjugate updating of West et al. (1985) is extended to the two-parameter exponential family. Section 4 illustrates the proposed method with two case studies: the firstone models unemployment rates in Brazil and the second one models some data on the UK economy asbeta distributed data. Section 5 concludes with a discussion and possible future research directions.2
Extended Dynamic Generalized Linear Models
In this section we introduced the class of extended dynamic generalized linear models (EDGLM).First we briefly revise the two-parameter exponential family and the dynamic generalized linear models,mainly aiming to fix the notation to be used in this paper. A special parametrization of the two-parameter exponential family is presented in this section. It is very useful to deal with data analysiswhen heterogeneity in the sample is greater than that explained by the variance function in the one-parameter exponential family. The distributions in this family are often used in many applications in thecurrent literature, not only to deal with the topic of extra variability.The two-parameter exponential family has the form p ( y | θ, φ ) = a ( y ) exp { φ [ θd ( y ) + d ( y )] − ρ ( θ, φ ) } , y ∈ Υ ⊂ IR (1)where a ( · ) is a non-negative function, d ( · ) and d ( · ) are known real functions, ( θ, φ ) ∈ Θ × Φ ⊆ IR × IR + and exp {− ρ ( θ, φ ) } = (cid:90) a ( y ) exp { φ [ θd ( y ) + d ( y )] } dy < ∞ . This is a suitable reparameterization of thegeneral two-parameter exponential family as defined in Bernardo and Smith (1994).This class includes many continuous distributions, such as the normal with unknown mean andvariance, the inverse Gaussian and the beta distributions, parameterized by its mean and precisionfactor. The expression for the variances, as we will see in section 3.3, make clear the relevance of theprecision parameter, φ , to control the model variance. Large values of φ corresponds to more precise dataor equivalently with smaller variance. Some discrete distributions are also included in this class, such asthe binomial (with the sample size known) and Poisson distributions, taking the scale parameter as fixedand equal to one.Among other interesting features of this class of distributions, we stress the existence of a joint priordistribution for the parameters ( θ, φ ) in the form p ( θ, φ | τ ) = κ ( τ ) exp { φ [ θτ + τ ] − τ ρ ( θ, φ ) } , where τ = ( τ , τ , τ ) (cid:48) and κ ( τ ) − = (cid:90) (cid:90) exp { φ [ θτ + τ ] − τ ρ ( θ, φ ) } dθdφ . Let ψ = ( θ, φ ) ∈ Ψ = Θ × Φ ,to make the notation easier. Its prior mode and observed curvature matrix can be straightforwardlyobtained differentiating the expression above with respect to the parameters vector ψ . More specifically,the mode and curvature matrix satisfy the equations˜ ψ = arg max ψ ∂∂ ψ log( p ( ψ | τ )) and J ( ψ ) = − ∂ ∂ ψ (cid:48) ∂ ψ log( p ( ψ | τ )) . Then it follows, after some algebra, that φτ − τ ∂∂θ ρ ( ψ ) θτ + τ − τ ∂∂φ ρ ( ψ ) = and J ( ψ ) = − τ t ∂ ∂θ ρ ( ψ ) τ − τ ∂ ∂θ∂φ ρ ( ψ ) τ − τ ∂ ∂θ∂φ ρ ( ψ ) − τ ∂ ∂φ ρ ( ψ ) . p ( y | τ ) = a ( y ) κ ( τ ) κ ( τ ∗ ) , y ∈ Υ , where τ ∗ = ( τ + 1 , τ + d ( y ) , τ + d ( y )) (cid:48) . (2)Now that the basic notation is clearly stated, we can progress to the dynamic version of the extendedgeneralized linear model. Let y , · · · , y T be conditionally independent observations from the two-parameter exponential family and for each t ∈ { , ..., T } , denote E [ y t | ψ t ] = µ t . Let us suppose that boththe mean µ t and the precision φ t can be described by explanatory variables through possibly differentnon-linear link functions, denoted by g and g .Therefore, given the prior moments of the latent states β t , the class of models to be consideredin this paper is described by three components. The first is a conditional conjugate model describingobservations in the two-parameter exponential family with its prior distribution: y t | ψ t ∼ Ef ( y t | ψ t ) and ψ t | D t − ∼ CEf ( τ t ) , ∀ t = 1 , · · · , T, (3)where Ef ( y t | ψ t ) denotes a distribution in the two-parameter exponential family (1), CEf ( τ t ) representsits conjugate prior distribution and D t − denotes all the information available up to time t − ψ : η t = g ( ψ t ) = F (cid:48) t β t and β t = G t β t − + ω t , ω t ∼ [ , W t ] , (4)with g : IR × IR + → IR , g ( ψ t ) = ( g ( µ t ) , g ( φ t )), F t is a p × p = p + p , with p i = dim β ti and β ti = ( β ti , · · · , β tip i ) (cid:48) , i = 1 ,
2, the latent variables vector related to µ t and φ t . Depending on thespecification of F t a broad class of models can be entertained. If F t = diag( F t , F t ), different time seriescomponents and covariates are used to describe the time evolution of µ t and φ t through the link functions.Of course, they can also share some common regressors. The state parameters’ evolution is describedby a partially specified distribution, with ω t ∼ [ , W t ], where [ a, b ] denotes a distribution specified justby its first and second moments. The state parameters’ initial information, β | D ∼ [ m , C ], is alsopartially specified with prior moments m and C .Therefore equations in (3) and (4), together with the state parameters’ initial information, define aclass of partially specified models, where only the first and second prior moments are defined for thevector of latent components. 4 Inference in EDGLM
The class of models described by (3) and 4) extends the models treated in West et al. (1985) not onlyallowing the scale parameter to vary in time, but also modelling it through an additional link function.This extension implies that the original algorithm is not immediately applicable.The conjugate updating algorithm of West et al. (1985) is extended, making estimation in this class ofmodels feasible. The estimation is still based in the conjugate distribution and linear Bayes estimation,updating sequentially the state vector distributions at each time t , as in the original algorithm. At theend of this process, we obtain both the first and second posterior moments of latent states vectors andthe posterior distribution of ( ψ t | D t ) for each instant t .In the next subsections, we review the main steps involved in the conjugate updating algorithmmainly to set up the notation, and propose a strategy to reduce the system dimension. We also discussthe forecasting distribution and conclude with some examples. The conjugate updating algorithm is based on the steps: evolution, moments equating and updating.The evolution step involves obtaining the first and second moments of the state vectors prior distribution, a t := E [ β t | D t − ] = G t m t − and R t := V ar ( β t | D t − ) = G t C t − G (cid:48) t + W t , given the posterior meanand variance at time t − m t − , C t − and the state evolution variance W t . The prior moments for thelinear predictors follow immediately as: f t := E [ η t | D t − ] = F (cid:48) t a t and Q t := V ar ( η t | D t − ) = F (cid:48) t R t F t .Alternatively, the prior moments of the linear predictor, η t = g ( ψ t ), can be obtained as functionsof parameters defining the conjugate prior. Denote these prior moments as: E [ η t | D t − ] = h ( τ t ) and V ar ( η t | D t − ) = H ( τ t ), where h : IR → IR , H : IR → M and M is a set of symmetric positivedefinite 2 × τ = ( τ , τ , τ ) (cid:48) is the parameters vector of the conjugate prior.We are facing a similar problem to the one posed by Poole and Raftery (2000) in the context ofcomputer simulation models. There are two prior on the same quantity but based on different sourcesof information. This also occurs in the context of reaching consensus in the presence of multiple expertopinions. The analytic expressions of the above moments need to be equated to the linear predictors’numerical moments, previously obtained as functions of the prior moments of the states, providing thenon-linear system of equations: h ( τ t ) = f t and vec( H ( τ t )) = vec( Q t ) . (5)where vec( M ) denotes the vectorization of the upper triangular matrix of a symmetric M .Note that the dimension of the involved vectors and matrices leads to a non-linear system with moreequations than unknown quantities, so the system (5) does not provide a unique solution for the parameter5ector τ t . Therefore it is necessary to introduce some criterion to reduce this large set of solutions toone compromise solution. A proposal to deal with this sort of dimension incompatibility in system (5)is treated in Section 3.2. This aims to answer the following query: What is the “best conjugate priordistribution” corresponding to the partially specified predictive distribution with mean f t and variance Q t ?After observing a new datum, the prior parameters are straightforwardly updated. It follows fromconjugacy that τ t can be updated according to expressions in (2), giving a new parameter vector τ ∗ t .The linear predictors’ posterior moments can be obtained analogously to the system equations (5), given f ∗ t = h ( τ ∗ t ) and Q ∗ t = H ( τ ∗ t ), or analogously, vec( Q ∗ t ) = vec( H ( τ ∗ t )).The observed information is propagated to the state vector using linear Bayes estimation (Westand Harrison (1997), Chapter 4), since its distribution is only partially specified. Then, we obtain theposterior moments of β t , m t = E [ β t | D t ] = a t + R t F t Q t − ( f ∗ t − f t ) and C t = V ar ( β t | D t ) = R t + R t F t Q − t [ Q ∗ t − Q t ] Q − t F (cid:48) t R t . The smoothed posterior moments of the latent states can be obtainedin the same way as m t and C t , using linear Bayes estimation, as detailed in Souza (2013), resulting toexpressions, m st = E [ β t | D T ] = m t + C t G (cid:48) t +1 R − t +1 ( m st +1 − a t +1 ) and C st = V ar ( β t | D T ) = C t + C t G (cid:48) t +1 R − t +1 ( C st +1 − R t +1 ) R − t +1 G t +1 C t , where m sT := m T and C sT := C T . To ensure the uniqueness of the vector τ t at each time considered in the algorithm, we need to reducethe dimensionality of the system (5). Several possibilities can be explored for this reduction, includingarbitrary solutions such as ignoring some equations of the system (5). To avoid such arbitrariness wepropose an alternative inspired on the generalized method of moments (Yin (2009)). Our main objective isto match the linear predictors’ moments and the conjugate prior moments preserving as much informationprovided by the system as possible. An optimum solution is obtained by minimizing the quadraticdistance between the functional form that represents the difference between the numerical moments andthe moment conditions described by its parameter vector, weighted by a weights matrix Ω k (where k isthe dimension of the system) and zero. So, an optimum choice for the parameter vector τ t is the onethat minimizes the function ∆ k ( τ t ; f t , Q t ) (cid:48) Ω k ∆ k ( τ t ; f t , Q t ) , (6)where ∆ k ( τ t ; f t , Q t ) = ( f t − h ( τ t ) , vec( Q t ) − vec( H ( τ t )) ) is a vectorial function and Ω k a positivedefinite weight matrix that specifies the importance of each equation condition in the estimation process.Actually, since the weight matrix Ω k determines how each condition is weighted in the system solution,a simple choice is to take Ω k = I k (identity matrix of dimension k ), which corresponds to considering all6he equations in system (5) on equal footing. Of course other choices for the matrix Ω k can be considered.Intuitively, the more accurate equations should be weighted more than the less accurate ones. A two-stageiterative procedure, described in Yin (2009), can be implemented to determine the “optimal” Ω k takinginto account the observed data.In summary, the proposed procedure can be implemented following the algorithm below: Extended Conjugate Updating Algorithm:
At each time t Step 1. evolution: given m t − and C t − , a t = G t m t − and R t G t C t − G (cid:48) t + W t f t = F (cid:48) t a t and Q t = F (cid:48) t R t F t . Step 2. prior moment equating: obtain the prior parametervector τ t , solution ofarg min τ t { ∆ k ( τ t ; f t , Q t ) (cid:48) Ω k ∆ k ( τ t ; f t , Q t ) } . Step 3. posterior moments updating and equating: obtain τ ∗ t using equation (2) and calculate f ∗ t and Q ∗ t using τ ∗ t in equations(5). Step 4. state updating: obtain ( m t , C t ) via Linear Bayesestimation taking m t = a t + R t F t Q t − ( f ∗ t − f t ) and C t = R t + R t F t Q − t [ Q ∗ t − Q t ] Q − t F (cid:48) t R t . In this section we present examples involving the normal, the inverse Gaussian and the gammadistribution, leaving the discussion of the beta model to the next section. Our aim is to show themain functions involved in the Ef definition and their constraints. Consider model (3), where p ( y t | µ t , φ t ) represents the density function of normal distribution with mean µ t = θ t and variance φ − t . In this case, d ( y t ) = y t , d ( y t ) = − y t ρ ( θ t , φ t ) = 12 (cid:0) µ t φ t − log( φ t ) (cid:1) ,and the conjugate prior distribution takes the form p ( µ t , φ t | D t − ) ∝ exp { φ t [ µ t τ t + τ t ] − τ t ρ ( µ t , φ t )] } , µ t ∈ IR , φ t ∈ IR + τ t τ t , τ t , τ t + 12 and − τ t τ t − τ t .Using the natural link functions η t = g ( µ t ) = µ t and η t = g ( φ t ) = log( φ t ), and the crudeapproximation of the digamma function, ψ ( x ) = log ( x ) + O ( x ) , x > x , to evaluate the moments of thelinear predictor η t , it follows that moment conditions are represented as in the functional form∆ k ( τ t ; f t , Q t ) = (cid:32) f t − τ t τ t , f t − log (cid:32) τ t + 12 (cid:20) − τ t τ t − τ t (cid:21) − (cid:33) ,q t − (cid:20) − τ t τ t − τ t (cid:21) τ − t (cid:20) τ t + 12 − (cid:21) − , q t − τ t + 1 (cid:33) . (7)Therefore τ t is obtained as the solution that minimizes the associated quadratic form.Note that in this example the prior covariance of the linear predictors ( η t , η t ), at each time t , arezero, which indicates that Q t is a diagonal matrix. In fact, it means that η is orthogonal to η given D t − ,so the system reduces to four equations. Nevertheless solving system (7) is not a trivial minimizationproblem since we need to ensure that all involved moments are well defined, in the sense that at eachalgorithm’s iteration, τ t , τ t and τ t generate non-negative variances. In this particular example, theminimization with respect to the vector τ t must satisfy the restrictions τ t > τ t < − τ t τ t , assumingthat the first and second moments of expression (7) are well defined. Suppose that p ( y t | µ t , φ t ) represents the density function of inverse normal distribution with mean µ t and variance µ t φ t in model (3). It is very ease to show that this model is a member of the exponentialfamily, taking d ( y t ) = − y t , d ( y t ) = − y t , ρ ( µ t , φ t ) = − (cid:20) φ t µ t + 12 log( φ t ) (cid:21) and a ( y ) = (2 πy t ) − / . Inthis case, the conjugate prior distribution for the observational model is p ( µ t , φ t | D t − ) ∝ exp (cid:26) − φ t (cid:20) µ t τ t + 12 τ t (cid:21) + τ t ρ ( µ t , φ t ) (cid:27) , µ t > , φ t > . (8)As explained in Banerjee and Bhattacharyya (1979), conditional to φ t , µ − t follows a normaldistribution truncated at zero; and, conditional to µ t , φ t follows a gamma distribution. On the otherhand, p ( µ t , φ t | D t − ) does not have an analytically known form, as far as we know, so we approximateits mean and variance by the mode (˜ µ t , ˜ φ t ) (cid:48) and the inverse curvature matrix ˜ V t of the conjugate priordistribution (8) evaluated at the mode point, respectively, getting ˜ µ t ˜ φ t = τ t /τ t τ t (cid:20) τ t − τ t τ t (cid:21) − and ˜ V t = ˜ µ t τ t ˜ φ t
00 2 ˜ φ t τ t . g ( µ t ) = log( µ t ) and g ( φ t ) = log( φ t ), and taking first-order Taylorapproximations of these functions around (˜ µ t , ˜ φ t ) (cid:48) , we obtain the mode and curvature of the linearpredictors.Then to equate the numerical moments of the linear predictors with those obtained using theirconjugate prior we must solve the system of equations∆ k ( τ t ; f t , Q t ) = (cid:18) f t − log( τ t ) + log( τ t ) , f t − log (cid:18) τ t τ t τ t τ t − τ t (cid:19) ,q t − τ t τ t τ t + 1 τ t , q t − τ t (cid:19) . (9)The optimization problem (7), based on ∆ k like in (9), must satisfy the constraints τ t > τ t τ t with τ t , τ t > Let y t | µ t , φ t denote the density function of the gamma distribution, with mean µ t and variance µ t φ t . The quantities defining this member of the two-parameter exponential family are: θ t = 1 µ t , d ( y t ) = − y t , d ( y ) = log( y t ) and ρ ( θ t , φ t ) = log(Γ( φ t )) − φ t log (cid:18) φ t µ t (cid:19) and, therefore, its conjugateprior distribution is given by p ( µ t , φ t | D t − ) ∝ exp (cid:26) φ t (cid:20) − µ t τ t + τ t (cid:21) − τ t ρ ( θ t , φ t ) (cid:27) , µ t > , φ t > . (10)Since the prior distribution does not represent a known distribution, as far as we know, we opt to use itsmode and the inverse curvature matrix of the conjugate prior distribution (10) in place of its mean andvariance. Using the logarithmic link functions for both parameters, we get E ( η t | D t − ]) ≈ log( τ t ) − log( τ t ) and E ( η t | D t − ]) ≈ log τ t (cid:104) τ t log (cid:16) τ t τ t (cid:17) − τ t (cid:105) V ar ( η t | D t − ) ≈ τ t (cid:20) τ t log (cid:18) τ t τ t (cid:19) − τ t (cid:21) and V ar ( η t | D t − ) ≈ τ t . (11)Moreover, taking a first order Taylor approximation of the function g ( µ t , φ t ) = (log( µ t ) log( φ t )) aroundthe mode of (10), we obtain the covariance of the linear predictors as Cov ( η t , η t | D t − ) ≈ log(˜ µ t ) log( ˜ φ t ) − [log(˜ µ t )][log( ˜ φ t )] = 0 . Comparing the numerical moments obtained for linear predictors through the dynamic model withthose obtained by conjugation (expressions (11), we obtain the functional form∆ k ( τ t ; f t , Q t ) = f t − log (cid:18) τ t τ t (cid:19) , f t − log τ t (cid:104) τ t log (cid:16) τ t τ t (cid:17) − τ t (cid:105) , q t − τ t ˜ φ t , q t − τ t , (12)9hose quadratic distance with respect to zero (possibly weighted by a weights matrix Ω k ) can beminimized by imposing the constraints τ t > τ t > τ t τ t > f t , which ensures that the momentsup to second order associated with the conjugate prior distribution (10) are well defined. Assume that our interest is to forecast some future observation, for example, at instant t + h (forsome integer h ), based on all observations until instant t . Making use of exponential family’s proprieties,it follows from conjugacy that p ( y t + h | D t ) = a ( y t + h ) κ ( τ t + h ) κ ( τ ∗ t + h ) , (13)where κ ( τ t + h ) and κ ( τ ∗ t + h ) are the normalization constants involved in the definition of the priorand the posterior distribution of the vector ( θ t + h , φ t + h ), respectively. Here, the parameter vector τ t + h = ( τ ,t + h , τ ,t + h , τ ,t + h ) can be obtained analogously to that discussed in Section 3.2, by solving theoptimization problemarg min τ t + h { ∆ k ( τ t + h ; f t ( h ) , Q t ( h )) (cid:48) Ω k ∆ k ( τ t + h ; f t ( h ) , Q t ( h )) } , (14)given the recursive relation between the linear predictor moments η t | D t ∼ [ f t ( h ) , Q t ( h )] , f t ( h ) = F (cid:48) t + h a t ( h ) , Q t ( h ) = F (cid:48) t + h R t ( h ) F t + h , with a t ( h ) = G t + h a t ( h − , a t (0) = m t , R t ( h ) = G t + h R t ( h − G (cid:48) t + h + W t + h and R t (0) = C t . Note that the vector τ ∗ t + h = ( τ ∗ ,t + h , τ ∗ ,t + h , τ ∗ ,t + h ) is directly obtained like in the relations representedin (2).In cases in which the constants κ ( · ) do not have known analytical form, we must use some numericalintegration method to approximate them. In this work, Laplace approximations are used to solve suchintegrals. All methods are implemented with the aid of routines available in the free software R (Team(2011)), like the optimization function nlminb and the function fdHess which numerically approximategradient and Hessian functions. Furthermore, to improve the quality of the approaches, we use a newparameterization for the involved prior distributions in terms of their linear predictors η t and η t ,integrating new parameters along the real line. See the next section for an example.10 Case Studies
In this section, two applications are presented to illustrate the performance of the proposed method.In both cases we suppose that the observations follow a beta distribution. The first one modelsunemployment rates in Brazil, using a data set that presents a trend component and a stable seasonalpattern. Our main interest in this first application is to illustrate the importance of dynamically modellingthe precision parameter. The second one considers some macroeconomic variables of the United Kingdom,viewed as compositional data. In this example our aim is to show the importance of modelling the datain their original scale. To start this section, we show the main developments concerning the beta modelused in both applications. The implementations are carried out through the R software and more detailsis discussed below.
Consider now p ( y t | µ t , φ t ) as the density function of a beta distribution in model (3), parameterized interms of its mean µ t and its variance µ (1 − µ ) φ . In this case, using conjugacy in the exponential family, p ( µ t , φ t | D t − ) ∝ exp { φ t [ µ t τ t + τ t ] + τ t ρ ( µ t , φ t ) } , < µ t < , φ t > . (15)where ρ ( µ t , φ t ) = − log (cid:18) Γ( φ t )Γ( φ t µ t )Γ( φ t (1 − µ t ) (cid:19) . Taking g ( µ t ) = logit( µ t ) and g ( φ t ) = log( φ t ) andapproximating first and second moments of (15), respectively, by the mode (˜ µ t , ˜ φ t ) (cid:48) and the inversecurvature matrix evaluated at the mode, we get E ( η t | D t − ) ≈ logit(˜ µ t ) = τ t τ t and V ar ( η t | D t − ) ≈ τ t ˜ µ t (1 − ˜ µ t ) ˜ φ t E [ η t | D t − ] ≈ log( ˜ φ t ) = log (cid:18) τ t { τ t log(1 − ˜ µ t ) − τ t } (cid:19) and V ar ( η t | D t − ) ≈ τ t . (16)The functional form (7) to be minimized depends on the vector function ∆ k ( τ t ; f t , Q t ) = (cid:18) f t − τ t τ t , f t − log (cid:18) τ t { τ t log(1 − ˜ µ t ) − τ t } (cid:19) , q t − τ t ˜ µ t (1 − ˜ µ t ) ˜ φ t , q t − τ t (cid:19) , (17) whose minimum must be obtained by imposing restrictions τ t > τ t > − τ t log(1 + exp { f t } ),since we are imposing the condition that Cov ( η t , η t | D t − ) = 0, such as in the gamma case.It is worth noting that although the beta distribution has a conjugated prior represented in equation(15), it does not have a known analytical form, as far as we know. So, to find its normalization constantwe need to approximate the integral κ ( τ t , τ t , τ t ) − = (cid:90) ∞ (cid:90) exp { φ t [ µ t τ t + τ t ] + τ t ρ ( µ t , φ t ) } dµ t φ t ,
11y using a Laplace approximation for its expression. In fact, by changing the variables of the integral in(18) in terms of η t and η t , we can approximate it as κ ( τ t , τ t , τ t ) − = (cid:90) ∞−∞ (cid:90) ∞−∞ exp (cid:26) e η t (cid:20) e η t e η t τ t + τ t (cid:21) − τ t ρ (cid:48) ( η t , η t ) (cid:27) dη t η t ≈ √ π | ˜ V t | exp { L t (˜ η t , ˜ η t ) } , where ρ (cid:48) ( η t , η t ) = log (Γ( e η t )) − log (cid:18) Γ (cid:18) e η t e η t e η t (cid:19) Γ (cid:18) e η t
11 + e η t (cid:19)(cid:19) , L t ( η t , η t ) = e η t (cid:104) e η t e η t τ t + τ t (cid:105) + τ t ρ (cid:48) ( η t , η t ) and ˜ V t = − (cid:2) ∇ L t ( η t , η t ) (cid:3) − (cid:12)(cid:12)(cid:12) ( η t ,η t )=(˜ η t , ˜ η t ) is the Hessian matrixof L t ( η t , η t ), applied in its mode (˜ η t , ˜ η t ).Using the R software, the mode (˜ η t , ˜ η t ) and the Hessian matrix ˜ V t can be easily obtained using,respectively, the functions nlminb and fdHess , using the expression L t ( η t , η t ) as the argument. month une m p l o y m en t r a t e ( % ) . . . Figure 1: Unemployment rates of working-age people in the major metropolitan regions of Brazil fromMarch 2002 to December 2011.It is well known that the yearly seasonal behaviour in this time series is mainly due to temporary jobscreated by holiday seasons and school vacations, as mentioned by da Silva et al. (2011). Considering these12actors, we analysed the data set through a dynamic beta model, where the observational mean evolve asa second-order polynomial model with seasonal effect. Unlike da Silva et al. (2011), we assume a moreparsimonious model, where seasonality is represented by a one-harmonic model and we assume that theprecisions can evolve dynamically in time. Additionally, we assume that the latent variables associatedwith means and precisions evolve in time independently, taking the matrices F (cid:48) t , G t , W t and C as blockdiagonal matrices of the form F (cid:48) t = diag( F (cid:48) t , F (cid:48) t ), G t = diag( G t , G t ), W t = diag( W t , W t ) and C = diag( C , C ), where the matrices related to the dynamics of the observational means are givenby F t = (1 , , , (cid:48) and G t = J (1) J (1 , ω ) , ∀ t, where J (1) = and J (1 , ω ) = cos( ω ) sin( ω ) − sin( ω ) cos( ω ) , ω = 2 π . To model the dispersions, we assume a first order dynamic model, taking F t = G t = 1, ∀ t , in orderto allow precision parameter φ t to vary in time through the introduction of a random error.We chose to specify the error evolution covariance matrices, W t , t ∈ { , ..., t } , through the useof multiple discount factors assuming W t to be a block diagonal matrix whose blocks are associatedwith mean level and trend and seasonal components, and a precision level component, taking D =blockdiag { δ − / µ,lt I , δ − / µ,s I , δ − / φ,l } , where δ µ,lt , δ µ,s and δ φ,l are discount factors associated with therespective blocks of components by replacing the expression of R t in the evolution step of the algorithmwith the form R t = DG t C t − G (cid:48) t D .Different combinations of discount factors were tried and we selected the one that provided thebest performance according to some alternative model selection criteria like the mean squared error(MSE)based on one-step-ahead forecasting, the joint log-likelihood (LL) and the log-observed predictivedensity (LPD), excluding the first 18 observations, taken as a learning period. Using the selected discountfactors, namely, δ µ,lt = 0 . δ µ,s = 0 .
95 and δ φ,l = 0 .
90, we obtained the model parameter estimates andthe one-step-ahead predictive distributions for the unemployment rates during the period from September2003 to December 2011 at each instant, using expression (13) as discussed in the previous subsection withthe aid of the R routines nlminb and fdHess .In Figure 2, it is possible to observe the filtered ( E [ β t | D t − ]) and the smoothed estimated statevariable means ( E [ β t | D T ]) related to the observational mean components, describing level, trend andseasonality, respectively; and the state variable associated with the observational precision. In fact,there is a clearly decreasing trend in the data as well as a seasonal behaviour like observed in Figure 1.13egarding the precision structure, the small growth of the state variable β t over time can indicate thatas new information is incorporated in the estimation process, the accuracy of the model increases. month b t − . − . − . (a) level month b t − . . (b) trend month b t − . . (c) seasonality month b t . . . (d) precision Figure 2: Filtered (solid line) and smoothed (dashed line) latent states means for application usingunemployment data.It can be seen in Figures 3 and 4 that the method generated satisfactory results, since both estimatedmeans (the filtered ones E [ µ t | D t ],) and one-step-ahead predictive distribution means ( E [ y t | D t − ]) followthe behavior of the real data series, as illustrated by Figures 3 and 4, respectively . Also note that theestimated 95% HPD credibility intervals for the one-step-ahead predictive distributions, represented bythe dashed red lines in Figure 4, are well concentrated and contain the true value of the observations inall considered instances. The point and interval estimates for the predictive distributions considered inthe last six instants can be seen in Table 1.To illustrate the importance of dynamic modelling for the precision parameter model, we completedthis application by comparing its results with those obtained using a similar model in which we assumedthat φ t = φ , ∀ t , taking null precision evolution errors in matrix W t . Figure 4 compares the intervalestimates for the one-step-ahead predictive distribution obtained considering both models. Note that14 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll month une m p l o y m en t r a t e ( % ) . . . Figure 3: Filtered estimated observational means (solid line) based on latent states posterior meansobtained for application using unemployment data. The gray points represent the true observations.
Month y t Mean Mode IC .
060 0 .
062 0 .
061 [0 . , . .
060 0 .
059 0 .
058 [0 . , . .
060 0 .
056 0 .
056 [0 . , . .
058 0 .
055 0 .
055 [0 . , . .
052 0 .
054 0 .
055 [0 . , . .
047 0 .
054 0 .
054 [0 . , . p ( y t | D t − ).intervals based on a model with φ fixed in time (represented by the shaded area in the graph) are lessconcentrated, indicating that there was a gain with respect to accuracy of the predictive distributions inthis case, in which we considered the dynamic modelling of the precision structure. c ), investment ( i ), governmentexpenditure ( g ) and export ( e ) shares of U.K. gross final expenditure.15 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll month une m p l o y m en t r a t e ( % ) . . . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Figure 4: One-step-ahead predictive intervals from September 2003 to December 2011, based on thepredictive distributions p ( y t | D t − ) of alternative models: the shaded region represents the 95% HPDpredictive credibility intervals based on a model with φ fixed in time, whereas the solid lines representthe predictive distribution credibility intervals based on a model with a dynamic precision parameterstructure. The gray points are the observed unemployment rates.Despite the compositional nature of the data, in order to use the class of models discussed in thisarticle, which includes only univariate observational distributions in the exponential family, we analysedeach of the rate series separately through a generalized dynamic model whose observations follow thebeta distributions, and for which we assumed different mean and precision structures. We denote theproposed models by the mnemonic Var( l ) and Pol( l ), meaning a vectorial autorregressive component anda polynomial trend, respectively, where l is the order of the correspondent model. This models werecombined to model the transformed observational mean and the transformed observational precision indifferent forms. For each case, as in the previous application, we assumed that the latent variablesassociated with means and precisions model evolve in time independently, taking the matrices F (cid:48) t , G t , W t and C as block diagonal matrices. Under this hypothesis, three different structures were consideredfor the class of models represented by (3) and (4): • Var(2)Pol(0) - Second order VAR model for the transformed observational mean and constant forthe transformed observational precision:For the means structure, we assumed that each series can be explained by all the other series, taking16wo lags in time, assuming a second-order VAR model. For the precision structure, we assumedthat each series has a constant accuracy in time, taking F t = (cid:16) , x c ,t − , x g ,t − , x i ,t − , x e ,t − , x c ,t − , x g ,t − , x i ,t − , x e ,t − (cid:17) (cid:48) , G t = I , W t = , and F t = G t = 1 , W t = 0 , ∀ t ∈ { , ..., T } , where x c ,. , x g ,. , x i ,. , x e ,. , represent, respectively, the rates of consumption, governmentexpenditure, investment and exports in previous instants. • Var(2)Pol(1) - Transformed observational mean modelled by a second order VAR and precisionwith a first order dynamic structure:As in the previous case, we assumed means explained by a second-order VAR model, but in thiscase we allowed the precisions to vary in time according to a first order polynomial model taking F t = (cid:16) , x c ,t − , x g ,t − , x i ,t − , x e ,t − , x c ,t − , x g ,t − , x i ,t − , x e ,t − (cid:17) (cid:48) , G t = I , W t = , and F t = G t = 1 , W t (cid:54) = 0 , ∀ t ∈ { , ..., T } , where, again, x c ,. , x g ,. , x i ,. , x e ,. , represent, respectively, the rates of consumption, governmentexpenditure, investment and exports in previous instants. • Pol(2)Pol(1) - Polynomial models for both mean and precision structures:For the means we assumed a second-order model in which we considered level and trend for eachof the series and a first-order structure for the precisions, taking F t = , G t = J (1) = , W t (cid:54) = and F t = G t = 1 , W t (cid:54) = 0 ∀ t ∈ { , ..., T } . As in the previous application, we chose to specify the covariance matrices through the use of multiplediscount factors, assuming block diagonal matrices, whose blocks are associated with the respectivecomponents (level and trend in the case of second-order model and level in the order 1 model) inpolynomial models. More specifically, considering, for example, the
Pol(2)Pol(1) structure, we useda block diagonal discount matrix of the form D = blockdiag { δ − / µ,lt I , δ − / φ,l } , where δ µ,lt is the discountfactor associated with mean level and trend components and δ φ,l is the discount factor associated with17recision level components, substituting the expression of R t in the evolution step of the algorithm forthe form R t = DG t C t − G (cid:48) t D , as discussed in Chapter 6 of West and Harrison (1997).For each of the rate series and for each of the dynamic structures assumed, different combinationsof discount factors values were used, so we selected the one that provided the best data fit accordingto the mean squared error (MSE) based on one-step-ahead forecasting, the joint log-likelihood (LL) andthe log-observed predictive density (LPD) of each series, excluding the first 31 observations (taken aslearning sample). For this application, different combinations of values 0 .
90, 0 .
95 and 0 .
98 were takenfor the discount factors and, for all assumed dynamic structures, models with smaller values, namely δ µ,lt = δ φ,l = 0 .
90, outperformed. Table 2 reports adjustment measures for the different dynamic models.It can be seen that, according to the criteria used, the model that supposes a second-order Var structurefor the mean and a first order structure for the precision performs better with lower MSE and values andhigher LL and LPD values, which makes sense since the Var structure capturing the relationship betweenthe different rate series and allows the precision model structure to vary in time, giving greater flexibilityto the model.
Model
VAR(2)Pol(0)VAR(2)Pol(1)Pol(2)Pol(1) consumptionMSE LL LPD . − .
507 724 . . − .
776 725 . . − .
711 653 . investmentMSE LL LPD . − .
757 694 . . − .
774 698 . . − .
014 636 . Model
Var(2)Pol(0)Var(2)Pol(1)Pol(2)Pol(1) government expenditureMSE LL LPD . − .
688 857 . . − .
035 862 . . − .
649 707 . exportMSE LL LPD . − .
010 702 . . − .
200 710 . . − .
193 621 . E [ µ t | D t ]. Note that for all analysed series, the estimated meansclosely parallel the behaviour of the data series. Similar behaviour can also be observed for the estimatedpredictive mean’s ( E [ y t | D t − ]), shown in Figure 6. It can also be seen that the estimated 95% HPDcredibility intervals for the one-step-ahead predictive distributions are well concentrated, containing thetrue observation values in most cases. Point and interval predictive estimates for investment rates forsome considered instants can be seen in Table 3. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter c on s u m p t i on s ha r e ( % ) . . . (a) consumption ( c ) lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter i n v e s t m en t s ha r e ( % ) . . (b) investment ( i ) lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter c on s u m p t i on s ha r e ( % ) . . . (c) government expenditure ( g ) lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter e x po r t s ha r e ( % ) . . . (d) export ( e ) Figure 5: Filtered estimated observational mean (solid line) based on latent states posterior means forthe quarterly shares of U.K. gross final expenditure for the period from 1955.1 to 2012.3. The pointsrepresent the true data set.The smoothed posterior mean estimates ( E [ µ t | D T ]) for all data series are represented in Figure 7.Although we treated each time series separately the estimates obtained are consistent, in the sense that,at each instant, the sum of the estimated means are approximately one. This behaviour indicates that,19 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter c on s u m p t i on s ha r e ( % ) . . . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (a) consumption ( c ) lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter i n v e s t m en t s ha r e ( % ) . . . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (b) investment ( i ) lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter go v . e x pend i t u r e s ha r e ( % ) . . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (c) government expenditure ( g ) lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll quarter e x po r t s ha r e ( % ) . . . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (d) export ( e ) Figure 6: One-step-ahead prediction for the quarterly shares of U.K. gross final expenditure for the periodfrom 1963.2 to 2012.3. The solid line represents the predictive distribution means ( E [ y t | D t − ]) and theshaded region the 95% HPD predictive credibility intervals. The points represent the true data set ineach case.despite the simplicity of the model used in this application, the behaviour of the series is well capturedby the proposed model.A subsets of the data set used in this application have already been analyzed by Mills (2010). Undera classical point of view, Mills (2010) estimated an order 2 VAR model, using a multivariate normaldistribution to model a transformation of the original data aslog (cid:16) ce (cid:17) , log (cid:16) ie (cid:17) and log (cid:16) ge (cid:17) , (18)where c , i , g and e represent consumption, investment, government expenditure and export rates,respectively.In order to ascertain whether there is any advantage in analysing the data in their original scale wereanalysed these data set transforming them as proposed by Mills (2010) (according to equations (18)),replacing the observational beta distributions with univariate normal distributions for each series. Again20 uarter ( t ) investment y t mean mode IC .
106 0 .
117 0 .
117 [0 . , . .
112 0 .
112 0 .
111 [0 . , . .
116 0 .
113 0 .
113 [0 . , . .
110 0 .
116 0 .
116 [0 . , . .
107 0 .
114 0 .
114 [0 . , . .
110 0 .
110 0 .
110 [0 . , . .
109 0 .
112 0 .
111 [0 . , . i ) shares of U.K. gross final expenditurebased on the predictive distributions p ( y t | D t − ). quarter U . K . e x pend i t u r e s ha r e s ( % ) . . . . . . . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll cegi Figure 7: Smoothed posterior estimates for the observational means for quarterly consumption ( c ),investment ( i ), government expenditure ( g ) and export ( e ) shares concerning expenditure in the UKeconomy over the period 1955.1 to 2012.3. Gray continuous lines represent true observations.we chose to model each series separately using analogous structures to those adopted in the beta case andassuming different discount factors for cases that include dynamics for the latent variables. According tothe model comparison criteria used in this article, the best fitted standard model was the one in which weassumed a second-order VAR model for the observational means and a first-order model for the precisions,21 onsumptioninvestmentgov. expenditure VAR(2)Pol(1) beta modelMSE LL LPD . − .
776 725 . . − .
774 698 . . − .
035 862 . VAR(2)Pol(1) normal modelMSE LL LPD .
061 472 .
727 474 . .
112 591 .
338 374 . .
101 591 .
906 440 . .
90 to specify the error evolution covariance matrices of the latentvariables associated with precision structure.To compare the performance of the best beta model with the corresponding normal one (both withVar(2)Pol(1)), we recalculated the normal model fit measures correcting each measure through theJacobian of the transformation, in order to obtain adjustment measures in a same scale. The resultsfor the fit measures for the different models can be seen in Table 4. Its possible to see that all the criteriathat take into account one-step-ahead predictive distribution estimates of each of the series indicate abetter performance of the beta model. Indeed, for the three considered series, the beta model had lowerMSE and higher LL and LPD for all cases, giving evidence that the modelling of the data in their originalscale has advantages regarding the predictive ability of the model.
In this paper we propose a method for estimation and prediction of dynamic models whose observationsfollow distributions of the two-parameter exponential family. The estimation in the proposed partiallyspecified model class, represented by equations (3) and (4), is based on a extension of the conjugateupdating algorithm of West et al. (1985). The main idea of this new method is to explore properties ofconjugacy in the exponential family and linear Bayes estimation, allowing the quick updating of bothmean and precision model parameters through analytical strategies, avoiding computationally intensivemethods such as those based on Monte Carlo estimation.Our algorithm stands out mainly for two reasons: first it treats a very general class of models withobservations in the exponential family, which allows modelling data in their original scale, such as in22cCullagh and Nelder (1989)’ MLG. Second, the introduction of a second link function in the modelallows treatment of overdispersion and heteroscedasticity in data, and allows the precision structure ofthe model to be dynamically treated, efficiently capturing the data behaviour even through the use ofpartially specified models.Simulated studies presented by Souza (2013), assuming different observational models in the two-parameter exponential family, show that the proposed method generated satisfactory results both asregards obtaining point and interval estimates for the parameters, as in steps-ahead forecasting. Theapplications to real data presented in Section 4 of this paper also illustrate the good performance of theproposed algorithm and demonstrate the relevance of modelling data in their original scale.Although use of MMG has been shown to be a good alternative to reduce the dimensionality of thesystem treated in Section 3.2, we intend to study other alternatives for reducing the system (5). Alsowith respect to the use of the generalized method of moments, we intend to study the choice of weightsmatrix Ω k with the aim of checking whether there is any gain in quality of estimates by introducing aniterative choice of weights matrix Ω k , as discussed in Newey (1993) and Hamilton (1994).As the main extension of this work we intend to extend the conjugate updating algorithm in orderto treat classes of multi-parameter and multivariate models, such as models whose observations followDirichlet or multinomial distributions, the parameters of which can be explained by different linkfunctions. References
Banerjee, A. K. and Bhattacharyya, G. K. (1979) Bayesian Results for the Inverse Gaussian Distributionwith Application.
Technometrics , , 247–251.Bernardo, J. M. and Smith, A. F. M. (1994) Bayesian Theory . John Wiley & Sons.Cepeda, E. C. and Gamerman, D. (2005) Bayesian methodology for modeling parameters in the twoparameter exponential family.
Revista Estad´ıstica , , 93–105.Cepeda, E. C., Migon, H. S., Achcar, J. A. and Garrido, L. (2011) Generalized linear models with randomeffects in the two parametric expoencial family. Tech. rep. , Universidade Federal do Rio de Janeiro.Dey, D. K., Gelfand, A. E. and Peng, F. (1997) Overdispersed generalized linear models.
Journal ofStatistical Planning and Inference , , 93–107.Efron, B. (1986) Double exponencial families and their use in generalized linear regression. Journal ofthe American Statistical Association , , 709–721.23elfand, A. and Dalal, S. (1990) A note on overdispersed exponential families. Biometrika , , 55–64.Hamilton, J. (1994) Time series analysis . Princeton University Press.Lindsay, B. G. (1986) Exponencial family mixture models.
The Annals of Statistics , , 124–137.McCullagh, P. and Nelder, J. A. (1989) Generalized linear models . Chapman & Hall.Mills, T. (2010) Forecasting compositional time series.
Qual Quant , , 673–690.Nelder, J. A. and Lee, Y. (2001) Generalized linear models for the analysis of taguchi-type experiments. J. Probab. Stat. , , 207–221.Nelder, J. A. and Wedderburn, R. W. M. (1972) Generalized linear models. Journal of the Royal StatisticalSociety. Series A (General) , , 370–384.Newey, W. (1993) Efficient estimation of models with conditional moment restrictions. Handbook ofStatistics , , 419–454.Poole, D. and Raftery, A. E. (2000) Inference for Deterministic Simulation Models: The Bayesian MeldingApproach. Journal of the American Statistical Association , , 1244–1255.da Silva, C. Q., Migon, H. S. and Correia, L. T. (2011) Dynamic Bayesian beta models. ComputationalStatistics & Data Analysis , , 2074–2089.Souza, M. A. O. (2013) Aproxima¸c˜oes anal´ıticas e inferˆencia em modelos na fam´ılia exponencialbiparam´etrica . Ph.D. thesis, Universidade Federal do Rio de Janeiro.Team, R. D. C. (2011)
R: A Language and Environment for Statistical Computing . R Foundation forStatistical Computing. URL .West, M. and Harrison, J. (1997)
Bayesian Forecasting and Dynamic Models . Springer Verlag.West, M., Harrison, P. and Migon, H. (1985) Dynamic generalized linear models and Bayesian forecasting(with discussion).
Journal of the American Statistical Association , , 7397.Yin, G. (2009) Bayesian Generalized Method of Moments. Bayesian Analysis ,4