Bayesian inference for generalized extreme value distribution with Gaussian copula dependence
BBayesian inference for generalized extreme value distribution with Gaussian copula dependence
Bo NingDepartment of Statistics, North Carolina State UniversityandPeter BloomfieldDepartment of Statistics, North Carolina State UniversityMarch 6, 2017
Abstract
Dependent generalized extreme value (dGEV) models have attracted much attention dueto the dependency structure that often appears in real datasets. To construct a dGEVmodel, a natural approach is to assume that some parameters in the model are time-varying.A previous study has shown that a dependent Gumbel process can be naturally incorporatedinto a GEV model. The model is a nonlinear state space model with a hidden state thatfollows a Markov process, with its innovation following a Gumbel distribution. Inference maybe made for the model using Bayesian methods, sampling the hidden process from a mixturenormal distribution, used to approximate the Gumbel distribution. Thus the response followsan approximate GEV model. We propose a new model in which each marginal distribution isan exact GEV distribution. We use a variable transformation to combine the marginal CDF ofa Gumbel distribution with the standard normal copula. Then our model is a nonlinear statespace model in which the hidden state equation is Gaussian. We analyze this model usingBayesian methods, and sample the elements of the state vector using particle Gibbs withancestor sampling (PGAS). The PGAS algorithm turns out to be very efficient in solvingnonlinear state space models. We also show our model is flexible enough to incorporateseasonality.
Keywords:
Generalized extreme value; nonlinear state space model; particle Gibbs sampler; timeseries analysis.
Generalized extremely value models have been used extensively to estimate and predict extremeevents in many applications, for instance in the environment, engineering, economics and finance.1 a r X i v : . [ s t a t . M E ] M a r he theoretical properties of this model have been studied in the literature (Pickands, 1975;Castillo et al., 2004). Both frequentist and Bayesian approaches have been developed to estimatethose models (Coles, 2001; Coles and Powell, 1996). Recently, dependent GEV models (for short,dGEV models) have attracted much attention. Those models are useful in applications whenextreme events are time-correlated. For example, higher temperature in one day may be followedby a higher temperature in another day.A GEV distribution is shown as follows: P ( Y ≤ y ) = F ( y ) = exp (cid:40) − (cid:18) ξ y − µψ (cid:19) − ξ + (cid:41) , (1.1)where µ , ψ and ξ are the location, scale and shape parameters respectively. To extend the GEVmodel to a dGEV model, it is common to introduce dependence for some of its parameters.Previous studies have focused on letting µ follow an AR(1) process with other parameters timeinvariant (Huerta and Sansó, 2007); and letting µ , ψ and ξ be time-dependent and follow ARMA-type process (Wei and Huerta, 2016). However, since we consider the dependency of extremeevents, a more general way to introduce the time-dependent structure is to insert a copula; this ideais originally proposed by Nakajima et al. (2011, 2015). To be explicit, their model is constructedas follows.If we define α t ≡ log (cid:40)(cid:18) ξ Y t − µψ (cid:19) ξ + (cid:41) , (1.2)then α t follows the Gumbel distribution. Solving (1.2) for Y t , we find the following expression, Y t = µ + ψ exp( ξα t ) − ξ , (1.3)and if α t is a hidden AR(1) process, α t +1 = φα t + η t , η t ∼ Gumbel , (1.4)then the model becomes a nonlinear non-Gaussian state space model. To estimate the parametersin the model, they first show that the model can be approximated by a linear Gaussian statespace model, by finding a linear approximation for nonlinear observation equations and using2ixed normal distributions to approximate the Gumbel errors in the state equation. However,from (1.4) the marginal distribution of α t does not follow a Gumbel distribution, because the sumof two Gumbel-distributed random variables in general does not follow a Gumbel distribution. Wepropose a method to allow α t to follow exactly the Gumbel distribution. The model has Gumbelmarginal distributions linked together with a Gaussian copula with AR (1) dependence. By thisconstruction, the marginal distribution of α t is exactly from a Gumbel distribution. The detailsare shown in Section 2.Because the number of variables in the dGEV model is large, we use Bayesian analysis to sampledraws from the joint distributions. Markov chain Monte Carlo (MCMC) allows us to sample fromthe conditional distribution for each individual parameter, and thus we only need to take care ofthe conditional distribution of each parameter at one time instead of the joint distribution for allparameters. A drawback of using the MCMC algorithm is that the mixing can be slow due to thehigh dimensional state vector in the model. As a result, many iterations are needed and methodsto reduce the correlation between chains such as marginalization, permutation and trimming (Dykand Park, 2008; Liu et al., 1994; Liu, 1994) must be considered.In Bayesian analysis, a common technique is to use Kalman filtering and backward smoothing(KFBS) (West and Harrison, 1997) for sampling draws from linear Gaussian state space models.For nonlinear non-Gaussian state space models, the KFBS cannot applied directly. One way ofsampling those model is to find a linear approximation for the nonlinear equations (Shephard andPitt, 1997); common techniques include using a first-order Taylor series expansion based on theobservation equations, or a second-order Taylor expansion based on the log-likelihood functions,and finding a mixed normal approximation for the non-Gaussian errors. The sampling approachis to apply KFBS algorithm on the approximated linear Gaussian state space model, and usingMetropolis-Hastings (M-H) to accept or reject the draws. However, When the model is highlynonlinear and non-Gaussian these approximation techniques may perform very poorly; as a result,the M-H step will reject most of the draws, thus increasing the computation time and reducingthe mixing rate of the chain. There are many extensions for the KFBS for adapting to a nonlinearstate space model including extended Kalman filter, robust extended Kalman filter (Einicke andWhite, 1999) and using mixed normal densities to approximate nonGaussian densities (Niemi andWest, 2010). 3n alternative way to make inferences in a nonlinear state space model is by using sequentialMonte Carlo (SMC). This approach is as follows: it first generates particles from a chosen proposaldensity; and then calculate the weights between the proposal density and the true density. Thegenerated particles and their weights give a discrete distribution approximation to the true density.However, one issue with this method is the degeneration of the weights, thus a resampling stepis needed when calculating the weights. This method is also known as importance sampling.Recently, much effort has been put into incorporating SMC within MCMC; some of the methodsinclude particle Metropolis-Hastings and the particle Gibbs sampler (Andrieu et al., 2010). Touse MCMC approach is more convenient when dealing with a model with many variables. Theparticle Gibbs sampler is the method we shall use in this paper; this method uses collapsed Gibbssample (Liu, 1994), which allows the joint distribution to be invariant in MCMC sampling. Theessential idea is to combine a conditional SMC (cSMC) (Liu, 2001) with the Gibbs sampler. ThecSMC method needs to keep a path known as ancestor lineage to survive in each resamplingstep. However, this algorithm has two weaknesses: poor mixing rate and particle degeneration(Doucet and Johansen, 2011; Chopin and Singh, 2015). An extension to address these weaknessesis to do backward smoothing, which known as forward sampling and backward smoothing (FFBS)(Whiteley, 2010; Lindsten and Schön, 2013; Kantas et al., 2015). A drawback for the FFBSalgorithm is that adding the backward step increases the computation time. A more recentmethod which is known as Particle Gibbs with ancestor sampling (PGAS) (Lindsten et al., 2014)is proposed to address the computation issues of FFBS. The PGAS approach is as follows: foreach forward sampling step, instead of fixing the ancestor lineage, the algorithm resamples theancestor lineage. In this case, some particles with small weights in the ancestor lineage will dropout.In the rest of the paper, we will introduce our novel dGEV model, which incorporates thedependence between extreme events through a dependent Gumbel distribution; the dependentGumbel distribution is constructed with a normal copula, and in the final model the observedvariable is sampled from an exact GEV distribution. Our model can be expressed as a nonlinearGaussian state space model; we use PGAS to sample the hidden process in the nonlinear statespace model. We also show that our model may be extended to incorporate seasonal components,which are needed for some datasets; we name the result a seasonal dGEV model.4n the simulation study, we show our estimated posterior medians for parameters are closeto the true values, and the true values are contained in the 95% credible intervals. Because thelatent variables marginally follow a standard normal distribution, we could use standard normaldistribution tails to characterize the extremeness of the values.We fit the model to two real datasets: one is an annual maximum water flow dataset and theother is a weekly minimum return for the S&P 500 index.The remaining sections are arranged as follows: Section 2 introduces our dGEV model; Sec-tion 3 describes a Bayesian computation steps; Section 4 adds seasonality on the dGEV model;Section 5 illustrates the dGEV and seasonal dGEV models using simulated data; Section 6 con-ducts a real data study for a water flow dataset and a financial dataset; Section 7 concludes. Our dGEV model is constructed as follows: suppose that β t follows the standard normal distribu-tion, so that Φ( β t ) follows the uniform (0 , distribution. Let G ( α ) be the CDF of the Gumbeldistribution: G ( α ) = exp( − exp( − α )) . If α t = G − (Φ( β t )) = − log( − log( β t )) , then α t follows the Gumbel distribution.Now suppose that β t follows an AR(1) process: β t +1 = φβ t + η t , where η t ∼ N (0 , − φ ) (2.1)for t = 1 , . . . , T − and β ∼ N (0 , , and let Y t = µ + ψ exp( ξα t ) − ξ + (cid:15) t = µ + ψ ( − log(Φ( β t ))) − ξ − ξ + (cid:15) t , (cid:15) t ∼ N (0 , σ ) (2.2)where t = 1 , . . . , T and β ∼ N (0 , . The normal error (cid:15) t in the observation equation is a addedterm that makes the model have a nonlinear state space presentation; the error should be have a5mall variance to make it negligible.The parameters in our model are µ , ψ , ξ , θ , σ and the latent variables β T . In Bayesiananalysis, we adopt a similar priors for these parameters to those suggested by (Coles and Tawn,1996; Chavez-Demoulin and Davison, 2012): µ ∼ N ( ν µ , σ µ ) , ψ ∼ Gamma ( a ψ , b ψ ) , ξ ∼ N ( ν ξ , σ ξ ) .The prior for φ ∼ U ( − , to make sure the AR(1) process in (2.1) is stationary. Last, the priorfor σ − ∼ InvGamma ( a, b ) , which we later abbreviate IG ( a, b ) . In this section, we describe a MCMC algorithm for sampling draws of the parameters in the dGEVmodel. Because of the many parameters, a slow mixing rate will cause the chain converge veryslowly. To ameliorate this problem, we consider using trimming and blocking techniques; Dyk andPark (2008) have justified those techniques as useful.In the following, to simplify the notation, let θ to be any parameter(s) in the model and − θ to be the parameters in the model except θ . The outline of the MCMC algorithm is designed asfollows: we sample π − ϑ ( ϑ | Y ) from its posterior distribution by grouping µ , ψ , and ξ together andtreating them as a block; and then sample π − σ ( σ | Y ) , π − φ ( φ | Y ) from the conditional posteriordistribution using Gibbs and Metropolis-Hastings algorithm; last, sample π − β ( β | Y ) using particleGibbs with ancestor sampler (PGAS), which will be introduced later. µ , ψ , ξ It is suggested to group µ , ψ , ξ parameters together and sample from their joint distribution toreduce the correlation between MCMC chains (Nakajima et al., 2011). Then the joint conditionalposterior distribution for ϑ is π − ϑ ( ϑ | Y ) = T (cid:89) t =1 π − ϑ ( ϑ | Y t ) π ( µ ) π ( ψ ) π ( ξ ) (3.1)Although the priors for µ and ψ is conjugate with their likelihoods, the prior for ξ does not.When we group them as a trivariate parameters, the joint likelihood is not conjugacy with thepriors, thus we use Metropolis-Hastings algorithm and find a proposal density using the second-6rder Taylor expansion of log π − ϑ ( ϑ | Y ) as follows. Let ϑ ∗ ∼ MVN ( ν ∗ , Σ ∗ ) , where Σ ∗− = − ∂ log π − ϑ ( ϑ | Y ) ∂ ϑ ∂ ϑ T (cid:12)(cid:12)(cid:12) ϑ =ˆ ϑ , ν ∗ = ˆ ϑ + Σ ∗ · ∂ log π − ϑ ( ϑ | Y ) ∂ ϑ (cid:12)(cid:12)(cid:12) ϑ =ˆ ϑ , and ˆ ϑ is a value that maximizes the posterior of π − ϑ ( ϑ | Y ) ; this value can be calculated by using optim in R . After sampling a draw ϑ ∗ from this proposal density, we use Metropolis-Hastings toaccept or reject ϑ ∗ with ratio α ( ϑ , ϑ ∗ ) = min (cid:110) , π − ϑ ( ϑ ∗ | Y ) f ( ϑ | ν ∗ , Σ ∗ ) π − ϑ ( ϑ | Y ) f ( ϑ ∗ | ν ∗ , Σ ∗ ) (cid:111) . σ To sample σ , we choose a conjugate prior inverse gamma prior as follows, σ ∼ IG ( a, b ) . Since (cid:15) t should be small, we choose a to be large and b to be small so that the prior will have asmaller variance.The posterior of σ can be written as follows: π − σ ( σ | Y ) = IG a + T , b + (cid:80) Tt =1 (cid:16) y t − µ − ψξ ( − log(Φ( β t )) − ξ − (cid:17) . (3.2) φ The conditional posterior of φ can be expressed as follows: π − φ ( φ | Y ) = (2 π ) − T/ (1 − φ ) − T/ · exp (cid:110) − (cid:80) Tt =2 ( β t − φβ t − ) − φ ) (cid:111) · π ( φ ) (3.3)where π ( φ ) is the prior of φ . Because the posterior density is not a well known distribution, similarto dealing with parameter ϑ , we find a proposal density for π − φ ( φ | Y ) as q − φ ( φ | Y ) ∼ N ( ν φ , σ φ ) σ − φ = − ∂ log π − φ ( φ | Y ) ∂φ (cid:12)(cid:12)(cid:12) φ = ˆ φ ,ν φ = ˆ φ + σ φ · ∂ log π − φ ( φ | Y ) ∂φ (cid:12)(cid:12)(cid:12) φ = ˆ φ . After we draw φ ∗ ∼ q − φ ( φ | Y ) ∼ N ( ν φ , σ φ ) , φ ∗ is accepted with probability α ( φ, φ ∗ ) = min (cid:110) , π − φ ( φ ∗ | Y ) q − φ ( φ | Y ) π − φ ( φ | Y ) q − φ ( φ ∗ | Y ) (cid:111) . β T The posterior distribution for π − β ( β | Y ) can be expressed as the following, π − β ( β | Y ) = T (cid:89) t =1 f ( Y t | β t , ϑ , σ ) T (cid:89) t =2 f ( β t | β t − , φ ) π ( β ) , with f ( Y t | β t , ϑ , σ ) = (cid:16) πσ (cid:17) − T exp (cid:110) − Y t − µ − ψξ (( − log(Φ( β t ))) − ξ − σ (cid:111) ,f ( β t | β t − , φ ) = (cid:16) π (1 − φ ) (cid:17) − T − exp (cid:110) − (cid:80) Tt =2 ( β t − φβ t − ) − φ ) (cid:111) ,π ( β ) = − √ π exp (cid:110) − β (cid:111) . Due to the highly nonlinear expression of the likelihood f ( Y t | β t , ϑ , σ ) , the Kalman filter andbackward smoothing algorithm developed based on linear state space models cannot be applieddirectly. That is because the linear approximation using Taylor expansion has very poor perfor-mance. As a result, the acceptance probability is almost for our model.The particle Gibbs sampler proposed by Andrieu et al. (2010) provides an alternative to sam-pling the nonlinear state space model. It is based on the conditional sequential Monte Carlo(cSMC) (Liu, 2001) algorithm which is similar to the SMC algorithm except that it assigns anancestor lineage keep out of the resample stage. The particle Gibbs sampler treats the particlesand weights generated from cSMC as a discrete distribution approximating the true density; a8raw is randomly sampled from these particles according to the their weights. Since the draws areobtained from an approximate distribution and not the real one, a pre-specified ancestor lineage ortrajectory path is used to guide draws from the invariant unconditional distribution. However, thismethod is known to have degenerate issues and a poor mixing rate. A finer algorithm is to allowthe pre-defined ancestor lineage to update during the forward sampling process, and drop somelineage with degenerate weights. The method is known as particle Gibbs with ancestor sampler(PGAS), proposed by Lindsten et al. (2014). The method uses a trimming technique (Dyk andPark, 2008) to improving the mixing rate of a Gibbs sampler; also, the degenerate weight issue isameliorated.In the remainder of this section, we shall introduce this method. We summarize the no-tations to be used later for convenience: let q − β ( β t | Y t ) be a proposal density, β t , . . . , β Nt be N particles drawn from the proposal density, k is a index in { , . . . , N } and k = (1 , . . . , N ) ;let β k = ( β k , . . . , β kT ) stand for the samples generated from the same trajectory path k ; let β kt = (cid:0) β k , . . . , β kt − , β kt +1 , . . . , β kT (cid:1) , and k = (1 , . . . , k − , k + 1 , . . . , N ) ; let β (cid:48) t be the previousdraws or the starting value for β t and β ∗ t be the current draws from β k t ; let w t ( β kt ) be the un-normalized weight for the k th particle of β t sampled from q ( β t | − β , Y t , β t − ) , and W t ( β kt ) be itsnormalized weight.We consider two methods to choose a proposal density; the first method is to use Taylorexpansion on the mean of the observation equation. For simplicity we write (2.2) as the following: Y t = f ϑ ( β t ) + (cid:15) t ,β t = g φ ( β t − ) + η t , with f ϑ ( β t ) = µ + ψ (( − log(Φ( β t ))) − ξ − /ξ and g φ ( β t − ) = φβ t − ; then we approximate f ϑ ( β t ) with f ϑ ( β t ) = f ϑ (cid:0) g φ ( β k t − ) (cid:1) + f (cid:48) ϑ (cid:0) g φ ( β k t − ) (cid:1) (cid:0) β t − g φ ( β k t − ) (cid:1) /c. Because in a nonlinear equation, the second term in the Taylor expansion can be relatively large,the constant c is used to control the approximated density. We adjust the value c to keep theweights from degenerating. Then the proposal density then can be transformed into a Gaussian9istribution, q − β ( β t | Y ) = N ( ν β t , σ β t ) with σ − β t = 1(1 − θ ) + f (cid:48) ϑ ( g φ ( β k t − )) /σ ν β t = σ β t (1 − θ ) g φ ( β k t − ) + f (cid:48) ϑ ( g φ ( β k t − )) σ (cid:0) Y t − f ϑ ( g φ ( β k t − )) × g φ ( β k t − ) (cid:1) The second method is to ignore the (cid:15) t in (2.2), then we write β t as a function of y t as follows, β t = Φ − (cid:32) exp (cid:32) − (cid:18) ξ Y t − µψ (cid:19) − ξ + (cid:33)(cid:33) where Φ − denotes the inverse standard normal distribution function. Then we choose a proposaldensity to be the t -distribution with mean β t and a small value of degrees of freedom, namely5. This method does not apply to general state space models, but can perform very well in ourmodel; this is because the (cid:15) t in our model is small.After specifying the proposal density, the PGAS algorithm is as follows, • At current iteration i , given β (cid:48) • For t = 1 – Draw β N − ∼ q − β ( β | Y ) – Let β N = β (cid:48) – Calculate weight for l = 1 , . . . , N as w l := f ( Y | β l , ϑ , σ ) π ( β l ) q − β ( β l | Y ) W l = w l (cid:80) ll =1 w l • For t = 2 , . . . , T – Resample β kt − according to the weight W k for k = 1 , . . . , N − , denote as ˜ β kt − .10 Sample ˜ β Nt − from β k t − with the associated weights: w lt f ( β (cid:48) t | β lt − , φ ) (cid:80) Nl =1 w lt f ( β (cid:48) t | β lt − , φ ) – Draw β ,...,N − t ∼ q − β ( β t | y t ) – Let β iNt = β (cid:48) t – Let β l t := ( ˜ β l t − , β lt ) , for l = 1 , . . . , N – Calculate weight for l = 1 , . . . , N as w lt = f ( Y t | β lt , ϑ , σ ) p ( β ilt | ˜ β ilt − , φ ) q − β ( β ilt | Y t ) W lt = w lt (cid:80) ll =1 w lt • Draw k from , . . . , N with the corresponding probability W ,...,NT • Let β = β k • Set β (cid:48) = β , and go to the next iteration.Then from the algorithm, we obtain draws for β . In summary, the MCMC algorithm is conducted as follows:1. initialize all the parameters values ϑ , β , φ , σ ;2. sample ϑ = ( µ, ψ, ξ ) from (3.1), and using M-H algorithm;3. sample σ from (3.2);4. sample φ from (3.3), and using M-H algorithm;5. sample β using PGAS;6. repeat steps 2 — 5 until sufficient draws have been obtained.11 Seasonal dGEV model
Seasonality can be easily incorporated into our model by adding sinusoids to any of the location,shape or scale parameters. This method is convenient since it has flexibility in the choice of thenumber of sinusoids and it does not greatly increase the difficulty of making inferences about themodel.We start by adding two sinusoids to the location parameter; the model is shown as follows: Y t = µ + a cos( ωt ) + a sin( ωt )+ ψ ( − log(Φ( β t ))) − ξ − ξ + (cid:15) t , (cid:15) t ∼ N (0 , σ ) ,β t +1 = φβ t + η t , η t ∼ N (0 , − φ ) , (4.1)where ω = 2 πf is the angular frequency, f is the number of cycles that occur for each periodof time. For a given dataset, f is typically known; for example, it has value 1/365.25 for annualvariability in a daily dataset, or 1/4 for a annual variability in a seasonal dataset. In equation (4.1), t = 1 , . . . , T , a and a are the coefficients for the two components of the sinusoid, and we let a = ( a , a ) .To estimate the parameters in (4.1), we use similar priors to those for the parameters describedin Section 3. For the added parameters in the seasonal component in this model, we treat ω asa known parameter. We observe that a has a multivariate normal likelihood in the equation, wechoose the prior for each element a i to be N (0 , σ a i ) . Then the posterior for a is as follows, π − a ( a | Y ) ∼ MVN ( ν a , Σ a ) (4.2)with Σ − a = σ − ( T (cid:88) t =1 p t p (cid:48) t + σ Ω − ) , ν a = Σ a ( T (cid:88) t =1 p t ˜ Y t ) /σ where p t = (cos( ωt ) , sin( ωt )) T , Ω = diag ( σ a , σ a ) , ˜ Y t = Y t − µ − ψ × (( − log(Φ( β t ))) − ξ − /ξ . Toconduct an MCMC algorithm for this model, we could adapt the algorithm described in Section 3.5,adding an extra step to sample a . 12ometimes, a dataset may show seasonality in the scale parameter, such as in a weather dataset,the variation in the winter may be larger than in the summer. Our model has the flexibility todeal with a dataset like this by adding two similar sinusoids to ψ , i.e. ψ + a cos( ωt ) + a sin( ωt ) .It is possible to add sinusoids to the ξ parameter or adding more sinusoidal components to eachparameter; however, more parameters in the model will reduce the mixing rate between chainsin the MCMC algorithm; as we will shown in the simulation study, the dGEV model alreadyexhibit a large autocorrelation between samples in the chain. For simplicity, we add seasonalityonly to the location parameter. In the later context, we will refer this model to be the seasonaldGEV model; however, it should be kept in mind that seasonality could be added to any of theparameters. In this section, we illustrate the dGEV model and seasonal dGEV model (4.1) using simulateddata. To generate datasets, we start by generating time-varying parameters β T , and then use β T to generate observations Y T .To be more specific, for the dGEV model, we generate 1000 observations. The choice of thesample size is arbitrary; however, more observations are rquired for a model with more parameters,for example in the case where we add seasonality to both the location and scale parameters. Inthe simulations, we choose φ = 0 . , µ = 0 . , ψ = 0 . ξ = 0 . , and σ = 0 . . We first simulate β ∼ N (0 , , then β , . . . , β T can be generated according to the AR(1) process in (2.1). Basedon the generated values of β T , we then generate dataset Y T from (2.2). For the seasonal dGEVmodel 4.1, we choose a = 1 , a = 2 , and suppose we have an annual dataset, f = 1 / . and ω = 2 π/ . .After generating observations, we need to chose priors to conduct the Bayesian analysis. Thepriors for those parameters are as follows: µ ∼ N (0 , ) , ψ ∼ Gamma (2 , , ξ ∼ N (0 , ) , φ ∼ uniform ( − , , and σ ∼ IG (1 , . . We choose the total iteration for MCMC to be20,000, when running the algorithm, with the first 5,000 as burnin. In the PGAS step, we need tochoose number of particles; here we choose the number to be 1000. Lindsten et al. (2014) showsthat when this rate is chosen, the PGAS algorithm has update frequency very close to 0.999, theideal rate. 13 igure 5.1: Plot of MCMC draws (1st row), autocorrelation function (ACF) (2nd row) and histogramwith density (3rd row) for µ , ψ , ξ , φ , σ . The red line indicates the true values. We then fit the model to the simulated datasets. The first row in Figure 5.1 plots the MCMCdraws for parameters except for β s for the dGEV model. The draws converge quickly for eachparameter; the red lines that indicate the true values are all contained in the draws. Table 5.1 givessummary statistics for the posterior medians and credible intervals. The results show that thetrue values lie within the 95% credible intervals, and the posterior medians are close to their truevalues, except for the parameter ξ . The issue with ξ , the shape parameter in the GEV distribution,is that the proposed distribution is a less accurate approximation to the true posterior distributionthan in the case of the location and scale parameters; thus its the posterior median has larger biascompared with the posterior medians of µ and ψ .Table 5.1 also provides the inefficiency factor for each parameter; this statistic gives a diagnosticabout how well the chains in MCMC mixed. It is calculated as (cid:80) Ms =1 (1 − s/M ) ρ s , as given14 able 5.1: Posterior medians and 95% credible intervals for parameters µ , ψ , ξ , φ , σ . Parameter True value Posterior median 95% Credible interval Inefficiency factor µ ψ ξ φ σ Figure 5.2:
Plot of posterior median (blue line) and 95% credible intervals (grey dash lines) for β , . . . , β ; the yellow dashed lines indicates our simulated value for β s; the four dashred lines from up to down represent the 0.995, 0.975, 0.025, 0.005 quantiles of the standardnormal distribution. by Chib (2001), where ρ s is the estimated autocorrelation at lag s , and M is the batch size, whichwe take as 500. An inefficiency factor of m means that the effective number of draws is the totalnumber of iterations, after deleting burnin, divided by m . The results in the table show that thoseinefficiency factors are large, which corresponds to the slow decay in the ACF plots in Figure 5.1.Figure 5.2 plots the posterior median and 95% credible intervals for β , . . . , β . Recall thatthe time-varying parameters are in a standard normal copula; thus each β t has a marginal standardnormal distribution. Examining the posterior medians of β s shows that the majority of them fallin the 95% standard normal interval. The credible intervals are very small around each posteriormedian, as a result of the small value of measurement error (cid:15) t .Figure 5.3 shows the simulation results of selected parameter draws based on the seasonal15 igure 5.3: Plot of MCMC draws (1st row), autocorrelation function (ACF) (2nd row) and histogramwith density (3rd row) for µ , ψ , ξ , φ , a , a . The red line indicates the true values. dGEV model. The estimated posterior median and 95% credible intervals of those parametersare summarized in Table 5.2. The 95% credible intervals all covered the true value, and posteriormedians are close to their true value. Similar to the non-seasonal model, ψ has larger bias comparedwith µ and ψ . We also found the inefficiency factor for parameters in the seasonal dGEV model tobe on average higher than the non-seasonal model; this is because having more parameters in themodel increases the correlations between chains. Figure 5.4 gives a plot of posterior medians andtheir credible intervals for β , . . . , β . The figures show that the majority of draws fall in the 95%region of the standard normal distribution. Also we plot the simulated true data of β , . . . , β in Figure 5.2; all the true values are covered by their estimated 95% credible intervals.16 able 5.2: Posterior median and 95% credit intervals for parameters µ , ψ , ξ , φ , a , a , σ . Parameter True value Posterior median 95% Credible interval Inefficiency factor µ ψ ξ − . , 0.08136] 92.22 φ a a σ Figure 5.4:
Plot of posterior median (blue line) and 95% credible intervals (grey dash lines) for β , . . . , β ; the yellow dashed lines indicates our simulated value for β s; the four dashred lines from up to down represent the 0.995, 0.975, 0.025, .005 quantiles of the standardnormal distribution. Real data study
We will conduct a study of two real datasets in this section. The first dataset is an annal maximumwater flow dataset and the second is a minimum log-return dataset for the S&P 500 stock index.We fit the dGEV model to both datasets, and we fit the seasonal dGEV model to the S&P 500.
A water flow dataset is collected from French Broad River at Asheville in North Carolina. Thedatasets contains annual maximum water flow level from 1941 to 2009. A plot of this dataset isshown in Figure 6.1, the plot shows there are two spikes in the year 1964 and 2004 which mayconsider to be unusual years. Our goal is to investigate how extreme those two values could be byfitting into the dGEV model.We run the chain for 20,000 draws and treat the first 5,000 as burnin, the number of particles ischosen to be 1000. Before running the analysis, we standardize the dataset to avoid computationproblem. The plot of draws and summary statistics is presented in Figure 6.2 and Table 6.1. TheACF plots and inefficient factors for each parameters shows the chain mixed well.From Table 6.1, the time varying parameter φ has posterior median 0.021, the 95% credibleintervals cover with 0, the result suggests the dependency between extreme values are low orpossibly does not exist.In Figure 6.3, we plot the posterior median for β t s. Since each β t s has marginal distributionfrom standard normal, it is convenient to compare these values with standard normal quantiles.If the values is larger than . in absolute value, then it is beyond 99% quantiles of normaldistribution, which may seems unusual. From the plot, both of the two events in 1964 and 2004we are interested in lies on the boundary of 95% quantile, and their credible intervals are below Table 6.1:
Posterior medians, 95% credible intervals and inefficient factors for parameters µ , ψ , ξ , φ , σ . Parameter Posterior median 95% Credible interval Inefficiency factor µ ψ ξ − . , 0.664] 223.22 φ − . , 0.523] 74.16 σ igure 6.1: Annual maximum water flow of French Broad River at Asheville, North Carolina.
Figure 6.2:
Plot of MCMC draws (1st row), autocorrelation functions (ACF) (2nd row) and histogramsalong with densities (yellow lines) (3rd row) for parameters µ , ψ , ξ , φ , σ in dGEV modelof the water flow dataset. igure 6.3: Plot of posterior medians (blue line) and their 95% credible intervals (grey dash lines) for β t sby fitting dGEV into water flow dataset. The four dash red lines from up to down representthe 0.995, 0.975, 0.025, 0.005 quantiles of the standard normal distribution.
99% quantile, which may shows the two events are not extreme as it looks like from the data plot.
Another dataset is from S&P 500 minimum weekly log-return, the data is collected from Septem-ber, 25, 2006 to September, 12, 2016. A plot of this data is shown in Figure 6.4. The data isplotted in a − scale, so the largest value in the plots stands for the minimum return. Our inter-ests is focus on finding the most unusual happened events in the dataset, especially at the periodduring 2008 to 2010. Sometimes a time series shows seasonality, although the seasonality is notso obviously to appear in this model, we are interested to fit model to find out if seasonal effectsexists. We fit the dataset in both of the dGEV model and the seasonal dGEV model, the plotof draws are shown in Figure 6.5 and Figure 6.6. The posterior medians with their 95% credibleintervals for parameters in the model are summarized in Table 6.2. The inefficient factors for thoseparameters are also provided. In the MCMC algorithm, we run 20,000 iterations and treat the5000 draws as burnin; we choose the number of particles to be 1000 for both of the two models.For the seasonal dGEV model, we choose f = 7 / . .From the result shown in Table 6.2, the estimation for location, scale and shape parameters20 igure 6.4: Weekly minimum S&P 500 log-return dataset, adjust the dataset to be − . for both dGEV and seasonal dGEV model are close, the inefficient factors for seasonal dGEVmodel are larger than the dGEV model. In the output of the estimation of seasonality parameters a , a , there value are very close to , and their 95% credible intervals contains 0. This suggestsseasonality may not exists in this model.Figure 6.7 gives plot of estimated β t s posterior medians and their 95% credible intervals forfitting both models. The two models gives a very similar estimates for those β t s. Both plotsindicates the period during 2008 crisis are much more unusual than other period. The highestpoint has its posterior median close to 99% quantile of standard normal, and its credible intervalsbounds contains the area which exceed the 99% quantile.21 igure 6.5: Plot of MCMC draws (1st row), autocorrelation functions (ACF) (2nd row) and histogramsalong with densities (yellow line) (3rd row) for parameters µ , ψ , ξ , φ , σ in the dGEV modelby fitting the S&P 500 dataset. Figure 6.6:
Plot of MCMC draws (1st row), autocorrelation function (ACF) (2nd row) and histogramsalong with densities (yellow line) (3rd row) of parameters µ , ψ , ξ , φ , a , a in a seasonaldGEV model by fitting the S&P 500 dataset. a)(b)Figure 6.7: (a) Plot of posterior medians (blue line) and their 95% credible intervals (grey dash lines)for β t s by fitting the dGEV model into S&P 500 dataset. (b) Plot of posterior medians (blueline) and their 95% credible intervals (grey dash lines) for β t s by fitting the seasonal dGEVmodel into S&P 500 dataset. The four dash red lines from up to down represent the 0.995,0.975, 0.025, 0.005 quantiles of the standard normal distribution. dataset. able 6.2: Posterior medians with their 95% credible intervals and inefficient factor for parameters µ , ψ , ξ , φ , σ in the dGEV model and parameters µ , ψ , ξ , φ , σ , a , a in the seasonal dGEV model. Parameter estimates for dGEV model
Parameter Posterior median 95% Credible interval Inefficient factor µ ψ ξ φ σ Parameter estimates for seasonal dGEV model
Parameter Posterior median 95% Credible interval Inefficient factor µ ψ ξ φ a − . [ − . , 0.1027] 7.75 a − . [ − . , 0.0890] 7.76 σ In this paper, we propose a novel dependent GEV model. The model can be expressed as nonlinearGaussian state space model. Due to the observation equation is high nonlinear, we use PGAS tosample time-varying parameters. and incorporate it into an MCMC algorithm. The simulationstudy based on the MCMC algorithm shows the sampled parameter distribution could cover thetrue value and the posterior median is close to the truth. The shape parameter ξ is turned to beharder to estimate compare to the location and scale parameters. We also show our dGEV modelcan easily to incorporate seasonal components. Seasonality can be added to both the location,scale and shape parameters.We did two case studies for the model: one is the annual maximum water flow dataset andanother is the weekly minimum return of S&P 500. The estimated dependent parameter in thewater flow dataset contains value 0 which suggests the correlation between extreme values arelow. The two unusual events in 1964 and 2004 does not turn out to be very unusual after plottingthe corresponding estimated value of β . In the S&P 500 dataset, the correlation is very high.24y fitting this dataset into the seasonal dGEV model which suggests there does not exist anyseasonality effect. The draws of β t s shows extreme values during the 2008 economics crisis is veryunusual compares to other values.Our model turns out to require large computation time when the sample size is larger. However,the sample size usually becomes 10,000 when analyzing extremely daily values for temperature.In order to analyze those datasets, a improvement of current algorithm is need. On the other side,for datasets contains seasonality on the scale or shape parameters, the number of parameters willincrease and cause the inefficient factors become larger. Thus a method to redesign the MCMCalgorithm is needed as well. Both of these issues need to be address in the future study but isbeyond the scope of this paper. References
Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods.
J. R. Statist. Soc. B , 72:269–342.Castillo, E., Hadi, A. S., Balakrishnan, N., and Sarabia, J. M. (2004).
Extreme Value and RelatedModels with Applications in Engineering and Science . John Wiley & Sons.Chavez-Demoulin, V. and Davison, A. (2012). Modeling time series extremes.
REVSTAT –Statistical Journal , 10(1):109–133.Chib, S. (2001).
Markov chain Monte Carlo methods: computation and inference in Handbook ofEconometrics, vol. 5.
Elsevier, North-Holland, Amsterdam.Chopin, N. and Singh, S. S. (2015). On particle Gibbs sampling.
Bernoulli , 21(3):1855–1883.Coles, S. (2001).
Introduction to statistical modeling of extreme values . Springer-Verlag, London,UK.Coles, S. G. and Powell, E. A. (1996). Bayesian methods in extreme value modelling: a reviewand new developments.
International Statistical Review / Revue Internationale de Statistique ,64(1):119–136. 25oles, S. G. and Tawn, J. A. (1996). A Bayesian analysis of extreme rainfall data.
Journal of theRoyal Statistical Society. Series C , 45:329–347.Doucet, A. and Johansen, A. M. (2011). A tutorial on particle filtering and smoothing: fifteenyears later.Dyk, D. A. V. and Park, T. (2008). Partially collapsed Gibbs samplers: theory and methods.
Journal of the American Statistics Association , 103(482):790–796.Einicke, G. A. and White, L. B. (1999). Robust extended Kalman filtering.
IEEE Transactionson signal processing , 47.Huerta, G. and Sansó, B. (2007). Time-varying models for extreme values.
Environmental andEcological Statistics , 14:285–299.Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J., and Chopin, N. (2015). On particle methodsfor parameter estimation in state-space models.
Statistical Science , 30(3):328?351.Lindsten, F., Jordan, M. I., and Schön, T. B. (2014). Particle Gibbs with ancestor sampling.
Journal of Machine Learning Research , 15:2145–2184.Lindsten, F. and Schön, T. B. (2013). Backward simulation methods for Monte Carlo statisticalinference.
Foundations and Trendső in Machine Learning , 6(1):1–143.Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computation with applications to agene regulation problem.
Journal of the American Statistical Association , 89(427):958–996.Liu, J. S. (2001).
Monte Carlo Strategies in Scientific Computing . Springer-Verlag New York,Inc., New York, NY, USA.Liu, J. S., Wong, W. H., and Kong, A. (1994). Covariance structure of the Gibbs sampler withapplications to the comparisons of estimators and augmentation schemes.
Biometrika , 81(1):27–40.Nakajima, J., Kunihama, T., and Omori, Y. (2015). Bayesian modeling of dynamic extreme values:extension of generalized extreme value distributions with latent stochastic processes.
Journalof Applied Statistics , 0(0). 26akajima, J., Kunihama, T., Omori, Y., and Frühwirth-Schnatter, S. (2011). Generalized extremevalue distribution with time-dependence using the AR and MA models in state space form.
Computational Statistics and Data Analysis , 56(11):3241–3259.Niemi, J. and West, M. (2010). Adaptive mixture modeling Metropolis methods for Bayesiananalysis of nonlinear state-space models.
Journal of Computational and Graphical Statistics ,19(2):260–280.Pickands, III, J. (1975). Statistical inference using extreme order statistics.
The Annals of Statis-tics , 3(1):119–131.Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-Gaussian measurement timeseries.
Biometrika , 84(3):653–667.Wei, Y. and Huerta, G. (2016). Dynamic generalized extreme value modeling via particle filters.
Communications in Statistics .West, M. and Harrison, J. (1997).
Bayesian forecasting and dynamic models (2Nd Ed.) . Springer-Verlag New York, Inc., New York, NY, USA.Whiteley, N. (2010). Discussion on the paper “Particle Markov chain Monte Carlo methods”.