[PDF] Non-Gaussian Autoregressive Processes with Tukey g-and-h Transformations

Abstract

When performing a time series analysis of continuous data, for example from climate or environmental problems, the assumption that the process is Gaussian is often violated. Therefore, we introduce two non-Gaussian autoregressive time series models that are able to fit skewed and heavy-tailed time series data. Our two models are based on the Tukey g-and-h transformation. We discuss parameter estimation, order selection, and forecasting procedures for our models and examine their performances in a simulation study. We demonstrate the usefulness of our models by applying them to two sets of wind speed data.

Full PDF

NNon-Gaussian Autoregressive Processeswith Tukey g -and- h Transformations

Yuan Yan and Marc G. Genton September 3, 2018

Abstract

When performing a time series analysis of continuous data, for example from climate or envi-ronmental problems, the assumption that the process is Gaussian is often violated. Therefore,we introduce two non-Gaussian autoregressive time series models that are able to ﬁt skewed andheavy-tailed time series data. Our two models are based on the Tukey g -and- h transformation.We discuss parameter estimation, order selection, and forecasting procedures for our models andexamine their performances in a simulation study. We demonstrate the usefulness of our modelsby applying them to two sets of wind speed data. Some key words:

Autoregressive; Heavy tails; Non-Gaussian; Skewness; Time series; Tukey g -and- h transformation. Short title : Tukey g -and- h Autoregressive Processes Statistics Program, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia.E-mail: [email protected], [email protected] publication is based upon work supported by the King Abdullah University of Science and Technology(KAUST) Oﬃce of Sponsored Research (OSR) under Award No: OSR-2015-CRG4-2640. a r X i v : . [ s t a t . M E ] A p r Introduction

To study climate change, it is essential to understand the temporal properties of the data ofinterest. Climate and environmental data, such as precipitation, temperature, thickness of glacialvarves, wind speed, concentration of air pollutants, are continuous in nature. In a time seriesanalysis of continuous random variables, the autoregressive moving-average (ARMA) model,especially the Gaussian one, is popularly adopted because of its simplicity and interpretability.However, data from climate or environmental science are often asymmetric with heavy tails.Therefore, a non-Gaussian time series analysis is needed. Since most climate or environmentaldata we come across are continuous, in this paper we do not consider time series analysis ofcategorical or binary data, which are intrinsically non-Gaussian.In linear models, three approaches are mainly used to deal with non-Gaussian data: trans-formation of the dependent variable, regression with non-Gaussian error, and generalized linearmodels (GLM). Similarly, we can also classify many of the existing non-Gaussian time series meth-ods and models into three categories: 1) transformations to ‘Gaussianize’ the data, 2) ARMAmodels with non-Gaussian noise, and 3) extension of the GLM to the time series context.The transformation approach is widely used among practitioners to, ﬁrst, Gaussianize dataand then examine the latent Gaussian process. Equivalently, it assumes that the observed process Y t is obtained after a transformation η of the latent Gaussian process Z t : Y t = η ( Z t ) . (1)For example, precipitation data can be approximated by the gamma distribution, then a squareor cube root transformation is usually applied to normalize the data; similarly wind speed datacan be assumed to follow a Weibull distribution, and taking a logarithm alleviates the departurefrom normality (Haslett and Raftery, 1989). The logarithm, square and cube root transforma-tions all belong to the Box-Cox power transformation (Box and Cox, 1964), which is a family oftransformations with one parameter ( λ ) and can be applied only to positive values. A parametric1orm of transformations allows users to ﬁnd the optimal transformation η from among the mem-bers of the Box-Cox transformation family by estimating λ from the data, instead of attemptingmany diﬀerent transformations. A large literature exists on the Box-Cox power transformation,as well as its modiﬁcations and its application in linear models and time series analysis. Nel-son and Granger (1979) extensively explored the use of Box-Cox transformations in 21 timeseries from economics. Non-parametric transformations are an appealing alternative to assum-ing a parametric form of transformations due to their ﬂexibility. Johns et al. (2003) explorednon-parametric transformations on both the dependent variable and the predictor variables inregression. Block et al. (1990) used the empirical distribution and probability integral transform(PIT) to form ARMA models with arbitrary continuous marginal distributions. However, thereis a trade-oﬀ between ﬂexibility and parsimony.Instead of using transformations to Gaussianize the data, an ARMA model with non-Gaussiannoise may be speciﬁed, whether it is the marginal distribution of the process or the distributionof the error term. Given the marginal distribution, the distribution of the error term can befound by linking their moment generating function, namely, the Laplace transformation of theprobability density function. Many models were previously proposed following this direction, forexample, ARMA models with exponential marginals (Lawrance and Lewis, 1980) and AR modelswith marginal gamma distribution (Gaver and Lewis, 1980). On the other hand, specifying thedistribution of the non-Gaussian error term is easier for data simulation and maximum likelihoodinference since the conditional likelihood can then be written down in a closed form. ARMAmodels that are driven by non-Gaussian innovations and their properties were discussed in Davieset al. (1980) and Li and McLeod (1988).For the third class of existing non-Gaussian time series methods and models, Benjamin et al.(2003) introduced the generalized ARMA model, extending the GLM to a time series setting.Earlier exploration of GLM in time series can be found in Benjamin et al. (2003) and referencestherein. Cordeiro and de Andrade (2009) proposed a model combining the GLM idea with2ransformations, and considered its use in time series.Apart from the three categories summarized above, there exists many other specialized mod-els for non-Gaussian, non-stationary time series. The most famous ones are the autoregressiveconditional heteroskedasticity (ARCH) (Engle, 1982) and the generalized autoregressive condi-tional heteroskedasticity (GARCH) (Bollerslev, 1986) models. These models have been usedcanonically for analyzing ﬁnancial time series that exhibit changes in volatility. Other modelsinclude the zeroth-order self-exciting threshold autoregressive (SETAR) model (Tong, 1990), themultipredictor autoregressive time series (MATS) models (Martin, 1992), the Gaussian mixturetransition distribution (GMTD) models (Le et al., 1996), the mixture autoregressive models(Wong and Li, 2000), and their extension to the Student t -mixture autoregressive model (Wonget al., 2009).Considering the trade-oﬀs between model ﬂexibility and parsimony, in this paper we usean alternative parametric family of transformations, the Tukey g -and- h (TGH) transformation,which is more ﬂexible and interpretable than the Box-Cox family and at the same time keeps themodel fairly simple compared to the non-parametric approaches. We build two autoregressivemodels for non-Gaussian time series via the TGH transformation in both the data transformationand non-Gaussian error approaches. Our ﬁrst model uses the TGH transformation for η in(1). Our second model assumes the non-Gaussian error term in the AR process to be a TGHtransformation of Gaussian white noise.The remainder of this paper is organized as follows. In Section 2, we review the TGHtransformation, discuss properties of the univariate TGH distribution and its extensions, andintroduce our two AR models based on the TGH transformation. In Section 3, we describe theparameter estimation and forecasting algorithm for our models. In Section 4, we demonstrate theestimation and prediction performances of our models through a simulation study. In Section 5,we illustrate the usefulness of our models by applying them to two wind speed datasets. Wesummarize our ﬁndings and prospects for future research in Section 6.3 Two Tukey g -and- h Autoregressive Models

The TGH transformation (Tukey, 1977) is deﬁned as a strictly monotonic increasing function of z ∈ R for h ≥ g ∈ R : τ g,h ( z ) =  g − { exp( gz ) − } exp( hz / , g (cid:54) = 0 ,z exp( hz / , g = 0 . (2)When applying the TGH transformation to a univariate standard normal random variable Z ∼N (0 , T = τ g,h ( Z ) is said to follow the TGH distribution.The support of T is the real line when h (cid:54) = 0 or h = g = 0; for h = 0 and g (cid:54) = 0, it has a lower(upper) bound of − /g when g > g < g and h ,which respectively control the skewness and tail heaviness of Y . The tails become heavier as h increases, and the q -th moment of Y exists only for h < /q . The level of skewness increases as | g | increases (becoming right-skewed when g > h <

0, for which the TGH transformation is no longer monotonic and the resulting TGHdistribution has lighter tails than the Gaussian one. However, their method is unconventional,so we only consider the case of h ≥

0. Also, from a practical point of view, real data are moreoften heavy-tailed rather than light-tailed. Special cases of the TGH distribution include theshifted log-normal random variable for h = 0 and g >

0, and a Pareto-like distribution for g = 0and h >

0, which was discussed in detail by Goerg (2015). The q -th moments of the TGHdistribution were derived by Martinez and Iglewicz (1984), for h < /q : E ( T q ) =  g q √ − qh q (cid:80) i =0 ( − i (cid:0) qi (cid:1) exp (cid:110) ( q − i ) g − qh ) (cid:111) , g (cid:54) = 0 , q !2 q/ ( q/ (1 − qh ) − q +12 , g = 0 , q even , , g = 0 , q odd . (3)4e illustrate various aspects of the Tukey g -and- h distributions by a visuanimation (Gentonet al., 2015) in the supplementary material.Extensions of the univariate TGH distribution to the multivariate case can be found in Fieldand Genton (2006) and He and Raghunathan (2012). Xu and Genton (2016) used the TGHtransformation to build a new max-stable process for the modeling of spatial extremes. Recently,Xu and Genton (2017) further extended the TGH distribution to the spatial case and deﬁnedTGH random ﬁelds. Those models were found to be useful in practice and have been appliedto air pollution, wind speed, rainfall and economic data. In this paper, we apply the TGHtransformation in the time series context and build two models that take advantage of the ARstructure. Our ﬁrst model transforms a latent Gaussian process similarly to Xu and Genton (2017). Let Z t ∼ AR( p ) , t = 1 , , . . . , be a stationary Gaussian AR process of order p with zero mean andunit variance, and let T t = τ g,h ( Z t ) by applying the standard TGH transformation to Z t . Ourmodel 1, the Tukey g -and- h transformed autoregressive (TGH-AR( p )-t) process Y t , has the sameform as (1), with η being a generalized version of the TGH transformation: Y t = X T t β + ξ + ωτ g,h ( Z t ) , (4)where X t and β are the covariates and the corresponding coeﬃcients respectively, and ξ and ω are the location and scale parameters, respectively. In a time series analysis, the covariatesare usually functions of t to model the trend or the seasonality, which may also be incorporatedthrough a seasonal integrated AR model. In our ﬁrst model, unlike the commonly adopted Box-Cox approach, the deterministic part that consists of the location parameter, covariates and theircoeﬃcients, is outside the transformation. In this way, the trend (also seasonality or periodicity)is attributed directly to the process Y t rather than the underlying Gaussian process Z t . We5ontrol the skewness and tail behavior of the stochastic part of the process Y t by applying theTGH transformation to Z t with diﬀerent values of g and h . The moments of Y t can be computedas the moments for T t in (3) with modiﬁcation for the additional location and scale parameters.The autocovariance function C T ( τ ) of T t can be found in terms of the autocorrelation function(ACF) ρ Z ( τ ) of the underlying Gaussian process Z t , which was derived in Xu and Genton (2017): C T ( τ ) = exp (cid:104) ρ Z ( τ )1 − h { ρ Z ( τ ) } g (cid:105) − (cid:104) − h { − ρ Z ( τ ) } (1 − h ) − h ρ Z ( τ ) g (cid:105) + 1 g (cid:112) (1 − h ) − ρ Z ( τ ) h − { E ( T t ) } . The autocorrelation of Y t is always weaker than the autocorrelation of Z t and in the supplemen-tary material we illustrate this fact via Visuanimation S1.When p = 1, the TGH-AR(1)-t process is of the form (4), with Z t = φZ t − + (cid:15) t , and where (cid:15) t is a Gaussian white-noise process with zero mean and constant variance σ (cid:15) = 1 − φ . Thereis a constraint that σ (cid:15) is a function of φ so that Z t has unit variance. This dependency mayseem strange at ﬁrst, but it is eﬀectively equivalent to standardizing Z t to a process with unitvariance by multiplication by a scaling constant. For example, the usual AR(1) process with aGaussian error term (cid:15) t of variance σ ω will result in a process Z (cid:48) t of variance σ ω − φ , and Z (cid:48) t canbe standardized to have a unit variance by multiplying the rescaling factor by √ − φ σ ω . This isequivalent to an AR(1) process with an error term of variance of 1 − φ .To simulate samples from the TGH-AR( p )-t model, we ﬁrst simulate data from the underlyingGaussian AR( p ) process Z t for t = 1 , , . . . , n . Then we transform the process by applying thegeneralized version of the TGH transformation to Z t at each time point. Figure 1 shows thefunctional boxplots (Sun and Genton, 2011) of 1000 realizations from the TGH-AR(1)-t modelof sample size n = 100 without covariates, with ξ = 0, ω = 1 , φ = 0 . σ (cid:15) = 0 .

36 and six pairsof values for g and h . The six pairs of values for g and h include the Gaussian AR case when g = h = 0. The functional boxplots are labeled with values of the marginal mean ( µ , green line),standard deviation ( σ ), skewness ( γ ) and excess kurtosis ( γ ) for the transformed process Y t .From the functional boxplots of the sample paths of Y t , we see clearly when h = 0, as g increases6

20 40 60 80 100 − − g=0, h=0 t y m =0, s =1, g =0, g =0 − − g=0.3, h=0 t y m =0.153, s =1.07, g =0.95, g =1.645 − − g=0.5, h=0 t y m =0.266, s =1.208, g =1.75, g =5.898 − − g=0, h=0.1 y m =0, s =1.182, g =0, g =2.508 − − g=0, h=0.2 y m =0, s =1.467, g =0, g =33.224 − − g=0.3, h=0.1 y m =0.18, s =1.29, g =1.701, g =10.407 Model 1: TGH−AR(1)−t; f =0.8 Figure 1: Functional boxplots of 1000 realizations of sample size n = 100 from model 1, i.e.,TGH-AR(1)-t, without covariates, with ξ =0, ω =1, an AR coeﬃcient of φ =0.8 and six pairsof values for g and h , labeled with values of the mean ( µ , green line), standard deviation ( σ ),skewness ( γ ) and excess kurtosis ( γ ) of the process.from 0.3 to 0.5, the process is more right skewed; when g = 0, as h increases from 0.1 to 0.2,extreme values occur more in the time series; with g = 0 . , h = 0 . g and h on the process. Instead of applying the transformation to the time series itself, in our second model we transformthe error term (cid:15) t i.i.d. ∼ N (0 ,

1) in an AR process to a non-Gaussian error term T t = τ g,h ( (cid:15) t ) thatfollows a standard TGH distribution. Thus, model 2 (the TGH-AR( p )-e model) is deﬁned as: Y t = X T t β + ξ + φ ˜ Y t − + · · · + φ p ˜ Y t − p + ωτ g,h ( (cid:15) t ) , (5)where ˜ Y t = Y t − X T t β − ξ is a process of median 0 (however, not a zero-mean process when g (cid:54) = 0); X t and β are the covariates and corresponding coeﬃcients respectively; ξ and ω arethe location and scale parameters, respectively; and φ = ( φ , . . . , φ p ) T are the AR coeﬃcients7nder constraints such that the AR process is weakly stationary. We express (5) in a movingaverage form: ˜ Y t = ψ ( B )( ωT t ), where ψ ( B ) = (cid:80) ∞ j =0 ψ j B j of the backshift operator B . Althoughthe marginal distribution of Y t in the TGH-AR( p )-e model cannot be written down in a closedform, the moments of ˜ Y t can be derived with respect to moments of the standardized errorterm T t : E ( ˜ Y t ) = ω (cid:80) ∞ j =0 ψ j E ( T t ) = ω − (cid:80) pj =1 φ j E ( T t ) , V ( ˜ Y t ) = ω (cid:80) ∞ j =0 ψ j V ( T t ). Furthermore, γ ( ˜ Y t ) = (cid:80) ∞ j =0 ψ j ( (cid:80) ∞ j =0 ψ j ) / γ ( T t ) , γ ( ˜ Y t ) = (cid:80) ∞ j =0 ψ j ( (cid:80) ∞ j =0 ψ j ) γ ( T t ), as long as the summation converges. Theserelationships conform with the algebraic relations between the skewness and kurtosis of theprocess and error term in ARMA processes derived by Davies et al. (1980). Figure 2 shows thefunctional boxplots of 1000 realizations from the TGH-AR(1)-e model with a sample size of 100,and the same parameters as in Figure 1. Again, we see that skewness and tail behavior of theprocess can be controlled by g and h . − − − g=0, h=0 t y m =0, s =1.667, g =0, g =0 − − − g=0.3, h=0 t y m =0.767, s =1.783, g =0.42, g =0.361 − − − g=0.5, h=0 t y m =1.331, s =2.013, g =0.775, g =1.295 − − − g=0, h=0.1 y m =0, s =1.97, g =0, g =0.551 − − − g=0, h=0.2 y m =0, s =2.445, g =0, g =7.293 − − − g=0.3, h=0.1 y m =0.901, s =2.15, g =0.753, g =2.284 Model 2: TGH−AR(1)−e; f =0.8 Figure 2: Functional boxplots of 1000 realizations of sample size n = 100 from model 2, i.e.,TGH-AR(1)-e, without covariates, with ξ =0, ω =1, an AR coeﬃcient of φ =0.8, and six pairsof values for g and h , labeled with values of the mean ( µ , green line), standard deviation ( σ ),skewness ( γ ) and excess kurtosis ( γ ) of the process.8 .4 Comparison of the Two Models Comparing Figures 1 and 2, we notice that, given the same values for g and h , the skewness andkurtosis of the TGH-AR(1)-t process are larger than those of the TGH-AR(1)-e process, whilethe variance is smaller. This agrees with the ﬁndings in Davies et al. (1980), where the ARMAprocess has skewness and kurtosis less than the error process.We interpret the diﬀerence between the two models from the viewpoint of the data-generatingmechanism. In model 1, TGH-AR-t, we assume that there is an underlying latent Gaussianprocess Z t and that the observed process Y t is a realization of a non-linear transformation of Z t . Goerg (2011) justiﬁed this idea using ﬁnancial data. Overreactions to changes in the stockmarket skew the underlying symmetric process of ﬁnancial news and result in skewed log-returnsof the stock market; see also De Luca et al. (2006). For model 2, the TGH-AR-e model, wedo not assume such a latent process, but Y t itself is an AR model with non-Gaussian noise thatfollows the TGH distribution. The diﬀerence between the two models can be seen most obviouslyby plotting the relationship between Y t − X T t β and Y t − − X T t − β (Figure 3). The relationshipis linear for model 2 and non-linear for model 1. The exploratory plots shown in Figure 3 fora sample size of n = 500 and diﬀerent values of g , h and φ should be consulted as a referencefor deciding whether model 1 or 2 is more suitable for the time series data to be analyzed. Inpractice, one should take into account both a reasonable data-generating mechanism for theprocess and the empirical behavior of Y t − X T t β versus Y t − − X T t − β to decide which model touse for a speciﬁc dataset.We want to emphasize that our time series models are not simply variations of the spatial case(Xu and Genton, 2017). We utilize the unique properties of the AR structure for discrete timeprocesses, which the random ﬁeld approach does not possess. One main diﬀerence between theAR time series and the random ﬁelds is that the correlation structure of the random process isinduced by the AR coeﬃcients, whereas in the random ﬁelds, it is modeled directly. Moreover, theproperty of conditional independence of an AR process allows us to write down the likelihood9 ll lll ll ll lll ll lll ll ll l ll ll ll l ll ll ll llllll ll llll ll lll ll ll ll ll ll llllll ll ll ll lll ll ll llll l lll ll l lll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll llll ll l ll ll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll ll ll ll l ll llll ll ll ll ll llll l ll lll l ll ll ll l ll ll ll ll ll ll ll ll lll l lll ll lll lll ll l lll ll ll ll l ll ll ll ll ll ll ll ll lll lllll ll ll ll lll ll ll ll ll llll lll ll ll ll l ll ll ll ll ll ll ll ll ll ll ll ll ll l lll ll lllll l ll ll llll l lll lll ll ll ll lll ll l ll ll l ll ll ll ll ll lll ll l ll ll ll ll llll ll lll ll lll ll l lll ll ll ll ll lllll ll ll ll ll l llll ll ll ll ll ll ll ll lll ll ll ll ll ll l ll ll lll lll ll ll ll g=0.5, h=0, f =−0.8 − − l lllllll lll l l ll l lllllll lllllll ll l llllllllll l llll l llll lllll lll ll l lllll llll lll lllll l l llll l lllll lllll ll lllllllllll l lll llll ll llllllll l lllll lll ll l lllllll ll llll lllllll ll ll ll l ll l llllll ll ll lll lllllll l lll l lll llll lll lll lll l lll ll lll l llll ll l lll l lll l llll l ll l lllll l ll l llllll llllll ll llllll l l l lll llllllll ll l lllll l lllll lllllllll l lllll lllll l llll l l llllllll ll l ll lll lll l llll l l ll l llll l ll lll llll ll l ll llllllll lllllll l l l lll lll llll lllllllllllll l l l lllll lll ll l ll lll l ll llllll l llll llll llllll llllllll l llll lll l l llll ll l llll l llllll l l g=0.5, h=0, f =0.8 ll lll l lll ll lll ll lll lllll ll ll l ll ll lll ll lll ll ll lll ll lllll ll lll ll llll lll ll ll ll lll l ll l ll lll ll ll ll l ll ll l lllll l lll ll l lll ll ll ll ll ll ll lll lll l ll ll ll ll lll l ll lll lll lll ll llll l ll lll ll ll llll ll l ll ll ll lll lllll ll ll ll lll ll llll lllll l ll l lll l ll l ll l ll ll ll l ll ll ll ll ll ll lll l lll l lll ll lll lll ll l lll lllllll ll ll ll ll ll ll ll ll lll lllll ll ll ll l ll ll llllllllll lll ll l l lll ll ll ll ll llll llll lll lll ll lll lll ll l ll ll l ll ll l lll l lll lll ll ll ll lll lll ll ll l ll ll lll lll lll ll l ll ll lll lllll lll ll ll lll ll l lll ll ll lll llll ll ll ll lllll lll l ll ll ll ll lll lll lll ll ll ll lll g=0, h=0.2, f =−0.5 − − l lllllll lll l l ll l lllllll lllllll ll llllllllll l l llll l llll lllll lll ll l lllll llll lll ll ll l l l llll l lllll lllll ll llllll ll l ll l lll llll ll llllll ll l lllll lll ll l lllllll ll llll lllllll ll ll ll l ll l llllll ll ll lll llllll l l lll l lll llll lll lll lll l lll ll lll l llll ll l lll l lll l llll l ll l lllll l ll l llllll llllll llllllll lll lll llll llll ll l lllll l llllllllllllll l lll l l lllll l llll l l llllll ll ll l lllll lll lllll l l ll l llll l ll lll llll ll l ll llllllll lllllll l l l lll lll llll llllllll l llll l l l lllll lll ll lll llllll llllll l llll llll llll ll lllllll l l llll ll l l l llll ll l llll l llllll l l g=0, h=0.2, f =0.8 l ll lll ll ll lll ll lll ll ll l ll ll ll l ll ll ll llllll ll llll ll lll ll ll ll ll ll llllll ll ll ll lll ll ll llll l lll ll l lll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll llll ll l ll ll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll ll ll ll l ll llll ll ll ll ll llll l ll lll l ll ll ll l ll ll ll ll ll ll ll ll lll l lll ll lll lll ll l lll ll ll ll l ll ll ll ll ll ll ll ll lll lllll ll ll ll lll ll ll ll ll llll lll ll ll ll l ll ll ll ll ll ll ll ll ll ll ll ll ll l lll ll lllll l ll ll llll l lll lll ll ll ll lll ll l ll ll l ll ll ll ll ll lll ll l ll ll ll ll llll ll lll ll lll ll l lll ll ll ll lllllll ll ll ll ll l llll ll ll ll ll ll ll ll lll ll ll ll ll ll l ll ll lll lll ll ll ll g=0.3, h=0.1, f =−0.8 −4 −2 0 2 4 − − ll llll lll lllll l l l ll lllllll lll l l ll l llll lll llll lll ll l llll lllll l l llll l llll ll ll l lll ll l lllll lll l lll ll ll l l lllll l lllll lll ll ll llllll ll l ll l lll llll ll llllll ll l lllll lll ll l llll ll l ll lll l llllll l ll ll ll lll l llllll ll lllll llllll l llll l lll lll l lll lll ll l l lll ll lll l llll ll l lll l lll l llll l ll l lllll l ll l llllll llllll ll llllll lll lll llll l lll ll l lllll l lllllllll ll lll l lll l l lllll l llll l l lllll l ll ll l ll lll lll lllll l l ll l llll l ll lll llll ll l ll lll lllll llllll l l l l lll lll llll l lllllll l ll ll ll l lllll lll ll l ll lll lll llll ll l llll llll llll ll ll llll l l l lll l ll l l l ll g=0.3, h=0.1, f =0.5 −4 −2 0 2 4 Model 1: TGH−AR(1)−t l ll ll lll ll lll ll l ll ll ll lllll ll l ll ll ll llllll ll ll ll ll lll ll ll ll ll ll lll lll ll ll ll lll ll l llllll lll ll l lll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll llll ll l ll ll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll ll ll ll lllllll lll ll l ll llll l ll llll ll ll ll l ll ll ll ll ll ll ll ll lll l ll lll ll llll ll l lll ll ll ll l ll ll ll ll ll ll ll ll lll l ll ll ll ll ll l ll ll ll ll ll llll lll ll ll ll ll lll ll ll ll ll llll ll ll ll ll lll lll ll l ll ll lll ll l ll ll ll llll ll ll ll lll ll lllll l ll ll ll ll ll ll l ll l ll ll ll ll ll ll ll lll ll ll lll l lll ll ll lll llll ll ll ll lllll llll ll ll ll ll ll ll ll lll ll ll ll llll l ll ll ll l l ll ll ll ll g=0.5, h=0, f =−0.8 l lllllll lll l l ll l lllllll lllllll ll llllllllll l l llll l llll lllll lll ll l lllll lllllll lllll l lllll l lllll lllll ll llllllll lll l lll llll ll llllll ll l lllll lll ll l lllllll ll llll lllllll ll ll ll lll l llllll ll lllll lllllll llll l lll llll lll lll llll lll ll lll l llll ll l lll l lll l llll l llllllll l ll l llllll llllll llllllll lll lll llllllll ll l lllll l llllllllllllll l llll l lllll l llll l l llllllll ll l lllll lll lllll l l ll l llll l lllll llll lll ll llllllll lllllll l l l lll lll lllllllllllllllll l l l lllll lllll lll llllll llllll l llll llll llllll llllllll l llll lll l l llll ll l llll l llllll l l g=0.5, h=0, f =0.8 ll lll l lll ll lll ll l ll lllll ll ll l ll ll l ll ll lll ll ll lll ll ll lll ll lll ll llll lll ll ll ll llll ll l ll lll ll ll ll l ll ll l lllll l l ll ll l lll ll ll ll ll ll ll ll l l ll l ll ll ll ll lll l ll lll lll lll ll llll l ll lll ll ll ll ll ll l ll ll ll lll lllll ll ll ll lll ll llll lllll l ll l lll l ll l ll l ll ll ll l ll ll ll ll ll ll llll lll l lll ll lll lll ll l lll lllllll ll ll ll ll ll ll ll ll lll lllll ll ll ll l ll ll ll llllllll lll ll l l lll ll ll ll ll llll llll lll lll ll lll lll ll l ll ll l ll ll l ll l l lll lll ll ll ll lll lll ll ll l ll ll ll l lll lll ll l ll ll lll lllll lll ll ll lll ll l lll ll ll lll llll llll ll lllll lll l ll ll ll ll lll ll l lll ll ll ll lll g=0, h=0.2, f =−0.5 l lllllll lll l l ll l lllllll lllllll ll llllllllll l l llll l llll lllll lll ll l lllll lllllll ll ll l l lllll l lllll lllll ll llllll ll l ll l lll llll ll llllll ll l lllll lll ll l lllllll ll llll lllllll ll ll ll l ll l llllll ll lllll llllll l l lll l lll llll lll lll llll lll ll lll l llll ll l lll l lll l llll l ll l lllll l ll l llllll llllll llllllll lll lll llll llll ll l lllll l llllllllllllll l lll l l lllll l llll l l llllll ll ll l lllll l l l lllll l l ll l llll l lllll llll lll ll llllllll lllllll l l l lll lll llll llllllll l llll l l l lllll lll ll l ll llllllllllll l llll llll llll ll lllllll l l llll ll l l l llll ll l llll l llllll l l g=0, h=0.2, f =0.8 l ll lllll ll lll ll l ll ll ll lllll ll l ll ll ll llllll ll ll ll ll lll ll ll ll ll ll lll lll ll ll ll lll ll llllll l lll ll l lll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll llll ll l ll ll ll ll ll ll ll ll ll ll ll lll ll ll ll ll ll ll ll ll ll ll lllllll lllll l ll llll l ll lll l ll ll ll l ll ll ll ll ll ll ll ll lll l lllll ll llll ll l lll ll ll ll l ll ll ll ll ll ll ll ll lll lll ll ll ll ll l ll ll ll ll ll llll lll ll ll ll lll ll ll ll ll ll llll ll ll ll ll lll lll ll lll ll lll ll l ll ll ll llll ll ll ll lll ll lllll l ll ll ll ll ll ll l ll l ll ll ll ll ll ll ll lll ll ll lll l lll ll ll lllllll ll ll ll ll ll l llll ll ll ll ll ll ll ll lll ll ll ll ll ll l ll ll ll l l ll ll ll ll g=0.3, h=0.1, f =−0.8 −4 −2 0 2 4 ll llll lll lllll l l l lllllllll lll ll ll l llll lll llll lll ll lllll lllll l l llll lllll llll l lll ll l lllll lll l lll ll ll l l lllll l lllll lll ll ll llllll ll l ll l lll llll ll llllll ll l lllll lll ll l llll ll l ll lll l llllll l ll ll ll lll l llllll ll lllll llllll l llll l lll lll l lll lll lll l lll ll lll l llll ll l lll l lll lllll lll l lllll l ll l llllll llllll llllllll lll lll llll l lll ll l lllll l lllllllll ll lll l lll l l lllll l llll l l lllll l ll ll l lllll lll lllll l l ll l llll l ll lll llll lll ll lll lllll llllll l l l l lll lll llll lllll lll l ll ll ll llllll lll ll lll llllll llllll l llll llll llll ll lllllll l l lll l ll l l lll g=0.3, h=0.1, f =0.5 −4 −2 0 2 4 Model 2: TGH−AR(1)−e Y t - Y t - Y t - Y t - Y t Y t Y t Figure 3: Illustration of the non-linear relationship between Y t and Y t − for the TGH-AR(1)-tmodel, overlapped with theoretical contours of the joint density (left two columns); illustrationof the linear relationship between Y t and Y t − for the TGH-AR(1)-e model with skewed and/orheavy-tailed noise (right two columns). Each panel is plotted with one realization of sample size n = 500 from our models without covariates, with ξ =0, ω =1 and diﬀerent values for g, h and φ .explicitly instead of a matrix form as for the random ﬁelds. The AR structure also makes itpossible for us to build the TGH-AR-e model, which is not possible in the random ﬁelds setting. A drawback of the TGH transformation is that its inverse transformation does not have an ex-plicit form (except when either g or h is equal to 0) and, thus, the likelihood inference is diﬃcult.10arlier estimation procedures rely on matching sample quantiles or sample moments with theirpopulation counterparts to avoid this diﬃculty. For example, Hoaglin (1985) developed an esti-mation procedure by matching a sequence of sample quantiles and theoretical quantiles, which werefer to as the letter-value-based method. Recently, Xu and Genton (2015) proposed an eﬃcientparameter estimation algorithm for the independent TGH distribution by using an approximatedlikelihood. Their algorithm greatly improved the parameter estimation performance comparedto the moment or quantile-based methods without compromising the computational speed.The parameters involved in our two TGH-AR models are ξ, ω, g, h, β , φ . Since the maximumlikelihood estimator (MLE) is well-known to possess many good asymptotic properties, we preferthe likelihood-based estimator over the moment or quantile-based estimators. The vector ofthe parameters related to the TGH transformation is denoted by θ = ( ξ, ω, g, h, β T ) T , theautoregressive parameters by φ , and set θ = ( θ T1 , φ T ) T .For model 1, the log-likelihood function given n observations ( y , x ) , . . . , ( y n , x n ) can bewritten as: l n ( θ ) = f φ ( z , θ ) + f φ ( z , θ | z , θ ) + · · · + f φ ( z n, θ | z n − , θ , . . . , z n − p, θ ) − n log ω − h n (cid:88) t =1 z t, θ − n (cid:88) t =1 log (cid:20) exp( gz t, θ ) + hg { exp( gz t, θ ) − } z t, θ (cid:21) , (6)where z t, θ = τ − g,h (cid:110) y t − ξ − x T t β ω (cid:111) , f φ ( ·|· ) is the conditional log-likelihood for the underlying GaussianAR( p ) process Z t , which is also Gaussian.Since an explicit form of the likelihood of Y t is intractable for model 2, we consider theconditional likelihood given the ﬁrst k ≥ p observations. The conditional log-likelihood of model2 can be written as: l n ( θ | y , . . . , y k ) = − h + 12 n (cid:88) t = k +1 (cid:15) t, θ − n (cid:88) t = k +1 log (cid:20) exp( g(cid:15) t, θ ) + hg { exp( g(cid:15) t, θ ) − } (cid:15) t, θ (cid:21) − ( n − k ) log ω − n − k π ) , (7)where (cid:15) t, θ = τ − g,h (cid:110) ˜ y t − φ ˜ y t − −···− φ p ˜ y t − p ω (cid:111) and ˜ y t = y t − ξ − x T t β .11lthough we can ﬁnd the MLE for θ in our two models by maximizing (6) or (7) withrespect to θ in principle, it is computationally expensive to ﬁnd the inverse τ − g,h ( · ) numericallyfor each data point and iteration of the optimization since τ − g,h ( · ) does not have a closed form. Tobypass this computational challenge, we borrow the idea of approximated likelihood from Xu andGenton (2015) for the univariate TGH distribution. Xu and Genton (2017) extended this idea tothe random ﬁeld case and proposed an algorithm for iteratively estimating both the parametersrelated to the TGH transformation and the parameters in the model of the covariance function.In essence, they linearized the inverse transformation function piecewisely and maximized theapproximated likelihood function with the piecewisely linearized inverse function instead of theexact one.Equipped with this linearization procedure, now we numerically maximize the approximatedlog-likelihood by using the approximated inverse transformation function ˜ τ − g,h instead of τ − g,h ineither function (6) or (7). Thus, we obtain the maximum approximated likelihood estimator(MALE) of the parameters in our models much faster. We also found that iteratively optimizing θ and φ is better than optimizing θ directly; however, it is about 20 times slower on a Lenovocomputer with 16 2.50GHZ Intel Xeon(R) CPUs. Hence, we optimize all of the parameters forestimation directly in our algorithm. The order selection for an AR model is usually performed via the Akaike information criterion(AIC) or the Bayesian information criterion (BIC). In a simulation study (all simulations are doneon the same computer as mentioned above), we found that the algorithm based on BIC performsmuch better than the one based on AIC for our models. Therefore, we use the order selectionprocedure based on BIC with our estimation algorithm. For model 1, BIC= − l n ( θ )+( p +4) log n ;for model 2, BIC= − l n ( θ | y , . . . , y k ) + ( p + 4) log( n − k ) and again we use the approximatedlikelihood instead of the exact one. 12e evaluate the empirical correct order selection rate by BIC with the approximated likeli-hood for our two models through a simulation study. We generate data from both models with ξ = 0, ω = 1, g = 0 . , h = 0 . n = 100 and n = 500 and atrue AR order of p = 0 , ,

2. For p = 1 and p = 2, we use multiple values for the AR coeﬃcient(s),which exhibit diﬀerent behaviors of the ACF and diﬀerent intensities for the spectral density.We get correct order selection rate from 1000 simulations. Table 1 summarizes the correct orderselection rate by BIC for our two models with diﬀerent sample sizes n and orders p for diﬀerentvalues of φ ( φ ).Table 1: Correct order selection rate for both the TGH-AR( p )-t and the TGH-AR( p )-e modelswithout covariates and with ξ = 0, ω = 1, g = 0 . , h = 0 . n = 100 and n = 500. TGH-AR( p )-t TGH-AR( p )-e n = 100 n = 500 n = 100 n = 500 p =0 0.959 0.988 0.938 0.990 p =1 φ = − . φ = 0 . φ = − . φ = 0 . p =2 φ = ( − . , − . T φ = (0 . , . T φ = (0 . , − . T φ = (1 . , − . T g and h as used in the functional boxplots. We conclude that theoverall correct order selection rate is satisfactory for our order selection procedure based on BICwith the approximated likelihood. In this section, we derive one-step-ahead point and probabilistic forecasts for our two models.For model 1, Y t = X T t β + ξ + ωT t , given p observations y t , . . . , y t − p +1 , we derive the conditional13istribution: T t +1 | y t , . . . , y t − p +1 ∼ τ g,h (˜ µ + ˜ σZ ) , Z ∼ N (0 , , where ˜ µ and ˜ σ are the conditional mean and variance, respectively, for Z t +1 | Z t , . . . , Z t − p +1 of the underlying Gaussian AR( p ) process Z t . Here, ˜ µ and ˜ σ are determined by φ and can befound eﬃciently by the Durbin-Levinson algorithm. With this conditional distribution, the pointpredictors for minimizing the absolute prediction error (conditional median) and the squaredprediction error (conditional mean) are: ˆ Y t +1 = ξ + X T t +1 β + ωτ g,h (˜ µ ) and ˆ Y t +1 = ξ + X T t +1 β + ωg √ − h ˜ σ exp (cid:110) h ˜ µ − h ˜ σ ) (cid:111) (cid:104) exp (cid:110) g ˜ σ +2 g ˜ µ − h ˜ σ ) (cid:111) − (cid:105) , respectively.For model 2, the conditional distribution is: Y t +1 | y t , . . . , y t − p +1 ∼ X T t +1 β + ξ + φ ˜ y t + · · · + φ p ˜ y t − p +1 + ωτ g,h ( Z ) , Z ∼ N (0 , . The point predictors for minimizing the absolute loss (conditional median) and squared loss(conditional mean) are ˆ Y t +1 = ξ + X T t +1 β + φ ˜ y t + · · · + φ p ˜ y t − p +1 and ˆ Y t +1 = ξ + X T t +1 β + φ ˜ y t + · · · + φ p ˜ y t − p +1 + ωg √ − h (cid:104) exp (cid:110) g − h ) (cid:111) − (cid:105) , respectively. We note that the conditionalmedian has the same form as that of an AR model with Gaussian error. In practice, we needto estimate the parameters ﬁrst to make forecasts for future values of the time series based onthose estimated parameters. Thus, the diﬀerence between the point predictions based on theconditional medians of our model and the Gaussian AR model comes from the diﬀerence in theirrespective estimations of β , ξ and φ .Prediction conﬁdence intervals (CI) can be found easily from the conditional distributionsof our two models. There are diﬀerent ways to form the two-sided CI for a given distribution.One popular choice for the (1 − α )100% CI is to exclude α/ − α probability interval of the distribution, which we refer to as the symmetricweight CI. Another choice is to ﬁnd the shortest 1 − α probability interval, which we refer toas the minimum-length CI. The minimum-length CI coincides with the symmetric weight CI forsymmetric distributions. For model 1, the lower and upper bounds of the 95% prediction CI14an be found by transforming the lower and upper bounds of a 95% probability interval of theunderlying normal distribution of mean ˜ µ and variance ˜ σ . The symmetric weight predictioninterval is [ X T t +1 β + ξ + ωτ g,h (˜ µ − z − α/ ˜ σ ) , X T t +1 β + ξ + ωτ g,h (˜ µ + z − α/ ˜ σ )], and the minimum-length prediction interval is [ X T t +1 β + ξ + ωτ g,h (˜ µ − z − γ opt ˜ σ ) , X T t +1 β + ξ + ωτ g,h (˜ µ + z − α + γ opt ˜ σ )],where γ opt needs to be optimized numerically for diﬀerent values of g and h for given ˜ µ and ˜ σ . Formodel 2, the symmetric weight prediction interval is [ ˆ Y t +1 + ωτ g,h ( − z − α/ ) , ˆ Y t +1 + ωτ g,h ( z − α/ )]and the minimum-length prediction interval is [ ˆ Y t +1 + ωτ g,h ( − z − γ opt ) , ˆ Y t +1 + ωτ g,h ( z − α + γ opt )],where ˆ Y t +1 is the conditional median of Y t +1 | y t , . . . , y t − p +1 .The prediction intervals given above are derived with respect to the true parameters of thetwo models. However, in practice, those parameters need to be estimated and the predictioninterval with the estimated parameters should be inﬂated taking the uncertainty in parameterestimation into account. Research on mean squared prediction error (MSPE) with estimatedparameters can be referred to a vast literature in the context of linear models, time series modelsand spatial models (Zimmerman and Cressie, 1992). In particular, for a Gaussian AR( p ) model,the MSPE of a one-step ahead forecast with estimated autoregressive coeﬃcients is inﬂated bya factor of 1 + p/n , where n is the sample size (Yamamoto, 1976). Many authors (e.g. Toyooka,1982) have concluded that the bias of the estimated MSPE is asymptotically negligible and wesee later in the simulation study that the empirical coverage of the prediction CIs by ignoringthe uncertainty in parameter estimation is close to the nominal level. We perform a Monte Carlo simulation study to assess performance of our models from diﬀerentaspects. We consider six pair of values for g and h (same values as in the functional boxplots)and sample sizes n = 100 , , n and each pair of values for g and h , we ﬁrst generate one realization fromthe model with ξ = − ω = 1 . , β = (3 , − T , X t = { cos(2 πt/ , sin(2 πt/ } T and φ = 0 . Using the TGH-AR-t model, we ﬁrst check the improvement of estimation performance by usingthe MALE to simultaneously estimate θ and φ rather than sequentially estimating θ from Y t treating them as independent realizations and then estimate φ from the transformed process˜ Z t = τ − g, ˆ h ( Y t − ˆ ξ − X T t ˆ β ˆ ω ). The latter approach is often adopted by practitioners to ﬁrst ﬁnd theoptimal transformation and then estimate the temporal structure from the ‘Gaussianized’ process.For this comparison, we use two additional methods for estimation in the simulation procedure asdescribed before: the letter-value-based method (Hoaglin, 1985) and the MALE for independentTGH distributions (Xu and Genton, 2015) to estimate θ ﬁrst and then estimate φ by thetransformed process ˜ Z t .Figure 4 presents boxplots of the parameters estimated by the three diﬀerent estimators for1000 replications with g = 0 . , h = 0 . h and φ , which indicate that estimating the optimal transformation by ignoring the dependency16 GH−AR(1)−t: f =0.8, g=0.3, h=0.1, n=100 llllllllll lllllllllll llllllll −5−4−3−2−1 LVII Ind TGH−AR−t x lllllllllllllllllllllllllllll lllllllllllllllllllllllll lllllllllllllllll

123 LVII Ind TGH−AR−t w llllllllll lllllll llllll g llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllll h llllllllllllllll llllll lllllllllllllll b llllllllllllll llllllllllll lllllll −4−3−2−10 LVII Ind TGH−AR−t b llllllllllll llllllllllllllllllll llllllllllllll f TGH−AR(1)−t: f =0.8, g=0.3, h=0.1, n=250 lllllllllll llllllll llllllllll −5−4−3−2−1 LVII Ind TGH−AR−t x llllllllllllllllll llllllllllllllll lllllllllllllllllll

123 LVII Ind TGH−AR−t w lllllllllllll lllll llllll g lllllllllllllllllllllll lllllll l h lllllllllll lllllllll lll b lllllllll lllllll llllll −4−3−2−10 LVII Ind TGH−AR−t b llllllllllllll lllllll llllll f TGH−AR(1)−t: f =0.8, g=0.3, h=0.1, n=500 llllllll llllllllll lllllllllll −5−4−3−2−1 LVII Ind TGH−AR−t x llllllllllllll llll llllllll

123 LVII Ind TGH−AR−t w llllllllllll llllllllll llllll g lllllllllllllllll lllllllllllll lllllllllllllllllllllllllllllllllllllllllll h lllllllllll llllllllll lllllllll b llllllllll llll lllllll −4−3−2−10 LVII Ind TGH−AR−t b llllllllllllll llllllll llllllllll f Figure 4: Comparison of three estimators: letter-value-based method (LVII), MALE based onindependent TGH distribution (Ind) and MALE for the TGH-AR-t model (TGH-AR-t), withboxplots of each estimator for each parameter from 1000 replications with data generated fromthe TGH-AR(1)-t with true parameters indicated by green line.structure of the underlying process deteriorate the estimation. Hence, the comparison resultjustiﬁes the need to develop such a tailored estimation procedure for the TGH distribution inthe time series context.

Next, we check the ﬁnite sample behavior of the MALE for both models. Figure 5 shows theestimation results from our two TGH-AR(1) models with φ = 0 . n = 100 for six pairs ofdiﬀerent values of g and h . For g = h = 0, which corresponds to a Gaussian AR process,we also include boxplots of the MLE based on the Gaussian AR model for ξ, ω, β , β and φ .Visunimations of boxplots in a same manner as sample size grows ( n = 100 , , φ = 0 . , φ = − .

25 are alsoincluded in the supplementary material (Visuanimations S4 and S5).From Figure 5, we can see that with sample size n = 100, the MALE for ω, h and φ is a bit17 GH−AR(1)−t: f =0.8, n=100TGH−AR(1)−t: f =0.8, n=100 lllll lllllll llllll lllllll lllllllll llllllllll llllllll −5−4−3−2−1 Gau g=0h=0 g=0.3h=0 g=0.5h=0 g=0h=0.1 g=0h=0.2g=0.3h=0.1 x llllll llllllllllll lllllllllllllllll lllllllllllllllllll llllllllllllllllll llllllllllllll lllllllllllllllll

123 Gau g=0h=0 g=0.3h=0 g=0.5h=0 g=0h=0.1 g=0h=0.2g=0.3h=0.1 w lllllllllll llllll lllllllllllll lllll lllll llllll g llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllll lllllll lllllllllllllllll h llllllll llllllll llllllllllll lllllllllllllllll lllllll lllllllllllll lllllllllllllll b llll llll lllllllllll llllllllllll lllllll lllllllllllllllllll lllllll −4−3−2−10 Gau g=0h=0 g=0.3h=0 g=0.5h=0 g=0h=0.1 g=0h=0.2g=0.3h=0.1 b llllllllllllllllllllll lllllllllllllllll lllllllllllllllllll llllllllllllllllllllll llllllllllllllllll llllllllllllllll llllllllllllll f TGH−AR(1)−e: f =0.8, n=100TGH−AR(1)−e: f =0.8, n=100 lllll lllllll lllllll llllllllllllll llllll llllll llllllll −5−4−3−2−1 Gau g=0h=0 g=0.3h=0 g=0.5h=0 g=0h=0.1 g=0h=0.2g=0.3h=0.1 x llllll lllllllll lllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllll llllllllll

123 Gau g=0h=0 g=0.3h=0 g=0.5h=0 g=0h=0.1 g=0h=0.2g=0.3h=0.1 w llllllllll lllllllllllll lllllllllllllllllll llllllll lllllllll llllllllll g llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llll llllllll lllll h llllllll lll lllll llllllll lll lllll lll b llll lllllll lllll llllll llllllll lllllll llllll −4−3−2−10 Gau g=0h=0 g=0.3h=0 g=0.5h=0 g=0h=0.1 g=0h=0.2g=0.3h=0.1 b lllllllllllllllllllll lllllllllllllllll llllllllllll llllllllllllllll lllllllllllllllllllll lllllllllllllllllllll llllllllllllll f Figure 5: Boxplots of the MALE in 1000 simulations for the two TGH-AR(1) models with n = 100 for six pairs of diﬀerent values of g and h for each parameter with true value indicatedby green line. For g = h = 0, boxplots of the MLE based on the Gaussian AR model (indicatedby ‘Gau’) for ξ, ω, β , β , φ are also included.biased. Nevertheless, as shown by Visuanimations S2 and S3 in the supplementary material, theestimation improves greatly as the sample size increases to n = 250. In addition, under g = h = 0when the true distribution is Gaussian, with two more parameters, g and h , to estimate, theestimators based on the approximated TGH-AR likelihood do not deteriorate much from theGaussian MLE. The visuanimations also give us an empirical guideline about how the samplesize aﬀects the estimation performance. To get an overall satisfying estimation performance bythe MALE, we suggest a sample size no less than n = 250. Finally, we evaluate the forecasting performance for our two models with parameters estimatedby the corresponding MALE without order selection. Table 2 summarizes the performance of thepoint forecasts from each of the three methods that based on the TGH-AR-t, TGH-AR-e andGaussian AR models, and for data generated from each of our two TGH-AR(1) models with ξ = − ω = 1 . , β = (3 , − T , X t = { cos(2 πt/ , sin(2 πt/ } T and φ = 0 . g = 0 . , h = 0 . n = 500. The mean absolute errors (MAE) is calculated from the conditional median andthe root mean square errors (RMSE), from the conditional mean as point predictor. It also showsthe empirical coverage and average length of the 95% minimum-length CI from each method.Table 2: Summary of point forecast performance of the three methods based on the TGH-AR-t,TGH-AR-e and Gaussian AR models, and for data generated from each of our two TGH-AR(1)models with ξ = − ω = 1 . , β = (3 , − T , X t = { cos(2 πt/ , sin(2 πt/ } T , φ = 0 . g = 0 . , h = 0 . n = 500.Data generated from Model 1: TGH-AR(1)-t Model 2: TGH-AR(1)-eModeled by TGH-AR-t TGH-AR-e Gau-AR TGH-AR-t TGH-AR-e Gau-ARMAE φ = 0 . , φ = − . IT: TGH−AR(1)−t D en s i t y . . . . Mean CRPS: 0.613

PIT: TGH−AR(1)−e D en s i t y . . . . Mean CRPS: 0.627

PIT: Gau−AR(1) D en s i t y . . . . Mean CRPS: 0.633True model: TGH−AR(1)−t: f =0.8, n=500 . . . . . . TGH−AR(1)−t

Nominal Quantile E m p i r i c a l Q uan t il e l l l l l l l l l l . . . . . . TGH−AR(1)−e

Nominal Quantile E m p i r i c a l Q uan t il e l l l l l l l l l l . . . . . . Gau−AR(1)

Nominal Quantile E m p i r i c a l Q uan t il e l l l l l l l l l l Figure 6: Comparison of probabilistic forecast performances via PIT and reliability plots whendata are generated from TGH-AR(1)-t model with ξ = − ω = 1 . , β = (3 , − T , X t = { cos(2 πt/ , sin(2 πt/ } T , φ = 0 . g = 0 . , h = 0 . n = 500 and ﬁtted for the threemethods: TGH-AR(1)-t, TGH-AR(1)-e and Gau-AR(1). Mean CRPS value is also labeled. PIT: TGH−AR(1)−t D en s i t y . . . . Mean CRPS: 1.003

PIT: TGH−AR(1)−e D en s i t y . . . . Mean CRPS: 0.987

PIT: Gau−AR(1) D en s i t y . . . . Mean CRPS: 1.002True model: TGH−AR(1)−e: f =0.8, n=500 . . . . . . TGH−AR(1)−t

Nominal Quantile E m p i r i c a l Q uan t il e l l l l l l l l l l . . . . . . TGH−AR(1)−e

Nominal Quantile E m p i r i c a l Q uan t il e l l l l l l l l l l . . . . . . Gau−AR(1)

Nominal Quantile E m p i r i c a l Q uan t il e l l l l l l l l l l Figure 7: Comparison of probabilistic forecast performances via PIT and reliability plots whendata are generated from TGH-AR(1)-e model with ξ = − ω = 1 . , β = (3 , − T , X t = { cos(2 πt/ , sin(2 πt/ } T , φ = 0 . g = 0 . , h = 0 . n = 500 and ﬁtted for the threemethods: TGH-AR(1)-t, TGH-AR(1)-e and Gau-AR(1). Mean CRPS value is also labeled.20he best forecast is achieved by using the same model that the data are generated from: theforecast has the smallest MAE and RMSE, a ﬂatter PIT histogram, and a smaller mean CRPS;the reliability plot is close to a 45 ◦ straight line; the empirical coverage of the 95% CI is close tothe nominal level while the width is shorter than that of the Gaussian AR predictor. Note that,although we used the prediction CIs derived in Section 3.3 by treating the estimated parametersas the true parameters, Table 2 shows that the empirical coverage of the 95% CIs is close to thenominal level. We also notice that the TGH-AR-t and TGH-AR-e models produce quite diﬀerentforecasts via their distinctive histograms of the PIT values, which means that the two modelsare not interchangeable. The analysis of wind data is a crucial step in simulations for climate science. The accurate fore-casting of wind speeds and quantifying the forecast uncertainty are also important for exploitingwind as clean energy. In this section, we illustrate the usefulness of our two non-Gaussian timeseries models under both wind speed simulation and wind speed forecasting scenarios. In Sec-tion 5.1 we ﬁt daily wind speed data with the TGH-AR-t model to get parameter estimationfor the purpose of fast wind speed simulation. In Section 5.2, we apply the TGH-AR-e modelin order to make better wind speed forecasts. For the two datasets in the two subsections, thereason to use one speciﬁc model is based on plots of Y t − X T t β versus Y t − − X T t − β and withreference to Figure 3 to see whether the relationship is linear or not. Climate models can produce multiple outputs of spatio-temporal wind speed data over the globe.Statistical models have been developed to reproduce the output from climate models for the sakeof fast simulation instead of using the computationally expensive physical models. In order toﬁt a space-time statistical model for wind ﬁeld over the entire world, a 4-step multi-resolution21ethod has been proposed by Castruccio and Genton (2016), in which the ﬁrst step is to modelthe time series of wind speed at each location individually; see Castruccio and Genton (2018)for the general principles for analyzing big spatio-temporal data from climate models. For the4-step multi-resolution method, our TGH-AR time series models can be used as a modiﬁcationof the ﬁrst step when time series data show non-Gaussian features.To illustrate the usefulness of our TGH-AR-t model, we consider a publicly available LargeEnsemble Project (LENS) dataset that consists of 30 ensembles of daily wind speed over theglobe with a spatial resolution of around 1 ◦ longitude and 1 ◦ latitude from the year 1920 to 2100(Kay et al., 2015). We select one ensemble from the complete dataset and use the historical86 years (1920-2005, n = 31390) at 22 ×

22 gridded locations over Saudi Arabia (boundedroughly by 13 − ◦ N and 32 − ◦ E). For each day in a year for each location, we estimate theseasonality by taking the average of the wind speed across the 86 years at that location. For

35 40 45 50 55 x lon l a t −0.3−0.2−0.10.0 35 40 45 50 55 w lon l a t g lon l a t −0.10.00.10.20.335 40 45 50 55 h lon l a t f lon l a t f lon l a t −0.5−0.4−0.3−0.2−0.10.0 Figure 8: Maps of the estimated parameters from ﬁtting the TGH-AR-t model with order selec-tion to daily wind speed residuals. 22ach location, we remove the seasonal eﬀect from each time point and analyze the residual windspeed data, which show a clear AR pattern by the ACF and the partial autocorrelation function(PACF). We use the TGH-AR-t model because the residual wind speed time series clearly showfeatures of skewness and heavy tails and a plot of Y t versus Y t − for the residual wind speedshows a nonlinear relationship. We get parameter estimation with order selection by ﬁtting theTGH-AR-t model at each location. With the TGH-AR-t model and the estimated parametersat each location, wind speed data can be generated rapidly without using the physical model.The estimated parameters also give insights into the pattern of the distributional properties forthe wind speed residuals.Figure 8 shows maps of the estimated values of the 4 parameters related to the Tukey g -and- h transformation as well as two autoregressive coeﬃcients. Since the autoregressive orderselected is 2 for the majority of the locations, maps of estimations of higher order autoregressivecoeﬃcients are omitted. We may notice from these plots that ˆ ξ , ˆ ω , ˆ φ and ˆ φ (white area indicatethe order selected is only 1) are distinctively diﬀerent between land and ocean. We also observethat the ˆ g and ˆ h estimates show interesting patterns that are closely related to elevation andgeographical features of a location. For the full 4-step multi-resolution analysis of daily windspeed over the globe, where the TGH-AR-t model is used in the ﬁrst step, see Jeong et al. (2017). We consider hourly wind speed data observed at a meteorological tower in Sunnyside, Ore-gon, from 1 December 2013 to 31 December 2014 (Kazor and Hering, 2015). First, we use thehourly wind speed observed in June 2014 ( n = 720) of this dataset to demonstrate the suit-ability of using the TGH-AR-e model. We notice that there exists a diurnal pattern in thehourly wind speed, so we include harmonics with periods of 12 and 24 hours as the covariatesin the model. We ﬁnd the MALE with order selection by ﬁtting the wind speed time seriesfrom that month with the TGH-AR-e model. The estimated parameters are ˆ ξ = 7 . , ˆ ω =23 unnyside 2014 Jun days Residual wind speed Y~ t y D en s i t y − − . . . . . γ =0.273, γ =2.684 − − − − Normal Q − Q Plot

Theoretical Quantiles S a m p l e Q uan t il e s − − Y~ t − Y ~ t . . . . . . Lag A C F Series Y~ t . . . . . Lag P a r t i a l A C F Series Y~ t ε t z D en s i t y − − − − . . . . . − − − − − − Normal Q − Q Plot

Theoretical Quantiles S a m p l e Q uan t il e s − . − . . . Lag P a r t i a l A C F Series ε t Figure 9: Various diagnostic plots of applying the TGH-AR-e model to the hourly wind speeddata of the month June 2014.1 . , ˆ g = 0 . , ˆ h = 0 . , ˆ φ = 1 . , ˆ φ = − .

15, and ˆ β = ( − . , . , − . , . T with X t = { cos(2 πt/ , sin(2 πt/ , cos(2 πt/ , sin(2 πt/ } T .Figure 9 shows the original hourly wind speed time series overlapped with a diurnal pattern(green line) estimated using the TGH-AR-e model. The histogram and normal Q-Q plot of theresidual wind speeds after removing periodicity obviously deviate from Gaussianity. The skewnessand kurtosis values presented with the histogram further support using our TGH-AR-e model toadapt to right-skewed and heavy-tailed data. The residual wind speed at time t is plotted againsttime t − (cid:15) t, θ = τ − g, ˆ h (cid:110) ˆ˜ y t − ˆ φ ˆ˜ y t − − ˆ φ ˆ˜ y t − ˆ ω (cid:111) ,where ˆ˜ y t = y t − ˆ ξ − x T t ˆ β . These plots conﬁrm the validity of using a TGH-AR(2)-e model inwhich (cid:15) t i.i.d. ∼ N (0 , n = 720) for parameter estimation to account for thenon-stationarity caused by seasonal eﬀect. To estimate the parameters for the Gaussian ARmodel, we ﬁrst remove the diurnal pattern using linear regression with the same harmonics asin the TGH-AR-e model. Then, we ﬁnd the MLE of the autoregressive parameters from theresiduals.The MAE of the conditional median from the TGH-AR-e model is 4.05; the Gaussian ARforecast has an MAE of 4.28. The RMSEs for the conditional means of the TGH-AR-e modeland the Gaussian AR model are 1.47 and 1.50, respectively. The empirical coverage of the 95%minimum-length CI is 94.2% for the TGH-AR-e model and 93.3% for the Gaussian model. Weconclude that the point forecasts and CIs based on the TGH-AR-e model are better than thosebased on the Gaussian AR model for these hourly wind speed data. However, it is diﬃcultto see the diﬀerences between the forecast results from only these numbers. Figure 10 showshistograms of the PIT values from forecasts based on the TGH-AR-e and Gaussian AR models.By comparing these histograms we see evidently the superiority of ﬁtting the wind speed usinga non-Gaussian TGH error in the AR model rather than a Gaussian error. In this paper, we applied the TGH transformation in a time series context and built two ﬂexiblenon-Gaussian autoregressive models. The TGH-AR-t model assumes a latent Gaussian process,25

IT: TGH − AR − e D en s i t y . . . . . . . Mean CRPS: 1.082

PIT: Gau − AR D en s i t y . . . . . . . Mean CRPS: 1.101

Figure 10: Histograms of the PIT values of probabilistic forecasts by the TGH-AR-e and GaussianAR model for one-hour-ahead forecasts over the whole year of 2014.which is transformed as whole while the TGH-AR-e model transforms the white-noise error termin an autoregressive regression model. The intrinsic diﬀerence between the two models can beexplained by diﬀerent data generating mechanisms. For a given time series, a plot like Figure 3can best help in deciding which model is more suitable. We described an eﬃcient parameterestimation procedure for our two models that approximates the maximum likelihood estimatorand an order selection procedure based on the approximated likelihood. We derived formulas forpoint and probabilistic forecast using the two models. We illustrated the empirical performancesof estimation, order selection and forecasting of our models through a simulation study. We foundthat estimating all parameters at once by our estimation procedure, the performance drasticallyimproved compared to sequential estimation that ignores the temporal dependence. Anotherﬁnding was that the two models yielded diﬀerent forecasts when calibrated on the same sample,hence proving that the two models are not interchangeable. Simulations also suggested a samplesize no less than 250 for a satisfying performance of our models. Finally, we demonstrated theusefulness of our models by applying them to two wind speed datasets at diﬀerent temporalresolutions. Our TGH-AR models provided a fast simulation method that could emulate windspeed outcome of a gridded climate model and produced competitive forecast results for anobservational hourly wind speed dataset from a meteorological station.26he AR models considered in this paper cannot incorporate measurement errors. Extensionsof the TGH-AR models to ARMA or state-space models need further research. Also, we feel thatan exhaustive comparisons of the many existing transformations, including the Box-Cox, TGHand Sinh-arcsinh transformations (Jones and Pewsey, 2009), would be welcome. In a futurestudy, the parameters g and h should be allowed to change smoothly across time, either byimposing a parametric function of t for g and h or by a penalty of smoothness, instead of themoving window scheme we used here. Extension of the TGH framework to a spatio-temporalsetting is also promising.Additional information and supporting material for this article is available online at thejournal’s website. References

Benjamin, M. A., Rigby, R. A., and Stasinopoulos, D. M. (2003). Generalized autoregressivemoving average models.

Journal of the American Statistical Association , 98(461):214–223.Block, H. W., Langberg, N. A., and Stoﬀer, D. S. (1990). Time series models for non-Gaussianprocesses.

Lecture Notes-Monograph Series , 16:69–83.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.

Journal ofEconometrics , 31(3):307–327.Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations.

Journal of the RoyalStatistical Society. Series B (Methodological) , 26(2):211–252.Castruccio, S. and Genton, M. G. (2016). Compressing an ensemble with statistical models: Analgorithm for global 3D spatio-temporal temperature.

Technometrics , 58(3):319–328.Castruccio, S. and Genton, M. G. (2018). Principles for statistical inference on big spatio-temporal data from climate models.

Statistics and Probability Letters , to appear.Cordeiro, G. M. and de Andrade, M. G. (2009). Transformed generalized linear models.

Journalof Statistical Planning and Inference , 139(9):2970–2987.Davies, N., Spedding, T., and Watson, W. (1980). Autoregressive moving average processes withnon-normal residuals.

Journal of Time Series Analysis , 1(2):103–109.De Luca, G., Genton, M. G., and Loperﬁdo, N. (2006). A multivariate skew-GARCH model.In

Advances in Econometrics: Econometric Analysis of Economic and Financial Time Series, art A (Special volume in honor of Robert Engle and Clive Granger, the 2003 winners of theNobel Prize in Economics) , chapter 20, pages 33–57. Elsevier.Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the varianceof united kingdom inﬂation. Econometrica , 50(4):987–1007.Field, C. and Genton, M. G. (2006). The multivariate g -and- h distribution. Technometrics ,48(1):104–111.Gaver, D. P. and Lewis, P. A. W. (1980). First-order autoregressive gamma sequences and pointprocesses.

Advances in Applied Probability , 12(3):727–745.Genton, M. G., Castruccio, S., Crippa, P., Dutta, S., Huser, R., Sun, Y., and Vettori, S. (2015).Visuanimation in statistics.

Stat , 4(1):81–96.Gneiting, T. and Katzfuss, M. (2014). Probabilistic forecasting.

Annual Review of Statistics andIts Application , 1(1):125–151.Gneiting, T., Larson, K., Westrick, K., Genton, M. G., and Aldrich, E. (2006). Calibratedprobabilistic forecasting at the stateline wind energy center.

Journal of the American StatisticalAssociation , 101(475):968–979.Goerg, G. M. (2011). Lambert W random variables - a new family of generalized skewed distri-butions with applications to risk estimation.

The Annals of Applied Statistics , 5(3):2197–2230.Goerg, G. M. (2015). The Lambert way to Gaussianize heavy-tailed data with the inverseof Tukey’s h transformation as a special case. The Scientiﬁc World Journal , 2015:1–16.doi:10.1155/2015/909231.Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence:Assessing Ireland’s wind power resource.

Journal of the Royal Statistical Society. Series C(Applied Statistics) , 38(1):1–50.He, Y. and Raghunathan, T. E. (2012). Multiple imputation using multivariate gh transforma-tions. Journal of Applied Statistics , 39(10):2177–2198.Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distributions.

Data Analysisfor Tables, Trends and Shapes: Robust and Exploratory Techniques .Jeong, J., Yan, Y., Castruccio, S., and Genton, M. G. (2017). A stochastic gen-erator of global monthly wind energy with Tukey g -and- h autoregressive processes. http://arxiv.org/abs/1711.03930 .Johns, C. J., Nychka, D., Kittel, T. G. F., and Daly, C. (2003). Inﬁlling sparse records of spatialﬁelds. Journal of the American Statistical Association , 98(464):796–806.Jones, M. C. and Pewsey, A. (2009). Sinh-arcsinh distributions.

Biometrika , 96(4):761–780.28ay, J. E., Deser, C., Phillips, A., Mai, A., Hannay, C., Strand, G., Arblaster, J. M., Bates,S. C., Danabasoglu, G., Edwards, J., Holland, M., Kushner, P., Lamarque, J.-F., Lawrence,D., Lindsay, K., Middleton, A., Munoz, E., Neale, R., Oleson, K., Polvani, L., and Vertenstein,M. (2015). The community earth system model (cesm) large ensemble project: A communityresource for studying climate change in the presence of internal climate variability.

Bulletin ofthe American Meteorological Society , 96(8):1333–1349.Kazor, K. and Hering, A. S. (2015). Assessing the performance of model-based clustering methodsin multivariate time series with application to identifying regional wind regimes.

Journal ofAgricultural, Biological, and Environmental Statistics , 20(2):192–217.Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving averageEARMA (p,q) process.

Journal of the Royal Statistical Society. Series B (Methodological) ,42(2):150–161.Le, N. D., Martin, R. D., and Raftery, A. E. (1996). Modeling ﬂat stretches, bursts, and outliersin time series using mixture transition distribution models.

Journal of the American StatisticalAssociation , 91(436):1504–1515.Li, W. K. and McLeod, A. I. (1988). ARMA modelling with non-Gaussian innovations.

Journalof Time Series Analysis , 9(2):155–168.Martin, V. L. (1992). Threshold time series models as multimodal distribution jump processes.

Journal of Time Series Analysis , 13(1):79–94.Martinez, J. and Iglewicz, B. (1984). Some properties of the Tukey g and h family of distributions. Communications in Statistics-Theory and Methods , 13(3):353–369.Nelson, H. L. and Granger, C. (1979). Experience with using the Box-Cox transformation whenforecasting economic time series.

Journal of Econometrics , 10(1):57–69.Sun, Y. and Genton, M. G. (2011). Functional boxplots.

Journal of Computational and GraphicalStatistics , 20(2):316–334.Tong, H. (1990).

Non-linear Time Series . Oxford University Press, Oxford.Toyooka, Y. (1982). Prediction error in a linear model with estimated parameters.

Biometrika ,69(2):453–459.Tukey, J. (1977).

Exploratory Data Analysis . Addison-Wesley, Reading, MA.Wong, C. S., Chan, W. S., and Kam, P. L. (2009). A Student t-mixture autoregressive modelwith applications to heavy-tailed ﬁnancial data.

Biometrika , 96(3):751–760.Wong, C. S. and Li, W. K. (2000). On a mixture autoregressive model.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 62(1):95–115.29u, G. and Genton, M. G. (2015). Eﬃcient maximum approximated likelihood inference forTukey’s g -and- h distribution. Computational Statistics & Data Analysis , 91:78–91.Xu, G. and Genton, M. G. (2016). Tukey max-stable processes for spatial extremes.

SpatialStatistics , 18(Part B):431–443.Xu, G. and Genton, M. G. (2017). Tukey g -and- h random ﬁelds. Journal of the AmericanStatistical Association , 112:1236–1249.Yamamoto, T. (1976). Asymptotic mean square prediction error for an autoregressive model withestimated coeﬃcients.

Journal of the Royal Statistical Society. Series C (Applied Statistics) ,25(2):123–127.Zimmerman, D. L. and Cressie, N. (1992). Mean squared prediction error in the spatial linearmodel with estimated covariance parameters.