[PDF] Calibration methods for spatial Data

Abstract

In an environmental framework, extreme values of certain spatio-temporal processes, for example wind speeds, are the main cause of severe damage in property, such as electrical networks, transport and agricultural infrastructures. Therefore, availability of accurate data on such processes is highly important in risk analysis, and in particular in producing probability maps showing the spatial distribution of damage risks. Typically, as is the case of wind speeds, data are available at few stations with many missing observations and consequently simulated data are often used to augment information, due to simulated environmental data being available at high spatial and temporal resolutions. However, simulated data often mismatch observed data, particularly at tails, therefore calibrating and bringing it in line with observed data may offer practitioners more reliable and richer data sources. Although the calibration methods that we describe in this manuscript may equally apply to other environmental variables, we describe the methods specifically with reference to wind data and its consequences. Since most damages are caused by extreme winds, it is particularly important to calibrate the right tail of simulated data based on observations. Response relationships between the extremes of simulated and observed data are by nature highly non-linear and non-Gaussian, therefore data fusion techniques available for spatial data may not be adequate for this purpose. After giving a brief description of standard calibration and data fusion methods to update simulated data based on the observed data, we propose and describe in detail a specific conditional quantile matching calibration method and show how our wind speed data can be calibrated using this method.

Full PDF

aa r X i v : . [ s t a t . A P ] S e p Calibration methods for spatial Data

M A Amaral Turkman , K F Turkman , P de Zea Bermudez ,S Pereira , P Pereira and M Carvalho Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa Instituto Politécnico Setúbal and Centro de Estatística e Aplicações, Universidadede Lisboa University of Edinburgh and Centro de Estatística e Aplicações, Universidade deLisboaE-mail: [email protected]

Abstract.

In an environmental framework, extreme values of certain spatio-temporalprocesses, for example wind speeds, are the main cause of severe damage in property,such as electrical networks, transport and agricultural infrastructures. Therefore,availability of accurate data on such processes is highly important in risk analysis, andin particular in producing probability maps showing the spatial distribution of damagerisks. Typically, as is the case of wind speeds, data are available at few stations withmany missing observations and consequently simulated data are often used to augmentinformation, due to simulated environmental data being available at high spatialand temporal resolutions. However, simulated data often mismatch observed data,particularly at tails, therefore calibrating and bringing it in line with observed datamay oﬀer practitioners more reliable and richer data sources. Although the calibrationmethods that we describe in this manuscript may equally apply to other environmentalvariables, we describe the methods speciﬁcally with reference to wind data and itsconsequences. Since most damages are caused by extreme winds, it is particularlyimportant to calibrate the right tail of simulated data based on observations. Responserelationships between the extremes of simulated and observed data are by naturehighly non-linear and non-Gaussian, therefore data fusion techniques available forspatial data may not be adequate for this purpose. Although, our ultimate goalis the development of statistical methods for data fusion and calibration that canextrapolate beyond the range of observed data—into the tails of a distribution—inthis work we will concentrate on calibration methods for the whole range of data.After giving a brief description of standard calibration and data fusion methods toupdate simulated data based on the observed data, we propose and describe in detaila speciﬁc conditional quantile matching calibration method and show how our windspeed data can be calibrated using this method. We also brieﬂy explain how calibrationcan be extended speciﬁcally to data coming from the tails of simulated and observeddata, using asymptotic models and methods suggested by extreme value theory alibration methods for spatial Data Keywords : data fusion, Bayesian hierarchical models, spatial extremes

1. Introduction

Extreme values of certain spatio-temporal processes, such as wind speeds, are themain cause of severe damage in property, from electricity distribution grid to transportand agricultural infrastructures. Accurate assessment of causal relationships betweenenvironmental processes and their eﬀects on risk indicators, are highly important in riskanalysis, which in return depends on sound inferential methods as well as on good qualityinformative data. Often, information on the relevant environmental processes comesfrom monitoring networks, as well as from numerical-physical models (simulators) thattypically solve a large set of partial diﬀerential equations, capturing the essence of thephysical process under study (Skamarock et al. et al. et al. etal. et al. et al. et al. et al. alibration methods for spatial Data et al ., 2019), using anextended Generalized Pareto distribution (Naveau et al. , 2016) for the simulated andobserved data, adequate for calibrating simultaneously the bulk and the tails of thedistribution. Finally, in section 5, this method will be exempliﬁed using a wind speeddata. Further discussion and conclusions are in section 6. alibration methods for spatial Data

2. Statistical Calibration methods; an overview

Calibration plays a crucial role in almost all areas of experimental sciences and can bedeﬁned in a nutshell as âĂĲthe comparison between measurements - one of knownmagnitude or correctness made or set with one device and another measurementmade in as similar a way as possible with a second device âĂĲ (Wikipedia). Whenmeasurements are obtained under random environments, then statistical methodsand models have to be employed to compare such sets of measurements. In thesimplest case of random experiments involving linear calibration, linear regressionmodels are used as models. The univariate calibration is then deﬁned as inverseregression problem (Lavagnini and Magno, 2007). These models and consequentcalibration techniques are, inevitably, restricted to uncorrelated repeated observationsor dependent Gaussian structures, simplifying immensely the problem (Aitchison andDunsmore, 1976). However, more often than not, even in designed experiments, theresponse relationships are mostly nonlinear and therefore Gaussian structures are hardlyjustiﬁable as models. Nonlinear calibration then typically have to be formulatedby conditional speciﬁcation of distributions, and consequently substantial amount ofnumerical integration and approximations are needed.Little is known on calibration of environmental data sets displaying nonlinear, non-Gaussian structures in a spatio-temporal setting. In these cases, deﬁning calibrationthrough spatial linear models as inverse regression problems will oversimplify thestructures and will not be adequate. Often calibrating simulated data based onobserved data is done by calibration of the simulator, namely the numerical-physicalmodel, using Bayesian methods (Kennedy and OâĂŹHagan, 2001, and Wilkinson, 2010).These generic methods are based on the following general ideas: The simulator ﬁrst isapproximated by a linear parametric emulator, and using Bayesian arguments, data areused to convert prior knowledge on these model parameters to posterior distributions.The newly generated data from this approximate emulator is then treated as thecalibrated data.There are many statistical calibration methods for diﬀerent purposes and based ondiﬀerent paradigms. We can classify these methods into(i) Quantile matching-based approaches,(ii) Inverse regression,(iii) Simulator–emulator-based approaches,(iv) Data assimilation/data fusion.Before describing these methods, we give here some basic notation.We denote by Y ( s, t ) and X ( s, t ) , respectively the observed and simulated windspeeds at location s ∈ R and time t . To simplify notation, often we will use Y and X for observed and simulated wind speeds when data are used without any space-timereference. Typically X are simulated over a regular grid, say B , often represented by alibration methods for spatial Data (cid:127)2 (cid:127)1 0 1 2 . . . . D en s i t y f y f x (cid:127)1.5 (cid:127)0.5 0.5 1.5 (cid:127) (cid:127) x x ∗ Figure 1.

Illustration of the quantile matching approach points s B which correspond to the centroid of the grid cells, whereas Y are observed instations located at diﬀerent spatial points s . For the time being, if we ignore totally space-time variations and dependence structures,calibration can be seen as a simple scaling making use of marginal distributions ﬁttedrespectively to X and Y (CDF transform method, Michelangeli et al , 2009).Suppose we have a set of n observed y i and simulated x i , i = 1 , ..., n data. Let F Y and F X be, respectively, the distribution functions of Y and X . Then the new calibrated(scaled) data x ∗ i is deﬁned as x ∗ i = F − Y ( F X ( x i )) , i = 1 , . . . , n. (1)Since P ( X ∗ ≤ z ) = F Y ( z ) , calibrated data has the same distribution as the observed data. Note that if F X = F Y then x ∗ i = x i . Figure 1 depicts the result of applying this calibration method when Y follows a Student distribution with 3 degrees of freedom, and X follows a standardnormal distribution.This calibration method depends on the marginal distributions of the randomvariables involved Y and X and hence it does not make use of the expected strongdependence between the two sets of data.An ideal calibration should involve the joint distribution of Y and X deﬁned in someway. A possibility is the use of a conditional quantile matching approach, which will bedescribed in section 3. Further, in the same section, we also introduce an extension tocover space-time non-homogeneity by scaling (calibrating) the data from x ∗ ( s, t ) = F − Y ( s,t ) ( F X ( s,t ) ( x ( s, t )) , (2) alibration methods for spatial Data Y ( s, t ) and X ( s, t ) for every s and t .These distributions will be estimated by ﬁtting them and considering theparameters as smooth functions of spatially and temporarily varying covariates andspace-time latent processes as in section 5. Calibration is usually seen as a method of adjusting the scale of a measurementinstrument on the basis of an informative experiment. As such, it is seen as an inverseregression problem. However, there are several problems associated with this approach(see, e.g. Kang et al. , 2017). Aitchison and Dunsmore (1975) approach the problem froma Bayesian perspective by deﬁning the calibrative distribution . See also (e.g. Racine-Poon, 1988, Osborne, 1991 and Muehleisen and Bergerson, 2016).According to Aitchison and Dunsmore’s proposal, a parametric model is ﬁtted to arandom vector ( X, Y ) , where, e.g. X is the random variable representing a measurementobtained in a laboratory and Y the random variable representing the measurementobtained in a ﬁeld experiment. This parametric model is deﬁned through a conditionalspeciﬁcation such that f ( X,Y ) ( x, y | ψ, θ ) = f X | Y ( x | y, ψ ) f Y ( y | θ ) . There are two parameters involved, namely the arrival parameter ψ and the structuralparameter θ . A further assumption is that the initial sources of information arestochastically independent, so that p ( ψ, θ ) = p ( ψ ) p ( θ ) . Now suppose that one has a future laboratory experiment resulting in a value x f and that further experiments follow the same pattern of arrival as the original trials ( x , y ) = ( x i , y i , i = 1 , ..., n ) . Now the data available is ( x i , y i , x f ) , i = 1 , ..., n and theunknowns are ( y f , ψ, θ ) . The objective is to obtain the predictive distribution (called inthis case the calibrative distribution ) for the corresponding ﬁeld experiment Y f , whichis simply obtained by integrating out ψ and θ , p ( y f | x , y , x f ) = Z f X ( x f | y f , ψ ) p ( y f , ψ, θ | x f , x , y ) dψdθ where p ( y f , ψ, θ | x f , x , y ) ∝ f X ( x f | y f , ψ ) f Y ( y t | θ ) n Y i =1 f X ( x i | y i , ψ ) f Y ( y i | θ ) p ( ψ ) p ( θ ) , assuming that the trial records are independent.Generalizing to the situation under study, without considering space and timedependence, for a model [ X | Y ][ Y ] , the objective is to obtain the distribution Y ( s ) | x ( s ) , x ( s ∗ ) , y ( s ∗ ) alibration methods for spatial Data Y ( s ) based on the observed and simulated data ( x ( s ∗ ) , y ( s ∗ )) on N stations and the simulated value x ( s ) . Kennedy and O’Hagan (2001) describe calibration as statistical postprocessing ofsimulator deterministic forecast. They assume a computer model describing somephysical system, that is a function of variable inputs x that can be measured andcalibration inputs ν needed to run the model, but whose values are not known in theexperiment. The output of the computer model is then assumed to be some functionof the inputs, say η ( x, u ) . Observations from the ﬁeld experiment are assumed to havebeen observed at u = θ and, possibly at diﬀerent values of x . The model for observationsfor known input variables x is then assumed to be a function of the computer modeloutput η ( x, ν ) , of a true underlying process ξ ( x ) and some model inadequacy describedby δ ( x ) . The objective is to estimate the calibration settings θ consistent with the ﬁeldexperiments and the computer model.Sigrist et al. (2015) give detailed description of stochastic versions of space-timeadvection-diﬀusion PDE’s and their solutions as models for emulators and describe amethod of postprocessing simulated data.However simulator–emulator-based approaches assume detailed information of howemulators work in terms of a set of parameters, which is not the case in most situationsrelated to climate models. Data assimilation or data fusion methods are used to combine diﬀerent sources ofinformation in order to obtain more accurate results. A recent review on dataassimilation is given in Berrocal (2019) together with many references.Among these methods, the interest lies in statistical approaches to dataassimilation/data fusion and in particular to hierarchical Bayesian models (HBM) forcombining monitoring data and computer model output.There are basically two diﬀerent approaches regarding these methods. TheBayesian melding proposed by Fuentes and Raftery (2005) assumes that there is a truelatent point-level process Z ( s ) ( a GP with spatially varying mean and non-stationarycovariance function ) to which both the observed Y ( s ) and simulated X ( s ) processesare linked to. The observed values are governed by this latent process with a randomerror and simulated values are expressed as a linearly calibrated integral over a grid cell(scaled by the area of the cell) of the latent point-level process, Y ( s ) = Z ( s ) + ǫ ( s ) X ( s ) = a ( s ) + b ( s ) Z ( s ) + δ ( s ) , where δ ( s ) explains the random deviation at location s with respect to the underlyingtrue process Z ( s ) . The aim is to obtain the posterior predictive distribution of thetruth Z at a new site s ∗ . However, the misalignment of the two processes involved alibration methods for spatial Data et al. (2010)propose a spatio-temporal extension of this Bayesian melding approach.The other approach suggested by Berrocal et al. (2010) is a Bayesian hierarchicaldownscaler model. They consider a spatial linear model relating the monitoring stationdata and the computer model output, with spatially varying coeﬃcients which are inturn modeled as Gaussian processes (GPs). These models oﬀer the advantage of localcalibration of the numerical model output without incurring in problems due to thedimensionality of the computer model output, since they are only ﬁtted at the gridcells where the monitoring stations reside. An extension to this downscaler model,by borrowing information from neighboring grid cells, was introduced by Berrocal etal. (2012). The proposed approach, the downscaler model for the observations andsimulated data from the computer model is Y ( s ) = β + β ( s ) + β + β ( s ) x ( B ) + e ( s ) , e ( s ) i.i.d. N (0 , τ ) , with B the grid cell containing sX ( B ) = µ + V ( B ) + η ( B ) , η ( B ) i.i.d. N (0 , σ ) where V ( B ) is a GP model with a ICAR structure (Rue and Held, 2005), and β ( s ) ismodeled as a mean-zero GP with exponential covariance structure.A smoothed version is possible considering Y ( s ) = β + β ( s ) + β ( µ + V ( B )) + e ( s ) , e ( s ) i.i.d. N (0 , τ ) with B the grid cell containing s .The aim is to obtain the predictive distribution of Y and its expected value at gridcell level.They also considered a smoothed downscaler using spatially varying random weightsand a space-time extension.

3. Calibration methods for bulk and tails

Pereira et al. (2019) develop a covariate-adjusted version of the quantile matching-basedapproach as in (1) where the distributions of simulated and real data change along acovariate. At the same time they suggest a regression method that simultaneouslymodels the bulk and the (right) tail of the distributions involved using the extendedGeneralized Pareto distribution (EGPD) (Naveau et al. , 2016) as a model for both thesimulated and observed data.Under fairly general conditions, according to the asymptotic theory of extremes,the generalized Pareto distribution (GPD) appears as a natural model for the right tailof a distribution, by focusing on the excesses over a high but ﬁxed threshold. Here, thechoice of this threshold plays a very important role in inference, ignoring the part of the alibration methods for spatial Data et al. (2004). The EGPDmodelling strategy suggested by Naveau et al (2016) avoids this selection problem, aswe will see in next section.In what follows we propose an extension of this conditional quantile matchingcalibration for the bulk and tails, to spatial temporal data. et al (2016) EGPD models

Naveau et al. (2016) suggest an extension of Generalized Pareto model tailored for boththe bulk and tails, and—contrarily to most methods for extremes— does not require athreshold to be selected. The objective of this extension is to generate a new class ofdistributions with GPD type tails consistent with extreme value theory, but also ﬂexibleenough to model eﬃciently the main bulk of the observed data without complicatedthreshold selection procedures.Let Y be a positive random variable with cumulative distribution function deﬁnedas: F Y ( y | θ ) = G (cid:16) H ( y | ξ, σ ) (cid:17) , where G is a function obeying some general assumptions (see Naveau et al. , 2016) and H is the cumulative distribution function of a Generalized Pareto distribution (GPD),that is H ( y | ξ, σ ) = ( − (1 + ξσ y ) − /ξ + , ξ = 0 . − exp( − yσ ) , ξ = 0 .with σ > , and y > if ξ ≥ and y < − σξ if ξ < . The parameter σ is a dispersionparameter while ξ is a shape parameter controlling the rate of decay of the right tail ofa distribution (de Zea Bermudez and Kotz, 2010).Naveau et al. (2016) consider four forms of G ( u ) resulting in four diﬀerent classesof distributions. Here we use one of the forms, namely, G ( u ) = u κ where κ is a parameter controlling theshape of the lower tail, although the theory can be easily extended to any of the otherforms of the G function.Let us assume that both random variables X and Y are space-time dependent andwe want to calibrate X based on Y . As in (2) the calibrated data is given as x ∗ ( s, t ) = F − Y ( s,t ) ( F X ( s,t ) ( x ( s, t )) , Now assume further that both random variables are distributed as an EGPD withdiﬀerent parameters. In order to better accommodate for the situation ξ < we makea transformation δ = − σξ . Hence, for ξ x = 0 F X ( s,t ) ( x ( s, t ) | δ x ( s, t ) , ξ, κ x ) = − (cid:18) − δ x ( s, t ) x ( s, t ) (cid:19) − /ξ x + ! κ x , (3) alibration methods for spatial Data ξ y = 0 F Y ( s,t ) ( y ( s, t ) | δ y ( s, t ) , ξ y , κ y ) = − (cid:18) − δ y ( s, t ) y ( s, t ) (cid:19) − /ξ y + ! κ y . (4)Although it is assumed that these random variables are conditional independent,a dependence structure is introduced through the transformed space-time dependentparameters δ x , δ y by modelling them as a function of a common random spatio-temporalprocess, in a Bayesian hierarchical modelling framework.As an exempliﬁcation of this modelling strategy, in the next section, we will builta Bayesian hierarchical model for the wind speed data.

4. Bayesian hierarchical model for the wind speed data

A preliminary data analysis of the wind speed data used in this study, shows thatobserved and simulated data are consistent with the case ξ < and hence, thedistributions for X and Y will have an end-point characterized by the respectiveparameter δ .We assume that the observed data { Y ( s i , t j ) , i = 1 , ..., N ; j = 1 , ..., T } , with N thenumber of stations with observed data in the study period and T the length of the timeperiod, follow a distribution as in (4), where δ y ( i, j ) ∼ Exp ( λ y ( i, j )) , δ y ( i, j ) > max( y ) , i.e., follows a shifted exponential distribution with log( λ y ( i, j )) = β y + W ( s i )+ Z ( t j ) , and W follows a Multivariate Gaussian process, deﬁned on the space, W ∼ M V N (0 , τ W Σ W ) .The matrix Σ W has diagonal elements equal to 1 and oﬀ-diagonal elements, Σ iℓ = f ( d iℓ ; α ) , where f ( . ; . ) is a function of d iℓ , the centroids’ distance of every two stations s i and s ℓ , and α a parameter representing the radius of the ’disc’ centred at each s . The τ W is a precision parameter. For the temporal random process we assume a randomwalk process of order 1, Z ∼ M V N (0 , τ Z Σ Z ) , where τ Z is a precision parameter and Σ Z is a matrix with a structure reﬂecting the fact that any two increments z i − z i − areindependent (Rue and Held, 2005).We assume, as well, that the simulated data { X ( s i , t j ) , i = 1 , ..., N s ; j = 1 , ..., T } follow a distribution as in (3) with N s total number of stations, where the model for δ shares the same latent processes W and Z with the model for the observed data, suchthat δ x ( i, j ) ∼ Exp ( λ x ( i, j )) , δ x ( i, j ) > max( x ) , with log( λ x ( ij )) = β x + W ( s i ) + Z ( t j ) . To complete the Bayesian hierarchical model we consider the following priorspeciﬁcation for the parameters and hyperparameters of the models β y , β x i.i.d. N (0 , . , κ y , κ x i.i.d. Ga (0 . , . , ξ y , ξ x i.i.d. U ( − . , , τ W , τ Z i.i.d. Ga (1 , . , α ∼ U (0 . , . .Finally, the calibrated values are obtained as the mean of the predictive distributionof F − Y ( F X ( x ( s i , t j )) at s i , i = 1 , ..., N s and time t j , j = 1 , ..., T . alibration methods for spatial Data

5. Application to wind speed data

We used observed and simulated wind speed data from the period 01/01/2013 to28/02/2013, so T = 59 . There are N = 51 stations where we have both observed andsimulated daily maximum wind speeds. Additionally we have extra 66 stations withsimulated values for the maximum daily wind speeds, so that N s = 117 . In Figure 2 wedepict the median of observed and simulated wind speeds for the 51 stations togetherwith the 2.5% and 97.5% empirical quantiles (95% IQR). stations w i nd s peed median observedmedian simulated0.95IQR_obs0.95IQR_sim Figure 2.

Median of observed and simulated wind speeds for the 51 stations, and the95% IQR wind speeds by station (dashed lines).

The model was implemented in

R2OpenBUGS (Sturtz et al. (2005). In Table 1 weshow the summary statistics for the marginal posterior distributions of the parametersof the model.We observe that the posterior mean of κ y has a much smaller mean than theposterior mean of κ x which is consistent with the fact that, in general, simulated dataare shifted to the right in relation to the observed data, indicating the possible existenceof some bias in the simulated data. The posterior mean of the precision (inverse of thevariance) parameters for the space model ( τ W ) and for the temporal model ( τ Z ) suggestthat time dependence is stronger than space dependence. The posterior mean for β y isslightly smaller than the posterior mean for β x . This naturally contributes for highervalues for σ y ( i, j ) relatively to σ x ( i, j ) and with greater dispersion, as it can be seenin Figure 3 where we show daily boxplots of the posterior means of the parameters σ ( i, j ) , ∀ j for both models. In that ﬁgure it is marked two dates, 19 of January, a daywhere it was observed a storm with heavy winds (storm GONG, maximum observedwind 29.6m/s), particularly in regions close to the littoral, and 14th of February, a verymild day all over the country (Valentine’s day; maximum observed wind 8.20m/s). Thevariation observed along the days is consistent with the fact that on windy days themaximum wind speed along the stations varies much more than on mild days. Also thetemporal dependence is clear in these pictures.These two days were studied, in particular, for exempliﬁcation of the conditional alibration methods for spatial Data Table 1.

Summary statistics for the marginal posterior distributions mean standard deviation 2.5% quantile median 97.5% quantile min max α β y -1.094 0.149 -1.376 -1.093 -0.806 -1.541 -0.595 β x -0.854 0.134 -1.105 -0.849 -0.598 -1.243 -0.365 κ y κ x τ W τ Z ξ y -0.070 0.002 -0.074 -0.070 -0.067 -0.077 -0.065 ξ x -0.081 0.001 -0.084 -0.081 -0.078 -0.085 -0.076 days σ days σ Figure 3.

Boxplot of the posterior means of σ y ( i, j ) (left) and σ x ( i, j ) (right) quantile calibration method proposed. For the purpose of exempliﬁcation of the resultswe represent in Figures 4 and 5, on the left, a kernel density estimation (considering allthe stations) for the observed and simulated maximum wind speed on that day, togetherwith the mean of the predictive distribution of the calibrated data as deﬁned in (2). Onthe right side we represent the observed and simulated maximum wind speed on that dayfor each station, together with the mean of the predictive distribution for the calibrateddata.We observe that, on a storm day (Figure 4) the observed winds have longer tailsthan simulated winds. The calibration method was able to capture both tails of thedistribution for the observed data, although it shifted the bulk of the distribution to theleft. Regarding a mild windy day (Figure 5), the distribution of the simulated data isshifted to the right relatively to the distribution of the observed data with longer tails,as it was observed in a preliminary study. This bias is corrected with the calibrationmethod.In Figures 6 and 7 there is a spatial representation of the observed, simulated andcalibrated values for each of these two days. alibration methods for spatial Data . . . . storm GONG wind speed l ObservedSimulatedCalibrated storm GONG stations w i nd s peed ObservedSimulatedCalibrated

Figure 4.

Kernel density estimation (left), observed and simulated maximum windspeed for each station, together with the mean of the predictive distribution for thecalibrated data, for a storm day. . . . . . . . wind speed l ObservedSimulatedCalibrated stations w i nd s peed ObservedSimulatedCalibratedCali. IC

Figure 5.

Kernel density estimation (left), observed and simulated maximum windspeed for each station, together with the mean of the predictive distribution for thecalibrated data, for a mild day.

Figure 6.

Storm day: observed, simulated and calibrated maximum wind speeds alibration methods for spatial Data Figure 7.

Mild day: observed, simulated and calibrated maximum wind speeds

6. Discussion and further extensions

In this article we discussed several possible ways of calibrating data obtained froma simulator based on observations at stations. We also proposed and implementeda conditional quantile matching calibration (CQCM) using a space-time extendedgeneralized Pareto distribution.The performance of the CQCM method was exempliﬁed with two speciﬁc days, astorm day and a mild day. In both cases the calibrated data matched well the observeddata on the tails, although on the storm day it did not capture well the bulk of thedistribution. Ideally this method should be extended to the grid level, since the simulatorproduces data at a ﬁne grid level and this is much more interesting if the objective is theconstruction of a risk map. However this extension is not trivial and some assumptionsregarding the model structure have to be assumed.Damages in electricity grid are basically governed by extreme winds and primarilysimulated and observed data coming from the right tail diﬀer. Hence adequatecalibration methods must be speciﬁcally adapted to extreme observations coming fromthe right tails and methods and models to be used in calibration should ideally becompatible with extreme value theory. A range of approaches for characterising theextremal behaviour of spatial process have been suggested and a brief comparison ofthese methods can be found in Tawn et al. (2018). Downscaling method described byTowe et al. (2017)— based on the conditional extremes process—is more suitable, withadequate modiﬁcations, to calibrate extreme simulated data based on observed windspeeds. Work on this approach is under progress.

Acknowledgments

Research partially ﬁnanced by national funds through FCT - Fundação para aCiência e a Tecnologia, Portugal, under the projects PTDC/MAT-STA/28649/2017and UIDB/00006/2020. alibration methods for spatial Data References

Aitchison, J. and Dunsmore, I. R. (1975).

Statistical Prediction Analysis . Cambridge: CambridgeUniversity Press.Beirland, J., Goegebeur, Y., Segers, J. and Teugels, J. (2004).

Statistics of Extremes: Theory andapplications . J Wiley, Chichester.Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004).

Hierarchical Modeling and Analysis for SpatialData , Boca Raton, FL: Chapman and Hall.Berrocal, V. J. (2019). Data assimilation. In Gelfand AE, Fuentes M, Hoeting JA and Smith RL,

Handbook of Environmental and Ecological Statistics , 133 – 151, Chapman and Hall/CRC.Berrocal, V. J., Gelfand, A. E. and Holland, D. M. (2012). Space-time data fusion under error incomputer model output: An application to modeling air quality.

Biometrics , , 837 – 848.Berrocal, V. J., Gelfand, A. E. and Holland, D. M. (2014), Assessing Exceedance of Ozone Standards:A Space Time Downscaler for Fourth Highest Ozone Concentrations, Environmetrics , , 279 – 291.Cardoso, R. M., Soares, P. M. M, Miranda,P. M. A. and Belo-Perira, M. (2013), WRF HighResolution Simulation of Iberian Mean and Extreme Precipitation Climate, International Journalof Climatology , , 2591 – 2608.De Zea Bermudez, P., and S. Kotz. 2010. Parameter estimation of the generalized ParetodistributionâĂŤPart I. J. Stat. Plan. Inference , , 1353 âĂŞ- 1373.Foley,K.M. and Fuentes, M. (2008). A Statistical Framework to Combine Multivariate Spatial Dataand Physical Models for Hurricane Surface Wind Prediction. Journal of Agricultural, Biological,and Environmental Statistics , ,37 – 59.Fuentes, M. and Raftery, A. E. (2005). Model evaluation and spatial interpolation by Bayesiancombination of observations with outputs from numerical models. Biometrics , , 36 – 45.Kennedy, M. and O’Hagan, A. (2001). Bayesian Calibration of Computer Models. Journal of the RoyalStatistical Society, Series B , , 425 – 464.Kalnay (2003) Atmospheric modeling, data assimilation and predictability . Cambridge University Press.Kang, P., Koo, C. and Roh, H. (2017). Reversed inverse regression for the univariate linear calibrationand its statistical properties derived using a new methodology.

International Journal of Metrologyand Quality Engineering , .Lavagnini, I. and Magno, F.(2007). A statistical overview on univariate calibration, inverse regression,and detection limits: application to gas chromatography/mass spectrometry technique. MassSpectrometry Techniques , , 1 – 18McMillan N. J., Holland, D. M., Morara, M., and Feng J. (2010). Combining numerical model outputand particulate data using Bayesian space-time modeling. Environmetrics , , 48 – 65.Michelangeli, P. A., Vrac, M., and Loukos, H. (2009). Probabilistic downscaling approaches: Applicationto wind cumulative distribution functions. Geophys. Res. Lett. , , 1 – 6.Muehleisen, R. T. and Bergerson, J. (2016). Bayesian Calibration - What, Why And How. InternationalHigh Performance Buildings Conference . Paper 167.Naveau P., Huser R., Ribereau P. and Hannart A. (2016). Modeling jointly low, moderate, and heavyrainfall intensities without a threshold selection.

Water Resources Research . 2753 – 2769.Osborne, C. (1991). Statistical Calibration: A Review.

International Statistical Review / RevueInternationale De Statistique , , 309 – 336.Pereira, S., Pereira, P., de Carvalho, M. and de Zea Bermudez, P. (2019). Calibration of extreme valuesof simulated and real data. Proceedings of International Workshop on Statistical Modelling 2019 .Rue, H. and Held, L. (2005).

Gaussian Markov Random Fields: Theory and Applications . Monographson Statistics and Applied Probability, vol. 104. Ghapman& Hall: London.Racine-Poon, A. (1988). A Bayesian Approach to non-linear calibration problems.

JASA , , 650 – 656.Skamarock, W.C ., Klemp, J.B ., Dudhia, J., Gill, D. O., Barker, D. M., Duda, M. G., Huang, X. Y.,Wang, W. and Powers, J. G. (2008). A description of the advanced research WRF version 3 . NCARTech. Note TN-475_STR . alibration methods for spatial Data Sigrist, F., Künsch, H.R. and Stahel, W.A. (2015). Stochastic partial diﬀerential equation basedmodelling of large space-time data sets.

Journal of the Royal Statistical Society, Series B , ,3 – 33.Sturtz, S., Ligges, U. and Gelman, A. (2005). R2WinBUGS: A Package for Running WinBUGS fromR. Journal of Statistical Software , , 1 – 16.Towe, R.P., Sherlock, E.F., Tawn, J.A., Jonathan, P. (2017). Statistical downscaling for future extremewave heights in the North Sea. Annals of Applied Statistics . , 2375 – 2403.Wilkison, R. D. (2010). Bayesian Calibration of expensive multivariate computer experiments. InComputational Methods for Large scale inverse problems and quantiﬁcation of uncertainity . J. Wileyand Sons.Zidek, J.V., Le, N. D. and Liu, Z. (2012). Combining data and simulated data for space-time ﬁelds:Application to ozone.

Environmental and Ecological Statistics ,19