Modeling Spatial Dependence with Cauchy Convolution Processes
aa r X i v : . [ s t a t . M E ] F e b Modeling Spatial Data with CauchyConvolution Processes
Pavel Krupskii , Rapha¨el Huser February 16, 2021
Abstract
We study the class of models for spatial data obtained from Cauchy convolution processesbased on different types of kernel functions. We show that the resulting spatial processes have someappealing tail dependence properties, such as tail dependence at short distances and independence atlong distances with suitable kernel functions. We derive the extreme-value limits of these processes,study their smoothness properties, and consider some interesting special cases, including Marshall–Olkin and H¨usler–Reiss processes. We further consider mixtures between such Cauchy processesand Gaussian processes, in order to have a separate control over the bulk and the tail dependencebehaviors. Our proposed approach for estimating model parameters relies on matching model-basedand empirical summary statistics, while the corresponding extreme-value limit models may be fittedusing a pairwise likelihood approach. We show with a simulation study that our proposed inferenceapproach yields accurate estimates. Moreover, the proposed class of models allows for a wide rangeof flexible dependence structures, and we demonstrate our new methodology by application to atemperature dataset. Our results indicate that our proposed model provides a very good fit to thedata, and that it captures both the bulk and the tail dependence structures accurately.
Keywords:
Copula; Extreme-value model; Kernel convolution process; Short-range spatial depen-dence; Spatial process; Tail dependence University of Melbourne, Parkville, Victoria, 3010, Australia. E-mail: [email protected]. Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah Universityof Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.. E-mail: [email protected].
Introduction
Assessment of environmental risk associated with unprecedented air or sea temperatures (Davi-son and Gholamrezaee, 2012; Huser and Genton, 2016; Hazra and Huser, 2020), extreme flooding(Thibaud et al., 2013; Huser and Davison, 2014; Castro-Camilo and Huser, 2020), strong wind gusts(Engelke et al., 2015; Oesting et al., 2017; Huser et al., 2017), or high air pollution levels (Vettoriet al., 2019, 2020) requires the computation of joint tail probabilities. Because the process of inter-est is always observed at a finite set of monitoring sites, spatial modeling is needed whenever therequired probabilities involve one or more unobserved locations, and the assumed tail dependencestructure plays a crucial role in estimating these risks.Gaussian processes have been widely used in the literature to model spatio-temporal dependence,because they are computationally convenient and they are parameterized using various types of co-variance functions that can capture features such as the dependence range and the smoothness ofrealizations (see, e.g., Gneiting, 2002, and Gneiting et al., 2007). However, Gaussian models haverestrictive symmetries and cannot capture strong tail dependence that is often found in spatial data,which makes them unsuitable when the main interest lies in the tails. More flexible yet computation-ally feasible spatial models are required. To circumvent the limitations of multivariate normality,copula models have become increasingly popular and have found a wide range of environmentalapplications in geology (Gr¨aler and Pebesma, 2011), hydrology (B´ardossy, 2006; B´ardossy and Li,2008) and climatology (Erhardt et al., 2015), among others. A copula is simply defined as a multi-variate cumulative distribution function (CDF) with standard uniform Unif(0 ,
1) marginals. Sklar(1959) showed that for any continuous d -dimensional CDF F with marginals F , . . . , F d there existsa unique copula C such that F ( z , . . . , z d ) = C { F ( z ) , . . . , F d ( z d ) } . A random vector ( Z , Z ) ⊤ with margins F , F and copula C is said to be upper tail-dependent if the limit λ U = lim u → Pr { Z > F − ( u ) | Z > F − ( u ) } = lim u → − u + C ( u, u )1 − u , (1)1xists and is positive, i.e., λ U >
0, and is upper tail-independent if λ U = 0. An analogous definitionholds for the lower tail based on the coefficient λ L = lim u → Pr { Z ≤ F − ( u ) | Z ≤ F − ( u ) } =lim u → C ( u, u ) /u . A two-dimensional copula C is said to be tail-symmetric if λ L = λ U and isreflection-symmetric if C = C R , where C R ( u , u ) := − u + u + C (1 − u , − u ) is thereflected copula. While multivariate normal vectors are always reflection-symmetric, as well astail-independent when the correlation is less than one (Ledford and Tawn, 1996), other copulamodels can be used to construct flexible multivariate distributions with arbitrary marginals andvarious tail dependence structures. Certain copula families, such as vine models (Aas et al., 2009;Kurowicka and Joe, 2011), are very flexible but lack interpretability with spatial data. By contrast,factor copula models (Krupskii and Joe, 2013) have been proposed as flexible models capturingnon-Gaussian features like reflection and/or tail asymmetry and strong tail dependence, and theycan be naturally extended to the spatial context; see Krupskii et al. (2018) and Krupskii and Genton(2019). However, since these models are built from common underlying random factors affecting allspatial sites simultaneously, they are unable to capture full independence at large distances. Hence,these models are only suitable for spatial data observed over small homogeneous spatial regions.To accurately model the data’s tail behavior, an alternative approach might be to rely on modelsjustified by Extreme-Value Theory; see Davison and Huser (2015) for a general review on statisticsof extremes, and Davison et al. (2012), Davison et al. (2019) and Huser and Wadsworth (2020)for reviews on spatial extremes. Classical extreme-value models, such as max-stable processes—characterized by extreme-value copulas—and generalized Pareto processes, stem from asymptotictheory for block maxima and high threshold exceedances, respectively. However, despite theirpopularity, these extremal models suffer from several drawbacks: first, these limit models cannotcapture weakening of dependence for increasing quantile levels. In particular, with Pareto processes,the conditional exceedance probability { − u + C ( u, u ) } / (1 − u ) that appears in (1) is constantacross all quantiles u ∈ [0 ,
1] (Rootz´en et al., 2018), while with extreme-value copulas, one has2 k ( u /k , u /k ) = C ( u , u ) for all ( u , u ) ⊤ ∈ [0 , , k = 1 , , . . . . While these strong restrictionson the form of the copula C are indeed justified asymptotically, they may not be satisfied atfinite levels (always considered in finite samples), and this has major implications in practice forassessing the risk of simultaneous extremes over a spatial region. Several recent studies have indeedshown that environmental extreme events are often found to be more spatially localized when theyare more extreme (Huser and Wadsworth, 2019), a property that these asymptotic extreme-valuemodels cannot capture. Second, a consequence of these stability properties is that these extreme-value models are always tail-dependent unless they are exactly independent. As a result, non-trivialextreme-value models cannot capture independence at large distances, which makes them unsuitableover large spatial domains similarly to factor copula models. Thirdly, these models have complicatedlikelihood functions that are costly to evaluate for inference (Padoan et al., 2010; Castruccio et al.,2016; de Fondeville and Davison, 2018; Huser et al., 2019), though recent progress on graphicalmodels for Pareto processes opens the door to higher-dimensional likelihood inference (Engelke andHitz, 2020). Finally, because these models describe the limiting behavior of extreme events, theyare typically fitted using only a fraction of observations, thus wasting a lot of information that ispotentially useful for accurate estimation of unknown model parameters.To circumvent some of the limitations of classical extreme-value models, recent work has focusedon the development of “sub-asymptotic” models for extremes that provide additional tail flexibil-ity at finite levels, and that can smoothly bridge both tail dependence classes under the sameparametrization; see, e.g., Wadsworth and Tawn (2012), Wadsworth et al. (2016), Hua (2017), Suand Hua (2017), Huser et al. (2017), Huser et al. (2019), Bopp et al. (2020), Huser et al. (2020), andthe recent review paper Huser and Wadsworth (2020). More recently, an alternative approach basedon single-site conditioning has been proposed by Wadsworth and Tawn (2019) to flexibly capturevarious forms of tail dependence structures, although the proposed model does not possess a con-venient unconditional formulation. However, except for the max-mixture model of Wadsworth and3awn (2012) and the conditional extremes model of Wadsworth and Tawn (2019), sub-asymptoticmodels proposed in the literature cannot capture changes in the tail dependence class as a functionof distance between sites. In other words, while it is reasonable to expect that strong tail depen-dence prevails at short distances and tail independence at larger distances, most proposed modelsin the literature assume either tail independence or tail dependence between all pairs of sites at anydistance, and do not capture full independence as the distance between sites increases arbitrarily.In this paper, we address these shortcomings by considering process convolutions of the form Z ( s ) = Z R q k ( s , s ∗ ) W (d s ∗ ) , s ∈ R q , (2)where k ( s , s ∗ ) ≥ R R q k ( s , s ∗ )d s ∗ < ∞ for all s ∈ R q ) and W is a particular L´evy process (Sato, 1999) with independent increments. Forsimplicity, we hereafter assume that q = 2 (unless specified otherwise), although most of our resultsand models can easily be extended to the case q = 1 or q >
2. Process convolutions have been usedextensively to model spatial data; see for example Higdon (2002), Paciorek and Schervish (2006),Calder and Cressie (2007) and Zhu and Wu (2010). However, the marginal CDF of W ( s ), F W , isusually assumed to be Gaussian, thus leading to a Gaussian process Z ( s ) in (2), which does not havetail dependence. Trans-Gaussian processes Z ∗ ( s ) = t { Z ( s ) } , where t ( · ) is a monotone increasingtransformation, could be used to model spatial data with non-Gaussian marginals (Bousset et al.,2015), but the process Z ∗ ( s ) still possesses the Gaussian copula and thus has the same restrictivedependence structure. More flexible tail structures can be obtained by considering non-Gaussiandistributions for F W in (2) (J´onsd´ottir et al., 2013; Noven et al., 2018), and Opitz (2017) studiedthe dependence properties of the resulting process for an indicator kernel k ( s , s ∗ ) = { s ∗ ∈ A ( s ) } defined in terms of a hypograph indicator set A ( s ). In this paper, we consider instead general classesof kernels in (2) but assume that F W is the Cauchy CDF. Because the Cauchy distribution is stable,the process Z ( s ) in (2) remains Cauchy, which facilitates inference and theoretical calculations, and4hanks to the heavy-tailedness of F W , we will see that the resulting copula can have interesting taildependence structures depending on the choice of the kernel k . While it would also be interestingto consider other types of (potentially skewed) stable distributions for F W , we here focus on theCauchy family, which yields tractable inference and already provides a fairly rich class of models. Inthis paper, we study the dependence properties of these Cauchy convolution process models undergeneral forms of kernel functions k , and we derive their limiting extreme-value copulas, which turnout to be characterized in terms of a moving maximum representation. We show that a wide classof extremal dependence structures can be obtained, recovering some popular existing models andoffering some new models, as well. In particular, when the kernel function is compactly supported,the resulting process Z ( s ) has the appealing property of local tail dependence, in the sense that itpossesses strong tail dependence at short distances only and full independence at larger distances.To make our Cauchy model even more flexible in the bulk of the distribution, we further extend itby mixing the Cauchy process with a ligther-tailed Gaussian process, so that we retain the sameextremal dependence structure while capturing (non-trivial) tail independence at large distances.Furthermore, we provide some theoretical results on the spatial dependence strength of the resultingextreme-value copulas at short distances, providing new insights into their “smoothness” properties.To make inference for the Cauchy process convolution model (2) or its more flexible mixtureextension efficiently, we develop a fast estimation approach that consists in matching suitable em-pirical and model-based summary statistics. Compared to likelihood-based inference, this approachallows us to easily fit our models in higher dimensions. Unlike most extreme-value inference meth-ods, which typically rely on extreme data from one tail only and discard all the other observations,we opt here for fitting the proposed model to the whole dataset (i.e., without applying any kindof censoring) for several reasons: first and foremost, we are not interested in modeling extremesonly, but the whole distribution, as joint moderately large events from the “bulk” may be as crucialin practice for risk assessment as severe individual extremes; second, our approach based on the5omplete dataset makes full use of the available information, thus getting more accurate parameterestimates; and lastly, our proposed model is highly flexible in the bulk and the tails, so it cangenerally provide a good overall fit, without compromising any part of the distribution. For com-pleteness, we also discuss how the corresponding extreme-value copula may be fitted by pairwiselikelihood.The rest of the paper is organized as follows. In Section 2, we define our proposed Cauchyconvolution model in more detail, study its dependence properties and tail behavior, we derivesome interesting special cases, and we discuss approximate simulation algorithms for the Cauchyconvolution process itself and its extreme-value limits. We also study our proposed process mixture,and explore its improved flexibility. In Section 3, we describe our proposed inference approach, whileIn Section 4, we report the results of a simulation study, and we illustrate the proposed methodologyby application to a temperature dataset from the state of Oklahoma, US. Section 5 concludes witha discussion and some perspective on future research. All proofs are deferred to the SupplementaryMaterial. We consider the process convolution (2), where W is a Cauchy white noise process, i.e., suchthat W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ) are independent and identically distributed (i.i.d.) increments,where Cauchy( c ) is the Cauchy distribution with scale parameter c and probability density function(pdf) f C ( z ; c ) = π − c/ ( z + c ), z ∈ R . The finite-dimensional distributions of the Cauchy processconvolution Z ( s ) in (2) are not tractable in the general case. However, it is possible to derivethe extreme-value (EV) limit of this process as Proposition 1 below shows. Before stating thisresult, we first recall some fundamentals about extreme-value theory. Let X = ( X , . . . , X d ) ⊤ bea d -dimensional random vector with margins F , . . . , F d and copula C . The copula C n describes6he copula of the vector of componentwise maxima from i.i.d. copies X i = ( X i , . . . , X id ) ⊤ of X , i = 1 , . . . , n , i.e., M n = ( M n , . . . , M nd ) ⊤ with M nj = max( X j , . . . , X nj ). Extreme-value copulas,denoted C EV , describe the class of dependence structures that arise as limits of M n (when properlyrenormalized), i.e., C EV ( u , . . . , u d ) = lim n →∞ C n ( u /n , . . . , u /nd ) , ( u , . . . , u d ) ⊤ ∈ [0 , d . (3)It can be shown that extreme-value copulas are such that for any k = 1 , , . . . , one has C EV ( u , . . . , u d ) = C k EV ( u /k , . . . , u /kd ), ( u , . . . , u d ) ⊤ ∈ [0 , d , and they can be characterized as C EV ( u , . . . , u d ) = exp {− ℓ ( − log u , . . . , − log u d ) } , ( u , . . . , u d ) ⊤ ∈ [0 , d , (4)where ℓ is called the stable (upper) tail dependence function and completely determines the limitingextremal dependence structure in the upper tail. From (3) and (4), the stable tail dependencefunction can be expressed as the limit ℓ ( w , . . . , w d ) = lim n →∞ n { − C (1 − w /n, . . . , − w d /n ) } ,and it lies between the bounds max( w , . . . , w d ) ≤ ℓ ( w , . . . , w d ) ≤ P dj =1 w j , w , . . . , w d ≥
0, corre-sponding to perfect dependence and independence, respectively. Therefore, extreme-value copulascannot be negatively dependent. As Cauchy processes are reflection-symmetric, their extremal de-pendence structures are identical in both tails, so hereafter we shall simply refer to ℓ as the stabletail dependence function . More details on extreme-value theory, copula models, and their propertiescan be found, e.g., in Gudendorf and Segers (2010), Segers (2012), and Davison and Huser (2015). Proposition 1
Consider the Cauchy process convolution defined as in (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ). For any collection of sites s , . . . , s d ∈ R , we write Z = Z ( s ) , . . . , Z d = Z ( s d ).Assume that k ( s , s ∗ ) ≤ k max < ∞ is a nonnegative bounded integrable kernel function. Let ℓ ( w , . . . , w d ) : [0 , ∞ ) d [0 , ∞ ) be the stable tail dependence function of the random vector( Z , . . . , Z d ) ⊤ , then ℓ ( w , . . . , w d ) = Z R max j =1 ,...,d w j ζ ( s j , s ∗ )d s ∗ , ζ ( s j , s ) = k ( s j , s ) R R k ( s j , s ∗ )d s ∗ . Z EV ( s ) = sup i =1 , ,... ξ i ζ ( s , s ∗ i ) , ζ ( s , s ∗ ) = k ( s , s ∗ ) R R k ( s , s ∗ )d s ∗ , (5)where { ( ξ i , s ∗ i ) } are points from a Poisson process on (0 , ∞ ) × R with intensity ξ − d ξ × d s ∗ . Thelimit process in (5) is max-stable and has unit Fr´echet margins, i.e., Pr { Z EV ( s ) ≤ z } = exp( − /z ), z >
0. A prominent example is the model introduced by Smith (1990), where the kernel has theshape of a Gaussian density, but the class is much wider than this specific example. We also notethat Fasen (2005) obtained a result similar to Proposition 1 for continuous-time mixed movingaverage processes. Similarly, Rootz´en (1978) studied the tail properties of stable moving averageprocesses and established continuity of sample paths of these processes.The Smith (1990) model is known to have very smooth sample paths; see, e.g., Davison et al.(2019). Although this is already quite clear from the stochastic representation (5) and from spatialrealizations, we now show more formally that the extreme-value limits of Cauchy processes of theform (2) are indeed “smooth” in a certain sense, and therefore that this process is not suitable formodeling spatial data with rough extremal dependence. “Smoothness” of realizations is determinedby the form of dependence at short distances, and the next proposition precisely details the behaviorof the stable tail dependence function for two variables from the limiting extreme-value process thatare located close to each other.
Proposition 2
Suppose that the assumptions of Proposition 1 hold. Moreover, assume that thekernel function in (2) may be written as k ( s , s ∗ ) = g ( k s − s ∗ k ), where g is an integrable nonnegativemonotonically decreasing function. Then, for any sites s , s ∈ R , the stable tail dependencefunction ℓ ( w , w ) of { Z ( s ) , Z ( s ) } ⊤ satisfies ℓ ( w , w ) ≤ { Kδ + o ( δ ) } max( w , w ), as δ :=8 s − s k →
0, where K is some constant that does not depend on w and w . Furthermore, wecan select K such that ℓ (1 ,
1) = 1 + Kδ + o ( δ ).Let Z EV ( s ) be the extreme-value limit (5) of the Cauchy convolution process Z ( s ) defined as in(2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ), which is characterized by the stable tail dependence function ℓ given in Proposition 1. To summarize the dependence structure in a spatial process, it is common toconsider measures of association such as Spearman’s rank correlation coefficient, S ρ ( δ ), or the uppertail dependence coefficient defined in (1), λ U ( δ ), here expressed as a function of the spatial distance δ between two sites. As Cauchy processes are reflection symmetric, we have λ U ( δ ) = λ L ( δ ), so we shallsimply write λ ( δ ) to denote the coefficient of (lower or upper) tail dependence. Recall that while λ ( δ ) is informative about the strength of tail dependence (and is thus identical for Z ( s ) and Z EV ( s )),while S ρ ( δ ) mostly controls dependence in the bulk (and generally differs for Z ( s ) and Z EV ( s )). If( Y , Y ) ⊤ is a random vector distributed according to a joint distribution with continuous margins F , F and underlying copula C , then Spearman’s rank correlation is defined as corr { F ( Y ) , F ( Y ) } and may be equivalently expressed in terms of the copula C as 12 RR [0 , C ( u , u )d u d u −
3. Thenext corollary exploits Proposition 2 to show that the coefficients λ ( δ ) and S ρ ( δ ) correspondingto the extreme-value copula C EV stemming from the limiting extreme-value process Z EV ( s ) haveindeed a quite restrictive behavior at the origin, i.e., for small distances δ ≈ Corollary 1
Under the assumptions of Proposition 2, the limiting extreme-value process Z EV ( s )of the Cauchy convolution process Z ( s ) (defined as in (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ )) hasdependence coefficients satisfying λ ( δ ) = 1 − Kδ + o ( δ ) and S ρ ( δ ) > − Kδ , as δ → S ρ ( δ ) = 1 + O ( δ α ),or λ ( δ ) = 1 + O ( δ α ), with α <
1. In Section 2.2, we describe some specific examples to illustratethis property, and in Section 2.3, we show how to extend Cauchy convolution processes in order to9apture the sub-asymptotic dependence structure more flexibly using mixtures.As already seen, the shape of the kernel k in (2) is crucial as it determines the extremal de-pendence structure of Cauchy convolution processes. In the next corollary, we further show thatthe support of the bivariate extreme-value copula C EV may not be the whole unit square [0 , depending on the kernel. This result may be used to guide the selection of a suitable kernel at apreliminary modeling stage, in order to avoid unreasonable joint behaviors. Corollary 2
Assume that the assumptions of Proposition 2 hold, and that G ( δ ) = max ( s ∗ ,s ∗ ) ⊤ ∈S ∪ + ( δ ) g ( k ( s ∗ + δ, s ∗ ) ⊤ k ) /g ( k ( s ∗ , s ∗ ) ⊤ k ) < ∞ , where S ∪ + ( δ ) := { ( s ∗ , s ∗ ) ∈ R : g ( k ( s ∗ , s ∗ ) ⊤ k ) > g ( k ( s ∗ + δ, s ∗ ) ⊤ k ) > } , with theconvention that x/ ∞ for all x >
0. Then, ℓ ( w , w ) = w for all ( w , w ) ⊤ ∈ [0 , ∞ ) with w /w > G ( δ ) ≥
1. Similarly, ℓ ( w , w ) = w for all ( w , w ) ⊤ ∈ [0 , ∞ ) with w /w > G ( δ ) ≥ C EV ( u , u ) has density zero on the regiondefined by { ( u , u ) ⊤ ∈ [0 , : u < u G ( δ )2 or u < u G ( δ )1 } . In particular, for g ( t ) = exp( − t α ), t ≥
0, with 0 < α ≤
1, we obtain by the triangle inequality that G ( δ ) ≤ max ( s ∗ ,s ∗ ) ∈ R exp {k ( s ∗ , s ∗ ) ⊤ k α −k ( s ∗ + δ, s ∗ ) ⊤ k α } ≤ exp {k ( s ∗ , s ∗ ) ⊤ k α −k ( s ∗ , s ∗ ) ⊤ k α + δ α } = exp( δ α ) , which is finite for all δ ≥
0. Thus, for all kernels of the form k ( s , s ∗ ) = exp {− ( k s − s ∗ k /λ ) α } , with λ > < α ≤
1, the extreme-value copula C EV resulting from Cauchy convolution processesdoes not have full support, thus preventing very low values at one location to occur with very highvalues at another location. By contrast, it is easy to verify that G ( δ ) = ∞ for all α >
1, and for allkernels k ( s , s ∗ ) = g ( k s − s ∗ k ) that are compactly supported, i.e., such that g ( t ) = 0 for all t > r for some range r >
0. In practice, it may be sensible to restrict ourselves to α > k ( s , s ∗ ) = exp {− ( k s − s ∗ k /r ) α } , or to use a different (potentially compactly supported) kernel.10 .2 Special cases We now give some specific examples that have a tractable bivariate extreme-value copula C EV . Example 1 (Marshall–Olkin copula)
Consider the indicator kernel k ( s , s ∗ ) = { s ∗ ∈ A ( s ) } ,where A ( s ) ⊂ R q is a compact subset of R q with area < |A ( s ) | < ∞ for each s ∈ R q . FromProposition 1, it is easy to check that the stable tail dependence function of { Z ( s ) , Z ( s ) } ⊤ is ℓ ( w , w ) = |A ( s ) \A ( s ) ||A ( s ) | · w + |A ( s ) \A ( s ) ||A ( s ) | · w + |A ( s ) ∩ A ( s ) | · max (cid:26) w |A ( s ) | , w |A ( s ) | (cid:27) . The corresponding limiting extreme-value process is thus driven by the Marshall–Olkin copula witha singular component (Marshall and Olkin, 1967).
It is also possible to obtain the extreme-value limit of the Cauchy convolution process in closedform for non-trivial kernels in some special cases. In particular, one can consider the stationarykernel k ( s , s ∗ ) = g ( k s − s ∗ k ) where g is a nonnegative continuous function. Example 2 (Smith (1990) model)
Assume that q = 1 and that k ( s, s ∗ ) = φ ( s − s ∗ ; σ ) , where φ ( · ; σ ) is the Gaussian density function with mean zero and variance σ . For simplicity, let alsoassume that s < s . Adopting the notation of Proposition 2, it follows that { s ∗ ∈ R : w ζ ( s , s ∗ ) >w ζ ( s , s ∗ ) } = { s ∗ ∈ R : s ∗ < σ s − s log( w /w ) + s + s } , and thus ℓ ( w , w ) = w Φ (cid:18) λ ∗ λ ∗ log w w (cid:19) + w Φ (cid:18) λ ∗ λ ∗ log w w (cid:19) , λ ∗ = | s − s | σ , where Φ( · ) denotes the standard normal CDF. The limiting extreme-value copula is thus the H¨usler–Reiss copula (H¨usler and Reiss, 1989). Note that λ ∗ = { γ ( s − s ) } / , where γ ( h ) = ( | h | /σ ) is a valid variogram. This model corresponds to the Smith (1990) max-stable model, which is asmooth limiting case of the Brown-Resnick model (Kabluchko et al., 2009) where the variogram is γ ( h ) = ( | h | /σ ) α , σ > , α ∈ (0 , , instead. More details for the case of q = 2 are provided in theSupplementary Material. xample 3 (Laplace kernel in R ) Assume that q = 1 and let k ( s, s ∗ ) = λ exp ( −| s − s ∗ | /λ ) .For simplicity assume s < s . It follows that for G λ ( δ ) = exp( δ/λ ) with δ = | s − s | , { s ∗ ∈ R : w ζ ( s , s ∗ ) ≥ w ζ ( s , s ∗ ) } = (cid:8) s ∗ ∈ R : s ∗ ≤ λ log( w /w ) + s + s (cid:9) , G λ ( δ ) ≤ w w ≤ G λ ( δ ) , R , w w > G λ ( δ ) , ∅ , w w < G λ ( δ ) , and, for any sites s , s ∈ R , ℓ ( w , w ) = w + w − p w w /G λ ( δ ) , G λ ( δ ) ≤ w w ≤ G λ ( δ ) ,w , w w > G λ ( δ ) ,w , w w < G λ ( δ ) . Thus, the corresponding extreme-value copula density is positive only when u G λ ( δ )2 < u < u /G λ ( δ )2 . Example 4 (Kernels with finite support)
Notice that if the set S ∩ + ( s , s ) := { s ∗ ∈ R q : k ( s , s ∗ ) > , k ( s , s ∗ ) > } = ∅ for some sites s , s ∈ R q , then ℓ ( w , w ) = w + w . In particular,if the kernel k is compactly supported such that k ( s , s ∗ ) = 0 whenever k s − s ∗ k > r for some radius r > , then S ∩ + ( s , s ) is empty if and only if k s − s k ≥ r . This implies that the variables Z EV ( s ) and Z EV ( s ) from the limiting extreme-value process are independent whenever k s − s k ≥ r . Byconstruction, two realizations Z ( s ) and Z ( s ) from the process convolution in (2) are not onlytail-independent but fully independent in this case. A flexible family of compactly supported kernelsincludes k ( s , s ∗ ) = { − ( k s − s ∗ k /r ) α } η + , where a + = max(0 , a ) , for some parameters r, α, η > ,though for identifiability concerns one may fix either α and/or η in practice. Here, the parameter r defines the range of spatial dependence for this process and should not be fixed. Another examplewith compactly supported kernel is detailed in the Supplementary Material. The Cauchy convolution process Z ( s ) defined as in (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ) has theappealing property of being tail-dependent (unless exactly independent) and the strength of depen-dence as a function of distance between two spatial locations is controlled by the kernel function12 ( s , s ∗ ). Moreover, if the kernel k has a compact support, the process is only dependent locally(i.e., at small distances), and is independent at large enough distances. Unfortunately, Cauchyconvolution processes have three main drawbacks when modeling spatial data:1. As Gaussian processes, the resulting copula is reflection-symmetric (and in particular, tail-symmetric), which might not be realistic in some applications;2. Depending on the kernel k and the distance between sites s , s , two variables Z ( s ) and Z ( s )can either be tail-dependent, or exactly independent, but the intermediate case of (non-trivial)tail independence is not possible. In other words, the process cannot capture tail independence(unless exactly independent) and thus, it lacks flexibility at sub-asymptotic levels;3. The strength of dependence in the bulk of the joint distribution of { Z ( s ) , . . . , Z ( s d ) } ⊤ andin its tails cannot be controlled separately with this process.While the issue highlighted in the first point is application-specific and should be addressed infuture research, we have not found it to be a limiting factor in our temperature data applicationdescribed in Section 4.2. The second and third points highlight, however, issues that are morecritical from a risk assessment perspective where the tail dependence structure needs to be estimatedaccurately. In applications, it is common to observe very weak or zero tail dependence at largedistances, while fairly strong overall dependence prevails in the bulk. In other words, it can happenin practice that the data suggest simultaneously λ ( δ ) ≈ S ρ ( δ ) ≫ δ , but the Cauchy convolution process (2) cannot capture this situation, as the strengthand range of dependence in the bulk and the tails are necessarily similar to each other, and cannotbe controlled separately.To increase flexibility of the Cauchy process convolution model and circumvent the issues high-lighted in the second and third points above, we can modify the original process by mixing it with13 tail-independent process possessing lighter tails. We define˜ Z ( s ) = Z ( s ) + βZ G ( s ) , (6)where Z ( s ) is the Cauchy process (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ), Z G ( s ) is a stationaryGaussian process with standard normal N (0 ,
1) marginals and some correlation function ρ G ( δ ), δ ≥
0, and β ≥
0. For simplicity, we assume that ρ G ( δ ) is an isotropic correlation function, thoughall the results in this section can be readily extended to the anisotropic or non-stationary context.As the next proposition shows, the tail properties of the new process remain intact, such that themixture model (6) still captures local tail dependence if the kernel is compactly supported. Proposition 3
Under the assumptions of Proposition 1, the stable tail dependence function corre-sponding to the joint distribution of { ˜ Z ( s ) , . . . , ˜ Z ( s d ) } ⊤ , with the process ˜ Z ( s ) defined as in (6),is ℓ ( w , . . . , w d ) given in Proposition 1.In the next section we will see that the mixture process ˜ Z ( s ) in (6) has much rougher realizationsthan the Cauchy proces Z ( s ) alone, and so it is more realistic for modeling real environmental data.The improved flexibility of model (6) is further illustrated in the Supplementary Material. Approximate simulations from the Cauchy convolution process Z ( s ) defined in (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ) can be obtained at locations s , . . . , s d ∈ R as Z ( s j ) ≈ m − P mk,l =1 k ( s j , s k,l ) W k,l , W k,l ∼ i.i.d Cauchy(1), j = 1 , . . . , d , where { s k,l } mk,l =1 is a fine rectangular grid in S m ⊂ R coveringthe simulation sites, with large m (see the proof of Proposition 1). Similarly, the process ˜ Z ( s ) in (6)can be simulated based on a finite approximation as ˜ Z ( s j ) ≈ m − P mk,l =1 k ( s j , s k,l ) W k,l + βZ G ( s j ), W k,l ∼ i.i.d Cauchy(1), j = 1 , . . . , d , where { Z G ( s ) , . . . , Z G ( s d ) } ⊤ is a realization from the Gaussianrandom field Z G ( s ) at the sites s , . . . , s d , independent of the Cauchy variables { W k,l } .14imulations from the associated limiting extreme-value process Z EV ( s ) are also quite straightfor-ward. Given that the kernel function k ( s , s ∗ ) is bounded, simulating moving maximum max-stablerandom fields can be performed exactly by exploiting the stochastic representation (5), simulatingthe Poisson points { ξ i } in decreasing order and setting an appropriate stopping rule; see Schlather(2002). If the kernel is compactly supported with range r >
0, then the study region should beexpanded on all sides by r units, and if it has support over R , then the study region should beexpanded sufficiently to ensure that the contributions of “distant” points in (5) become negligi-ble. Alternatively, fast approximate simulations are also possible; we provide more details in theSupplementary Material.Figure 1 shows realizations of the processes Z ( s ), ˜ Z ( s ) and Z EV ( s ) in [0 , obtained for differentkernel functions. For the indicator kernel with finite support we used a rectangular grid on [ − r, r ] to prevent edge effects. For kernels with infinite support, we used a regular grid on S m = [ − , because k ( s , s ∗ ) < · − for k s − s ∗ k > m = 200 in all cases.We can see that the process ˜ Z ( s ) (third column) has a rougher spatial dependence than theCauchy process Z ( s ) (second column), confirming the higher degree of flexibility of ˜ Z ( s ) to capturethe sub-asymptotic dependence structure. Moreover, in all simulations, the effect of the kernelon the spatial dependence structure is obvious, especially—but not only—with the extreme-valueprocess Z EV ( s ) (fourth column). Smoother kernels result in smoother random fields. Figure 2 showsbivariate scatter plots for datasets of size 1000 generated from the extreme-value copulas linking Z EV ( s ) and Z EV ( s ) at distance k s − s k = 0 .
125 for the four extremal processes displayed inFigure 1. With the indicator kernel, the singular component on the diagonal u = u is clearlyseen. Green lines u = u ± exp(20 δ )2 and u = u ± exp(10 δ . )2 , with δ = 0 . .0 0.2 0.4 0.6 0.8 1.0 . . . . . . distance k e r ne l . . . . . . s X s Y . . . . . . s X s Y . . . . . . s X s Y . . . . . distance k e r ne l . . . . . . s X s Y . . . . . . s X s Y . . . . . . s X s Y . . . . . . distance k e r ne l . . . . . . s X s Y . . . . . . s X s Y . . . . . . s X s Y . . . . . . distance k e r ne l . . . . . . s X s Y . . . . . . s X s Y . . . . . . s X s Y Figure 1:
Realizations on [0 , from the Cauchy convolution process Z ( s ) defined in (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ) (2 nd column), ˜ Z ( s ) defined in (6) (3 rd column), and their extreme-value limit Z EV ( s ) (4 th column) fordifferent isotropic kernels k ( s , s ∗ ) = g ( k s − s ∗ k ) (the 1 st column shows the function g ). Each simulated process ismarginally transformed to have Unif(0 ,
1) marginals. The same random seed was used for all realizations. The kernelis k ( s , s ∗ ) = ( k s − s ∗ k < .
25) (1 st row), k ( s , s ∗ ) = exp (cid:0) − k s − s ∗ k / (cid:1) (2 nd row), k ( s , s ∗ ) = exp ( − k s − s ∗ k )(3 rd row), and k ( s , s ∗ ) = exp (cid:0) − k s − s ∗ k (cid:1) (4 th row). We use β = 2 and ρ G ( δ ) = exp( − δ ) for ˜ Z ( s ). .0 0.2 0.4 0.6 0.8 1.0 . . . . . . indicator kernel U U . . . . . . powered exponential kernel U U . . . . . . exponential kernel U U . . . . . . Gaussian kernel U U Figure 2:
Bivariate scatter plots of datasets of size 1000 generated from the extreme-value copulas linking Z EV ( s )and Z EV ( s ) at distance δ = k s − s k = 0 .
125 for the extreme-value process Z EV ( s ) with kernel k ( s , s ∗ ) = ( k s − s ∗ k < . k ( s , s ∗ ) = exp (cid:0) − k s − s ∗ k / (cid:1) , k ( s , s ∗ ) = exp ( − k s − s ∗ k ), and k ( s , s ∗ ) = exp (cid:0) − k s − s ∗ k (cid:1) (leftto right). Green lines u = u ± exp(10 δ / )2 and u = u ± exp(20 δ )2 delimit the area with positive density in the secondand third panels, respectively. unit square [0 , , and the indicator kernel has a positive density on [0 , \ { ( u , u ) ⊤ : u = u } .In the next section, we detail how inference may be performed for the Cauchy convolutionprocess Z ( s ) observed at d locations s , . . . , s d , for the modified mixture process ˜ Z ( s ), and for theircorresponding extreme-value limit, Z EV ( s ). Throughout this section, we assume that { y i = ( y i , . . . , y id ) ⊤ } ni =1 are n i.i.d. realizations of astationary process Y ( s ) measured at d spatial locations s , . . . , s d ∈ R , and whose dependencestructure (i.e., copula) is the same as either (i) the Cauchy convolution process Z ( s ) defined asin (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ ) using some parametric kernel function k (Section 3.2); or(ii) the extended mixture model ˜ Z ( s ) defined as in (6) (Section 3.3), but with potentially differentmarginal distributions. We also assume that F ( · ; θ F ) and f ( · ; θ F ) are the marginal distribution anddensity functions, respectively, of the observed process Y ( s ) and that θ F is the vector of marginal17arameters. In other words, the process Y ( s ) has the same dependence properties (both in the bulkand the tails) as Z ( s ) or ˜ Z ( s ) or Z EV ( s ), but the marginal distributions are essentially arbitrary,thus allowing for a greater flexibility by modeling margins and dependence separately.We may estimate marginal parameters non-parametrically in a first step using the empiricaldistribution (computed for each site separately, or for the entire pooled dataset). Alternatively, wecan also estimate the vector of marginal parameters θ F by maximizing the marginal (composite)log-likelihood function ℓ F ( y ; θ F ) = P ni =1 P dj =1 log f ( y ij ; θ F ), and we denote the respective estimateby b θ F . Such an approach, which neglects spatial dependence to estimate marginal parameters,is known to be valid (i.e., yielding consistency and asymptotic normality of estimators) undermild regularity conditions (Varin et al., 2011). The data can then be transformed to the uniformUnif(0 ,
1) scale using the probability integral transform. More precisely, pseudo-uniform scores { u i = ( u i , . . . , u id ) ⊤ } ni =1 can be obtained by setting u ij = F ( y ij ; b θ F ).In the next sections, we show how dependence parameters may be estimated from the pseudo-uniform scores { u i } ni =1 by matching some suitable empirical and model-based summary statistics.We focus here on the models Z ( s ) in (2) and ˜ Z ( s ) in (6), while composite likelihood inference for Z EV ( s ) is discussed in the Supplementary Material. Z ( s ) defined as in (2) Here, we assume that the pseudo-uniform scores { u i } ni =1 obtained in Section 3.1 are driven by thecopula stemming from the Cauchy convolution process Z ( s ) in (2) with W (d s ∗ ) ∼ i.i.d. Cauchy(d s ∗ )using some parametric kernel function k ( s , s ∗ ; θ K ), where θ K is the vector of dependence parameterscontrolling the kernel function. While the joint likelihood function for the process Z ( s ) is nottractable in general, we can exploit its stochastic representation in (2) to estimate θ K .We first back-transform the pseudo-uniform scores to the standard Cauchy scale. More precisely,18e compute { z ∗ i = ( z ∗ i , . . . , z ∗ id ) ⊤ } ni =1 with z ∗ ij = F − C ( u ij ), where F − C ( q ) = tan { π ( q − . } is thequantile function of the standard Cauchy distribution. By stability of the Cauchy distribution,the marginal distribution of the process convolution Z ( s ) in (2) are Cauchy with scale parameter c ( s ; θ K ) = R R k ( s , s ∗ ; θ K )d s ∗ . This implies that Z ∗ ( s ) := Z ( s ) /c ( s ; θ K ) ∼ Cauchy(1) and that { z ∗ i } ni =1 can be treated as pseudo-observations from Z ∗ ( s ). Moreover, we have Z ∗ jk = Z ∗ ( s j ) − Z ∗ ( s k ) = Z ( s j ) c ( s j ; θ K ) − Z ( s k ) c ( s k ; θ K ) = Z R { ζ ( s j , s ∗ ) − ζ ( s k , s ∗ ) } W (d s ∗ ) ∼ Cauchy { c jk ( θ K ) } , where c jk ( θ K ) = Z R | ζ ( s j , s ∗ ) − ζ ( s k , s ∗ ) | d s ∗ , ζ ( s , s ∗ ) ≡ ζ ( s , s ∗ ; θ K ) = k ( s , s ∗ ; θ K ) R R k ( s , s ∗ ; θ K )d s ∗ . (7)For each pair of sites { s j , s k } , the scale parameter c jk may be estimated non-parametrically fromthe pseudo-observations { z ∗ ijk = z ∗ ij − z ∗ ik } ni =1 by maximizing the corresponding Cauchy likelihoodfunction. The maximum likelihood estimator b c jk satisfies the equation P ni =1 b c jk / ( b c jk + y ∗ ijk ) = n/
2, whose positive root can be easily found using numerical routines. The estimator b c jk is aconsistent and asymptotically normal estimator of c jk . Alternatively, the median of absolute values,median {| y jk | , . . . , | y njk |} , may also be used as a more robust and faster-to-compute non-parametricestimator of c jk , but which is about 20% less efficient than the maximum likelihood estimator (interms of the ratio of their variances). To estimate the vector of parameters θ K , we can then use aleast squares approach and compute b θ K = arg min θ K X j
0, may be helpful to reduce the computational burden and/orprioritize goodness-of-fit at small distances. The estimator in (8) is a special case of minimum19istance estimators and therefore it is a consistent and asymptotically normal estimator of θ K (Millar, 1984). ˜ Z ( s ) defined as in (6) We now assume that the pseudo-uniform scores { u i } ni =1 obtained in Section 3.1 are driven by thecopula stemming from the extended model ˜ Z ( s ) in (6). In addition to the parametric kernel function k ( s , s ∗ ; θ K ) described by the vector of parameters θ K , we now need to estimate the correlationfunction ρ G ( δ ; θ G ) parametrized by a vector θ G , and the mixture parameter β ≥ Z EV ( s ) only depends onthe kernel parameters θ K . In the first step, θ K can thus be estimated by matching empirical andmodel-based estimates of the tail dependence coefficient for different pairs of sites. More precisely,let λ jk ( θ K ) be the tail dependence coefficient defined in (1) corresponding to the pair of variables { ˜ Z ( s j ) , ˜ Z ( s k ) } . On the one hand, λ jk can be estimated non-parametrically from the pseudo-uniformscores { ( u ij , u ik ) ⊤ } ni =1 for each pair of sites { s j , s k } as b λ jk = 1 n (1 − u ) n X i =1 ( u ij > u, u ik > u ) , (9)where u ≈ b λ jk is consistent as n → ∞ and u ≡ u n → n (1 − u n ) → ∞ . Many other valid non-parametric estimators ofthe tail dependence coefficient exist. In particular, as the copula stemming from ˜ Z ( s ) is reflection-symmetric, it would be possible to combine information from the lower and upper tails to estimate λ jk ( θ K ) but here we restrict ourselves to the empirical estimator (9), which is quite standard andbetter reflects the upper tail. On the other hand, from Propositions 1 and 3, we have λ jk ( θ K ) = 2 − ℓ (1 ,
1) = 2 − Z R max { ζ ( s j , s ∗ ) , ζ ( s k , s ∗ ) } d s ∗ = 1 − Z ζ ( s j , s ∗ ) >ζ ( s k , s ∗ ) ζ ( s j , s ∗ )d s ∗ + 1 − Z ζ ( s k , s ∗ ) >ζ ( s j , s ∗ ) ζ ( s k , s ∗ )d s ∗ Z ζ ( s j , s ∗ ) ≤ ζ ( s k , s ∗ ) ζ ( s j , s ∗ )d s ∗ + Z ζ ( s j , s ∗ ) ≥ ζ ( s k , s ∗ ) ζ ( s k , s ∗ )d s ∗ , (10)where ℓ ( w , w ) is the stable tail dependence function of { ˜ Z ( s j ) , ˜ Z ( s k ) } ⊤ , and ζ ( s , s ∗ ) ≡ ζ ( s , s ∗ ; θ K ) = k ( s , s ∗ ; θ K ) / R R k ( s , s ∗ ; θ K )d s ∗ . The coefficient λ jk ( θ K ) can be expressed in simple form as a func-tion of θ K for many parametric families of kernels. We consider one example in the simulationstudy reported in Section 4.1. The parameter θ K can then be estimated by least squares as follows: b θ K = arg min θ K X j 0, we then back-transform the pseudo-uniform scores to the scale of F ( · ; 1 , β ) as { ˜ z i = (˜ z i , . . . , ˜ z id ) ⊤ } ni =1 , where ˜ z ij = F − ( u ij ; 1 , β ). If β is the true value of β , then the vectors { ˜ z i } ni =1 can be considered as pseudo-observations fromthe process ˜ Z ( s ) in (6). Now, notice that for any two sites { s j , s k } , we have˜ Z + jk = ˜ Z ( s j ) + ˜ Z ( s k ) ∼ F {· ; 2 , β + jk ( θ G ) } , ˜ Z − jk = ˜ Z ( s j ) − ˜ Z ( s k ) ∼ F {· ; c jk ( θ K ) , β − jk ( θ G ) } , c jk ( θ K ) is given in (7) and β + jk ( θ G ) = β { ρ G ( δ jk ; θ G ) } / , β − jk ( θ G ) = β { − ρ G ( δ jk ; θ G ) } / , δ jk = k s j − s k k . To obtain empirical estimates of θ G , we first obtain non-parametric estimates b β + jk of β + jk ( θ G ) by assuming that ˜ Z + jk ∼ F {· ; 2 , β + jk ( θ G ) } and maximizing the corresponding likelihoodfunction using the pseudo-observations { ˜ z + jk = ˜ z ij + ˜ z ik } ni =1 . We use the same maximum like-lihood approach to get non-parametric estimates b β − jk of β − jk ( θ G ), but now assuming that ˜ Z − jk ∼ F {· ; c jk ( b θ K ) , β − jk ( θ G ) } with b θ K obtained in (11) and using the pseudo-observations { ˜ z − jk = ˜ z ij − ˜ z ik } ni =1 instead. The vector of parameters θ G and the parameter β ≥ b θ ⊤ G , b β ) ⊤ = arg min ( θ ⊤ G ,β ) ⊤ X j 0, and then the value β can be selected in a second step as the one thatprovides the lowest objective function overall. The estimators in (11) and (13) are again specialcases of minimum distance estimators and they are therefore consistent and asymptotically normalif b λ j,k and b β + jk , b β − jk are consistent and asymptotically normal estimators of λ j,k and β + jk ( θ G ) , β − jk ( θ G ),respectively (Millar, 1984). We now perform a simulation study to verify the performance of the inference schemes proposed inSections 3.2 and 3.3, before illustrating our proposed methodology by application to real tempera-ture data in Section 4.2.We use the algorithms presented in Section 2.4 to simulate datasets comprised of n = 100 , , Z ( s ) in (2) for some kernel function k ( s , s ∗ ; θ K ),22ts mixture process extension ˜ Z ( s ) in (6), and the extreme-value limit Z EV ( s ) in (5), at d =9 , , 25 sites on the regular grid (cid:8) ( j/ ( m + 1) , k/ ( m + 1)) ⊤ (cid:9) mj,k =1 ⊂ [0 , , for m = 3 , , 5. Toillustrate the methods, we consider here the compactly supported kernel function k ( s , s ∗ ; θ K ) =(1 − k s − s ∗ k /r ) η + with true kernel parameters chosen as θ K = ( η, r ) ⊤ = (1 , . ⊤ . For the mix-ture process ˜ Z ( s ) in (6), we additionally specify the correlation function of the Gaussian process tobe ρ G ( δ ; θ G ) = exp( − θ G δ ) with rate θ G = 1, and the mixture parameter is fixed to β = 2.Following the notation from Section 3.2, we need to find the theoretical expression of the scaleparameter c jk ( θ K ) in (7). By symmetry, straightforward calculations yield c jk ( θ K ) = Z R | ζ ( s j , s ∗ ) − ζ ( s k , s ∗ ) | d s ∗ = 2 Z ζ ( s j , s ∗ ) >ζ ( s k , s ∗ ) ζ ( s j , s ∗ )d s ∗ + 2 Z ζ ( s j , s ∗ ) <ζ ( s k , s ∗ ) ζ ( s k , s ∗ )d s ∗ − 2= 4 c ∗ ( θ K ) Z k s ∗ − s k k≤ min { r, k s j − s ∗ k} (cid:18) − k s k − s ∗ k r (cid:19) η d s ∗ − , where ζ ( s , s ∗ ) = k ( s , s ∗ ; θ K ) / R R k ( s , s ∗ ; θ K )d s ∗ and the normalizing factor is equal to c ∗ ( θ K ) = R R k ( s , s ∗ ; θ K )d s ∗ = πr ( η +1)( η +2) . After a change of variables, we find that c jk ( θ K ) = 4 r c ∗ ( θ K ) Z ππ (cid:26) g jk ( φ ; r ) η +1 η + 1 − g jk ( φ ; r ) η +2 η + 2 (cid:27) d φ = 2( η + 1)( η + 2) π Z ππ (cid:26) g jk ( φ ; r ) η +1 η + 1 − g jk ( φ ; r ) η +2 η + 2 (cid:27) d φ, where g jk ( φ ; r ) = 1 − max { , k s j − s k k / (2 r sin φ ) } . This integral can be easily computed nu-merically. Furthermore, from (10), we can also deduce that λ jk ( θ K ) = 1 − c jk ( θ K ) / n replicates of the process Z ( s ), or ˜ Z ( s ), or Z EV ( s )), weassume that the marginal distributions are unknown, and for simplicity, we estimate them usingthe empirical distribution in this simulation study (separately at each site). Dependence (i.e.,copula) parameters are then estimated in a second step based on the inference methods describedin Sections 3.2, 3.3, or by a pairwise likelihood approach with equal weights, respectively. Here, we23imply set the weights to ω + jk = ω − jk = 1 for all pairs of sites { s j , s k } in (8), (11), and (13), though thismay not be the optimal choice. Parameters to be estimated are the kernel parameters θ K = ( η, r ) ⊤ for Z ( s ) and Z EV ( s ), and ( η, r, θ G , β ) ⊤ for the mixture process ˜ Z ( s ). To assess the estimators’performance, we repeat the simulations N = 500 times and compute the root mean squared errors(RMSE) for all estimated parameters. We also compute ∆ max = N P Ni =1 max x ∈ (0 , | k ( s , s ( x ); θ K ) − k ( s , s ( x ); b θ K,i ) | , and ∆ avg = N P Ni =1 R | k ( s , s ( x ); θ K ) − k ( s , s ( x ); b θ K,i ) | d x , where s = (0 , ⊤ and s ( x ) = (0 , x ) ⊤ , while θ K = (1 , . ⊤ denotes the true kernel parameters and b θ K,i is its estimatefor the for i -th simulation ( i = 1 , . . . , N ). Hence, ∆ max and ∆ avg represent the maximum and meanintegrated absolute differences between the true kernel and its estimate along a horizontal segmentpassing through the origin, averaged across the N = 500 simulations.Table 1 reports the results. As expected, estimates are more accurate with larger sample sizes, asshown by significantly smaller RMSE and (∆ max , ∆ avg ) values as n increases. Moreover, using datafrom more locations (i.e., increasing d ) can further improve the accuracy of parameter estimates.For small n and d , RMSE values are quite large due to kernel parameters being more tricky toidentify. Similar kernels (hence, dependence structures) may be obtained for different combinationsof parameters θ K = ( η, r ) ⊤ , and thus the effects of η (shape) and ρ (dependence range) are difficultto distinguish, especially in low sample sizes. Figure 3 shows the estimated kernel profiles alongthe x -axis, i.e., k ( s , s ( x ) , b θ K,i ) with 0 < x < 1, for each simulated dataset of Z ( s ) and Z EV ( s )with d = 25. The true kernel appears to be nevertheless very well estimated, even when the samplesize is not very large. Similar results are obtained with other sets of parameters and with differentkernel functions. We now illustrate the proposed methodology to analyze temperature data from the state of Ok-lahoma, United States. Here, we use daily temperature averages measured at 20 stations, namely24able 1: Performance metrics (i.e., RMSE of estimated parameters and (∆ max , ∆ avg )) calculated for the inferenceschemes detailed in Sections 3.2, 3.3, and for a pairwise likelihood method for data simulated from the processes Z ( s ) (top panel), ˜ Z ( s ) (middle panel), and Z EV ( s ) (bottom panel), respectively. The true kernel function is chosenas k ( s , s ∗ ; θ K ) = (1 − k s − s ∗ k /r ) η + , with parameters θ K = ( η, r ) ⊤ = (1 , . ⊤ . For the mixture process ˜ Z ( s ), thecorrelation function of the underlying Gaussian process is ρ G ( δ ; θ G ) = exp( − θ G δ ) with θ G = 1, and the mixtureparameter is β = 2. Datasets are simulated on a regular grid at d = 9 , , 25 locations, with n = 100 , , Data simulated from the Cauchy process Z ( s ) in (2), with inference based on Section 3.2.RMSE for θ ⊤ K = ( η, r ) (top) and (∆ max , ∆ avg ) (bottom)Sample size d = 9 d = 16 d = 25 n = 100 (3.85,0.25) (3.26,0.19) (2.93,0.17)(0.37,0.05) (0.19,0.03) (0.19,0.03) n = 200 (2.47,0.17) (1.58,0.10) (1.08,0.06)(0.29,0.04) (0.13,0.02) (0.12,0.02) n = 500 (1.14,0.08) (0.47,0.03) (0.38,0.02)(0.15,0.02) (0.08,0.01) (0.06,0.01)Data simulated from the mixture process ˜ Z ( s ) in (6), with inference based on Section 3.3.RMSE for ( η, r, θ G , β ) (top) and (∆ max , ∆ avg ) (bottom)Sample size d = 9 d = 16 d = 25 n = 100 (3.08,0.17,0.78,1.18) (1.88,0.11,0.83,1.25) (1.71,0.10,0.86,1.32)(0.25,0.03) (0.17,0.02) (0.17,0.02) n = 200 (1.67,0.10,0.71,1.02) (0.83,0.05,0.69,1.02) (0.41,0.03,0.68,1.01)(0.19,0.03) (0.14,0.02) (0.12,0.01) n = 500 (0.48,0.04,0.47,0.70) (0.32,0.03,0.41,0.61) (0.25,0.03,0.37,0.55)(0.14,0.02) (0.12,0.01) (0.10,0.01)Data simulated from the extreme-value process Z EV ( s ) in (5), with inference based on a pairwiselikelihood.RMSE for θ ⊤ K = ( η, r ) (top) and (∆ max , ∆ avg ) (bottom)Sample size d = 9 d = 16 d = 25 n = 100 (3.53,0.23) (2.04,0.14) (1.59,0.11)(0.30,0.04) (0.14,0.02) (0.11,0.02) n = 200 (2.76,0.18) (1.23,0.08) (0.84,0.06)(0.21,0.03) (0.11,0.02) (0.08,0.01) n = 500 (1.59,0.10) (0.41,0.03) (0.30,0.02)(0.12,0.02) (0.07,0.01) (0.05,0.01)25 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . (cid:81) =100 distance k e r ne l . . . . . . (cid:81) =200 distance 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . (cid:81) =500 distance0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . distance k e r ne l . . . . . . distance 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . distance Figure 3: Estimated kernel profiles k ( s , s ( x ); b θ K,i ), 0 < x < i = 1 , . . . , N (light gray lines), based on N = 500simulated processes Z ( s ) (top row) and extreme-value limit Z EV ( s ) (bottom row), with n = 100 (left column),200 (middle column) and 500 (right column) independent replicates. Black thick solid lines show the true kernels k ( s , s ( x ; θ K )) with θ K = (1 , . ⊤ . Acme , Ada , Ardmore2 , Byars , Centrahoma , Chikasha , Durant , Fittstown , Ketchum Ranch , Lane , Madill , Minco , Newport , Norman , Pauls Valley , Ringling , Shawnee , Tishomingo , Washington ,and Waurika . The time period considered here is the year 2018, which contains n = 366 days intotal. The dataset can be freely downloaded from the website mesonet.org . After removing theobvious seasonal component, we then fit an AR(3) model to account for temporal dependence, andwe fit the skew- t distribution jointly to the residuals using the marginal likelihood approach de-scribed in Section 3.1. Finally, we transform the residuals to the Unif(0 , 1) scale using the estimatedmarginal distribution functions. To explore the dependence features of the data, the left panel of26 cme −3 −1 1 3 −3 −1 1 3 − − − − Centrahoma Ketchum Ranch − − − − Newport−3 −1 1 3 −3 −1 1 3 −3 −1 1 3 − − Shawnee . . . . . distance in km S pea r m an ’ s c o rr e l a t i on Figure 4: Left: Bivariate scatter plots of normal scores based on the temperature data residuals (obtained frompreliminary marginal fits) for some selected pairs of stations. Right: Spearmans’s correlation estimates for all pairsof stations, plotted as a function of distance between stations. Figure 4 displays bivariate scatter plots of normal scores (i.e., residuals further transformed to thestandard normal distribution) for some selected pairs of stations. The sharp tails that can be seenin all bivariate scatter plots indicate clear evidence of non-Gaussianity and strong tail dependence.Furthermore, these diagnostics do not reveal any significant tail asymmetry. Therefore, while a(transformed) Gaussian process would clearly not be suitable to model the data due to its veryweak form of dependence in the tails, our proposed Cauchy convolution process provides a moreadequate alternative. Furthermore, the right panel of Figure 4 shows Spearman’s correlation esti-mates for all pairs of stations, plotted as a function of distance between stations. This plot revealsthat the behavior of Spearman’s correlation function is approximately linear near the origin, whichsuggests that the observed process is quite smooth and that our Cauchy convolution model shouldfit well. 27 50 100 150 200 . . . . . . distance in km k ( s , s * ) . . . . distance in km S pea r m an ’ s c o rr e l a t i on . . . . . . distance in km l ( d ) Figure 5: Left: Fitted kernel function k ( s , s ∗ ; b θ K ) = (1 − k s − s ∗ k / b r ) b η + with b θ K = ( b η, b r ) ⊤ = (53742 , ⊤ . Middle: Empirical estimates of Spearman’s correlation for all pairs of stations (black dots) and fitted Spearman’scorrelation function S ρ ( δ ) (solid green light), plotted as a function of distance δ between stations. Right: Empiricalupper tail dependence coefficient estimates for all pairs of stations (black dots) and fitted tail dependence function λ ( δ ) (solid green light), plotted as a function of distance δ between stations. All the stations are located in a rather small geographical region in the Southern part of Ok-lahoma State, and the largest distance between any two stations is only 212 km. We can there-fore reasonably assume that the data are stationary over space. We select the kernel function k ( s , s ∗ ; θ K ) = (1 − k s − s ∗ k /r ) η + , θ K = ( η, r ) ⊤ ∈ (0 , ∞ ) , and estimate its parameters usingthe inference approach described in Section 3.3. Notice that although this kernel is compactlysupported on [0 , r ] with r > k ( s , s ∗ ; θ K ) = exp( − ξ k s − s ∗ k ) when η := η ( r ) such that η ( r ) → ∞ and η ( r ) /r → ξ > r → ∞ . The estimated kernel parameters are b θ K = ( b η, b r ) ⊤ = (53742 , ⊤ with the es-timated range b r > k ( s , s ∗ ) ≈ e − . k s − s ∗ k ; thefitted kernel function is plotted on the left panel of Figure 5, from which the exponential decayis evident. Nevertheless, notice that our modeling approach has the great benefit of estimatingwhether the dependence range is finite or infinite (obtained as a boundary limiting case), rather28han keeping it fixed a priori. We then estimate the remaining parameters of the mixture pro-cess (6) with the underlying Gaussian process specified with a powered exponential correlationfunction ρ G ( δ ) = { − τ I ( δ = 0) } exp( − θ G δ α ), δ > 0, where I ( · ) is the indicator function, θ G > α ∈ (0 , τ ∈ [0 , 1] is the nugget effect capturing small scale variability. More precisely, wehere fix the kernel parameters to θ K = (53742 , ⊤ as estimated above, and then estimate theremaining parameters ( θ G , α, τ, β ) ⊤ using the least squares approach (13) described in Section 3.3.We obtain b θ G = 2 . 97 (0 . b α = 1 . 98 (0 . b τ = 0 . 035 (0 . b β = 1 . 80 (0 . b β is positiveand quite far from zero, the fitted mixture process ˜ Z ( s ) turns out to be quite different from theCauchy convolution model Z ( s ), providing increased flexibility to capture the behavior in the bulkof the distribution. Furthermore, notice that the estimated nugget effect b τ is very small, indicatingnegligible (yet statistically significant) small-scale variability.To assess the goodness-of-fit for the mixture process (6), we then compute empirical and model-based Spearman’s correlation estimates S ρ ( δ ) for all pairs of sites { s j , s k } and plot them in themiddle panel of Figure 5 as a function of distance δ = k s j − s k k . Model-based estimates areobtained by Monte Carlo using a large number of simulations from the fitted model. Similarly, theright panel of Figure 5 shows empirical and model-based estimates of the upper tail dependencecoefficient λ ( δ ) plotted as function of distance δ . We obtain very similar results for the lower taildependence coefficient (not shown) as the data show no significant tail asymmetry. These plotsshows that the model (6) fits the data very well, both the in the bulk of the distribution (asmeasured by S ρ ( δ )) and in the tails (as measured by λ ( δ )). In this paper, we have proposed a Cauchy kernel convolution copula model for non-Gaussian spa-29ial data and studied its dependence properties, with a particular eye on its tail behavior. Inparticular, with compactly support kernels, our proposed model can capture complex dependencestructures that possess short-range tail dependence, and long-range independence. Moreover, tofurther increase its flexibility in the bulk of the distribution and better capture the sub-asymptoticdependence behavior, we have also proposed a parsimonious copula model constructed by mixinga Cauchy convolution process with a Gaussian process. With this extended model, bulk and tailproperties can be separately controlled with a few parameters, and a smooth transition betweentail dependence classes is achieved as a function of distance between stations.Our proposed inference scheme relies on a least-square minimum distance approach, whichmatches suitably chosen empirical and model-based summary statistics. It yields consistent es-timates and it is guaranteed to be very fast even in high dimensions, thus circumventing the com-putational difficulties of likelihood-based inference. We have shown that our inference scheme workswell using an extensive simulation study, and we have successfully applied it to a real temperaturedata example. Model parameters are generally easy to identify from each other, and the underlyingkernel function can be accurately estimated, even in low sample sizes.A limitation of our approach is that, by construction, the proposed Cauchy model and its mixtureextension are tail symmetric, and can only capture smooth extreme-value dependence structurescharacterized by moving maximum processes similar to the well-known Smith (1990) max-stableprocess. However, as opposed to fitting the (rigid) Smith model directly to spatially-indexed blockmaxima, we here propose to fit the Cauchy convolution process or the flexible mixture processto the whole dataset instead. Therefore, even if the limiting extreme-value dependence structureis very smooth, our proposed models, which have much rougher realizations, can still adequatelycapture dependence characteristics at sub-asymptotic levels. Interesting research directions includegeneralizing our modeling approach to capture tail asymmetry, e.g., by considering kernel convolu-tion processes constructed from asymmetric stable noise. It would also interesting to study how to30odify our model to capture rougher tail dependence structures of Brown–Resnick type. References Aas, K., Czado, C., Frigessi, A., Bakken, H., 2009. Pair-copula constructions of multiple dependence.Insurance: Mathematics and Economics 44, 182–198.B´ardossy, A., 2006. Copula-based geostatistical models for groundwater quality parameters. WaterResources Research 42.B´ardossy, A., Li, J., 2008. Geostatistical interpolation using copulas. Water Resources Research44.Bopp, G., Shaby, B.A., Huser, R., 2020. A hierarchical max-infinitely divisible spatial model forextreme precipitation. Journal of the American Statistical Association To appear.Bousset, L., Jumel, S., Garreta, V., Picault, H., Soubeyrand, S., 2015. Transmission of leptosphaeriamaculans from a cropping season to the following one. Annals of Applied Biology 166(3), 530–543.Calder, C., Cressie, N. A., 2007. Some topics in convolution-based spatial modeling. Bulletin ofthe International Statistical Institute 62, 132–139.Castro-Camilo, D., Huser, R., 2020. Local likelihood estimation of complex tail dependence struc-tures, applied to U.S. precipitation extremes. Journal of the American Statistical Association115, 1037–1054.Castruccio, S., Huser, R., Genton, M.G., 2016. High-order composite likelihood inference for max-stable distributions and processes. Journal of Computational and Graphical Statistics 25, 1212–1229.Davison, A.C., Gholamrezaee, M.M., 2012. Geostatistics of extremes. Proceedings of the RoyalSociety A: Mathematical, Physical & Engineering Sciences 468, 581–608.Davison, A.C., Huser, R., 2015. Statistics of Extremes. Annual Review of Statistics and its Appli-cation 2, 203–235.Davison, A.C., Huser, R., Thibaud, E., 2019. Spatial Extremes, in: Gelfand, A.E., Fuentes, M.,Hoeting, J.A., Smith, R.L. (Eds.), Handbook of Environmental and Ecological Statistics. CRCPress, pp. 711–744.Davison, A.C., Padoan, S., Ribatet, M., 2012. Statistical Modelling of Spatial Extremes (withDiscussion). Statistical Science 27, 161–186.Engelke, S., Hitz, A.S., 2020. Graphical models for extremes. Journal of the Royal StatisticalSociety, Series B (with Discussion) 82, 871–932.Engelke, S., Malinowski, A., Kabluchko, Z., Schlather, M., 2015. Estimation of Huesler–Reissdistributions and Brown–Resnick processes. Journal of the Royal Statistical Society: Series B(Statistical Methodology) 77, 239–265.Erhardt, T.M., Czado, C., Schepsmeier, U., 2015. R-vine models for spatial time series with anapplication to daily mean temperature. Biometrics 71(2), 323–332.31asen, V., 2005. Extremes of regularly varying L´evy-driven mixed moving average processes. Ad-vances in Applied Probability 37(4), 993–1014.de Fondeville, R., Davison, A.C., 2018. High-dimensional peaks-over-threshold inference. Biometrika105, 575–592.Gneiting, T., 2002. Nonseparable, stationary covariance functions for space-time data. Journal ofthe American Statistical Association 97, 590–600.Gneiting, T., Genton, M. G., Guttorp, P., 2007. Geostatistical space-time models, stationarity,separability and full symmetry. In Finkenstaedt, B., Held, L. and Isham, V.(eds), Statistics ofSpatio-Temporal Systems , Chapman & Hall / CRC Press, Monograph in Statistics and AppliedProbability, Boca Raton.Gr¨aler, B., Pebesma, E., 2011. The pair-copula construction for spatial data: a new approach tomodel spatial dependency. Procedia Environmental Sciences 7, 206–211.Gudendorf, G., Segers, J., 2010. Extreme-value copulas, in: Jaworski, P., Durante, F., H¨ardle, W.,Rychlik, T. (Eds.), Copula Theory and Its Applications, Proceedings of the Workshop Held inWarsaw, 25–26 September 2009, pp. 127–145. Lecture Notes in Statistics — Proceedings.de Haan, L., 1984. A Spectral Representation for Max-stable Processes. Annals of Probability 12,1194–1204.Hazra, A., Huser, R., 2020. Estimating high-resolution Red sea surface temperature hotspots, usinga low-rank semiparametric spatial model. Annals of Applied Statistics To appear.Higdon, D., 2002. Space and Space-Time Modeling using Process Convolutions. In: Anderson,C. W., El-Shaarawi, A. H., Chatwin, P. C., Barnett, V. (eds) Quantitative Methods for CurrentEnvironmental Issues. Springer, London.Hua, L., 2017. On a bivariate copula with both upper and lower full-range tail dependence. Insur-ance: Mathematics and Economics 73, 94–104.Huser, R., Davison, A.C., 2014. Space-time modelling of extreme events. Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 76, 439–461.Huser, R., Dombry, C., Ribatet, M., Genton, M.G., 2019. Full likelihood inference for max-stabledata. Stat 8, e218.Huser, R., Genton, M.G., 2016. Non-stationary dependence structures for spatial extremes. Journalof Agricultural, Biological and Environmental Statistics 21, 470–491.Huser, R., Opitz, T., Thibaud, E., 2017. Bridging asymptotic independence and dependence inspatial extremes using Gaussian scale mixtures. Spatial Statistics. 21, 166–186.Huser, R., Opitz, T., Thibaud, E., 2020. Max-infinitely divisible models and inference for spatialextremes. Scandinavian Journal of Statistics To appear.Huser, R., Wadsworth, J.L., 2019. Modeling spatial processes with unknown extremal dependenceclass. Journal of the American Statistical Association 114, 434–444.Huser, R., Wadsworth, J.L., 2020. Advances in statistical modeling of spatial extremes. WileyInterdisciplinary Reviews (WIREs): Computational Statistics To appear.32¨usler, J., Reiss, R. D., 1989. Maxima of normal random vectors: between independence andcomplete dependence. Statistics and Probability Letters 7, 283–286.J´onsd´ottir, K.Y., Rønn-Nielsen, A., Mouridsen, K., Jensen, E.B.V., 2013. L´evy-based modelling inbrain imaging. Scandinavian Journal of Statistics 40(3), 511–529.Kabluchko, Z., Schlather, M., de Haan, L., 2009. Stationary max-stable fields associated to negativedefinite functions. Annals of Probability 37, 2042–2065.Krupskii, P., Genton, M. G., 2019. A copula model for non-Gaussian multivariate spatial data.Journal of Multivariate Analysis 169, 264–277.Krupskii, P., Huser, R., Genton, M. G., 2018. Factor copula models for replicated spatial data.Journal of the American Statistical Association 521, 467–479.Krupskii, P., Joe, H., 2013. Factor copula models for multivariate data. Journal of MultivariateAnalysis 120, 85–101.Kurowicka, D., Joe, H., 2011. Dependence Modeling: Vine Copula Handbook. World Scientific,Singapore.Ledford, A.W., Tawn, J.A., 1996. Statistics for near independence in multivariate extreme values.Biometrika 83, 169–187.Marshall, W. A., Olkin, I., 1967. A multivariate exponential distribution. Journal of the AmericanStatistical Association 62, 30–44.Millar, P. W., 1984. A general approach to the optimality of minimum distance estimators. Trans-actions of the American Mathematical Society 286, 377–418.Noven, R.C., Veraart, A.E.D., Gandy, A., 2018. A latent trawl process model for extreme values.Journal of Energy Markets 11(3), 1–24.Oesting, M., Schlather, M., Friederichs, P., 2017. Statistical post-processing of forecasts for extremesusing bivariate Brown-Resnick processes with an application to wind gusts. Extremes 20, 309–332.Opitz, T., 2017. Spatial random field models based on l´evy indicator convolutions. arXiv preprintarXiv:1710.06826.Paciorek, C. J., Schervish, M. J., 2006. Spatial modelling using a new class of nonstationaryco-variance functions. Environmetrics 17, 483–506.Padoan, S.A., Ribatet, M., Sisson, S.A., 2010. Likelihood-Based Inference for Max-Stable Processes.Journal of the American Statistical Association 105, 263–277.Rootz´en, H., 1978. Extremes of moving average of stable processes. The Annals of Probability 6(5),847–869.Rootz´en, H., Segers, J., Wadsworth, J.L., 2018. Multivariate peaks over thresholds models. Ex-tremes 21, 115–145.Sato, K., 1999. L´evy processes and infinitely divisible distributions. Cambridge University Press,UK. 33chlather, M., 2002. Models for Stationary Max-Stable Random Fields. Extremes 5, 33–44.Segers, J., 2012. Max-stable models for multivariate extremes. REVSTAT 10, 61–82.Sklar, A., 1959. Fonctions de r´epartition `a nn