[PDF] Modelling Extremes of Spatial Aggregates of Precipitation using Conditional Methods

Abstract

Inference on the extremal behaviour of spatial aggregates of precipitation is important for quantifying river flood risk. There are two classes of previous approach, with one failing to ensure self-consistency in inference across different regions of aggregation and the other requiring highly inflexible marginal and spatial dependence structure assumptions. To overcome these issues, we propose a model for high-resolution precipitation data, from which we can simulate realistic fields and explore the behaviour of spatial aggregates. Recent developments in spatial extremes literature have seen promising progress with spatial extensions of the Heffernan and Tawn (2004) model for conditional multivariate extremes, which can handle a wide range of dependence structures. Our contribution is twofold: new parametric forms for the dependence parameters of this model; and a novel framework for deriving aggregates addressing edge effects and sub-regions without rain. We apply our modelling approach to gridded East-Anglia, UK precipitation data. Return-level curves for spatial aggregates over different regions of various sizes are estimated and shown to fit very well to the data.

Full PDF

MModelling Extremes of Spatial Aggregates of Precipitationusing Conditional Methods

Jordan Richards A , Jonathan A. Tawn A , and Simon Brown BA STOR-i Centre for Doctoral Training, Department of Mathematics andStatistics, Lancaster University, LA1 4YR, UK B Hadley Centre for Climate Prediction and Research, Met Oﬃce, FitzRoyRoad, Exeter, EX1 3PB, UK

Abstract

Inference on the extremal behaviour of spatial aggregates of precipitation is importantfor quantifying river ﬂood risk. There are two classes of previous approach, with one failingto ensure self-consistency in inference across diﬀerent regions of aggregation and the otherrequiring highly inﬂexible marginal and spatial dependence structure assumptions. Toovercome these issues, we propose a model for high-resolution precipitation data, fromwhich we can simulate realistic ﬁelds and explore the behaviour of spatial aggregates.Recent developments in spatial extremes literature have seen promising progress withspatial extensions of the Heﬀernan and Tawn (2004) model for conditional multivariateextremes, which can handle a wide range of dependence structures. Our contribution istwofold: new parametric forms for the dependence parameters of this model; and a novelframework for deriving aggregates addressing edge eﬀects and sub-regions without rain.We apply our modelling approach to gridded East-Anglia, UK precipitation data. Return-level curves for spatial aggregates over diﬀerent regions of various sizes are estimated and a r X i v : . [ s t a t . M E ] F e b hown to ﬁt very well to the data. Keywords— extremal dependence; extreme precipitation; spatial aggregates; spatial conditionalextremes

Fluvial ﬂooding is typically not caused by high intensity extreme rainfall at single locations, butby the extremes of precipitation events which are aggregated over spatial catchment areas. Accuratemodelling of such events can help to mitigate the ﬁnancial impacts associated with ﬂoods, especially ifriver defences are built to withstand a T − year event of this kind. Approaches to quantifying the tailbehaviour of spatial aggregates exist in the literature; however, these techniques are often simplistic ormake unrealistic assumptions about the behaviour of the process for which they are trying to model.We present a novel methodology for making inference on the tail behaviour of spatial aggregates, whichwe apply in the context of extreme precipitation aggregates.We deﬁne a spatial process { Y ( s ) : s ∈ S} for some spatial domain S . Our interest lies in theupper tail behaviour of the spatial aggregate R A on regions A ⊂ S ⊂ R , R A = (cid:90) A Y ( s )d s, (1)for diﬀerent, possibly overlapping, A , and the joint behaviour of ( R A , R B ) for A , B ⊂ S . Typically, thedata we would have available for inference are realisations of Y t = ( Y t ( s ) , . . . , Y t ( s d )) for t = 1 , . . . , n ,which are observations of said process { Y ( s ) } at d sampling locations s = ( s , . . . , s d ) ⊂ S at n samplingtimes. Note that s need not be point locations; they can instead be non-overlapping grid-boxes. Dataproduced by climate models are often available in this form and observations of Y ( s i ) , i = 1 , . . . , d, correspond to spatial aggregates themselves, as they are typically presented as an average over the gridbox s i . In these circumstances, the integral in (1) can be replaced with the equivalent summation, butour methodology is still applicable; see Section 4. We assume that both the full marginal behaviour,and dependence, of { Y t ( s ) } is stationary with respect to time. Marginally, the upper tail behaviourof Y ( s ) is assumed to be characterised by a generalised Pareto distribution (GPD) with scale andshape parameters, υ ( s ) > ξ ( s ), respectively, that vary smoothly over s ∈ S (Davison and mith, 1990). Dependence in { Y ( s ) } is characterised through a marginal transformation to the process { X ( s ) : s ∈ S} , which has standardised margins; further details are given in Section 2.2. Thus, we canrewrite (1) as R A = (cid:90) A Y ( s )d s = (cid:90) A F − Y ( s ) { F X [ X ( s )] } d s, where F Y ( s ) ( · ) and F X ( · ) are the marginal CDFs of { Y ( s ) } and { X ( s ) } , respectively. We will focuson the situation where { X ( s ) } is a stationary process; an assumption that we ﬁnd holds well for ourapplication (see Section 4.4). If this assumption did not hold, a wide-range of literature exists forboth non-parametric, and parametric, methods that account for non-stationarity in extremal depen-dence (Huser and Genton, 2016, Richards and Wadsworth, 2021), and these methods can easily beincorporated into our methodology.There are three main existing modelling approaches for inference on the upper tails of R A :univariate methods, spatial approaches that focus on modelling all of the data, and spatial approachesthat focus on modelling only the extremes; our approach falls in the latter class, making less restrictiveassumptions than previous methods of this type.We ﬁrst consider the univariate case. Within an extreme value analysis framework, univariatemethods for estimating the size of T − year events are well studied and cemented in asymptotic theory(Coles, 2001). If we can create a sample of observations of R A , we can use univariate methods tomake inference on its upper tail, i.e., ﬁt a GP D distribution to exceedances of a sample of R A abovesome ﬁxed threshold and then extrapolate to high quantiles. However, creating this sample can bechallenging. If s are regularly spaced point locations, or contiguous non-overlapping grid-boxes, then(1) can be approximated using the sum of the elements of Y t . However, if s are irregularly spaced, wemay be required to compute a weighted sum to approximate (1), with the weights to be determinedsomehow. Further complications arise if we have partially missing observations of { Y ( s ) } . Evenif these issues are overcome, when using univariate methods we lose the information present in themargins of { Y ( s ) } and dependence of { X ( s ) } . If the process we are considering is precipitation, thiscan lead to inference that is not self-consistent and may be physically unrealistic; a trait that can beundesirable to practitioners. To explain this further, observe that, for precipitation, { Y ( s ) } is non-negative everywhere, i.e., Y ( s ) ≥ s ∈ S . Trivially it follows that R A ≥ R B for all B ⊆ A ⊂ S nd hence return levels should be similarly ordered. This natural ordering may not follow if we takea simple univariate approach to modelling the upper tail behaviour of the R A and R B aggregatesseparately (Nadarajah et al., 1998). To prevent this from occurring, we ﬁt a model for the process { Y ( s ) } , of which observations may be partially missing or complete. We then simulate from our modelfor { Y ( s ) } for s ∈ S and compute realisations of R A .In the context of precipitation aggregates, one richly studied approach has developed a class ofstationary stochastic processes to model the whole precipitation intensity process, continuous in bothtime and space. These models typically describe the intensity as the accumulation, at each point intime and space, over a random number of simple shaped individual stochastic rain cells, which clusterin time and space, and move on stochastic trajectories. These models were ﬁrst developed for a singlesite by Rodriguez-Iturbe et al. (1987), then developed spatially by Northrop (1998) and some of themore recent methods are summarised by Wheater et al. (2005). These models are typically estimatedby optimising the ﬁt against a range of characteristics of observed ﬁelds. As a result, these modelscan capture well the features of typical precipitation ﬁelds. However, for deriving the distribution ofquantities like the upper tail of R A , the models and their inference have limitations as there is noguarantee that models for the body of a process ﬁt well to the extremes. Yet it is precipitation ﬁeldsthat are extreme somewhere in A that yield extremes of R A unless A is very large relative to therange of spatial dependence, but in that case their method’s assumption of stationarity is likely to beunreliable.A typical approach to modelling extreme ﬁelds is the use of max-stable models, see Padoan et al.(2010), Westra and Sisson (2011), Reich and Shaby (2012). These models are predominately ﬁt tocomponent-wise block maxima, typically annual maxima, at sampling locations, but cannot be usedto make inference about the extremal dependence structure of individual precipitation ﬁelds as theycannot account for zeros, and hence neither can be used to describe the distribution of the aggregate.Typically annual maxima do not occur concurrently for diﬀerent sampling locations and so aggregatingover realisations from a max-stable process is not appropriate for inference on aggregates.Coles (1993) rectiﬁed some of these issues by using a point-process representation of a max-stableﬁeld to derive the proﬁle of concurrent events. Coles and Tawn (1996) used this formulation to deriveclosed form results for the tail behaviour of R A where the tail parameters are determined by the arginal GP D parameters of { Y ( s ) } and its dependence structure; Ferreira et al. (2012) formalisethese results and provide some non-parametric extensions. Further extensions of this framework byEngelke et al. (2018) relate not only the extremal behaviour of { Y ( s ) } and the aggregates R A , butalso the joint behaviour of aggregates over diﬀerent regions, A . de Fondeville and Davison (2020) usefunctional Pareto processes to model the dependence in { Y ( s ) } . All of these modelling approachesrely on the marginal shape parameters of { Y ( s ) } to be spatially homogeneous i.e., ξ ( s ) = ξ for all s ∈ A , for each A of interest. This assumption is unlikely to hold for applications to larger regions.When ξ ( s ) varies over a region, models based on the limiting behaviour of the aggregates of { Y ( s ) } are likely to fail. For example, Richards and Tawn (2021) show that when ξ ( s ) > s ∈ S , the tail behaviour of R A will be driven solely by the upper tail behaviour at locations s = arg max { ξ ( s ) : s ∈ S} . We construct a sub-asymptotic spatial model that avoids the spatialhomogeneity constraint.A particular restriction of using models based on max-stable, or Pareto, processes is that theyallow for a restrictive class of dependence structures only. Asymptotic dependence describes the co-occurrence of extremal events and is often quantiﬁed through the upper tail index χ ( s A , s B ) (Joe,1997) for all s A , s B ∈ S , which can be deﬁned for { Y ( s ) } as χ ( s A , s B ) = lim q ↑ χ q ( s A , s b ), where χ q ( s A , s B ) = Pr { Y ( s B ) > F − Y ( s B ) ( q ) | Y ( s A ) > F − Y ( s A ) ( q ) } . (2)In practice, we cannot estimate χ ( s A , s B ) as q ↑

1. Instead, estimates are provided by ﬁxing somehigh threshold q < χ ( s A , s B ) using χ q ( s A , s B ). Max-stable, or Pareto, processesare asymptotically dependent (Coles et al., 1999, Coles, 2001), or perfectly independent, at all spatialdistances. That is, for any max-stable, or Pareto, process exhibiting positive spatial association, wehave χ ( s A , s B ) > s A , s B ∈ S . These models are then unable to account for cases where wehave positive association, but χ ( s A , s B ) = 0 for some s A , s B ∈ S which holds for all Gaussian processeswhen s A (cid:54) = s B ; we refer to this scenario as asymptotic independence, i.e., the tendency for extremeevents to occur increasingly independently as the magnitude of the events gets larger. Extensions ofmax-stable processes, such as max-inﬁnitely divisible processes (Huser et al., 2020), for component-wise maxima can account for asymptotic independence in data. Bopp et al. (2020) illustrate good ﬁtsfor these models to block-maxima data, but they are not appropriate for precipitation event data. adsworth and Tawn (2019) have developed a conditional approach to spatial extremes. Theyprovide a spatial extension of the multivariate Heﬀernan and Tawn (2004) model, which enables themodelling of processes given that at least one location in the process is extreme. Dependence parame-ters within the Heﬀernan and Tawn (2004) model are represented as smooth functions, parametric orsplines, of distance between variables at the site of interest and the conditioning site, and the residualprocess is driven by a latent Gaussian process, see Section 2.2. This modelling approach allows forboth asymptotic dependence and asymptotic independence at diﬀerent spatial distances. We adaptthis approach for modelling extreme precipitation ﬁelds. Our model outperforms approaches that re-strict the dependence in { Y ( s ) } to asymptotic dependence when the true process is asymptoticallyindependent; this is illustrated in Section 4.4. However, even if the true process was asymptoticallydependent our model is able to capture this.Extensions of Wadsworth and Tawn (2019) are provided by Tawn et al. (2018), Shooter et al.(2020), Simpson and Wadsworth (2021), Simpson et al. (2020), Huser and Wadsworth (2020). Thesepapers cover extremal modelling of air and sea temperature ﬁelds and spatial wave heights. Most ofthese applications use a small numbers of sampling locations ( d < d using INLA; however, this imposes restrictionson the dependence structure parameters that are not appropriate in our application. We have d = 934and so ﬁnd that some non-parametric approaches to modelling dependence parameters are infeasi-ble; we explore novel parametric forms for these. Due to the high value of d , we explore a stratiﬁedsampling scheme for model ﬁtting and develop a novel bootstrap method that allows us to estimateuncertainty for estimated parameters. We ﬁnd that when using Monte-Carlo methods to approximatePr { R A > r } , that the position and size of A within S are important considerations that must be takeninto account, as we observe edge eﬀects on this inference for R A .The model is applied to precipitation data from the 2018 UK climate projections (Lowe et al.,2018). The data are from a convection permitting model, and we ﬁnd that the extremal behaviourof the underlying process is driven by spatially-localised events consistent with intensive convectiverainfall. We observe high variability in the ﬁtted model for { Y ( s ) } as we move further away fromthe centre of an event, which corresponds to the observed roughness in events that generate extreme recipitation. We further ﬁnd that the dependence model for { Y ( s ) } ﬁts well and that we are able tocomfortably handle zeroes in the data. We ﬁnd strong indication that we can replicate the empiricaldistribution of R A using Monte-Carlo methods, and so we have evidence to suggest that the furtherinferences we make about the tail behaviour of the aggregates are well-founded.The layout of this paper is as follows: Section 2 describes our model for the process { Y ( s ) } .We describe methods for model inference and simulation of events in Section 3, which includes ourcensoring technique for handling zero values. In Section 4 we discuss the marginal and dependencemodel ﬁts for the precipitation data, and inference on the tail behaviour of spatial aggregates of thesedata. We compare the results from our approach with those using GP D ﬁtted to the sample aggregatesand using a spatial asymptotically dependent model in Section 4.4. We end with further discussionand model extensions in Section 5.

The site-wise marginals of { Y ( s ) } can be modelled using a ﬁtted GP D distribution above somehigh threshold and the empirical distribution below (Coles, 2001). We extend this approach by incor-porating a third component, which we denote p ( s ), that describes the probability that there is no rainat site s . The marginal distribution function of Y ( s ) for each s ∈ S is F Y ( s ) ( y ) =  p ( s ) if y = 0 − λ ( s ) − p ( s )˜ F Y +( s ) ( q ( s )) ˜ F Y + ( s ) ( y ) + p ( s ) if 0 < y ≤ q ( s )1 − λ ( s ) (cid:104) ξ ( s )( y − q ( s )) υ ( s ) (cid:105) − /ξ ( s )+ if y > q ( s ) , (3)where υ ( s ) > F Y + ( s ) ( y ) denotes the empirical distribution function of strictly positive valuesof Y ( s ) and p ( s ) ≥ , λ ( s ) > p ( s ) + λ ( s ) <

1; this ensures that the marginal distribution iscontinuous across components. We expect spatial smoothness over F Y ( s ) and so deﬁne full spatialmodels for the three components of (3) which also enable us to make inference about F Y ( s ) for all s ∈ S , i.e., including where Y ( s ) is not observed.We ﬁrst consider the distribution of Y ( s ) above q ( s ). Following the approach of Youngman GP D model (GAM) to exceedances Y ( s ) − q ( s ). This allows us torepresent the GP D scale and shape parameters, υ ( s ) and ξ ( s ), respectively, through a basis of smoothsplines. We set λ ( s ) = λ for all s ∈ S , allowing us to estimate q ( s ) for s ∈ S . We use a non-parametricapproach, and simply ﬁt a thin-plate spline to point-wise estimates of q ( s ) for the associated λ , as theparametric method of Youngman (2019) fails as we have discrete data below q ( s ).We estimate p ( s ) = Pr { Y ( s ) = 0 } for s ∈ S as a spatially smooth surface by using a logisticGAM (Wood, 2006); that is, we ﬁt p ( s ) = logit { E [ g ( s )] + (cid:15) s } where g ( · ) is a smoothing spline and theerrors (cid:15) s ∼ N (0 , K ) , K > s .For the distribution for 0 < y ≤ q ( s ), we require a spatial extension of the marginal empiricaldistribution of strictly positive values, ˜ F Y + ( s ) ( · ). We use the site-wise empirical distribution for ﬁttingand recognise that, should we require simulation of Y ( s ) for s ∈ S \ s , we can use additive quantileregression (Fasiolo et al., 2020) to compute the empirical distributions at unobserved locations. Thiscan be performed, if necessary, using only a local neighbourhood of sampling locations, as this methodis too computationally expensive for large d and so is not viable without dimension reduction.We use (3) to perform a site-wise standardisation of the margins of the data. For modellingdependence within a process { X ( s ) : s ∈ S} using the Wadsworth and Tawn (2019) conditionalextremes framework, we require its margins to have standard exponential upper tails, i.e., Pr { X ( s ) >x } ∼ C exp( − x ) for some C >

0, as x → ∞ and for all s ∈ S . We follow Tawn et al. (2018) and useLaplace margins. Wadsworth and Tawn (2019) model the underlying extremal dependence in our standardisedprocess { X ( s ) } , given that it is extreme for some s ∈ S , by ﬁrst conditioning on the process being abovesome high threshold u at a speciﬁed site s O ∈ S . We introduce the function h ( s A , s B ) = (cid:107) s A − s B (cid:107) for s A , s B ∈ S , where (cid:107) · (cid:107) is some distance metric (we use the anisotropic measure (12)). Underthe assumption that there exists normalising functions { a : ( R , R + ) → R } , with a ( x,

0) = x , and b : ( R , R + ) → (0 , ∞ ) } , such that as u → ∞ , they assume that for each s O ∈ S (cid:18)(cid:26) X ( s ) − a { X ( s O ) , h ( s, s O ) } b { X ( s O ) , h ( s, s O ) } : s ∈ S (cid:27) , X ( s O ) − u (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) X ( s O ) > u (cid:19) d −→ (cid:32)(cid:26) Z ( s | s O ) : s ∈ S (cid:27) , E (cid:33) , (4)where E is a standard exponential variable and process { Z ( s | s O ) } which is non-degenerate for all s ∈ S where s (cid:54) = s O . That is, there is convergence in distribution of the normalised process to { Z ( s | s O ) : s ∈ S} , termed the residual process, which is independent of E, and Z ( s O | s O ) = 0 almostsurely. Characterisations of the normalising functions, a and b and the residual process Z ( s | s O ) aregiven in Sections 2.2.1 and 2.2.2, respectively.To make inference on the upper tail of R A for any A ⊂ S we require the process { X ( s ) } givenan extreme value somewhere in the domain S , i.e., (cid:26) X ( s ) : s ∈ S (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) (cid:18) max s ∈S X ( s ) > u (cid:19) (5)for large u . Limit (4) conditions only on observing an exceedance at a speciﬁc site s O ∈ S , socannot be immediately used. However, this limit provides a core building block for what is requiredwhen combined with a limiting model for { X ( s O ) > u : s O ∈ S}| (max s ∈S X ( s ) > u ) as u → ∞ . For astationary process, this limiting model will be invariant to s O for all s O ∈ S which are suﬃciently farfrom the boundaries of S . Wadsworth and Tawn (2019) show how simulation from process (5) can beachieved through an importance sampling method; the outline of this is given in Section 3.4.For the process { X ( s ) } to be ergodic over R , we need conditions on a and b and Z ( s | s O ) so thatindependence is achieved as h = h ( s, s O ) → ∞ for any s O ∈ S and suitably distanced s ∈ S . Thisrequires for any ﬁxed x >

0, that a ( x, h ) → b ( x, h ) → h → ∞ . Further, the residual process Z ( s | s O ) must have identical margins to X ( s ) as h → ∞ ; in particular, we require standard Laplacemargins for Z ( s | s O ) as h → ∞ . For inference, we assume parametric forms for the a and b normalising functions in limit (4).Wadsworth and Tawn (2019) provide a discussion of these normalising functions and provide somesuggestions for their possible parametric forms. We considered the range of parametrics forms discussed y Wadsworth and Tawn (2019), Tawn et al. (2018), Shooter et al. (2020), but for brevity we reportonly the models that provided the best ﬁt. We let a ( x, h ) = xα ( h ) , with α ( h ) =  , h ≤ ∆ , exp( −{ ( h − ∆) /κ α } κ α ) , h > ∆ , (6)where ∆ ≥ κ α , κ α > { X ( s ) } to be asymptotically dependent up to distance ∆from s O , and asymptotically independent thereafter. We also take b ( x, h ) = x β ( h ) , with β ( h ) = κ β exp( −{ h/κ β } κ β ) (7)for κ β , κ β > κ β ∈ [0 , b (0 , x ) = x κ β . Ergodicity holds for { X ( s ) } whatever theparameters of a and b , as a ( x, h ) → b ( x, h ) → h → ∞ for ﬁxed x > { Z ( s | s O ) } We follow Shooter et al. (2020) by imposing that the residual process { Z ( s | s O ) } has delta-Laplacemargins; a random variable follows a delta-Laplace distribution, i.e., DL ( µ, σ, δ ), with location, scaleand shape parameters µ ∈ R , σ > δ >

0, respectively, if its density is f ( z ) = δ kσ Γ (cid:0) δ (cid:1) exp (cid:40) − (cid:12)(cid:12)(cid:12)(cid:12) z − µkσ (cid:12)(cid:12)(cid:12)(cid:12) δ (cid:41) , ( z ∈ R ) (8)with Γ( · ) as the standard gamma function and k = Γ(1 /δ ) / Γ(3 /δ ). The scaling by k is used to improveidentiﬁability between σ and δ , as the random variable has expectation µ and variance σ regardlessof the value of δ . Use of the delta-Laplace distribution introduces ﬂexibility in the marginal choice for Z ( s | s O ), as for δ = 1 or 2, we have the Laplace or Gaussian distributions respectively. As with thenormalising functions, we parametrise the delta-Laplace parameters as smooth functions of distancefrom the conditioning site s O . That is, Z ( s | s O ) ∼ DL ( µ { h ( s, s O ) } , σ { h ( s, s O ) } , δ { h ( s, s O ) } ), with µ ( h ) = κ µ h κ µ exp {− h/κ µ } , ( κ µ > , κ µ > ,σ ( h ) = √ − exp {− ( h/κ σ ) κ σ } ) , ( κ σ > , κ σ > ,δ ( h ) = 1 + ( κ δ h κ δ − κ δ ) exp {− h/κ δ } , ( κ δ ≥ , κ δ > , κ δ > , κ δ < , (9) or h ≥

0. These functions satisfy the constraint that µ (0) = σ (0) = 0, which ensures that Z ( s O | s O ) = 0holds and provides a ﬂexible modelling choice for δ . We do not constrain any particular value of δ (0)and instead let δ (0) = 1 − κ δ >

0. Furthermore, ergodicity of X ( s ) is achieved as µ ( h ) → σ ( h ) → δ ( h ) → h → ∞ , where the variance of a standard Laplace random variable is 2.Following the approach of Shooter et al. (2020), dependence in { Z ( s | s O ) } is induced by ﬁrst con-sidering the process { W ( s | s O ) } = { W ( s ) | ( W ( s O ) = 0) } for all s ∈ S , where { W ( s ) } is a standard sta-tionary Gaussian process with correlation function ρ ( h ). We set { Z ( s | s O ) } = { F − Z ( s | s O ) { Φ[ W ( s | s O )] }} for all s ∈ S , where Φ( · ) and F Z ( s | s O ) are the CDFs of a standard Gaussian distribution and Z ( s | s O ),respectively. The corresponding density function to F Z ( s | s O ) is f Z ( s | s O ) , deﬁned by (8) and (9).To illustrate the dependence in { Z ( s | s O ) : s ∈ S} , we consider the joint distribution of Z ( s | s O )in a ﬁnite-dimensional setting, which we achieve by using a Gaussian copula model. Consider any s O ∈ ( s , . . . , s d ), and without loss of generality, rewrite the sampling locations as s , ( s , . . . , s d − ),i.e., here we illustrate with s O = s d . The joint distribution of { Z ( s | s O ) , . . . , Z ( s d − | s O ) } for s O is, for z = ( z , . . . , z d − ), F s O ( z ) = Φ d − (cid:8) Φ − ( F Z ( s | s O ) ( z )) , . . . , Φ − ( F Z ( s d − | s O ) ( z d − )); , Σ (cid:9) , (10)where Φ d − ( · ; , Σ) is the CDF of a ( d − − dimensional Gaussian distribution with mean . Thecorrelation matrix Σ must account for the conditioning W ( s ) | ( W ( s O ) = 0). To create Σ, we initialisea stationary correlation matrix Σ ∗ using correlation function ρ ( · ) evaluated for all pairwise distances,and we condition on observing W ( s O ) = 0. That is, the correlation matrix Σ has ( i, j )-th elementΣ ij = Σ ∗ ij − Σ ∗ i Σ ∗ j (1 − Σ ∗ i ) / (1 − Σ ∗ j ) / . (11)Note the elements of Σ are normalised such that the diagonal elements are equal to one. In ourapplication, ρ ( · ) is taken to be the Mat´ern correlation function ρ ( h ) = 12 κ ρ − Γ( κ ρ ) (cid:18) h √ κ ρ κ ρ (cid:19) κ ρ K κ ρ (cid:18) h √ κ ρ κ ρ (cid:19) , ( κ ρ > , κ ρ > , where K κ ρ ( · ) is the modiﬁed Bessel function of the second kind of order κ ρ . o account for spatial anisotropy in the extremal dependence structure of { X ( s ) } we use thetransformation of coordinates s ∗ =  /L   cos θ − sin θ sin θ cos θ  s, (12)where θ ∈ [ − π/ ,

0] controls rotation and

L > L = 1recovering the isotropic model. We deﬁne our distance metric (cid:107) s A − s B (cid:107) = (cid:107) s ∗ A − s ∗ B (cid:107) ∗ , where (cid:107) · (cid:107) ∗ denotes great-circle, or spherical, distance. The non-zero probability of zeroes for precipitation Y ( s ) causes the X ( s ), for all s ∈ S , tohave non-zero mass at a ﬁnite lower endpoint. Consequently, the Gaussian copula and delta-Laplacemarginal model described in (10) are not appropriate for the transformed precipitation data in theselower tail regions. To circumvent this issue, we apply censoring at all points where Y t ( s ) = 0 for all s ∈ S and t = 1 , . . . , n . A spatially-varying censoring threshold c ( s ) is attained by transforming p ( s )in Section 2.1 to the Laplace scale, using c ( s ) = F − L { p ( s ) } , where F L ( · ) is the standard Laplace CDF.We then assert that Y t ( s ) = 0 ⇔ X t ( s ) ≤ c ( s ), which in turn implies that Z t ( s | s O ) ≤ c ( s O ) t ( s ) where c ( s O ) t ( s ) is dependent on the value observed at the conditioning site and given in (13).For inference, the number of censored components varies at each time point; the number, locationsand censoring values can all vary, with a maximum value of d − s ∈ S \ s . These features are detailed in Appendix A. Our censored triplewise likelihood approach for model ﬁtting is based on the pseudo-likelihoodapproach of Padoan et al. (2010); their pairwise approach provides unbiased estimation of model ependence parameters. Recall that some observations are right-censored at diﬀerent sampling loca-tions with varying rate of occurrence over time. To deﬁne a single likelihood contribution at time t , we begin by considering a single conditioning site amongst the observed sites s i ∈ ( s , . . . , s d )such that y t ( s i ) > F − Y ( s i ) { F L ( u ) } and hence x t ( s i ) > u . We then deﬁne the set of all such times by T ( s i ) = { t = 1 , . . . , n : x t ( s i ) ≥ u } . For the observed sites, we deﬁne h i,j = h ( s i , s j ) for i, j = 1 , . . . , d with i (cid:54) = j . Then for each site s j , j = 1 , . . . , d, j (cid:54) = i , we deﬁne the residual for time t ∈ T ( s i ) forconditioning site s i as z ( s i ) t ( s j ) =  [ x t ( s j ) − a { x t ( s i ) , h i,j } ] /b { x t ( s i ) , h i,j , } if x t ( s j ) > c ( s j ) ,c ( s i ) t ( s j ) , otherwise, (13)where a and b are described in Section 2.2.1, and c ( s i ) t ( s j ) = c ( s j ) − a { x t ( s i ) ,h i,j } b { x t ( s i ) ,h i,j } is the censored residualfor site s j with conditioning site s i and t ∈ T ( s i ) . The full pseudo-likelihood is given, for residuals z ( s i ) t with conditioning site s i , by L CL ( ψ ) = d (cid:89) i =1 (cid:89) t ∈T ( si ) L s i CL ( ψ ; z ( s i ) t ) = d (cid:89) i =1 (cid:89) t ∈T ( si ) (cid:89) ∀ jc ( si ) t ( s j ) } b { x t ( s i ) , h i,k } { z t ( s k ) >c ( si ) t ( s k ) } , where 1 {·} is the indicator function. The parameter vector ψ contains all parameters of the normalisingfunctions a and b , the marginal parameter functions µ, σ and δ , the correlation ρ and the anisotropy.Estimation of ψ can be achieved by maximising (14). Clearly, maximising (14) is computationally infeasible if d is large, as evaluation of (14) requires( d − d − (cid:80) di =1 |T ( s i ) | / g ( s i ) , which can require double integrals and grows as O ( d n ).We detail a stratiﬁed sampling regime to create a pseudo-likelihood that circumvents the computational ssue. To construct a sub-sample of data that can be used to estimate the parameters ψ via pseudo-likelihood estimation, we ﬁrst need to consider what these parameters represent. These parameterscontrol characteristics of the dependence functions described in Sections 2.2.1 and 2.2.2, which arefunctions of either distance to the conditioning site h i,j or pairwise distances h j,k for s i , s j , s k ∈ s . Thus,we construct our sub-sample by drawing triples of sites ( s i , s j , s k ) ∈ s ; there are d ( d − d − / s i , s j , s k ) is chosen so that the distances h i,j , h i,k and h j,k do not exceeda speciﬁed threshold. This is a natural extension of the approach of Huser and Davison (2013) whosuggest using only pairs of locations that are within some low distance h max > d s (cid:28) d ( d − d − / s i ∈ ( s , . . . , s d ). A pair of sites are then drawn randomly from the set { s j , s k ∈ s \ s i , j < k : max { h i,j , h i,k } < h max } without replacement, and the process is repeated. Insampling in this way, only sites within distance h max of the conditioning site are used for inference. Thatis, we estimate the spatial functions of the dependence parameters for h i,j < h max and h i,k < h max ,and so h j,k < h max only, and then extrapolate to larger distances. There is a trade-oﬀ involved inchoosing the value of h max : if too low, then extrapolations to larger distances are likely to be poor; iftoo high, ﬁt at small distances is compromised. In Section 4.3, we describe a heuristic technique forchoosing h max and we ﬁnd this works well in practice. The number of triples d s is chosen to be aslarge as possible, whilst pseudo-likelihood estimation remains computationally feasible. Our aim is to estimate the sampling distribution of ˆ ψ d s , the parameter estimates of the model ﬁtusing the d s triples sampled under the regime described in Section 3.2. However, to derive this usinga bootstrapped sample of ˆ ψ d s , denoted ˆ ψ ∗ d s , may be computationally infeasible; if d s is large, as it is omputationally expensive to get the required number of replicated values. We detail a bootstrappingregime that uses less data, and hence is computationally feasible, but it is still reliable.We denote ˆ ψ m of length Q as parameter estimates using a sample of m ∈ N stratiﬁed sampledtriples with m ≤ d S . We further denote ˆ ψ ∗ m as a bootstrap samples of parameter estimates achievedvia maximum pseudo-likelihood estimation of (14) with m triples, and V m as the Q × Q variance matrixof the estimator of ˆ ψ m which are calculated using ˆ ψ ∗ m . We take a scaling approach to approximatethe sampling distribution of ˆ ψ d s . This is achieved by creating a bootstrap sample of ˆ ψ ∗ d s ; although wecannot compute this directly, we can compute a bootstrap sample ˆ ψ ∗ d s /w for w >

1, such that d s /w ∈ N ;that is, a sample created using d s /w < d s triples where the replicates of ˆ ψ ∗ d s /w can be estimated in afeasible time-frame. We then apply a linear transformation to ˆ ψ ∗ d s /w to create an approximate sampleof ˆ ψ ∗ d s . To illustrate this, let ˜ ψ d s /w be the component-wise mean of the ˆ ψ ∗ d s /w replicates. Then eachreplicate is transformed to giveˆ ψ ∗ d s = ˆ ψ d s + V − / d s V / d s /w (cid:16) ˆ ψ ∗ d s /w − ˜ ψ d s /w (cid:17) = ˆ ψ d s + λ − / (cid:16) ˆ ψ ∗ d s /w − ˜ ψ d s /w (cid:17) , (15)where λ > V d s is to be speciﬁed below. This ensures that the boot-strap sample ˆ ψ ∗ d s has expectation ˆ ψ d s and variance V d s . The sampling distribution of ˆ ψ d s is thenapproximated empirically from ˆ ψ ∗ d s .To estimate λ , we begin by estimating V d s ; although direct computation is infeasible, we canestimate V d s /w for d s > d s /w ∈ N , i.e., the variance of the parameter estimates for the model ﬁt using d s /w triples. As long as the same sampling mechanism is used to create the sub-sampled triples ofsize d s and d s /w , i.e., that described in Section 3.2, it follows that V d s ≈ λV d s /w for some λ >

1. Ifobservations in both samples are truly independent of one another, we have that λ = w . However, thisis unlikely to be the case as observations will exhibit spatial dependence. To estimate λ , we note that | V d s | ≈ | λV d s /w | = λ Q | V d s /w | , (16)where | · | denotes the matrix determinant and Q is the size of ψ ; this follows from the property | λM | = λ Q | M | for constant λ > M a Q × Q matrix. Ideally, we would rewrite (16) to approximate λ ; however, we cannot compute V d s directly. Instead we estimate V d s /w and V d s / (2 w ) , and use (16) to stimate λ such that V d s /w ≈ λ V d s / (2 w ) . It follows that λ = λ log ( w )2 = (cid:18) | V d s /w || V d s / (2 w ) | (cid:19) log ( w ) /Q , (17)and we can use this to estimate (15). The exponent in (17) follows as λ corresponds to the variancematrix scaling factor if the sample size doubles; it would take log ( w ) repetitions of doubling d s /w toreach d s . We now detail a technique that will allow us to draw realisations of { Y ( s ) : s ∈ S} . First, we notethat the model in Section 2.2 does not describe the dependence in all of { Y ( s ) } ; instead, it describes (cid:26) Y ( s ) : s ∈ S (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) (cid:18) max s ∈S (cid:8) F − L ( F Y ( s ) { Y ( s ) } ) (cid:9) > v (cid:19) ≡ (cid:26) F − Y ( s ) ( F L { X ( s ) } ) : s ∈ S (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) (cid:18) max s ∈S { X ( s ) } > v (cid:19) , (18)for v ≥ u with u used for ﬁtting in Section 3.1. Thus, to create a realisation of { Y ( s ) : s ∈ S} , wedraw realisations of (18) with probabilityPr (cid:26) max s ∈S (cid:8) F − L ( F Y ( s ) { Y ( s ) } ) (cid:9) > v (cid:27) , (19)and otherwise draw realisations of (cid:26) Y ( s ) : s ∈ S (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) (cid:18) max s ∈S (cid:8) F − L ( F Y ( s ) { Y ( s ) } ) (cid:9) < v (cid:19) . (20)As we do not expect realisations of (20) to contribute to the tail behaviour of R A , we simply drawrealisations of (20) from the observed data. We estimate (19) empirically; although this could beinferred using the parametric model of Section 2. If S does not correspond to the set of samplinglocations, then we would have to approximate (20) though some form of inﬁlling, i.e., using the quantileregression technique (Fasiolo et al., 2020) discussed in Section 2.1.We now describe a simulation technique that will allow us to draw realisations of (18). That is,the ﬁeld { Y ( s ) : s ∈ S} given that an extreme value above a threshold is observed anywhere in thedomain. This threshold varies with s and corresponds to the relative quantile v on the Laplace scale. adsworth and Tawn (2019) detail the procedure for achieving this. There are three steps: drawingconditioning sites s O ∈ S , simulating the ﬁelds { Y ( s ) : s ∈ S}| ( F − L ( F Y ( s O ) { Y ( s O ) } ) > v ) using theﬁtted model described in Section 2, and then using importance sampling to approximate (18), seeAlgorithm 1, Step 2. The ﬁrst step requires random sampling of conditioning sites s O for some s O ∈ S ;we do this uniformly, which provides a good ﬁrst approximation of the occurrence of these sites in S and then improve on this via the importance sampling regime described below.To simulate N realisations from process (18), we follow Wadsworth and Tawn (2019) and draw aninitial N (cid:48) > N realisations of the process { X ( s ) : s ∈ S} on the Laplace scale. Then, using importancesampling, we sub-sample N realisations from { X ( s ) : s ∈ S}| max s ∈S X ( s ) > v , and transform themargins of the sample to the original scale, { Y ( s ) } . The sub-sampling regime adds extra weight torealisations for which the conditioning site is near the boundary of the domain. This is to alleviatethe edge eﬀect caused by not using conditioning sites outside of the boundaries of S . A discussion of arelated issue is given in Section 3.5. We found that setting N (cid:48) ≈ N was suﬃcient for our application,although this may be dependent on the size of S and value of N . Algorithm 1

Simulating (18)1. For i = 1 , . . . , N (cid:48) with N (cid:48) > N :(a) Draw a conditioning location s ( i ) O from S with uniform probability density 1 / |S| .(b) Simulate E ( i ) ∼ Exp (1) and set x i ( s ( i ) O ) = v + E ( i ) .(c) Simulate a ﬁeld { z i ( s | s ( i ) O ) : s ∈ S} from the residual process model deﬁned in Section2.2.2.(d) Set { x i ( s ) : s ∈ S} = a { x i ( s ( i ) O ) , h ( s, s ( i ) O ) } + b { x i ( s ( i ) O ) , h ( s, s ( i ) O ) } × { z i ( s | s ( i ) O ) : s ∈ S} .2. Assign each simulated ﬁeld { x i ( s ) : s ∈ S} an importance weight of (cid:26)(cid:90) S { x i ( s ) > v } d s (cid:27) − , for i = 1 , . . . , N (cid:48) , and sub-sample N realisations from the collection with probabilitiesproportional to these weights.3. Transform each { x i ( s ) : s ∈ S} to { y i ( s ) : s ∈ S} using the marginal transformation (3).If x i ( s ) ≤ c ( s ), set y i ( s ) = 0, where for some s (cid:48) ∈ S , y i ( s (cid:48) ) is above its F L ( v )-th quantile. Using the sample of realisations of { Y ( s ) : s ∈ S} generated in Section 3.4, we make inferenceabout the tail behaviour of R A in (1) or the corresponding sum; here we focus on the latter, but a iscussion of the integral is given in Section 5. The possible size of the aggregation region A in relationto the region S is of particular interest. Trivially, we require A ⊆ S . However, we cannot have A = S ,as if we did, the simulation algorithm will never generate an event for which the conditioning site liesoutside of the boundaries of S , but we still observe an extreme event somewhere inside A . To avoidsuch edge-eﬀects, we require the boundaries of A to be far enough inside the interior of S , such thatthe distribution (18) does not change if the size of S increases. Informally, we require a buﬀer zonebetween the boundaries of A and S which is large enough, such that any event with conditioning siteoutside of S has negligible eﬀect on the distribution within A . We select the width τ of this buﬀerzone by using the measure χ q ( s A , s B ) given in (2) and stationarity. We choose τ such that for any s ∈ A and s O ∈ R \ S , such that for h ( s, s O ) > τ , we have χ q ( s O , s ) < γ for small γ > q ; that is, we have small probability less than γ of observing a large event at s given that there isan extreme event at any site outside of S . This measure can be evaluated empirically or by simulatingfrom the ﬁtted model; we take the latter approach in Section 4.4. We consider data consisting of average hourly precipitation rate (mm/hour) taken from the UKclimate projections 2018 (UKCP18) (Lowe et al., 2018). Data are from a convection permitting modelwhich produces values over hourly intervals between the years 1980 and 2000, using the observedatmospheric conditions. The sampling locations are (5 km ) grid boxes corresponding to the BritishNational Grid from Ordnance Survey (OSGB). The spatial domain S of interest is East-Anglia, UK(see Figure 1) and only data sampled over land have been included, leaving 934 sampling locations.Each observation corresponds to the average over the assigned spatio-temporal grid-box. The datarepresent the average in each grid-box, and so a natural quantity of interest is ¯ R A := R A / |A| , ratherthan R A , but we present results on R A as this variable must satisfy the ordering constraints discussedin Section 1. To remove any seasonal eﬀect observed in the data, we use summer (JJA) observationsonly, leaving 43200 ﬁelds. We chose to take summer precipitation events as these typically exhibithigher intensity than winter events (Sharkey and Winter, 2019). We treat the centre of each grid boxas a sampling location, and as the grid-boxes are non-overlapping and contiguous, we can approximate he integral R A in (1) using a sum. We use the great-circle distance as our distance metric describedin Section 2.2.2. Initial analysis shows that the data consists of 8 .

7% hours with zero precipitation, but much ofthe data with non-zero values exhibits noise around zero produced by the climate model. Thus, thedata less than 1 × − mm/hour were set to zero , increasing the average number of dry hours to83 . p ( s ) within agiven hour; this is estimated using the logistic regression GAM detailed in Section 2.1. We observesome spatial variation in p ( s ), with slightly lower estimates being found along the north-east coast.We ﬁt the spatial marginal model detailed in Section 2.1. We take λ ( s ) = 0 .

995 for all s ∈ S in(3) and the corresponding GP D threshold q ( s ), estimated using a thin-plate spline, is illustrated inFigure 1 with q ( s ) varying roughly over S ; larger values are found along the east coast. The GP D

GAM model with spatially smooth estimate parameters is then ﬁt to site-wise exceedances above ˆ q ( s )at each site; a spatial map of the shape parameters are given in Figure 1. We take the approach ofYoungman (2019) and use as many knots as is computationally feasible in the thin-plate splines, whichis 300. This creates a potentially overly rough spline which may overﬁt the data and not capturetrue physical smoothness; however, our primary interest is in the dependence structure when studyingaggregates as this is the novel element of our model, so we chose this approach to ensure that theempirical marginal distributions are as well modelled as possible. We observe ˆ ξ ( s ) > s ∈ S ,and so the marginal upper tails are unbounded at each site. Q-Q plots of the marginal ﬁts at ﬁverandomly sampled locations are presented in Figure S2, all showing good ﬁts. To evaluate the ﬁt overall locations, we use a pooled Q-Q plot, transforming all data onto standard exponential margins usingthe ﬁtted model, see Figure S2. Again the ﬁt is remarkably good, although conﬁdence intervals arenot provided due to the spatial dependence in the pooled data. All dependence models are ﬁtted by taking the exceedance threshold u in (4) to be the standardLaplace 98% quantile. This leaves 864 ﬁelds for ﬁtting the extremal dependence model given an A level which would be recorded as zero by a rain gauge. p ( s ), centre: ˆ q ( s ), right: ˆ ξ ( s ). ˆ υ ( s ) is illustrated in Figure S1. observed extreme at a single conditioning site. The empirical estimate (and 95% conﬁdence interval)for the probability in (19) is 0 .

273 (0 . , . u was considered; however, we found that this leadsto poorer model ﬁts as the data exhibits a partial mixing of dependence structures. We believe thisis due to the presence of multiple data generating processes in the climate model. Precipitation istypically generated by either high intensity events with localised spatial proﬁles, i.e., convective cells,or low intensity events with much large spatial proﬁles, i.e., frontal storms (Thomassen et al., 2020). Inthe absence of covariates to distinguish between these events in the data, we use a higher exceedancethreshold to remove any frontal events; this is discussed further in Section 5.We proceed with an initial analysis by ﬁtting a simple version of the model of Section 2.2 tothese data. We ﬁt the model with two caveats: we make the temporary assumption that the residualprocess { Z ( s | s O ) } is independent at all distances; and evaluate a sequence of “free” pairwise parameterestimates (Wadsworth and Tawn, 2019) for the normalising functions and those functions that describethe marginal characteristics of { Z ( s | s O ) } . That is, we ﬁt individual parameters, i.e., α ( s O ) s i etc. for i = 1 , . . . , d with s i (cid:54) = s O , rather than a spatial function α { h ( s, s ) } , and we do this for seven diﬀerentconditioning sites s O sampled randomly over S . This approach can be used to assess non-stationarityin { X ( s ) } ; if we observe clear disagreement in the parameter estimates for the diﬀerent conditioningsites, then the assumption of stationarity of { X ( s ) } is unlikely to be appropriate. We ﬁnd no evidence h , which is calculated under the anisotropy transformation for thefull spatial model. Estimates from the free ﬁts described in Section 4.3 are given by the blackpoints, parametric spatial functions are given in red (asymptotically independent model) andblue (asymptotically dependent model). Bottom right: estimates from model for χ q ( s, s O ) in(2) with q = 1 / (24 × km ) are given in the spatial anisotropy setting. for non-stationarity in the parameter estimates presented in Figure 2; while we observe some volatilityin the free parameter estimates, the general patterns appear to be the same regardless of the choiceof conditioning site. We use the spatial structure in the free estimates to motivate our choice for theforms of the parameter functions detailed in Sections 2.2.1 and 2.2.2.Using the sampling method described in Section 3.2, the full spatial ﬁt uses d s = 5000 triples ofsites with each sampling location being used as a conditioning site at least once. As the estimates of α and β in Figure 2 decay quickly with increasing spatial distance, i.e., for any distance greater than25 km , α ≈ β < .

5; this illustrates that the underlying process Y ( s ) exhibits fairly localisedstrong extremal dependence. This suggests that we should focus on modelling extremal dependencelocally, as this will be the driving factor of the aggregate behaviour. A distance of 25 km in theanisotropic setting corresponds to an approximate distance of 28 km in the original setting, and so weset h max = 28 km . Although 5000 triples of sites represents a very small proportion of all possibletriples, we observe a good model ﬁt from Figure 2, which shows that, even at distances greater than28 km , the ﬁtted parametric functions for the dependence parameters correspond well to the sequencesof free estimates. We further investigate the choice of h max after ﬁtting the model igure 2 can be used to make further inference about the underlying dependence structure of theprecipitation process. For example, we ﬁnd that ∆ in (6) can be taken to be zero without restricting thequality of the ﬁt and similarly we can set κ δ = κ β = 1. The estimate ∆ = 0 suggests that the processis asymptotically independent at even the closest distances as asymptotic dependence requires both α ( h ) = 1 and β ( h ) = 0 for all h , which the estimates in Figure 2 suggest is not the case; a ﬁt imposingasymptotic dependence is discussed later. Furthermore, we found that incorporating spatial anisotropyinto the dependence model improved the overall ﬁt; stronger extremal dependence was found alongan approximate − ◦ bearing, reducing by at most 7% over diﬀerent directions. Parameter estimates(and standard errors) are provided in Table S1. Although not illustrated in Figure 2, ρ decays quicklywith distance, with ρ (100) ≈ .

2. Standard errors are estimated using the bootstrap scheme describedin Section 3.3 with w = 20 and the use of 250 bootstrap samples. We estimate ˆ λ = 1 .

042 in (17).To further support the choice of h max , we estimate χ q ( s, s O ) in (2) for s O ∈ S in the centreof S , taking q corresponding to a one-year return level probability, and look to see how far away s must be for χ q ( s, s O ) to be less than γ for small γ > χ q ( s, s O ) bysimulating 5 × replications, using Algorithm 1, from the ﬁtted model and is illustrated in Figure 2,bottom-right panel; for γ = 0 .

1, we ﬁnd that a distance of h max is suﬃcient and so we set τ = h max inSection 3.5, discussed further in Section 4.4.Figure 3 illustrates six extreme ﬁelds: three realisations from the model deﬁned in (18) andthree observations from the data. Fields are chosen such that the site in the centre of S exceeds its99 . α quickly goes to zero with distance, all extremal dependence is instead exhibited through β . This is atypical of ﬁts of this model for other applications (Wadsworth and Tawn, 2019, Shooteret al., 2020, Simpson and Wadsworth, 2021), where the α function drives the extremal behaviourof the considered processes, e.g., temperature and sea wave heights. Having β controlling extremal dependence would suggest that the process that generates the extreme precipitation we are modellingis somewhat rough; this concurs with the observed ﬁelds containing an extreme value shown in Figure3, and consistent with the spatial nature of strong convective rainfall. To illustrate this, we notethat, for small h = h ( s, s O ), we have α ( h ) ≈ β ( h ) ≈

1, then E [ X ( s ) | X ( s O ) = x ] ≈ µ ( h ) x andvar (() X ( s ) | X ( s O ) = x ) ≈ x σ ( h ), and so the largest events at s O are the most variable. This has notbeen observed in other applications as the extremal dependence in these processes is typically quitesmooth with var (() X ( s ) | X ( s O ) = x ) ≈ σ ( h ) as β ( h ) ≈ h is small. Even at the largest h ,the process { X ( s ) } does not exhibit independence; although α and β tend to zero the residual processdoes not attain standard Laplace margins with δ ( h ) = 1.Existing literature for approaches that rely on modelling the underlying process to make inferencethe extremal behaviour of spatial aggregates of precipitation typically use models that only allow forasymptotic dependence (Coles, 1993, Coles and Tawn, 1996, Buishand et al., 2008). We ﬁt such amodel to illustrate that imposing asymptotic dependence may lead to poor inference for the tails ofspatial aggregates. We term this the “AD model” and the model described above as the “AI model”.To specify the AD model, we ﬁx α ( h ) = 1 and β ( h ) = 0 for all h and we change σ ( h ) in (9) to σ ( h ) = κ σ (1 − exp {− ( h/κ σ ) κ σ } ) with κ σ , κ σ , κ σ > , as we no longer require that σ ( h ) → √ h ) → ∞ . The corresponding µ ( · ) and δ ( · ) functional forms remain the same and the spatial anisotropysetting described in (12) is still used. To fully capture the behaviour of µ ( · ), we found we had to take max = 75 km . The estimated spatial functions for the AD model are illustrated in Figure 2. With α and β ﬁxed, we observe that the other parameters are forced to compensate for this misspeciﬁcation.For example, we observe a strictly negative µ function; this is to compensate for ﬁxing the α valuetoo large for the data. Given this, we re-estimated the free parameters with α = 1 and β = 0 ﬁxedand observed good agreement between the spatial functions and these new estimates. However, thisdoes not imply that the model as a whole ﬁts well, this is emphasised in Section 4.4 where spatialaggregates of simulated ﬁelds { Y ( s ) } are studied. Q-Q plots, presented in Figure 4, assess how well the tails of the simulated distributions compareagainst the tails of the empirical distribution of the spatial averages. Conﬁdence intervals given for thesimulated quantiles are derived using the bootstrap; for each of the 250 bootstrap parameter estimatesdiscussed in Section 3.3, we draw 5 × realisations of { Y ( s ) : s ∈ S} using the regime described inSection 3.4. Then ¯ R A is calculated for each sample and for each region A ; per the discussion in Section4.3, each A is at least τ = 28 km away from the boundaries of S . Figure 4 illustrates generally goodﬁts for the tails of ¯ R A with nested regions A . The AI model appears to slightly underestimate thetrue magnitude of the largest aggregates for the largest regions; while this may suggest that the modelis not capturing dependence at further distances from the conditioning site, Figure 2 suggests thatthe model ﬁts well even at the furthest distances. This leads us to suspect that there is a mixture ofevents present in the data when we consider large spatial regions for aggregation, and that the modelis not ﬂexible enough to capture these mixtures, see Section 5. To illustrate the beneﬁts of using ourapproach, Figure 4 also illustrates the same diagnostics for the AD model described in Section 4.3 forthe smallest and largest regions. The AD model provides much poorer ﬁts than the AI models, asit always overestimates the quantiles; this suggests that the AD model overestimates the dependencewithin the original process even for the smallest aggregation regions. A similar plot for non-overlappingregions is illustrated in Figure S5; here we observe some underestimation in the largest estimated returnlevels for the regions closest to the east coast, which is a possible indication of non-stationarity alongthis coast.As discussed in Section 1, obtaining physically consistent return level estimates of spatial aggre-gates is essential. We compare two methods for achieving this: (i) performing a long-run simulation R A of regions of increasing size. Left: AI model,right: AD model. Probabilities range from 0 . A with corresponding areas(125 , , , , , − km . Regions 1-6 are coloured red, green, blue, cyan, purple,yellow; regions include both the coloured and interior points. from our model, deriving empirical estimates of return-levels from these replicates; (ii) ﬁtting a GP D distribution to the observed aggregate tails and extrapolating to the desired return-level. Ideally, wewant to use only the former approach as this mitigates the potential issues with using method (ii)discussed in Section 1; however, for computational eﬃciency, we perform a shorter run for method (i)with 5 × realisations and use a ﬁtted GP D to extrapolate to the largest return-levels. Figure 5presents estimates of return level curves for R A over the nested regions, illustrated in Figure 4, usingmethods (i) and (ii), top-left and top-right panels, respectively. For each region A , a GP D distributionis ﬁtted to exceedances of the respective sample R A above the 99 .

9% quantile and return level curvesare estimated from these ﬁts. In the top-right panel of Figure 5, we observe intersection in the returnlevel curves estimated for the two smallest regions using method (ii). This problem does not ariseusing the computationally eﬃcient version of approach (i), e.g., in the top-left panel of Figure 5, wherewe have 5 × × / ≈ R A usingboth methods, for a single region A . Conﬁdence intervals are derived for both methods by ﬁtting a GP D to 250 bootstrap samples of R A : in method (i), these are the samples as described at the top ofthis section; for (ii), we perform a simple bootstrap of the observed data, assuming temporal indepen- ence. A higher exceedance threshold for approach (i) was considered, but we found that the diﬀerencein estimates was negligible; to support the use of the 99 .

9% quantile, we illustrate a pooled Q-Q plot inFigure 5, transforming exceedances from all 250 bootstrap samples onto standard Exponential marginsusing their respective

GP D ﬁts and observe an excellent overall ﬁt.

Figure 5: Top: Estimated return level curves of R A using the model (left) and observations(right). Colours correspond to the regions illustrated in Figure 4. Bottom-left: return levelestimates for Region 5 in Figure 4 using methods (i) and (ii) in black and red, respectively.95% conﬁdence intervals for the methods are given by the coloured dashed lines. Bottom-right:Q-Q plot for pooled GP D ﬁt for approach (i), over all 250 bootstrap samples, on standardExponential margins. 95% tolerance bounds are given by the dashed lines.

A further point of interest for practitioners is inference on the joint behaviour of ( ¯ R A , ¯ R B ) fordiﬀerent regions A , B ∈ S . We investigate this joint behaviour for the diﬀerent aggregate regionsgiven in Figure 4. Figure 6 illustrates realisations of pairwise ( ¯ R A , ¯ R B ) for both model and empiricalestimates with nested regions A , B , showing that the model captures the joint distributions well; asimilar ﬁgure is given for non-overlapping regions in Figure S6, in which we observe that extreme eventsdo not typically occur together. This suggests that the extremal behaviour of the aggregates is drivenby spatially-localised events. Further evidence for this can be found in Figure 6 for aggregates overnested regions, as we observe weakening extremal dependence between aggregates over the smallest,and increasingly larger, regions. × realisations of pairwise ( ¯ R A , ¯ R B ) for nested regions A , B , illustratedin Figure 4. Black points are model estimates, red points are from the data. The regions A , B are labelled on the respective panels. We have presented extensions of the Heﬀernan and Tawn (2004) and Wadsworth and Tawn (2019)models for modelling the extremal dependence for precipitation data. As illustrated in Section 2.2, thismodel provides ﬂexibility over existing models for extreme precipitation as it can capture asymptoticindependence. Simulating from this model is simple, and replications can be used to make reliableinference about the tail behaviour of spatial aggregates of the underlying process once issues linkedto edge eﬀects are addressed. This approach circumvents an issue that is common with independentinference on the tails of spatial aggregates over diﬀerent regions, namely that they run the risk ofmaking inference that is inconsistent with the physical properties of the process.A particular drawback of our our approach is that inference using the full likelihood is compu-tationally infeasible. To overcome this issue, we proposed methods for model ﬁtting and assessingparameter uncertainty that are based on a pseudo-likelihood approach which requires speciﬁcation of ahyper-parameter h max and a novel scaling approach respectively. We found that these methods workedwell in our application, as we were able to choose a suitable h max quite low for which the model ﬁtswell in a reasonable time-frame, see Figure 2. This is because our data exhibits fairly localised extremeevents; in applications where this is not the case, a larger h max will be required which could potentiallylead to more samples being required for ﬁtting.Data used in Section 4 are from a climate model, which means that sampling locations arecomprised of non-overlapping grid-boxes, rather than point locations. In our application, we take R A to be the corresponding summations, rather than the integrals deﬁned in (1); however, this is not o say that our approach cannot be used if we require inference on the tail of an integral. We havedetailed a fully spatial model for both the dependence and marginal behaviour of { Y ( s ) } , and so it ispossible to create a sample of { Y ( s ) : s ∈ S} where S is not necessarily the sampling locations. Wecan then approximate R A by specifying S as a ﬁne-grid and taking the sum of { Y ( s ) : s ∈ A} .When considering spatial aggregates over the largest regions A , we ﬁnd that our approach slightlyunderestimates the largest events, see Figure 4. Whilst this may be caused by boundary eﬀects, itcould also be caused by a complexity of the data generating process that is not captured by the model.As the size of A increases, it becomes less likely that the tail behaviour of R A is driven by a singletype of extreme event. There are two possible areas of complexity that are missed for regions thatare suﬃciently large: (i) multiple occurrences of localised high-intensity convective events (Schroeeret al., 2018), whereas we modelled single occurrences in Section 4; (ii) events consisting of a mixtureof localised high-intensity convective and widespread low-intensity non-convective, events. Our dataappears to exhibit these; recall in Section 4.3 we remarked that we considered a lower threshold u in(4) for modelling, but we found that this was not feasible as the data exhibits mixtures of dependence.A higher threshold had to be speciﬁed to remove observed ﬁelds that exhibited long-range spatialdependence to improve model ﬁtting. Further improvements can be made to inference on the tails of R A by modelling frontal events. To illustrate this, consider that we model R A | (max s ∈S X ( s ) > v ), i.e., R A given an extreme event somewhere in S , and undo said conditioning using the data. We do notmodel R A given that there is no extreme event anywhere in S , i.e., such caused by a frontal event. Asthe size of A grows, we will increasingly ﬁnd that these events will drive the extremal behaviour of R A ;this could be further explanation behind the underestimation in Figure 4, and so should be incorporatedinto the model. We also considered another measure of the extremal dependence in Appendix B, whichsuggests improvement may be possible using mixture modelling. A possible approach to this problemis to incorporate covariates on precipitation ﬁeld type into the model. Acknowledgements

Jordan Richards and Jonathan Tawn gratefully acknowledge funding through the STOR-i Doc-toral Training Centre and Engineering and Physical Sciences Research Council (grant EP/L015692/1).Simon Brown was supported by the Met Oﬃce Hadley Centre Climate Programme funded by BEISand Defra. The authors are grateful to Robert Shooter of the Met Oﬃce Hadley Centre, UK, and o Jennifer Wadsworth and Emma Simpson of STOR-i, Lancaster University, UK, for helpful discus-sions. References

Bopp, G. P., Shaby, B. A., and Huser, R. (2020). A hierarchical max-inﬁnitely divisible spatial modelfor extreme precipitation.

Journal of the American Statistical Association , to appear.Buishand, T. A., de Haan, L., and Zhou, C. (2008). On spatial extremes: With application to a rainfallproblem.

Ann. Appl. Stat. , 2(2):624–642.Coles, S. G. (1993). Regional modelling of extreme storms via max-stable processes.

Journal of theRoyal Statistical Society. Series B (Methodological) , 55(4):797–816.Coles, S. G. (2001).

An Introduction to Statistical Modeling of Extreme Values . Springer series inStatistics. Springer-Verlag, London.Coles, S. G., Heﬀernan, J. E., and Tawn, J. A. (1999). Dependence measures for extreme value analyses.

Extremes , 2(4):339–365.Coles, S. G. and Tawn, J. A. (1996). Modelling extremes of the areal rainfall process.

Journal of theRoyal Statistical Society. Series B (Methodological) , 58(2):329–347.Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds (with discussion).

Journal of the Royal Statistical Society: Series B (Methodological) , 52(3):393–425.de Fondeville, R. and Davison, A. C. (2020). Functional peaks-over-threshold analysis. arXiv e-prints ,arXiv:2002.02711.Engelke, S., De Fondeville, R., and Oesting, M. (2018). Extremal behaviour of aggregated data withan application to downscaling.

Biometrika , 106(1):127–144.Fasiolo, M., Wood, S. N., Zaﬀran, M., Nedellec, R., and Goude, Y. (2020). Fast calibrated additivequantile regression.

Journal of the American Statistical Association , to appear.Ferreira, A., de Haan, L., and Zhou, C. (2012). Exceedance probability of the integral of a stochasticprocess.

Journal of Multivariate Analysis , 105(1):241–257. eﬀernan, J. E. and Tawn, J. A. (2004). A conditional approach for multivariate extreme values (withdiscussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 66(3):497–546.Huser, R. and Davison, A. C. (2013). Composite likelihood estimation for the Brown–Resnick process.

Biometrika , 100(2):511–518.Huser, R. and Genton, M. G. (2016). Non-Stationary dependence structures for spatial extremes.

Journal of Agricultural, Biological, and Environmental Statistics , 21(3):470–491.Huser, R., Opitz, T., and Thibaud, E. (2020). Max-inﬁnitely divisible models and inference for spatialextremes.

Scandinavian Journal of Statistics .Huser, R. and Wadsworth, J. L. (2020). Advances in statistical modeling of spatial extremes. arXive-prints , arXiv:2007.00774.Joe, H. (1997).

Multivariate Models and Multivariate Dependence Concepts . CRC Press.Lowe, J., Bernie, D., Bett, P., Bricheno, L., Brown, S., Calvert, D., Clark, R., Karen, Eagle, Edwards,T., Fosser, G., Maisey, P., McInnes, R., Mcsweeney, C., Yamazaki, K., and Belcher, S. (2018).UKCP18 science overview report.

Met Oﬃce Hadley Centre: Exeter, UK .Nadarajah, S., Anderson, C. W., and Tawn, J. A. (1998). Ordered multivariate extremes.

Journal ofthe Royal Statistical Society: Series B (Statistical Methodology) , 60(2):473–496.Northrop, P. (1998). A clustered spatial-temporal model of rainfall.

Proceedings of the Royal Societyof London. Series A: Mathematical, Physical and Engineering Sciences , 454(1975):1875–1888.Padoan, S. A., Ribatet, M., and Sisson, S. A. (2010). Likelihood-based inference for max-stableprocesses.

Journal of the American Statistical Association , 105(489):263–277.Politis, D. N. and Romano, J. P. (1994). The stationary bootstrap.

Journal of the American StatisticalAssociation , 89(428):1303–1313.Reich, B. J. and Shaby, B. A. (2012). A hierarchical max-stable spatial model for extreme precipitation.

The Annals of Applied Statistics , 6(4):1430.Richards, J. and Tawn, J. A. (2021). On the tail behaviour of aggregated random variables.

In draft . ichards, J. and Wadsworth, J. L. (2021). Spatial deformation for non-stationary extremal dependence. Environmetrics , to appear.Rodriguez-Iturbe, I., Cox, D. R., and Isham, V. (1987). Some models for rainfall based on stochasticpoint processes.

Proceedings of the Royal Society of London Series A , 410(1839):269–288.Schroeer, K., Kirchengast, G., and O, S. (2018). Strong dependence of extreme convective precipitationintensities on gauge network density.

Geophysical Research Letters , 45(16):8253–8263.Sharkey, P. and Winter, H. C. (2019). A Bayesian spatial hierarchical model for extreme precipitationin Great Britain.

Environmetrics , 30(1):e2529.Shooter, R., Tawn, J. A., Ross, E., and Jonathan, P. (2020). Basin-wide spatial conditional extremesfor severe ocean storms.

Extremes , pages 1–25.Simpson, E. S., Opitz, T., and Wadsworth, J. L. (2020). High-dimensional modeling of spatialand spatio-temporal conditional extremes using INLA and the SPDE approach. arXiv e-prints ,arXiv:2011.04486.Simpson, E. S. and Wadsworth, J. L. (2021). Conditional modelling of spatio-temporal extremes forred sea surface temperatures.

Spatial Statistics , 41:100482.Tawn, J. A., Shooter, R., Towe, R., and Lamb, R. (2018). Modelling spatial extreme events withenvironmental applications.

Spatial Statistics , 28:39 – 58.Thomassen, E. D., Sørup, H. J. D., Scheibel, M., Einfalt, T., and Arnbjerg-Nielsen, K. (2020). Data-driven distinction between convective, frontal and mixed extreme rainfall events in radar data.

Hydrology and Earth System Sciences Discussions , 2020:1–26.Wadsworth, J. L. and Tawn, J. A. (2019). Higher-dimensional spatial extremes via single-site condi-tioning. arXiv e-prints , arXiv:1912.06560.Westra, S. and Sisson, S. A. (2011). Detection of non-stationarity in precipitation extremes using amax-stable process model.

Journal of Hydrology , 406(1-2):119–128.Wheater, H., Chandler, R., Onof, C., Isham, V., Bellone, E., Yang, C., Lekkas, D., Lourmas, G., nd Segond, M.-L. (2005). Spatial-temporal rainfall modelling for ﬂood risk estimation. StochasticEnvironmental Research and Risk Assessment , 19(6):403–416.Wood, S. (2006).

Generalized Additive Models: An Introduction with R . Chapman & Hall/CRC Textsin Statistical Science. Taylor & Francis.Youngman, B. D. (2019). Generalized additive models for exceedances of high thresholds with anapplication to return level estimation for U.S. wind gusts.

Journal of the American StatisticalAssociation , 114(528):1865–1879.

AppendixA Delta-Laplace conditional distribution functions

A.1 Connection to main text

This appendix supports the material presented in Sections 2.2.3 and 3.1 of the main paper. Thisappendix describes contributions to the pairwise likelihood function in (14) given by the conditionaldistribution associated with the residual distribution described in Section 2.2.2 of the main paper.

A.2 Bivariate conditional distribution

We note that if we had Y ( s i ) > i = 1 , . . . , d , then the corresponding joint density of (10)for observed residuals z = ( z , . . . , z d − ) ∈ (( c , ∞ ) × · · · × ( c d − , ∞ )) is f s O ( z ) = φ d − (cid:8) Φ − ( F Z ( s | s O ) ( z )) , . . . , Φ − ( F Z ( s d − | s O ) ( z d − )); , Σ (cid:9) d − (cid:89) i =1 f Z ( s i | s O ) ( z i ) φ (Φ − ( F Z ( s i | s O ) )) , (21)where c , . . . , c d − are the censoring thresholds deﬁned in Section 3.1 of the main text, φ ( · ) is the PDFof a standard Gaussian distribution and φ d − ( · ; , Σ) is the CDF of a ( d − − dimensional Gaussiandistribution with mean and correlation matrix Σ deﬁned in (11); components of z are not censoredwhen in this domain. or censoring thresholds c , c , the bivariate density with the conditioning constraints is g s O ( z , z ; c , c ) =  f s O ( z , z ) if z > c , z > c ,f s O ( z ) F s O , | { c , z } if z > c , z ≤ c ,f s O ( z ) F s O , | { c , z } if z ≤ c , z > c ,F s O ( c , c ) if z ≤ c , z ≤ c , (22)where the conditional distribution F s O , i | j ( c, z ) is given by F s O , i | j ( c i , z j ) = Φ (cid:8) Φ − ( F Z ( s i | s O ) ( c i )); µ i | j , Σ i | j (cid:9) , (23)where µ i | j = Σ ij Σ − jj Φ − { F Z ( s j | s O ) ( z j ) } and Σ i | j = Σ ij − Σ ij Σ − jj with Σ deﬁned in (11). Both f s O ( · , · )and F s O ( · , · ) are the bivariate analogues of (10) and (21) and F Z ( s i | s O ) is the CDF of a delta-Laplacedistribution with parameters given in (9); these and Σ are all identiﬁable as they are all fully determinedby pairwise distances. A.3 Multivariate extension of (23)

Suppose we have some s O ∈ S and s A ∈ S \ s O , which need not be any of the original samplinglocations, and is indexed such that s A = ( s , . . . , s n A ). We partition the indices of set s A into twosets; indices for censored sites and non-censored sites, A c and A nc , respectively and without loss ofgenerality we write A c = ( s , . . . , s n c ) and A nc = ( s n c +1 , . . . , s n A ). If sites in A c are censored withthresholds c , . . . , c n c , then the conditional distribution function of (cid:18) Z ( s ) | s O ) , . . . , Z ( s n c ) | s O ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) Z ( s n c +1 | s O ) = z n c +1 , . . . , Z ( s n A | s O ) = z n A (cid:19) is F s O , A c | A nc ( c , . . . , c n c , z n c +1 , . . . , z n A ) equals toΦ n c (cid:8) Φ − ( F Z ( s | s O ) ( c )) , . . . , Φ − ( F Z ( s nc | s O ) ( c n c )); µ A c | A nc , Σ A c | A nc (cid:9) , (24) here µ A c | A nc = Σ A c A nc Σ − A nc A nc (Φ − { F Z ( s nc +1 | s O ) ( z n c +1 ) } , . . . , Φ − { F Z ( s nA | s O ) ( z n A ) } ) T Σ A c | A nc = Σ A c A c − Σ A c A nc Σ − A nc A nc Σ A nc A c . The notation Σ AB denotes the partition of Σ that takes rows indexed by the set A and columns indexedby the set B . B Application dependence model evaluation

B.1 Connection to main text

This appendix supports the material in Section 4.3 of the main paper. Appendix B.2 details theparameter estimates for the spatial AI model and Appendix B.3 details a diagnostic, based on χ , forevaluating the ﬁtted dependence model. B.2 Table of AI model parameter estimates

Table S1 gives the estimates and standard errors for the model parameters discussed in Section4.3 of the main paper. Note that those parameters without standard errors were treated as ﬁxed inthe model ﬁt.