[PDF] Wave Extremes in the North East Atlantic from Ensemble Forecasts

Abstract

A method for estimating return values from ensembles of forecasts at advanced lead times is presented. Return values of significant wave height in the North-East Atlantic, the Norwegian Sea and the North Sea are computed from archived +240-h forecasts of the ECMWF ensemble prediction system (EPS) from 1999 to 2009. We make three assumptions: First, each forecast is representative of a six-hour interval and collectively the data set is then comparable to a time period of 226 years. Second, the model climate matches the observed distribution, which we confirm by comparing with buoy data. Third, the ensemble members are sufficiently uncorrelated to be considered independent realizations of the model climate. We find anomaly correlations of 0.20, but peak events (>P97) are entirely uncorrelated. By comparing return values from individual members with return values of subsamples of the data set we also find that the estimates follow the same distribution and appear unaffected by correlations in the ensemble. The annual mean and variance over the 11-year archived period exhibit no significant departures from stationarity compared with a recent reforecast, i.e., there is no spurious trend due to model upgrades. EPS yields significantly higher return values than ERA-40 and ERA-Interim and is in good agreement with the high-resolution hindcast NORA10, except in the lee of unresolved islands where EPS overestimates and in enclosed seas where it is biased low. Confidence intervals are half the width of those found for ERA-Interim due to the magnitude of the data set.

Full PDF

aa r X i v : . [ phy s i c s . a o - ph ] A p r Wave Extremes in the North East Atlantic fromEnsemble Forecasts

Øyvind Breivik ∗†‡ , Ole Johan Aarnes § Jean-Raymond Bidlot ‡ Ana Carrasco § Øyvind Saetra § July 30, 2018

Abstract

A method for estimating return values from ensembles of forecasts at advanced leadtimes is presented. Return values of signiﬁcant wave height in the North-East Atlantic,the Norwegian Sea and the North Sea are computed from archived +240-h forecasts ofthe ECMWF ensemble prediction system (EPS) from 1999 to 2009.We make three assumptions: First, each forecast is representative of a six-hour inter-val and collectively the data set is then comparable to a time period of 226 years. Second,the model climate matches the observed distribution, which we conﬁrm by comparingwith buoy data. Third, the ensemble members are sufﬁciently uncorrelated to be con-sidered independent realizations of the model climate. We ﬁnd anomaly correlations of0.20, but peak events ( > P ) are entirely uncorrelated. By comparing return values fromindividual members with return values of subsamples of the data set we also ﬁnd thatthe estimates follow the same distribution and appear unaffected by correlations in theensemble. The annual mean and variance over the 11-year archived period exhibit nosigniﬁcant departures from stationarity compared with a recent reforecast, i.e., there isno spurious trend due to model upgrades.EPS yields signiﬁcantly higher return values than ERA-40 and ERA-Interim and isin good agreement with the high-resolution hindcast NORA10, except in the lee of un-resolved islands where EPS overestimates and in enclosed seas where it is biased low.Conﬁdence intervals are half the width of those found for ERA-Interim due to the mag-nitude of the data set. ∗ Final version published as Breivik, Ø, O J Aarnes, J-R Bidlot, A Carrasco and Ø Saetra (2013). WaveExtremes in the North East Atlantic from Ensemble Forecasts,

J Climate , doi:10.1175/JCLI-D-12-00738.1 (inpress) † Corresponding author. E-mail: [email protected] ‡ European Centre for Medium-Range Weather Forecasts (ECMWF), Shinﬁeld Park, Reading, RG2 9AX,United Kingdom § Norwegian Meteorological Institute Introduction

Extreme value estimates of atmospheric and oceanographic variables are usually derivedfrom observational records or from model reconstructions of the past (reanalyses andhindcasts).A given probability of exceedance or equivalently return period corresponds to a re-turn value of the geophysical variable in question. This return value is normally approx-imated by ﬁtting the Generalized Extreme Value (GEV) distribution to blocked maxima(such as annual maxima) or by ﬁtting the Generalized Pareto (GP) distribution to ex-ceedances above a threshold, see Smith (1990) and Coles (2001). In the atmospheric andoceanographic sciences 100-year return values are usually sought (Lopatoukhin et al.,2000), which means that observational or modeled time series are rarely long enough tocover the return period, even for the longest reanalyses and hindcasts (see Uppala et al.2005; Kalnay et al. 1996; Wang et al. 2012 for descriptions of some recent reanalyses).Extrapolation of the parametric ﬁt to lower probabilities of exceedance (return periodslonger than the observational record or modeled time series) is then required. This af-fects the conﬁdence intervals of the return value estimates and is a concern when usingshorter records like altimeter measurements (Alves and Young, 2003; Young et al., 2011).Model or observational bias will further increase the conﬁdence intervals, but is muchharder to identify than the unsystematic error stemming from insufﬁcient length of thetime series.Trends and low-frequency oscillations can seriously inﬂuence return value estimatesfrom time series. This can be handled using non-stationary techniques (see Coles 2001,Chapter 6 for an introduction). Due to imminent climate change (IPCC, 2007), estimat-ing return values from time series with trends has recently received some attention in theearth sciences. Kharin and Zwiers (2000) and Kharin and Zwiers (2005) investigated theimpact of a linear trend on the GEV distribution of the annual extremes while Parey et al.(2007) looked at extending the extreme value theory to assess the return values of tem-perature extremes in the presence of a linear trend over a 54-year period for Frenchobserving stations. de Winter et al. (2012) investigated the changing wave extremes ina regional climate projection of the North Sea for the time-slice 2071-2100. Similarly,Wang et al. (2004) and Wang and Swail (2006) investigated the impact of changing waveclimate on wave extremes in the span of the 21st century using statistical projections andcoupled climate models.Even if non-stationarity can be handled it raises the question of what exactly the re-turn value estimates are to be used for. If a probability of exceedance valid for a certaintime period is required, similar to what Kharin and Zwiers 2000 did for 21-year timeslices from climate projections considered sufﬁciently stationary, then a long time seriesis not necessarily of much interest. What is then needed is an estimate of exceedancelevels for that given time slice. Such a repository of possible weather realizations doesin fact exist. The ensemble prediction system (EPS) operated by the European Centrefor Medium-Range Weather Forecasts (ECMWF) has now been in operation for 20 years(Molteni et al., 1996; Buizza et al., 2007; Hagedorn et al., 2008). The individual ensem-ble members start from almost identical initial conditions with only small perturbationsadded to the best guess analysis (Buizza et al., 1999; Leutbecher and Palmer, 2008) tospread the ensemble in a way representative of the uncertainty of the forecast system. lthough there is considerable forecast skill after a lead time of ﬁve days (Richardson,2010), the skill drops rapidly after day six, and on day 10 the individual members areonly weakly correlated with each other and with observations, as we will show in Sec 2.If the quantiles of the entire cumulative distribution of the ensemble compare well withobservations then the forecasts can be considered random realizations of a realistic modelclimate.Return values for signiﬁcant wave height have been estimated from a wide varietyof data sources in the past, ranging from relatively short observational records (Battjes1972 and Muir and El-Shaarawi 1986), satellite altimeters (Cooper and Forristall, 1997;Alves and Young, 2003; Vinoth and Young, 2011), long-term global reanalyses (Caires and Sterl,2005a; Sterl and Caires, 2005), regional model hindcasts (Wang and Swail, 2001, 2002;Weisse and Günther, 2007; Aarnes et al., 2012), and statistical downscaling (Breivik et al.,2009). Here we explore a new approach to estimating return values of signiﬁcant waveheight using ensemble forecasts at advanced lead times instead of a time series. A simi-lar approach has been explored by Van den Brink et al. (2005) for the special case of riverﬂooding protection using seasonal forecast ensembles from ECMWF’s earlier System 2Seasonal Forecast System (Anderson et al., 2003). Van den Brink et al. (2005) employedthe entire seasonal forecasts from a lead time of one month up to six months, arguingthat the modelled North Atlantic Oscillation (NAO) was only weakly correlated withobserved NAO after one month, dropping further to essentially zero for the subsequentmonths. We employ a different approach where we instead extract the signiﬁcant waveheight for a ﬁxed forecast time (+240 h) from the EPS version of the Integrated ForecastSystem (IFS) of ECMWF. We have gathered all forecasts at +240 h generated during theperiod 1999-2009, equivalent (as will be explained in Sec 2) to ∼

226 years if the data hadformed a continuous time series. As will be explained in Sec 3 we assume that each fore-cast represents a six-hour interval, which is a reasonable assumption for a coarse modeland analogous to the temporal resolution of traditional reanalyses. However, this alsomeans that we estimate return values of the six-hourly average sea state. We address thisin Sec 3 and discuss the implications further in Sec 5.The method to be explored allows us to utilize a vast unused resource of climaterealizations and their lack of skill is actually a prerequisite since extreme value theorydemands that events be uncorrelated. However, there are important caveats to the inter-pretation and use of the method. First, climate trends are by construct not captured bythe method since we base our estimates on a time-slice of ∼

10 years. Likewise, quasi-cyclical phenomena like El Niño with a period longer than what is covered by the archivemay inﬂuence the results. This suggests the following use and interpretation of EPS re-turn values: If probabilities of exceedance for the present time period are sought, thenthe ensemble data set is superior since it is not affected by long-term trends and low-frequency cycles. If on the other hand long-term return values are required then tech-niques for estimating extremes from time series with trends must be considered (seeKharin and Zwiers 2000, 2005; Parey et al. 2007), or at least comparison with traditionaltime series covering a sufﬁciently long period.The paper is organized as follows. Sec 2 presents the observational records and thereanalysis and hindcast data sets used to test the method. Sec 3 presents the methodused to compute return values from forecast ensembles at long lead times and how itdiffers from traditional return value estimates from observational records and modeled ime series. We then investigate the independence of ensemble members and the clima-tology of the archived forecasts by comparing against a model climatology from a recentreforecast (see Hagedorn et al. 2012) and observations. Sec 4 compares the return valuesfound from the EPS with three reference model data sets, namely the reanalyses ERA-40and ERA-Interim (ERA-I hereafter) and a high-resolution regional hindcast for the Nor-wegian Sea and adjacent seas, NORA10 (see Reistad et al. 2011; Aarnes et al. 2012). Sec 5discusses the differences in method and results, and points at possible weaknesses of themethod. Finally, Sec 6 presents our conclusions on the general usefulness of the methodand its application to signiﬁcant wave height and the ECMWF EPS system. To assess the validity of our return value estimates from EPS forecasts, we will make anumber of comparisons with observational records, reanalyses and hindcasts of signif-icant wave height. This section presents the observations used and the ﬁve model datasets (ERA-40, ERA-I, NORA10, EPS and EPS reforecasts). We investigate the EPS cli-matology at analysis time (labeled EPS0) and at +240-h lead time (labeled EPS240) andassess the stationarity and independence of EPS240. Time series of all model data havebeen interpolated to buoy locations.Time series have been extracted from ERA-40, ERA-I, NORA10 and EPS and interpo-lated to the same 1 ◦ × ◦ grid of the northeastern part of the Atlantic Ocean, the NorthSea and the Norwegian Sea in order to make geographical comparisons of extreme valueestimates. The regridding and interpolation will inevitably smooth the ﬁeld slightly.This will inﬂuence the return values somewhat. It is of interest to compare our EPS re-turn estimates with these reference data sets because all three archives (ERA-40, ERA-Iand NORA10) are frequently used for return value estimation (see e.g. Caires and Sterl2005a; Aarnes et al. 2012). Signiﬁcant wave height from the ERA-40 reanalysis (Uppala et al., 2005) is available forthe period September 1957 to August 2002 on six-hourly temporal resolution. The atmo-spheric model was coupled to a deep-water version of the wave model (WAM) throughexchange of a wave-modiﬁed Charnock parameter (Janssen et al. 2002 and Janssen 2004pp 232–234). WAM was run on a regular 1.5 ◦ -grid. At this resolution the Shetlandand Faroe archipelagoes are not resolved and the modeled wave ﬁeld on the lee sideof these islands is consequently biased high. It is also well known that ERA-40 is bi-ased low in general (Caires and Sterl, 2005b; Reistad et al., 2011). For this study wehave not attempted any correction either to the time series themselves, which is howCaires and Sterl (2005b) came up with the corrected semi-global ﬁelds referred to asthe Corrected ERA-40, or by correction of the 100-year return values, which is howCaires and Sterl (2005a) and Sterl and Caires (2005) constructed the global maps of re-turn values from ERA-40. We argue that for this study it is better to compare the originaldata sets to avoid confounding artefacts of the new approach with artefacts of the sta-tistical correction algorithms employed by Caires and Sterl (2005a) and Caires and Sterl ERA-I is a continually updated coupled atmosphere-wave reanalysis which originallycovered the period from 1989 (roughly co-incident with the satellite era), but which hasrecently been extended back to 1979 (Simmons et al., 2007; Uppala et al., 2008; Dee et al.,2011). The resolution is 1.0 ◦ for the wave model at the equator, but the resolution iskept nearly constant towards the poles by the use of an irregular latitude-longitude grid.The wave model is coupled to the atmospheric model in the same fashion as outlinedabove for ERA-40, but the ERA-I wave model physics include shallow-water effects im-portant in areas like the southern North Sea. ERA-I also differs from ERA-40 in its use ofa four-dimensional variational assimilation scheme and a substantially larger amount ofobservations, especially after 1991. ERA-I uses a subgrid scheme to represent the down-stream impact of unresolved islands (Bidlot, 2012). Though a clear improvement overERA-40, the wave ﬁeld in the lee of the Faroes and the Shetland Isles is still biased a littlehigh. NORA10, a recently completed atmospheric downscaling of ERA-40 and wave modelhindcast on 10-11 km resolution, is described by Reistad et al. (2011). The model domaincovers the North Sea, the Norwegian Sea and the Barent Sea. The temporal resolutionof the archived ﬁelds is three hours. The hindcast initially covered the ERA-40 period(September 1957 to August 2002), but an extension with boundary and initial ﬁelds fromthe ECMWF IFS has since been added. The hindcast archive is continually updated.The breach of stationarity due to the change in boundary and initial values after August2002 was investigated by Aarnes et al. (2012) and no statistically signiﬁcant changes werefound. The median and upper percentiles of NORA10 H s show little bias and generallyclose correspondence with the wave observations used in this study. The model resolvesthe main coastal features and the archipelagoes in the Norwegian Sea. Like ERA-I thewave model is run in shallow water mode. We have extracted the signiﬁcant wave height from archived operational ECMWF EPSwave forecasts for the period 1999-2009, a total of 11 years. The data set is not homoge-neous, i.e., the resolution and the model physics of the operational EPS forecast systemhave been continually upgraded (see Fig 1 for the most important changes affecting thewave ﬁeld). The wave model has been coupled to the atmospheric model in the samefashion as for ERA-I. The data assimilation scheme has been upgraded several timesduring the archived period, and the amount of data entering the assimilation cycle hassteadily increased. It is also important to note that the forecast systems started issuingtwo forecasts per day on 2003-03-25 (00 and 12 UTC analysis time). This means thatthe amount of data is not uniform over the period. We have extracted the analysis and he +240-h forecasts from the 50 perturbed ensemble members plus the control mem-ber (forced by unperturbed wind ﬁelds). We have also extracted the forecasts at +228 h(EPS228 hereafter). This data set is naturally slightly more correlated than EPS240 and isused here primarily to assess the validity of the method. Since 2008 every new model cycle of the EPS has been accompanied by a model clima-tology based on reforecasting a ﬁve-member ensemble of the current model cycle (fourperturbed and one unperturbed member) from ERA-I initial conditions every Thursdayfrom 18 years back and up until the present day (Hagedorn et al., 2008; Hagedorn, 2008;Hagedorn et al., 2012; Prates and Buizza, 2011). These reforecasts are run to 32 days, sim-ilar to the operational forecasts. We have extracted the reforecasts valid at +240 h frommodel cycle Cy36r4, in operation after the end of the archived period, from 60 locationsin the north-east Atlantic, the Norwegian Sea and the North Sea. The reforecasts arein principle also useful for extreme value analysis, but unfortunately the data set is toosmall by two orders of magnitude to allow the kind of analysis attempted with the EPSforecasts: 5 weekly members ×

18 yr51 members × × ×

11 yr ≈ Wave observations are routinely archived and quality-controlled by ECMWF as part ofthe wave model intercomparison effort (Bidlot et al., 2002). To make the observationscomparable with model output Bidlot et al. (2002) averaged observations over four hourscentered on the synoptic times. The rationale behind this averaging procedure is as fol-lows. Typical wind conditions in the open ocean are on the order of 10 m s − . For fullydeveloped wind sea the group speed, which dictates the propagation speed across themodel grid, will be comparable to the wind speed (Holthuijsen, 2007; World Meteorological Organization,1998). If the resolution is ∼ ◦ then the time interval that is represented by the modeloutput is 4-6 h. Archived model values, although “instantaneous” in the sense that theyare model output, are thus slowly changing and should be considered averages repre-sentative of intervals of 4-6 h in the case of the coarser archives discussed below (ERA-40,ERA-I and EPS). The NORA10 archive has much higher resolution (10-11 km) and is con-sequently also archived at three-hourly resolution. Both ERA-40 and ERA-I are archivedon six-hourly resolution and the return values derived from these reanalyses should beinterpreted as six-hourly averages of the signiﬁcant wave height. EPS is of compara-ble resolution and we assign the same interval to the EPS240 forecasts. We discuss thisfurther in Sec 5.Three locations with quite different wave climate were selected from a total of 60observation stations in the North East Atlantic, The Norwegian Sea and the North Seafor inspection of the relative performance of the reanalyses and the EPS:• P40 (Ekoﬁsk oil ﬁeld, WMO code LF5U, 56.50 ◦ N, 003.20 ◦ E) in the central NorthSea P35 (Heidrun oil ﬁeld, WMO LF3N, 65.30 ◦ N, 007.30 ◦ E) in the eastern NorwegianSea• B16 (K5 buoy, WMO 64045, 59.10 ◦ N, 011.40 ◦ W) in the north-east Atlantic

The two most commonly used statistical methods for estimating return values are basedon the GEV distribution for blocked maxima and the GP distribution for values ex-ceeding a set threshold. For completeness we repeat here the general form of the GEVand the GP distributions but refer to Coles (2001) for a more in-depth discussion. TheGEV distribution (GEVD) is an asymptotic limit for a distribution of blocked maxima M n = max { X , X , . . . , X n } . The method is routinely used to approximate the proba-bility distribution of blocked maxima such as annual maxima (Coles 2001, pp 45–51).However, the method can also be used on ensembles of independent and identically dis-tributed (iid) forecasts since the blocking procedure itself makes no assumption of thegrouping other than to ensure that all blocks have the same statistical properties. Theblocking can thus be performed in many ways, but it is natural to block by ensemblemember or some subset of time and member which is sufﬁciently large to ensure that theGEVD is a reasonably good approximation to the parent distribution. Following Coles(2001) the cumulative distribution function (CDF) of the block maxima formed from arandom sequence of independent variables can be written G ( z ) = exp ( − (cid:20) + ξ (cid:18) z − µ n σ n (cid:19)(cid:21) − / ξ ) , (2)where σ n is the scale parameter, µ n is known as the location parameter, and ξ is theso-called shape parameter. The GEV distribution contains as special cases the Fréchet( ξ > ξ =

0) and reversed Weibull ( ξ <

0) distributions. The width ofconﬁdence intervals will depend strongly on the sign of the shape parameter (Hosking,1984; Coles, 2001).The GP method retains only values exceeding a threshold u . The transformed vari-able is written y = X i − u , y >

0. It can be shown (Coles 2001, pp 75–77) that theGP distribution (GPD) is applicable if the data y are independent and the maxima M n formed from the original variable belong to the GEV distribution, Eq (2). The GDP iswritten H ( y ) = − + ξ y ˜ σ − / ξ ! . (3)Here ˜ σ = σ + ξ ( u − µ ) and ξ is the shape parameter found in Eq (2). For ξ = ers represent independent events and we can retain all values exceeding the chosenthreshold. In this case the return values are more properly referred to as GP thresholdestimates rather than GP/POT estimates. The following assumptions must be shown to hold in order to estimate return valuesfrom ensemble forecasts:1. Each forecast is representative of a time interval (e.g. six hours)2. The model climatology distribution is comparable to the observed climatology dis-tribution3. No spurious trend in the mean and the variance due to model updates4. No signiﬁcant correlation between ensemble members at advanced lead timesIf these conditions are met the ensemble can be assumed to be independent and identi-cally distributed and representative of the observed wave climate. We will address eachof these conditions below.

Turning M ensemble forecasts with N ensemble members each into the equivalent of atime period is necessary in order to convert from probability of exceedance to a returnperiod. We assume each forecast to represent a six-hour interval based on the followingreasoning. First, ∆ t = equivalent time period T eq = MN ∆ t . We discuss the validity ofthis assumption in Sec 5. To convince ourselves that the EPS240 dataset is identically distributed we need onlylook at the quantile-quantile (QQ) distribution of two members. Panel (c) of Fig 2 clearlydemonstrates the similarity of the distributions of two randomly selected ensemble mem-bers (the statistics look essentially the same for all combinations of members, not shown).The QQ-plot against observations in panel (a) of Fig 2 makes it clear that the ensemblemembers at +240 h represent the observed climate well. In fact, the cumulative distri-bution of the ensemble at +240 h is somewhat better aligned with observations than atanalysis time as can be seen in panel (b) of Fig 2 for the Ekoﬁsk location in the NorthSea, but the differences in distribution are small from analysis time to +240 h for all theobservation locations and we conclude that we can assume that the ensemble membersare identically distributed and that they represent the observed wave climate well.

To address the question of whether there is a spurious trend in the model climate over thearchived period we compare against the reforecasts for the same period from Cy36r4. We re only interested in the behavior of the +240 h forecasts as our objective is to investigatewhether the changes in model physics and resolution, especially in the early days of theEPS forecast system, will have signiﬁcant impact on our return value estimates.The EPS forecasts at lead time +240 h are compared with the reforecasts at the samelead time and at the same analysis dates over the period 1999-2009 in Fig 3. Panel (a)presents the time series of the annual mean deviations of the signiﬁcant wave heightat 60 locations in the north-east Atlantic, the North Sea and the Norwegian Sea. Theannual mean difference in their standard deviation is shown in panel (b). It is clear fromthe boxplots that there is considerable deviation in both mean and standard deviationbetween the EPS forecasts and the EPS reforecasts throughout the model domain fromone year to another, but there is no consistent drift in either statistic. However, it isclear that the reforecasts have slightly higher mean ( ∼ ∼ The motivation for using ensemble forecasts at long lead times is the anticipation thatthe upper percentile of the data will be found to be independent. In other words, we arelooking for forecasts with as little skill as possible (Wilks, 2006). With a set of ensembleforecasts at advanced lead time (e.g. 240 hours), the question is then whether the correla-tion between two arbitrary ensemble members is sufﬁciently low for the members to beassumed independent. The residual correlation (after subtraction of the seasonal mean)between two ﬁelds can be investigated through the centered anomaly correlation (Wilks2006, pp 311–312) which for two ensemble members i and j can be deﬁned as r ACC = Cov ( x ′ i , x ′ j ) s ′ , (4)where primes indicate centered anomalies. The anomaly correlation at all three locationsis approximately 0.20, while the actual correlation varies from 0.46 in the open ocean(B16 and P35) to 0.35 at Ekoﬁsk (P40) in the central North Sea. With small variations thisis the level of anomaly correlations found throughout the domain studied here.Even weak correlations between ensemble members (if signiﬁcant) may have a dele-terious effect on the equivalent or effective ensemble size. This is a well known problemoften discussed in the context of autocorrelated time series (von Storch and Zwiers 1999,pp 371–372). Denote the sample size (in our case the number of archived forecasts) M and the ensemble size N . The entire ensemble is X ∈ R M × N . The ensemble variance-covariance matrix is written h e i e j i ∈ R N × N , where e i represent departures from the en-semble mean. If we assume all members to have equal variance h e i e i i = s and commoncorrelation r (a reasonable assumption since there is nothing to distinguish one memberfrom another) such that h e i e j i = rs (where we note that r ≡ i = j ) we arrive atthe following relation for the variance of the ensemble mean, s x = s N ( + ( N − ) r ) . (5)The effective sample size is now found from s x = s / N ∗ , i.e., the variance of the mean of asmaller ensemble of uncorrelated members should equal that of our correlated ensemble. he effective ensemble size becomes N ∗ = N + ( N − ) r (6)and it is clear that even quite weak correlations can seriously reduce the effective ensem-ble size and have a detrimental impact on the mean properties of the ensemble. How-ever, assessing the impact of correlations on the ensemble mean alone is of limited valuesince only the upper percentiles of the data set are actually used for the return value esti-mates. To investigate the possible impact of correlations on the tail of the data set we ﬁrstcomputed the correlation and the Spearman rank correlation (see Press et al. 1992) for asubset of the forecasts where at least one ensemble member exceeded the 97 percentile( P ). Members not exceeding the threshold were set to zero. The average rank corre-lation and Pearson’s correlation coefﬁcients were 0.05 for this subset of forecasts. Thisshows that the higher percentiles of the ensemble tend to be uncorrelated even if theensemble itself exhibits weak correlation. This is not surprising given the nature of ouranalysis. We are selecting the upper percentiles from a large data set. This means that weare only selecting storm events, which are transient and fast-moving. It is unlikely thatstorm events exceeding P will occur simultaneously in many ensemble members aftera 10-day integration. Average sea state will on the other hand be more correlated at longlead times since such weather patterns are less transient (e.g., high pressure situations).To assess the impact of any residual correlation on the return values we followed aheuristic approach suggested by M. Leutbecher (pers comm) where return values from N individual ensemble members are compared with return values from decimated subsetsof similar sample size where all members are used, but only every N -th forecast. Wethus arrive at two distributions of return values drawn from samples of the same size M .Splitting the total data set in N parts obviously increases the uncertainty associated withthe return value estimates, but we are only interested in comparing the distributions ofthe two sets of return values. Fig 4 compares quantiles of the 51 member-wise returnvalues (ﬁrst axis) with the 51 return values from the decimated samples for location P40in the central North Sea. It is evident that the two distributions are very similar with onlyslightly higher standard deviation (1.83 m v 1.67 m) for the member-wise estimates. Theaverage is practically the same (11.30 m). We conclude that the weak correlations foundbetween ensemble members in the mean have no discernible impact on the expectedvalue or the spread of the return estimates. Gridded estimates of the 100-year return value H of the signiﬁcant wave height weremade from EPS240 interpolated to a 1.0 ◦ grid for the North Atlantic, the NorwegianSea and the North Sea using both blocked maxima (GEV) and threshold exceedances(GP). Note that this is an extraction procedure and does not reﬂect the underlying modelresolution, which as Fig 1 shows has increased over the archived period. Ice-infestedlocations, i.e., locations where the modeled ice concentration ever exceeds 30%, havebeen removed from the analysis. e start by looking at the differences we can expect from the existing reanalyses andhindcasts available to us. Fig 5 shows the difference between return values estimatedfrom ERA-40 and ERA-I (panel a) and similarly between ERA-40 and NORA10 (panelb) using GP (POT) with a threshold of 97%. The differences between ERA-40 and ERA-I(panel a) are moderate in the open ocean, ∼ P , butvarying this threshold did not affect the return values considerably (results not shown).Caires and Sterl (2005a) and Sterl and Caires (2005) ﬁnd that ERA-40 when calibratedagainst buoy data yields return values on the order of 24 m in the northern North At-lantic; higher but much closer to our ﬁndings than the uncorrected ERA-40 return values.Fig 7 presents the difference between the GP 100-year return values found for EPS240and those found for ERA-I (left panel) and NORA10 (right panel) using GP with a P threshold. It is clear that EPS240 predicts considerably higher return values than whatis found for ERA-I. The differences approach 5 m in the North Atlantic, and through-out the Norwegian Sea we see differences on the order of 2-3 m. The differences be-tween NORA10 and EPS240 are smaller (panel b), but here we see signiﬁcant differencesthroughout the domain that must be considered separately. First, the inﬂuence of ERA-40 on NORA10 is visible along the open boundary in the south-west, so the boundaryzone should not be taken into account in this analysis. Second, EPS240 does not properlyresolve the Faroes and the Shetland archipelagos. The middle North Sea is biased low,which is probably also a resolution issue. Away from these areas we see that the agree-ment is generally good, except in the northern part of the Norwegian Sea, where EPS240is up to 3 m lower than the NORA10 estimates. Aarnes et al. (2012) discusses the largeimpact of an individual storm event on the NORA10 estimates in this area and we notethat the conﬁdence intervals are exceptionally wide here (14-22 m, see Aarnes et al. 2012,Fig 3). .1 Bootstrapping conﬁdence intervals The GEV shape parameter ξ in Eq (2) and its counterpart for GPD in Eq (3) determine thewidth of conﬁdence intervals (Hosking, 1984; Coles, 2001). The signiﬁcant wave heightfrom the NORA10 hindcast has been shown by Aarnes et al. (2012) to exhibit a widerange of extreme value shape parameters within the Norwegian Sea and the adjacentseas with correspondingly varied conﬁdence intervals.We have estimated conﬁdence intervals for EPS240 and ERA-I using a bootstrappingtechnique similar to that employed by Aarnes et al. (2012). For ERA-I which representsa traditional time series we have made 100 random draws with replacement from thePOT data (see Fig 7). In the case of EPS240, we have similarly made random drawsfrom the tail of the dataset exceeding the 97 percentile (note that this is technically not peaks -over-threshold since the EPS240 data are considered independent.The upper limits of the conﬁdence intervals found for ERA-I and EPS240 are shownin Fig 8, panels (a) and (b). The differences are pronounced. First, the conﬁdence intervalis much tighter for EPS240 (panel d), ranging from less than 1 m in the sheltered partsof the North Sea to approximately 2 m in the open ocean (relative width 5-10% of thereturn values). ERA-I (panels a and c) has conﬁdence intervals up to 5 m (relative width30% of the return values) in the north-east Atlantic. Second, the spatial variability of theconﬁdence intervals is very low for EPS240, while the ERA-I intervals vary substantiallythroughout the domain due to sensitivity to individual storm events.It is important to stress that even though the conﬁdence intervals become muchtighter with a larger data set, the bootstrapping method does not account for modelbias. The bias must be assessed by comparing the observed and modeled wave heightdistributions, see Sec 3(3.2). We discuss the impact of model bias further in Sec 5. Estimating return values from ensembles at advanced lead times is a new technique,and the assumptions underlying the method have been outlined in Sec 3. Here we dis-cuss some of the perceived weaknesses of the method in general and how applicable themethod appears to be for signiﬁcant wave height. The main caveats to be aware of whenusing the technique on archived EPS forecasts in general are:1. Spurious trends caused by model upgrades2. Upper-percentile biases3. Conversion to an equivalent time period4. Correlations within the ensemble5. Return value estimates in a changing climateAlthough we do not ﬁnd evidence of any spurious trend in the mean and the variance ofthe signiﬁcant wave height at +240 h (see Fig 3), we are aware that the IFS model updatesover the past decade have led to an apparent increase in the 10-m wind speed at analysistime. S. Abdalla (pers comm) quantiﬁed this effect at analysis time to be about 29 cm s − ,i.e., the earlier analyses were biased low. The wave height will be somewhat affected bythis, but it is thought to have a small effect on the extremes of waves found at advanced ead times, especially since some of the removed bias stems from changes to the dataassimilation and will fade as the model integration becomes dynamically balanced atadvanced lead times. The effect is also evident from inspection of Fig 2 where the waveheight at analysis time (panel b) is seen to be biased low. Since this bias disappears forEPS240 (panel a), we believe that the model updates have had only a modest impact onthe wave climatology at advanced lead times.We have investigated the robustness of the return value estimates by also looking atEPS228. We select the maximum from each pair of EPS228 and EPS240 since the +228and +240-h forecasts are strongly correlated (see Fig 9). This combined data set is nowassumed equivalent to 2 ×

226 years. The combined 100-year GP return value estimates(indicated by blue circles) fall between the 100-year return values from the the two datasets (EPS228, green circles, and EPS240, red circles), which is what we expect when goingto larger data sets. This suggests that even larger data sets may be built by selectingmaxima from longer forecast sequences. However, care must be taken to avoid gettingtoo close to the beginning of the forecast where the ensemble members are correlated.We have also looked at the possible sources of bias to the extremes from EPS forecasts.Such a bias can not be estimated from a bootstrap procedure. Instead we have comparedthe return values of the ERA-40, ERA-I and EPS240 against NORA10 which has beenshown to represent the upper percentiles well (Reistad et al., 2011; Aarnes et al., 2012).Biases can enter a model data set in two distinct ways. The ﬁrst is through poor repre-sentation of physical processes. For example (as pointed out by Aarnes et al. 2012), ERA-40 applies deep-water physics in areas where this is questionable (such as the southernNorth Sea). If a bias is due to poor model physics or poor model resolution then the biasshould also be different from one region to another. The second way biases can enter adata set is through poor spatial and temporal sampling of the model ﬁelds. For example,both ERA-40 and ERA-I are archived with six-hourly resolution and will consequentlymiss some modeled storm maxima, even if a coarse model as discussed in Sec 2(2.6) isslowly varying and representative of signiﬁcant wave height averaged over 6 h. The datasets are also typically interpolated in space, leading to further reduction of extremes.This means that the return values from EPS and coarse resolution reanalyses such asERA-40 and ERA-I should be interpreted as return values of the six-hourly average seastate, as discussed in Sec 2(2.6), and will thus generally be lower than those found froma high-resolution hindcast such as NORA10 where the model values represent shortertime intervals. Fig 10 shows how the return values ranging from H , . . . , H line up.It is clear that ERA-40 and ERA-I due to their negative bias give signiﬁcantly lower re-turn values than NORA10, especially in the open-ocean conditions in the North Atlantic(panel b). EPS240 on the other hand matches the return values better, and the shorterreturn periods also line up well here. For Ekoﬁsk (panel a) in the central North Sea thesituation is different. Here the shorter return periods match well, but due to the relativelycoarse resolution of EPS240 H seems to converge to approximately the same value asERA-40 and ERA-I (11.3 m). NORA10 yields 100-year return values closer to 13 m forthis location. It seems likely that the EPS240 return values are biased low in enclosedseas and should consequently be used with some care in such areas. The EPS240 returnvalues are close to NORA10 estimates in the open ocean and we conclude that the biasfrom spatial and temporal interpolation is of less importance, since otherwise the returnvalues should be depressed everywhere. ack of forecast skill at advanced lead times is an important requirement since theensemble members must be assumed uncorrelated to be considered independent drawsfrom the model climate. We have shown that the weak correlations in the mean are notpresent in the tail of the distribution in the case of signiﬁcant wave height (see Sec 3).However, it seems likely that the method is not equally applicable to the investigationof the extremal behavior of parameters representing large-scale features, e.g. the NorthAtlantic Oscillation (NAO) index (Hurrell, 1995), or long-term (seasonal, say) averages.Here we do expect the ensemble forecast system to retain skill at advanced lead times,and indeed forecast skill in reproducing large-scale features is the rationale behind sea-sonal forecast systems (Stockdale et al., 1998, 2011), where the lead time typically goesto six months (Van den Brink et al., 2005). We therefore ﬁnd it prudent to advice againstemploying the method on large-scale spatial averages or long-term temporal averages.It is also clear that the forecasts only differ from the initial conditions by as much as240-h integrations allow and will still be inﬂuenced by the slow components of the earthsystem, like the Arctic ice cover. This means that for parameters inﬂuenced by climatechange or where quasi-cyclical phenomena with long-periodic components such as theEl Niño, Southern Oscillation (ENSO) are present we must be careful when assessingthe return values since we must be convinced that the archive covers a sufﬁciently longperiod to capture all the stages of the phenomenon. As noted in Sec 1, under such cir-cumstances non-stationary techniques employed on traditional time series and climateprojections will be more relevant. If on the other hand return values valid for the presentperiod are sought then ensemble forecasts are superior. Return values estimated from long lead-time ensemble forecasts have been investigatedand found to yield good results. The immediate advantage is clear; a huge data set offorecasts are readily available from the ECMWF archive. The method yields return val-ues of signiﬁcant wave height that are comparable to what is found from NORA10 butsigniﬁcantly higher than what is found from ERA-I and ERA-40. This result was nottotally unexpected since it is known that ERA-I and especially ERA-40 tend to under-estimate the upper percentiles of the wave height distribution. The EPS estimates areprobably too low in enclosed seas (see Fig 10). Although we have only investigatedthe extremes in the North Atlantic, the North Sea and the Norwegian Sea, it appearslikely that the extreme value estimates found from ERA-40 and ERA-I are too low glob-ally (as discussed by Caires and Sterl 2005a; Sterl and Caires 2005 in the case of ERA-40).However, the return value estimates from NORA10 (Aarnes et al., 2012) and the presentﬁndings suggest that the corrected ERA-40 return estimates reported by Caires and Sterl(2005a) and Sterl and Caires (2005) are too high.Return value estimation from large ensembles at advanced lead times is a generalmethod which should be applicable to a wide range of atmospheric and oceanographicvariables if the conditions discussed in Sec 3 and Sec 5 are met. It is clear that the EPSarchive represents an unused resource which complements and perhaps yields more pre-cise return values than traditional reanalyses and hindcasts. cknowledgment This work has been supported by the Research Council of Norway through the project“Wave Ensemble Prediction for Offshore Operations” (WEPO, grant no 200641) andthrough the European Union FP7 project MyWave (grant no 284455). This study has alsobeen part of a PhD program partially funded by the Norwegian Centre for Offshore WindEnergy (NORCOWE) for OJA. The Norwegian Deepwater Programme (NDP) ﬁnancedthe construction of the NORA10 hindcast archive. The patient advice of Saleh Abdalla,Peter Janssen, Hans Hersbach and Martin Leutbecher is greatly appreciated. We wouldalso like to thank the three anonymous reviewers. Their constructive comments helpedimprove the manuscript. eferences Aarnes, O. J., Breivik, Ø., Reistad, M., 2012. Wave Extremes in the Northeast Atlantic. JClimate 25, 1529–1543, doi:10.1175/JCLI–D–11–00132.1, 10/bvbr7k.Alves, J. H. G., Young, I. R., 2003. On estimating extreme wave heights using combinedGeosat, Topex/Poseidon and ERS-1 altimeter data. Appl Ocean Res 25 (4), 167–186,doi:10.1016/j.apor.2004.01.002.Anderson, D., Stockdale, T., Balmaseda, M., Ferranti, L., Vitart, F., Doblas-Reyes, R.,Hagedorn, R., Jung, T., Vidard, A., Palmer, T., 2003. Comparison of the ECMWF sea-sonal forecast System 1 and 2, including the relative performance for the 1997/98El Nino. ECMWF Technical Memoranda 404, European Centre for Medium-RangeWeather Forecasts.Battjes, J., 1972. Long-term wave height distributions at seven stations around the BritishIsles. Deutsch Hydrogr Z 25 (4), 179–189, doi:10.1007/BF02312702.Bidlot, J., Holmes, D., Wittmann, P., Lalbeharry, R., Chen, H., 2002. Inter-comparison of the performance of operational ocean wave forecasting systemswith buoy data. Weather and Forecasting 17 (2), 287–310, doi:10.1175/1520–0434(2002)017 < > ooper, C., Forristall, G., 1997. The use of satellite altimeter data to estimate theextreme wave climate. J Atmos Ocean Tech 14 (2), 254–266, doi:10.1175/1520–0426(1997)014 < > < > < > harin, V., Zwiers, F., 2005. Estimating extremes in transient climate change simulations.Journal of Climate 18 (8), 1156–1173, doi:10.1175/JCLI3320.1.Leutbecher, M., Palmer, T., 2008. Ensemble forecasting. Journal of Computational Physics227 (7), 3515–3539, doi:10.1016/j.jcp.2007.02.014.Lopatoukhin, L., Rozhkov, V., Ryabinin, V., Swail, V., Boukhanovsky, A., Degtyarev, A.,2000. Estimation of extreme wind wave heights. Tech. Rep. JCOMM Technical ReportNo 9, World Meteorological Organization.Molteni, F., Buizza, R., Palmer, T. N., Petroliagis, T., 1996. The ECMWF ensemble pre-diction system: methodology and validation. Q J R Meteorol Soc 122 (529), 73–119,doi:10.1002/qj.49712252905.Muir, L. R., El-Shaarawi, A., 1986. On the calculation of extreme wave heights: A review.Ocean Engng 13 (1), 93–118, doi:10.1016/0029–8018(86)90006–5.Naess, A., Clausen, P., 2001. Combination of the Peaks-over-treshold and bootstrappingmethods for extreme value prediction. Structural Safety 23, 315–330.Parey, S., Malek, F., Laurent, C., Dacunha-Castelle, D., 2007. Trends and climate evolu-tion: Statistical approach for very high temperatures in France. Climatic Change 81,331–352, doi:10.1007/s10584–006–9116–4.Prates, F., Buizza, R., 2011. PRET, the Probability of RETurn: a new probabilistic productbased on generalized extreme-value theory. Q J R Meteorol Soc 137 (655), 521–537,doi:10.1002/qj.759.Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P., 1992. Numerical Recipesin FORTRAN, 2nd edition. Cambridge University Press, Cambridge.Reistad, M., Breivik, Ø., Haakenstad, H., Aarnes, O. J., Furevik, B. R., Bidlot, J.-R., 2011.A high-resolution hindcast of wind and waves for the North Sea, the NorwegianSea, and the Barents Sea. J Geophys Res 116, 1–18, C05019, doi:10.1029/2010JC006402,doi:10/fmnr2m, arXiv:1111.0770.Richardson, D., 2010. Landmark in forecast performance. ECMWF newsletter, EuropeanCentre for Medium-Range Weather Forecasts.Simmons, A., Uppala, S., Dee, D., Kobayashi, S., 2007. ERA-Interim: New ECMWF re-analysis products from 1989 onwards. ECMWF newsletter 110, 25–35.Smith, R., 1990. Extreme value theory. In: Ledermann, E., Lloyd, E., Vajda, S., Alexander,C. (Eds.), Handbook of Applicable Mathematics. Wiley, pp. 437–471.Sterl, A., Caires, S., 2005. Climatology, variability and extrema of ocean waves: the Web-based KNMI/ERA-40 wave atlas. International Journal of Climatology 25 (7), 963–977,doi:10.1002/joc.1175.Stockdale, T., Anderson, D., Alves, J., Balmaseda, M., 1998. Global seasonal rain-fall forecasts using a coupled ocean–atmosphere model. Nature 392 (6674), 370–373,doi:10.1038/32861. tockdale, T., Anderson, D., Balmaseda, M., Doblas-Reyes, F., Ferranti, L., Mogensen, K.,Palmer, T., Molteni, F., Vitart, F., 2011. ECMWF seasonal forecast system 3 and its pre-diction of sea surface temperature. Climate dynamics, 455–471, doi:10.1007/s00382–010–0947–3.Uppala, S., Dee, D., Kobayashi, S., Berrisford, P., Simmons, A., 2008. Towards a climatedata assimilation system: Status update of ERA-Interim. Ecmwf newsletter, EuropeanCentre for Medium-Range Weather Forecasts.Uppala, S., Kållberg, P., Simmons, A., et al., 2005. The ERA-40 Re-analysis. Q J R MeteorolSoc 131, 2961–3012, doi:10.1256/qj.04.176.Van den Brink, H., Können, G., Opsteegh, J., Van Oldenborgh, G., Burgers, G., 2005. Es-timating return periods of extreme events from ECMWF seasonal forecast ensembles.International Journal of Climatology 25 (10), 1345–1354, doi:10.1002/joc.1155.Vinoth, J., Young, I., 2011. Global Estimates of Extreme Wind Speed and Wave Height. JClimate 24 (6), 1647–1665, doi:10.1175/2010JCLI3680.1.von Storch, H., Zwiers, F., 1999. Statistical Analysis in Climate Research. Cambridge Uni-versity Press.Wang, X., Feng, Y., Swail, V., 2012. North Atlantic wave height trends as recon-structed from the 20th Century Reanalysis. Geophys Res Lett 39, L18705, 6 pp,doi:10.1029/2012GL053381, 2012GL053381.Wang, X., Swail, V., 2001. Changes of extreme wave heights in Northern Hemisphereoceans and related atmospheric circulation regimes. J Climate 14 (10), 2204–2221,doi:10.1175/1520–0442(2001)014 < > < > < > EPS(

EPS ( ) − l ead : + Entries: 5924Cor: 0.36SI: 68.03BIAS: 0.01

P40

Data1:1Q−Q

Figure 2: Panel a: The correlation of all 51 ensemble member forecasts at +240 h withobservations of signiﬁcant wave height at location P40 (Ekoﬁsk, central North Sea) over thewhole period 1999-2009. The QQ curve is shown in green. It is clear that the +240 h climateis quite similar to the observed climate in this location and better than the wave heightdistribution found at analysis time (b). Panel b: Same as panel (a) for analysis time. TheEPS is biased low at analysis time. Panel c: The correlation between the +240 h forecastsof two ensemble members (member 0 represents the unperturbed atmospheric integration)over the whole period 1999-2009. The QQ-curve is shown in green. The centered anomalycorrelation relative to the weekly observed wave climate is 0.20.21a) −0.6−0.4−0.200.20.40.6 yearAnnual bias in mean Hs at 60 positions:EPS: archive−reforecast A nnua l b i a s [ m ] (b) −0.6−0.4−0.200.20.40.6 yearAnnual bias in STD Hs at 60 positions:EPS: archive−reforecast A nnua l b i a s [ m ] Figure 3: Panel a: Time series (1999-2009) of the difference in annual mean signiﬁcant waveheight [m] of EPS forecasts at lead time +240 h and the reforecast at the same lead time frommodel cycle Cy36r4 (operational in May 2011) at 60 locations in the north-east Atlantic, theNorth Sea and the Norwegian Sea. Panel b: The difference in annual standard deviation[m] of the signiﬁcant wave height of EPS240 and the reforecast. Considerable variations arefound from one year to another, but no signiﬁcant drift due to model upgrades is seen. estimates [m] D e c i m a t ed − s a m p l e H e s t i m a t e s [ m ] Figure 4: Comparison of the quantiles of H (100-year return values) from 51 ensemblemembers over the whole period 1999-2009 v the quantiles of H estimates from 51 dec-imated samples of all ensemble members taken every 51 forecasts. This decimation gives51 estimates of same sample size as the member-wise estimates. The distributions are verysimilar with common averages (11.30 m) and standard deviations of 1.83 m and 1.67 m,respectively. 22a) o W o W o W o o E o E o E o N o N o N o N o N o N o W o W o W o o E o E o E o N o N o N o N o N o N P40 B16 P35

Discrepancy in 100−year Hs based on POT (threshold: 97.0)ERA40−ERAi [m] −5−4−3−2−1012345 (b) o W o W o W o o E o E o E o N o N o N o N o N o N Discrepancy in 100−year Hs based on POT (threshold: 97.0)ERA40−NORA10 [m] −5−4−3−2−1012345

Figure 5: Panel a: Difference of H of ERA-40 and ERA-I using the GP (POT) method witha threshold of 97%. The differences are ∼ ◦ resolution, and all ice-infested grid points have been excluded. Thethree buoy locations are marked as P40 (Ekoﬁsk), B16 (K5) and P35 (Heidrun). Panel b:Difference of ERA-40 and NORA10, same threshold as for panel (a). The difference betweenNORA10 and ERA-40 is consistently ∼ o W o W o W o o E o E o E o N o N o N o N o N o N (b) o W o W o W o o E o E o E o N o N o N o N o N o N Figure 6: Panel a: Gridded estimates of H from EPS240 using GEV with blocked maximafrom individual ensemble members. The grid is 1.0 ◦ , and all ice-infested grid points havebeen excluded. Panel b: Same as panel (a) but for GP with a threshold of 97% of the data.23a) o W o W o W o o E o E o E o N o N o N o N o N o N Discrepancy in 100−year Hs based on GP (threshold: 97.0)EPS240−ERAi [m] −5−4−3−2−1012345 (b) o W o W o W o o E o E o E o N o N o N o N o N o N Discrepancy in 100−year Hs based on GP (threshold: 97.0)EPS240−NORA10 [m] −5−4−3−2−1012345

Figure 7: Panel a: Difference between H estimates of EPS240 and ERA-I using the GPdistribution with a threshold of 97% (positive means EPS240 is higher). EPS240 consistentlypredicts higher return values throughout the domain. The differences are ∼ ◦ , and all ice-infested grid points have been excluded.Panel b: Difference between EPS240 and NORA10, same GP threshold as for panel (a). Thedifference between NORA10 and EPS240 is generally smaller than what was found in panel(a) for ERA-I, but signiﬁcant geographical differences exist. Near the south-western bound-ary NORA10 is inﬂuenced by ERA-40, and behind the Faroe and Shetland archipelagos theresolution of EPS240 is too coarse to provide a meaningful comparison.24a) o W o W o W o o E o E o E o N o N o N o N o N o N CI95 (high) 100−year Hs based on POT (threshold: 97.0)ERAi [m] (b) o W o W o W o o E o E o E o N o N o N o N o N o N CI95 (high) 100−year Hs based on GP (threshold: 97.0)EPS240 [m] (c) o W o W o W o o E o E o E o N o N o N o N o N o N o W o W o W o o E o E o E o N o N o N o N o N o N CI95 (width) 100−year Hs based on POT (threshold: 97.0)ERAi [m] (d) o W o W o W o o E o E o E o N o N o N o N o N o N o W o W o W o o E o E o E o N o N o N o N o N o N CI95 (width) 100−year Hs based on GP (threshold: 97.0)EPS240 [m]

Figure 8: Panel a: Upper limit of 95% conﬁdence interval for ERA-I H based on 100bootstraps of the POT exceeding the 97 percentile. Panel b: Upper limit of 95% conﬁdenceinterval for EPS240 H based on 100 bootstraps of the data exceeding the 97 percentile.Panel c: Width of 95% conﬁdence interval for ERA-I. The relative width reaches 30% of thereturn values in parts of the north-east Atlantic. The geographic variability is pronounced,largely due to inﬂuence from individual storm events. Panel d: Width of 95% conﬁdenceinterval for EPS240. The relative width varies from 5% in sheltered areas to 10% in the openocean. The geographic variability is very low.25a) (b)(c)Figure 9: Panel a: Probability of non-exceedance for the EPS forecasts in location P40(Ekoﬁsk, central North Sea). The GPD estimates are shown as curves (with 100-year valuesmarked as circles) and are based on a 97% threshold while the individual forecast valuesare shown as asterisks. Green and red indicate EPS228 and EPS240, respectively. Blue is acombined estimate for EPS228 and EPS240 where the maximum of each pair of EPS228 andEPS240 forcast is chosen. This is done because the two forecasts are separated by only 12 hand strongly correlated. The combined data set thus represents the equivalent of 452 yearsof data since EPS228 and EPS240 each represents the equivalent of 226 years. The combineddata set lies below the EPS228 and EPS240 on the vertical axis since it has a lower probabil-ity of non-exceedance due to being twice the size of EPS228 and EPS240. Panel b: Same aspanel (a) but for location B16 in the eastern North Atlantic (note that the upper three val-ues of EPS228 are masked by EPS240 as the values are almost identical. Panel c: Same aspanel (a) but for location P35 in the eastern Norwegian Sea (Heidrun). It is evident from allthree panels that the combined 100-year estimate is bracketed by the estimates from the twoindividual estimates, as expected from a larger dataset.26a) R e t u r n v a l ue e s t i m a t e s t o y ea r s − EPS / E R A i / E R A P40Threshold: 97−percentile ERA−40ERA−interimEPS +240 (b)

10 12 14 16 18 201011121314151617181920 Return value estimates 1 to 100 years − NORA10 R e t u r n v a l ue e s t i m a t e s t o y ea r s − EPS / E R A i / E R A B16Threshold: 97−percentile ERA−40ERA−interimEPS +240 (c) R e t u r n v a l ue e s t i m a t e s t o y ea r s − EPS / E R A i / E R A P35Threshold: 97−percentile ERA−40ERA−interimEPS +240

Figure 10: Panel a. “Comet plot” comparison of H , H , . . . , H return values in the centralNorth Sea (P40, Ekoﬁsk) for NORA10 (ﬁrst axis) against ERA-40 (red asterisks), ERA-I (blue)and EPS240 (green). All return estimates were made from the GP distribution with a 97%threshold. Panel b. Same as panel (a) but for location B16 in the eastern North Atlantic.Panel c. Same as panel (a) but for location P35 in the eastern Norwegian Sea (Heidrun).ERA-40 and ERA-I estimates are signiﬁcantly lower than NORA10 in all three locations.EPS240 shows good correspondence in open-ocean conditions over the whole range up to H100