Elemental unbiased estimators for the Generalized Pareto tail
aa r X i v : . [ m a t h . S T ] A p r Elemental unbiased estimators for the Generalized Pareto tail
Allan McRobieCambridge University Engineering DepartmentTrumpington St, Cambridge, CB2 1PZ, [email protected] 31, 2018
Abstract
Unbiased location- and scale-invariant ‘elemental’ estimators for the GPD tail parameter are con-structed. Each involves three log-spacings. The estimators are unbiased for finite sample sizes, evenas small as N =
3. It is shown that the elementals form a complete basis for unbiased location- andscale-invariant estimators constructed from linear combinations of log-spacings. Preliminary numericalevidence is presented which suggests that elemental combinations can be constructed which are consistentestimators of the tail parameter for samples drawn from the pure GPD family.
The Generalized Pareto Distribution (GPD) and the Generalized Extreme Value (GEV) distribution play acentral role in extreme value theory. Each has three parameters ( µ, σ, ξ ) corresponding to location, scale andtail (or shape) respectively. This paper describes a particularly simple set of location- and scale-invariant‘elemental’ estimators for the GPD tail parameter. Each ‘elemental’ involves three log-spacings of the data,and each is unbiased over all tail parameters −∞ < ξ < ∞ , and for all sample sizes, as small as N = ξ IJ = log τ J − t I where τ = X I − X J − X I − X J and t = X I + − X J X I − X J , with J ≥ I + X I are the upper-order statistics, numbered in decreasing order starting from I = s s s s s s s s s s s X n X J X J − X I + X I X X (Largest) ✉ ✉✉ ✉✉ ✉ J − − ( J − + I − It τ Figure 1: Between any two non-adjacent data points X I and X J an elemental estimator ˆ ξ IJ can be defined. Itinvolves three log-spacings - the one between X I and X J , together with two shorter log-spacings connectingeach end-point to the data point immediately inside the other end.1he Generalized Pareto Distribution arises as the limiting distribution of maxima in Peaks-Over-Thresholdapproaches (see for example Embrechts et al. (1999)). It has distribution function: F ( x ) = − (cid:18) + ξ x − µσ (cid:19) − /ξ (2)The parameters µ and ξ can take any value on the real line, whilst σ can be any positive value (and when ξ = ξ , thesupport ( µ ≤ x ) is unbounded at the right, giving long- or heavy-tailed distributions. For ξ negative, thesupport is bounded both below and above ( µ ≤ x ≤ µ − σ/ξ ). Estimators for the tail parameter can be loosely classed into: maximum likelihood (ML); method of mo-ments; Pickands-like and Bayesian. Standard texts such as Embrechts et al. (1999) and Reiss and Thomas(2001) provide detailed background, with Coles (2001) giving the Bayesian perspective. de Zea Bermudez and Kotz(2010) provide a comprehensive review, such that only a brief survey is presented here.The maximum likelihood approach to the GPD is described in Smith (1987). Although it possesses somedesirable properties, the numerical maximization algorithms can experience problems for small samplesizes and for negative tail parameters, as there are occasions when the likelihood function does not possessa local maximum (Castillo and Daoudi (2009)). To avoid such problems, a method of moments approachwas proposed by Hosking and Wallis (1987).The classical tail parameter estimator is that of Hill (1975). However, it is not location invariant and isonly valid in the heavy-tailed Frech´et region ( ξ positive) of the GEV, although an extension into the Weibullregion ( ξ negative) was proposed by Dekkers et al. (1989) using the method of moments. Pickands (1975)proposed an estimator based on log-spacings which overcame many of these shortcomings. This estimatoris popular in current applications, and a substantial literature exists on its generalization (Drees (1998); Yun(2002), for example), the most general and e ffi cient of which appear to be those of Segers (2005), derivedusing second order theory of regular variation. These are optimised for estimation of the tail index in themore general case of data drawn from any distribution within the domain of attraction of the particular GPD.Although the main concern of Extreme Value Theory is the domain of attraction case, this paper restrictsattention to distributions within the pure GPD family. The possibility that results derived in this specificsetting may be extended to the more general case is left for later consideration.Throughout, there is an emphasis on results that are valid for small sample sizes. The main result here is the proof in Appendix 1 that each elemental estimator is absolutely unbiased withinthe GPD family. That the proof is valid, remarkably, for ALL ξ may be appreciated by inspection of Eqn. 7there. For ξ negative, the expectation of the log-spacing is expressed in terms of the tail probabilities G i and G j via a term log( G γ j − G γ i ) with γ = − ξ . This trivially decomposes into a simple term γ log G j and acomplicated term log(1 − ( G i / G j ) γ ). The proof shows how the simple terms provide the expectation ˆ ξ = − γ ,and how the elementals combine the complicated terms in such a manner that they cancel (obviating theneed to evaluate them explicitly). For ξ positive, the absolute lack of bias is maintained by an additionalsimple term γ log G i G j which adds 2 γ to the − γ result for ξ negative. This elegant correspondence be-tween the results for ξ positive and negative is absent in previous approaches to GPD tail estimation. Asfurther demonstration of the absolute lack of bias of each elemental triplet, even at small sample sizes, thenumerically-returned average values for each of the fifteen elemental estimators available for N = − ≤ ξ ≤
10 −8 −6 −4 −2 0 2 4 6 8 10−10−8−6−4−20246810 tail parameter ξ m ean v a l ue o f e s t i m a t o r s Performance of elemental estimators, N=7
Figure 2: Averages over 50,000 samples for the 15 elemental estimators for N = ξ .The fifteen lines are almost indistinguishable from the diagonal, indicating that each is indeed absolutelyunbiased. Trivially, any unit-weight linear combination of elementals will also be unbiased. Whilst it will be ofinterest to form e ffi cient combinations, no detailed analysis of e ffi ciency or variance is undertaken here.Instead, the performance of a simple combination is reported.Linear combinations of elementals are most conveniently described via the upper triangular matrix M ofdimension N × N of Table 1 containing all possible log-spacings. The general term is M IJ = log( X I − X J ) for J ≥ I + N × N form with zero diagonal allows for easier indexing). Eachelement above the secondary diagonal ( J ≥ I +
2) may be uniquely identified with an elemental estimator,involving that log-spacing, that to its left and that below it in M . Corresponding weights of elementalestimator combinations may thus be stored in an N × N upper triangular matrix R . Weighting an elementalˆ ξ IJ by r IJ requires (from Eqn. 1) that the three weights r IJ × { J − , − ( J − − I ) , − I } be given, respectively,to the corresponding left, upper-right and lower log-spacings in M . These weights are illustrated in the grid G shown in Table 2. The totals may then be collected in an N × N matrix A of log-spacing weights. Thesum of the entrywise product of A and M is then the unbiased estimate ˆ ξ .In summary, the matrix M is the roadmap of all log-spacings and the grid G gives the set of weights tobe used within each elemental. A linear combination of elementals is then defined by a unit-sum matrix R ,and the corresponding log-spacing weights are collected in the zero-sum matrix A . An example combination
A natural choice of linear combination might give equal weight to each elemental. However here wegive further consideration to a simple “linearly-rising” combination. Numerical experiments indicate thatmany simple choices of linear combination lead to good estimators, and further research may seek theoptimal combination. There is thus nothing special about the “linearly-rising” combination consideredhere. As will be demonstrated, it has a good all-round performance, but more importantly it illustratesthe great simplicity that the elementals permit, allowing the ready creation of unbiased tail estimators withe ffi ciencies comparable to current leading (and often highly complicated) alternatives.The “linearly-rising” combination has elemental weights r IJ ∝ N + − J and the resulting log-spacing3og( X − X ) log( X − X ) log( X − X ) log( X − X ) log( X − X ) log( X − X )log( X − X ) log( X − X ) log( X − X ) log( X − X ) log( X − X )log( X − X ) log( X − X ) log( X − X ) log( X − X )log( X − X ) log( X − X ) log( X − X )log( X − X ) log( X − X )log( X − X )Table 1: The address matrix M for N =
7. Each elemental involves three adjacent cells in an inverted-Lformation. That corresponding to the elemental ˆ ξ is shaded for illustration.2 -1 3 -2 4 -3 5 -4 6 -5-1 -1 -1 -1 -13 -1 4 -2 5 -3 6 -4-2 -2 -2 -24 -1 5 -2 6 -3-3 -3 -35 -1 6 -2-4 -46 -1-5Table 2: The grid G of exponents of the elemental estimators, for N =
7. Again, the terms involved in theelemental ˆ ξ are shaded for illustration. 4eights are a IJ = N − J + / ( N ( N − N − J ≥ I +
1. For example, for N = R = . . giving A = .
10 7 4 1 − −
57 4 1 − −
54 1 − − − − − − − Further illustration is given in Figure 3 (centre) for a sample size of 40, showing how the log-spacingweights have a simple linearly-rising distribution with zero mean. For comparison, the unusual pattern ofthe corresponding log-spacing weights of an unconstrained (i.e. optimised but biased) Segers estimator arealso shown. Since the Segers estimator has only a single non-zero weight in each column it immediatelyfollows that it cannot be constructed from the elemental triplets. −3 Matrix A for combined elementals
Figure 3: The A matrix of log-spacing weights for a typical elemental, the combined elementals and aSegers estimator.In Figure 4, the errors of the elemental combination are compared with those of the unconstrainedSegers estimator for pure GPDs with the somewhat extreme cases of ξ = ±
3, and small sample sizes. Theelemental combination has comparatively large variance around an unbiased mean, in contrast to the Segersestimator which is more tightly bunched around a biased o ff set. Despite the complexity, the extensiveoptimization and the substantial bias in the Segers estimator, its mean square error is nevertheless typicallyonly marginally less than that of the simple elemental combination. Proposition : if ˆ ξ is a linear combination of log-spacings and is an absolutely-unbiased, location- and scale-invariant estimator of the tail parameter of the GPD, then ˆ ξ is a linear combination of elementals.The proof, presented in Appendix 2, shows that a requirement for lack of bias imposes N − N ( N − / N − N − / ffi ciency and Optimality Given that the e ffi ciency of the estimator depends on the actual unknown value of the parameter ξ , there isno unique definition of optimality. The question as to which linear combination is in some sense the ‘best’is thus a matter of judgement.Fig. 5 shows the relative e ffi ciency of various linear combinations of elementals, wherein relative e ffi -ciency is defined with respect to the minimal possible variance (given the tail parameter ξ ) within the class5 ξ + R M SE Elemental cf Segers, ξ = 3 4 6 8 10 12 14 16 18 20−7−6−5−4−3−2−101 N ξ + R M SE Elemental cf Segers, ξ = −3 Figure 4: The mean, (mean ± std. dev.), and (actual + rmse) for the elemental combination (solid lines) andthe unconstrained Segers (dashed) for 10,000 samples of sizes N = ξ = ξ = −
3. Means are highlighted with + and actual-plus-rmse by ◦ . Note the large bias in the Segersestimator (particularly for ξ = − N = −3 −2 −1 0 1 2 300.20.40.60.811.21.4 ξ R e l a t i v e e ff i c i en cy Relative efficiency of elemental combinations Lin RisSegersA1B1C1D1
Figure 5: The relative e ffi ciency of various linear combinations of elementals (for GPD samples of size N =
20 over a range of ξ ). E ffi ciency is defined relative to the numerically-computed minimal variance atgiven ξ within the class of location- and scale-invariant unbiased estimators that are linear combinations oflog-spacings. The linearly-rising combination r i j ∝ N + − j is seen to be a good compromise, giving highrelative e ffi ciency over the whole range − ≤ ξ ≤ ξ and for any sample size N , the unbiased combina-tion giving minimum variance within this class can be estimated numerically by constructing, via repeatedsamples, the numerical covariance matrix for the set of ( N − N − / ffi cients r i j . The Lagrange multiplier is thenan estimate of the minimum variance. Since the computed coe ffi cients are minimal for that set of samples,it will, for that sample set, perform better than the actual global optimum, and thus provide a lower bound6n the minimum variance. The computed coe ffi cients will not be fully optimal for other randomly drawnsample sets, and since the global optimum is, on average, optimal for other sets, then an upper bound on theminimum possible variance (within this class of estimators) can be obtained by applying the numerically-computed optimal coe ffi cients to a large set of samples which were not used in their computation. By thisprocedure, using two separate blocks of 8000 samples of size N =
20 drawn from GPDs, the (approximate)optimal linear combination within the class was constructed for various ξ .The optimal elemental coe ffi cients r i j and corresponding log-spacing coe ffi cients a i j computed for ξ = ffi cients are small near the i ≈ j diagonal, rising in amplitudenear the top corner i ≈ , j ≈ N −
2. At all values of ξ investigated, the optimal coe ffi cients had thischaracteristic. Moreover all exhibited the decidedly non-smooth character reminiscent of the measure λ inSeger’s optimisation procedure (Segers (2005)). ξ = 0j ξ = 0j Figure 6: The numerically computed matrix elements r i j and a i j that minimise the variance at ξ = ξ -specific optimal combinations have thus been computed, these are not optimal in any globalsense, as their performance is far from optimal away from the values of ξ at which they were optimised.This can be seen in Fig. 5, where the curve D1 shows the performance of the optimal ξ = ffi ciency for ξ values di ff erent from zero.Fig. 5 also shows the performance of some representative examples of other linear combinations. CurveA1 is for equal elemental weights ( r i j = constant), showing good performance in the very heavy-tailedregion, but with much lower e ffi ciency for ξ small or negative.Curve B1 has r i j ∝ / ( ii ( ii + ξ negative region, but has low e ffi ciency for ξ positive.Curve C1 (dashed) is for r i j ∝ ( j j − ii ) . This emulates the numerically-computed optimals in risingfrom zero near the diagonal to larger values in the i ≈ , j ≈ N − ffi ciency is good in theregion near ξ =
1, but is low elsewhere, especially for ξ ≈ −
1. Interestingly, the e ffi ciency stays close tothat of the Seger’s estimator ( + ) throughout.The relative e ffi ciency of the Seger’s estimator is also shown in Fig. 5. The reason it can have a relativee ffi ciency greater than unity (as it does near ξ ≈ .
5) is that it is not strictly within the class of estimatorsunder consideration, in that it is biased and, moreover, is a nonlinear function of the log-spacings (sincethe weights are adaptively selected after an initial log-spacing-based estimate). It should also be notedthat, despite having the advantages of bias-variance trade-o ff and access to a larger class of possibilities, itsrelative e ffi ciency is nevertheless comparatively poor for ξ negative.Finally, it can be seen from Fig. 5 that the linearly rising combination with r i j ∝ N − j + ξ from 0 to 3, which is often of great interest in practice, and although the e ffi ciency falls somewhatfor ξ negative, it does not do so by much. For this reason, this combination will be considered furtherthroughout the remainder of the paper. 7 .1 Consistency - preliminary results Although there is as yet no proof of consistency for any elemental combination, numerical evidence sug-gests that the “linearly-rising” combination is consistent for samples drawn from within the GPD family.Fig. 7 shows how the Root Mean Square Error (RMSE) for the “linearly-rising” combination of elementalsdecreases as the sample size grows. At each point on each graph, the RSME was determined numericallyfrom 10,000 samples of size N drawn from a GPD at various ξ in the range − ≤ ξ ≤
3, with N increasingfrom 20 to 1000. At each ξ , and for N large, the errors appear to be converging with increasing sample size N at a rate proportional to 1 / √ N , with the constant of proportionality dependent on ξ .There is a consistency proof already in existence which has some relevance here, and covers many ele-mental combinations for the more general domain of attraction case (which trivially includes the pure GPDcase). This is Theorem 3.1 of Segers (2005), which guarantees weak consistency of many elemental com-binations in the N → ∞ , k → ∞ , k / N → A to have zero weights in thevicinity of the diagonal and of the top row. The linearly-rising combination does not satisfy this condition,although it is clear that minor adjustments can be made to zero the weights in the appropriate vicinities.Finally, it could be noted that the presence (or lack) of asymptotic consistency results is arguably oflimited interest if the emphasis, as here, is on providing estimators which perform well with small ormoderately sized samples. ξ = −3 1−sqrt(2/N) R M SE Convergence, pure GPD, various ξξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −3 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −2 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = −1 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 0 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 1 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 2 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 ξ = 3 Figure 7: Dependence of root mean square error RMSE on sample size N (20 ≤ N ≤ ξ are indicated with circles and crosses respectively. Horizontal axes are scaled to bring infinite N to unity, byplotting 1 − √ / N . This numerical evidence suggests that, for large N , the RMSE decreases in proportionto 1 / √ N , with the constant of proportionality depending on ξ . ‘Elemental’ absolutely-unbiased, location- and scale-invariant estimators for the tail parameter ξ of theGPD have been presented, valid for all ξ and all N ≥
3. The elemental estimators were shown to forma complete basis for those unbiased, location and scale-invariant estimators of the GPD tail parameterwhich are constructed from linear combinations of log-spacings. Numerical evidence was presented which8upports consistency of at least one elemental combination for samples drawn from the pure GPD family.
References
Castillo, J., Daoudi, J., 2009. Estimation of the generalized Pareto distribution. Statistics and ProbabilityLetters 79, 684–688.Coles, S., 2001. An Introduction to Statistical Modelling of Extreme Values. Springer, London.de Zea Bermudez, P., Kotz, S., 2010. Parameter estimation for the generalized Pareto distribution - parts iand ii. J. Stat. Planning and Inference 140 (6), 1353–1388.Dekkers, A. L. M., Einmahl, J. H. J., deHaan, L., 1989. A moment estimator for the index of an extremevalue distribution. Annals of Statistics 17, 1833–1855.Drees, H., 1998. A general class of estimators of the extreme value index. J. Stat. Planning and Inference66, 95–112.Embrechts, P., Kl¨uppelberg, C., Mikosch, T., 1999. Modelling Extreme Events for Insurance and Finance.Springer, Berlin.Hill, B. M., 1975. A simple general approach to inference about the tail of a distribution. Annals of Statistics3, 1163–1174.Hosking, J. R. M., Wallis, J. R., 1987. Parameter and quantile estimation for the Generalized Pareto Distri-bution. Technometrics 29 (3), 339–349.Pickands, J., 1975. Statistical inference using extreme order statistics. Ann. Statist. 3, 119–131.Reiss, R.-D., Thomas, M., 2001. Statistical Analysis of Extreme Values: with applications to insurance,finance, hydrology and other fields. Birkh¨auser, Basel.Segers, J., 2005. Generalized Pickands estimators for the extreme value index. J. Stat. Planning and Infer-ence 128 (2), 381–396.Smith, R. L., 1987. Estimating tails of probability distributions. Annals of Statistics 15 (3), 1174–1207.Yun, S., 2002. On a generalized Pickands estimator of the extreme value index. J. Stat. Planning and Infer-ence 102, 389–409. 9 ppendix 1: Proof that log τ J − / t I is unbiased for the GPD Consider a random variable x with distribution function F ( x ) and complement G ( x ) = − F ( x ), and viathe probability integral transform, construct the inverse function x = u ( G ) which maps an exceedenceprobability G to a point x in the data space. Since dG = − dF / dx dx = − p ( x ) dx where p ( x ) is the density,the expected value of any function h ( x ) may be evaluated by transforming integrals over x to integrals over G , using : h h ( x ) i = Z ∀ x h ( x ) p ( x ) dx = Z h ( u ( G )) dG (3)A consequence of the well-known uniform density of the G ’s is that the second integral (over G ) can be con-siderably simpler than the first integral (over x ). For the GPD, integrals via p ( x ) lead to lengthy expressionsinvolving hypergeometric (Lauricella) functions. Although a proof that the elemental estimators are unbi-ased has been constructed by that route, the approach via the transformed integrals over G is considerablysimpler, and is presented here.Consider an ordered sample X of n data points drawn from the distribution F , ordered such that X n ≤ X n − ≤ . . . ≤ X ≤ X . The expected value of any function h ( X ) is h h ( X ) i = n ! Z dG n . . . Z G dG h ( u ( G )) (4)the integral being over the n -dimensional unit simplex containing all possible G .The GPD has distribution function F ( x ) = − + ξ ( x − µ ) σ ! − /ξ (5)such that x = u ( G ) = µ + σξ ( G − ξ −
1) (6)Depending whether ξ is positive or negative, the expected value of the log-spacing between the i th and j thorder statistics is h log( X i − X j ) i = h log( G γ j − G γ i ) i − γ h log G i G j i + log σγ for ξ = γ, γ > h log( G γ j − G γ i ) i + log σγ for ξ = − γ, γ > ξ ( X ) = P i , j a i j log( X i − X j ), a linear combination of log-spacings. For ˆ ξ to bescale invariant, the weights a i j must sum to zero to remove the σ dependence in Eqn. (7). Moreover, for ˆ ξ to be unbiased for both positive and negative ξ , Eqn. (7) requires X i , j a i j h log( G γ j − G γ i ) i = − γ and − X i , j a i j γ h log G i G j i = γ (8)To determine the expected value of any function h ( X i , X j ), all other G variables can be integrated out,leaving h h ( X i , X j ) i = C i j Z dG j Z G j dG i G i − i (1 − G j ) n − j ( G j − G i ) j − i − h ( u ( G i ) , u ( G j )) (9)where C i j = n ! / (( i − n − j )!( j − i − X a i j h log G j i = X a i j C i j . Z G j − j (1 − G j ) n − j log G j dG j . Z φ i − (1 − φ ) j − i − d φ (10)where φ = G i / G j . The φ integral leads immediately to the beta function B ( i , j − i ). For the G j integral,standard Mellin transform theory gives Z G j − j (1 − G j ) n − j log G j dG j = " dds Z G s − + j − j (1 − G j ) n − j dG j s = = B ( j , n − j +
1) ( ψ ( j ) − ψ ( n + ψ is the digamma function, the derivative of the logarithm of the gamma function. Both beta functionshave integer arguments and may be expressed as ratios of factorials. These cancel with the leading factorialterms C i j , such that the expected value of the weighted sum is X a i j h log G j i = X a i j ( ψ ( j ) − ψ ( n + = X a i j ψ ( j ) (12)the last step following from P a i j =
0. It follows similarly that X a i j h log G i i = X a i j ( ψ ( i ) − ψ ( n + = X a i j ψ ( i ) (13)For an individual elemental ˆ ξ IJ the weights a i j and the relevant values of i and j are given in Table 3.Table 3: The indices of an elemental estimator, and, in the final column, the weights.term i j a i j log ( X I − X J − ) I J − J − X I − X J ) I J − ( J − I − X I + − X J ) I + J − I Using the relation ψ (1 + x ) = ψ ( x ) + / x we obtain the contributions from the individual elemental ˆ ξ IJ to be X a i j ψ ( j ) = ( J − ψ ( J − − ( J − I − ψ ( J ) − I ψ ( J ) = ( J − ψ ( J − − ψ ( J )) = − X a i j ψ ( i ) = ( J − ψ ( I ) − ( J − I − ψ ( I ) − I ψ ( I + = I ( ψ ( I ) − ψ ( I + = − ξ of elementals providesan unbiased estimate for ξ via the h log G γ i i and h log G γ j i terms. Explicitly, with γ = | ξ | , we have h ˆ ξ i = ( γ P a i j h log G j i + P a i j h log(1 − φ γ ) i for ξ < − γ P a i j h log G i i + P a i j h log(1 − φ γ ) i for ξ > = ξ + X a i j h log(1 − φ γ ) i (17)It only remains to prove that the second term is zero. This term involves somewhat more complicated inte-grals leading to sums of digamma functions with non-integer arguments dependent on γ . Explicit evaluationof these can, however, be avoided here by observing how the elemental terms combine and cancel.From Eqn. (9), we obtain h log(1 − φ γ i j ) i = C i j D j B i j where D j = Z y j − (1 − y ) n − j dy = ( j − n − j )! n !and B i j = Z φ i − (1 − φ ) j − i − log(1 − φ γ ) d φ (18)Summing over a single elemental using the weights a i j given in Table 3, and collecting the leading a i j C i j D j terms gives X a i j h log(1 − φ γ ) i = ( J − I − J − I − B I , J − − B I , J − B I + , J ) (19)Now B I , J − − B I , J − B I + , J = (20) Z { φ I − (1 − φ ) J − I − − φ I − (1 − φ ) J − I − − φ I (1 − φ ) J − I − } log(1 − φ γ ) d φ (cid:2) − (1 − φ ) − φ (cid:3) =
0, thus X a i j h log(1 − φ γ ) i = ∀ ξ , ξ .The proof for ξ = G ( x ) = exp ( − ( x − µ ) /σ ), whence, using Eqn. (9), weobtain X a i j h log( X i − X j ) i = X a i j h log σ i + X a i j h log[ − log φ i j ] i (22)The first term is trivially zero due to the zero sum of the elemental a i j , and the second term is zero for thesame reason that P a i j h log(1 − φ γ ) i is zero, as shown above, in that the elementals combine terms in such away as to eliminate the expectation of any function of the φ i j .This completes the proof that for any elemental ˆ ξ IJ , the expectation h ˆ ξ IJ i = ξ for all ξ .12 ppendix 2: Completeness Here we prove that for an estimator ˆ ξ of the tail parameter ξ of the GPD, with the preconditions that ˆ ξ is i)a linear combination of log-spacings, ii) absolutely-unbiased for all ξ and iii) location- and scale-invariant,then ˆ ξ may be expressed as a linear combination of the elementals. That is, the elementals form a completebasis for the set of invariant unbiased log-spacing estimators of the GPD tail parameter.Precondition ii requires that ˆ ξ must be unbiased at both ξ = γ and ξ = − γ for any γ >
0. This symmetryis embodied in Eqn. (7), addition and subtraction of which (together with precondition iii and elementaryintegrations such as Eqns 10-12) leads to the requirements on the log-spacing weights a i j that γ X a i j ( h log G j i + h log G i i ) = γ X a i j (cid:2) ψ ( j ) + ψ ( i ) (cid:3) = − γ (23) γ X a i j ( h log G j i − h log G i i ) = γ X a i j (cid:2) ψ ( j ) − ψ ( i ) (cid:3) = − X a i j h log(1 − φ γ i j ) i (24)where ψ is the digamma function and P means sum over i from 1 to N − j from i + N .We now prove that the preconditions imply that the right-hand side of Eqn (24) is zero. Eqn (24) may bewritten as c γ = X a i j h log(1 − φ γ i j ) i : = I where c = − (1 / X a i j (cid:2) ψ ( j ) − ψ ( i ) (cid:3) (25)Similar to Eqn (10), each term in the I summation may be considered individually and all variables irrele-vant to that term may be integrated out to give I = X i j a i j C i j Z G j − j (1 − G j ) n − j dG j . Z φ i ji − (1 − φ i j ) j − i − log(1 − φ γ i j ) d φ i j (26)The G j integral gives a beta function B ( j , n − j +
1) which combines with the C i j term to give a factor1 / B ( i , j − i ). Since each expectation integral over φ i j is definite, we can set each φ i j = φ . Each product φ i − (1 − φ ) j − i − involves integer exponents and can thus be expressed as a polynomial of degree j −
2. Passingthe summation through the integral sign, the various polynomials can be collected into a single polynomial p n − ( φ ) = P n − k = b k φ k of degree n −
2. Passing the summation back out of the integral gives: I = n − X k = b k g k ( γ ) where g k ( γ ) : = Z φ k log(1 − φ γ ) d φ (27)The binomial expansion of each (1 − φ ) j − i − factor in Eqn. 26 leads to the total polynomial p n − ( φ ) = n − X i = n X j = i + ( j − i − j − i − a i j j − i − X q = ( − q ( j − i − q !( j − i − − q )! φ ( i − + q (28)Collecting together equal powers of φ gives p n − ( φ ) = n − X k = φ k k + X i = n X j = k + ( j − i − a i j ( − k − i − ( k − i + j − k − ffi cients b k are b k = k + X i = n X j = k + ( j − i − a i j ( − k − i − ( k − i + j − k − = ( k + n X j = k + j − k + ! k + X i = ki − ! ( − k − i − a i j (30)13aving determined the polynomial coe ffi cients, we consider the integrals g k ( γ ) of the various φ k terms, asdefined in Equation (27). The transformation φ γ = (1 − ρ ) leads to g k ( γ ) = γ Z (1 − ρ ) k + γ − log ρ d ρ = γ " dds Z ρ s − (1 − ρ ) k + γ − d ρ s = = γ B (1 , k + γ ) " ψ (1) − ψ (1 + k + γ ) = k + " ψ (1) − ψ (1 + k + γ ) (31)We thus have I = c γ = n − X k = b k . g k ( γ ) = n − X k = b k k + " ψ (1) − ψ (1 + k + γ ) (32)If this is true at all γ >
0, then it must be true for γ large, ( γ = /ǫ , ǫ > ǫ small). There g k γ = ǫ ! = k + (cid:2) ψ (1) − ψ (1 + ( k + ǫ ) (cid:3) Since the digamma function is well-behaved (i.e. infinitely di ff erentiable) near ψ (1), we may take the Taylorseries ψ (1 + δ ) = ψ (1) + ψ ′ (1) δ + O ( δ ) to obtain g k ( γ ) = − ψ ′ (1) ǫ + O ( ǫ ) (33)thus I = c γ = c ǫ = n − X k = b k g k ( 1 ǫ ) = − ǫψ ′ (1) n − X k = b k + O ( ǫ ) (34)whence c = − ǫ ψ ′ (1) n − X k = b k + O ( ǫ ) (35)Since c is a constant wrt γ and since (nonzero) ǫ = /γ can be arbitrarily small, we thus infer from 35 thatthe preconditions imply that c = I = c γ = I = P n − k = b k g k ( γ ) the independence of the N − g k ( γ ) implies that b k = , ∀ k ∈ { , , . . . , n − } (36)These are the N − b k summation, the terms associated with each elemental crosshair of the grid G . Each b k = A matrix as illustrated in Table 4.Consider an element a IJ away from the rectangle boundaries (such as a in Table 4). Consider the b k summation terms associated with the elemental ˆ ξ IJ , which thus involves the term a IJ , the term a I , J − to itsleft and the term a I + , J below it. The b k summation weights given to each of these terms can be obtainedfrom Eqn 30 as b k , ( I , J ) = ( J − + I − k − I + J − k − b k , ( I , J − = ( J − + I − k − I + J − k − b k , ( I + , J ) = ( J − − I )!( k − I )!( J − k − a a a a a a a a a a a a a a a a a a a a a Table 4: Each constraint b k = A matrix. For N =
7, the rectangleof terms for k = ξ IJ contributes in proportions − ( J − I − J −
1) and − I to a IJ , a I , J − and a I + , J respectively. The weights in the b k summation are thus such as to eliminate the contribution from theelemental ξ IJ , since it readily follows from the above that − ( J − I − b k , ( I , J ) + ( J − b k , ( I , J − + ( − I ) b k , ( I + , J ) (40) ∝ − ( J − I − + ( k − I + + ( J − k − = b k rectangle is similar).Since any elemental satisfies the constraints then so does any linear combination thereof, and since thereare ( N − N − //