Characterization of the tail of the distribution of earthquake magnitudes by combining the GEV and GPD descriptions of Extreme Value Theory
1 1
Characterization of the Tail of the Distribution of Earthquake Magnitudes by combining the GEV and GPD descriptions of Extreme Value Theory
V. F. Pisarenko , A. Sornette , D. Sornette and M.V. Rodkin International Institute of Earthquake Prediction Theory and Mathematical Geophysics Russian Ac. Sci., Profsoyuznaya 84/32, Moscow 117997, Russia ETH Zurich, Swiss Seismological Service HPP P6.1, Hönggerberg, CH-8093 Zürich, Switzerland ETH Zurich, D-MTEC, Kreuzplatz 5 CH-8032 Zurich, Switzerland University of California, Los Angeles, California 90095 Geophysical Center of Russian Ac. Sci. Molodezhnaya 3, Moscow 119206, Russia
Abstract:
The present work is a continuation and improvement of the method suggested in [Pisarenko et al. 2008] for the statistical estimation of the tail of the distribution of earthquake sizes. The chief innovation is to combine the two main limit theorems of Extreme Value Theory (
EVT ) that allow us to derive the distribution of T -maxima (maximum magnitude occurring in sequential time intervals of duration T ) for arbitrary T . This distribution enables one to derive any desired statistical characteristic of the future T -maximum. We propose a method for the estimation of the unknown parameters involved in the two limit theorems corresponding to the Generalized Extreme Value distribution ( GEV ) and to the Generalized Pareto Distribution (
GPD ). We establish the direct relations between the parameters of these distributions, which permit to evaluate the distribution of the T -maxima for arbitrary T . The duality between the GEV and
GPD provides a new way to check the consistency of the estimation of the tail characteristics of the distribution of earthquake magnitudes for earthquake occurring over arbitrary time interval. We develop several procedures and check points to decrease the scatter of the estimates and to verify their consistency. We test our full procedure on the global Harvard catalog (1977-2006) and on the Fennoscandia catalog (1900-2005). For the global catalog, we obtain the following estimates: max ˆ M = 9.53 ± )97.0(ˆ Q =9.21 ± For Fennoscandia, we obtain max ˆ M = ± )97.0(ˆ Q = ± The estimates of all related parameters for the
GEV and
GPD , including the most important form parameter, are also provided. We demonstrate again the absence of robustness of the generally accepted parameter characterizing the tail of the magnitude-frequency law, the maximum possible magnitude M max , and study the more stable parameter Q T (q) , defined as the q -quantile of the distribution of T -maxima on future interval of duration T . 2 2 The magnitude-frequency Gutenberg-Richter (
G-R ) law is the most documented and robust statistical law of seismology. It has been subjected to numerous investigations (see, e.g. Bird and Kagan, 2004; Cosentino et al., 1977; Kagan, 1991; 1996; 1999; 2002a; 2002b; Kijko and Sellevol 1989, 1992; Knopoff et al., 1982; Main et al., 1999; Ogata and Katsura, 1993; Pisarenko and Sornette, 2003; 2004; Sornette et al., 1996; Utsu, 1999; Wu, 2000; Pisarenko and Rodkin, 2007). For small and moderate magnitudes, and for large space-time volumes, the Gutenberg-Richter law is valid with a large degree of accuracy. However, for the largest magnitudes, some more or less significant deviations of the distribution of earthquake magnitudes from the
G-R law have been documented (see e.g. Pisarenko and Sornette (2004) and references therein). The intrinsic difficulty of the investigation of the largest magnitudes is the insufficient number of large earthquakes. Inevitably, the numerous proposals for novel models of the deviations of the tail of the distribution of earthquake magnitudes from the
G-R suffer from large statistical uncertainty. As a consequence, the problem of finding an adequate description of the tail of the magnitude distribution cannot be considered as definitely settled. One of the best-known modifications of the
G-R (Kagan, 1997; Kagan and Schoenberg, 2001; Bird and Kagan 2004) consists in multiplying the power law distribution of seismic moments (which corresponds to the
G-R exponential distribution of magnitudes) by an exponential factor (also referred to as an exponential taper), which leads to a Gamma-distribution for seismic moments. The characteristic index in the exponential taper is often referred to as the “corner” moment, as it constitutes the typical magnitude beyond which the distribution departs significantly downward from the pure
G-R law. Rather than introducing a “soft” truncation of the
G-R law, a different class of models assume that the
G-R law holds up to a maximum magnitude M max , beyond which no earthquake are observed (Cosentino et al., 1977; Dargahi-Noubary, 1983; Main et al., 1999; Pisarenko, 1991; Pisarenko et al., 1996)
0; x < m ; F(x) = [ 10 - β m - 10 - β x ] / [ 10 - β m - 10 - β Mmax ]; m ≤ x ≤ M max ; 1; x > M max . The parameter M max represents the maximum possible earthquake size in the region under study . This parameter plays a very important role in seismic risk assessment and in seismic hazard mitigation (see e.g. Bender and Perkins, 1993; Pisarenko et al., 1996; Kijko and Graham, 1998; Kijko et al., 2001). It should be noted that the truncated
G-R ensures the finiteness of the mean seismic energy, whereas the
G-R in its unlimited form corresponds to a regime with infinite seismic energy, which is, of course, an undesirable property of the model. The parameter M max is convenient for engineers and insurers: having a reliable estimate of M max , it is comparatively easy to take adequate decisions on the construction standards of buildings or on insurance policy. As a consequence, the truncated G-R has undergone a wide dissemination. Unfortunately, all attempts so far for a reliable statistical estimation of M max in various seismic regions did not give satisfactory results due to large statistical scatter and lack of reliability of its estimates. Attempts to attribute a maximum magnitude to individual faults rather than to regions suffer from the same problems and in addition face the fact that many large earthquakes involve several faults or fault segments which are difficult if not impossible to determine in advance (Black et al., 2004; Ward, 1997). In order to avoid these undesirable properties of the parameter M max we have suggested in [Pisarenko et al.2008] to use the more stable parameter Q T (q) – the q -quantile of the distribution of T -maximum in intervals of duration T . In the present paper, we refine this approach and 3 3 establish important relations between the two main limit theorems of Extreme Value Theory ( EVT) , allowing us to characterize the distribution of future T -maximum for any desired T . The paper is structured as follows. Section 2 presents the two main theorems of EVT and 3 Corollaries that we shall use in our estimation methods (the Corollaries are proved in the Appendix). Then, two approaches for the estimation of the parameters corresponding to the
GPD and
GEV distributions are defined and tested on synthetic catalogs. We explain in details the bootstrap methods and the statistical methods developed to (1) decrease the scatter of the estimated parameters and (2) quantify the remaining level of uncertainty. Section 3 presents the application of our method to the global worldwide Harvard catalog. Section 4 applies our method to the local catalog of Fennocsandia. Section 5 concludes.
2. The method of combined use of the two main theorems of Extreme Value Theory. 2.1 Statement of the two main theorems of extreme value theory
Extreme value theory (
EVT ) studies the limiting distribution of the maxima of iid rv (identically independently distributed random variables) as the sample size n tends to infinity. Two main limit theorems constitute the statistical basis for applications of EVT : (i) the main limit theorem of
EVT (proved by Frechet (1927) for Pareto-type limit distribution and by Fisher and Tippet (1928) for Weibull and Gumbel limit distributions) leading to the Generalized Extreme Value distribution (
GEV ) and (ii) the theorem by Gnedenko-Pickands-Balkema-Haan leading to the Generalized Pareto Distribution (
GPD ). We now state these two theorems in turn.
Frechet(1927)-Fisher-Tippett(1928) theorem (FFT), (see e.g. [Embrechts et al. 1997])
Let x ,… x n ,… be iid rv with a continuous distribution function (DF) F(x); Let M n = max(x ,… x n ) be the maximum of a sample of n such rv. If there exist two series of numbers a n , b n , such that (M n - a n )/ b n weakly converges to a non-degenerate rv with DF Φ (x), then up to a shift and to a scale change, Φ (x) can only take one of the three forms: Φ (x| ξ ) = exp(-1/x ξ ), x > 0; ξ > 0; (Pareto type); Φ (x| ξ ) = exp(-|x| -1/ ξ ), x < 0; ξ < 0; Φ (x)=1, x ≥
0; (Weibull type); Φ (x) = exp(-exp(-x)), |x| < ∞ ; (Gumbel type). These three distributions can be written in a unified form, using the notations µ and σ to refer to the centering and scale parameters: Φ (x| µ , σ , ξ ) = exp{ -[1+ ξ (x- µ )/ σ ] -1/ ξ }, 1+ ξ (x- µ )/ σ > 0; σ > 0; 0 <| ξ | < ∞ ; | µ | < ∞ . (1) Φ (x| µ , σ ,0 ) = exp{ -exp[-(x- µ )/ σ ] }; (corresponding to ξ = 0) The form parameter ξ varies from minus infinity to plus infinity. Its sign determines the domain of definition of the rv: ξ > 0; x ≥ µ – σ / ξ ; Pareto; ξ < 0; x ≤ µ – σ / ξ ; Weibull; ξ = 0; |x| < ∞ ; Gumbel. The distribution (1) is called the
Generalized Extreme Value ( GEV ) distribution. 4 4
In order to formulate the second theorem, we introduce the right limit point x F of the distribution F(x) (which is infinity for unbounded rv) and the excess function F H (x) (defined as the conditional distribution over the threshold H ): x F = sup { x: F(x) < 1 } , (2) F H (x) = P { X – H < x X > H } , x ≥ (3) Gnedenko-Pickands-Balkema-Haan theorem (GPBH), (see e.g. [Bassi et al. 1998])
Let x , x ,…x n be iid rv with the continuous distribution function F(x) and excess function F H (x). We denote M n = max(x , x ,…x n ) the maximum of a sample of n such rv.. Suppose that there exist some normalizing constants c n , d n , such that the normalized maximum (M n - d n )/ c n weakly converges to a non-degenerate rv. Then, there exists a non-negative function s(H) such that lim sup | F H (x) –G(x | γ , s(H)) | = 0, (4) H ↑ xF ≤ x ≤ xF- H where G(x | γ , s) is the Generalized Pareto Distribution (GPD): G(x | γ , s) = 1 – (1 + γ x/s) - 1 / γ , 0 <| γ | < + ∞ ; s > 0 , (5) G(x |0, s) = 1 – exp(-x/s); (corresponding to γ = 0). The domain of definition of G is determined by the sign of γ : x ≥
0 for γ ≥ ≤ x ≤ - s / γ for γ < 0. (6) This second theorem implies that the limit of any excess function F H (x) , which satisfies the conditions of this theorem, is an universal distribution – the Generalized Pareto Distribution (GPD). These two limit theorems show that information on the distribution of extremes can be gathered in two ways: (1) by measuring the maximum of a sample whose size n goes to infinity, or (2) by recording the excess function F H (x) of that sample when increasing the threshold H to its upper limit x F . These two different ways are closely connected: the form parameters are identical γ = ξ , and as we shall show (see Corollary 1 and Corollary 3 below), if one observes a Poissonian flow of rv characterized with the GPD -distribution of exceedance, then the maximum of this Poissonian flow is distributed with the
GEV distribution. We shall use this duality for the estimation of the tail characteristics of the distribution of earthquake magnitudes for earthquake occurring over arbitrary time interval.
Definition of the maximum magnitude of an earthquake flow given by a Poisson process : M T is called the T-maximum of a Poissonian flow of observations X ,…, X ν , generated by the distribution function (DF) F(x), if M T = max(X ,…, X ν ) and the following properties are verified: ν is a random Poissonian rv with parameter λ T, which is independent of the X j ’s; λ is the intensity of the Poissonian flow. The DF of M T under the condition that one or more observations occur in the interval (0; T) is equal to Ψ T (x) = )exp(1 )exp()])(1[exp( T TxFT ! !! "" """" ≅ exp(- λ T[1-F(x)]) if λ T >>1 (Lomnitz formula). (7)
We state three Corollaries from the
FFT and
GPBH theorems.
Corollary 1.
Up to terms of order exp(- λ T), the T-maximum M T can have a GEV-distribution if and only if the DF F(x) is a GPD-distribution: F(x) = G H (x | ξ , s) ≡ ξ (x-H)/s] - 1 / ξ , x ≥ H, for some ξ , s and H. Corollary 2.
Let X be a rv distributed according to a GPD with DF G H (x | ξ , s) and K is some threshold K > H. Then, the conditional distribution of X under the condition X > K is the GPD written as G K (x | ξ , S) with S = s + ξ (K –H). (8) Corollary 3.
If the rv X has the GPD G H (x | ξ , s) and one observes the maximum M T (for λ Т >>1), then M T has (up to terms of order exp(- λ Т )) the GEV-distribution GEV(x | ζ ( Т ), µ ( Т ), σ ( Т )) with ζ ( Т ) ≡ ξ ; σ ( Т ) = s ⋅ ( λ T) ξ ; µ ( Т ) = H –(s/ ξ ) ⋅ [1 – ( λ T) ξ ]. (9) These corollaries are proved in the three Appendix A, B and C.
Definition of quantiles of the DF of magnitudes : Once the parameters H ˆ , s ˆ and ! ˆ have been estimated by one of the methods described below, we can determine any desired statistical characteristic of the tail distribution. One such characteristic, which is both useful and well-behaved, is the quantile Q τ (q) at a prescribed confidence level q : Q τ (q) = H ˆ – ( s ˆ / ! ˆ ) ⋅ [1 – ( )/1log( q !" ! ˆ ) ]. (10) In this section, we give the two algorithms based on the two main theorems of EVT that allows us to estimate the distribution of extreme values from a given data set. Choose an interval of values (T L ; T H ) for time interval durations T , for which the limit FFT and
GPBH theorems are (approximately) valid and, at the same time, the catalog still contains a sufficient number of T -intervals; Choose in this interval (T L ; T H ) a finite set of u time-interval durations T (T L ≤ T 3. Derive the estimates of the GEV parameters by the method of moments (found to be more efficient than the ML method for small sample sizes (Pisarenko et al., 2008) for each of the u time interval durations T , which yields the following set of parameters: ζ (T ), ζ (T ),…, ζ (T u ), σ (T ), σ (T ),…, σ (T u ), µ (T ), µ (T ),…, µ (T u ); ξ = ζ and the relations log( σ (T k )) = log(s) + ξ log( λ T k ), k=1,…,u, (11) H = µ (T k ) +(s/ ξ ) ⋅ [1 – ( λ T k ) ξ ], k=1,…,u, (12) following from equations (9), estimate the average values ! ˆ , s ˆ , H ˆ of the GPD parameters ξ , s, H by regressing log( σ (T k )) and H as a function of T k for k=1,…, u . 5. Using Lomnitz formula (7), estimate the DF of the maximum magnitude M τ of a Poissonian flow of main shocks over an arbitrary time interval τ : Ψ τ (x) = exp(- λτ [1 + ! ˆ (x – H ˆ )/ s ˆ ! ˆ/1 ] " ) . (13) Choose an interval (H L ; H H ) of possible thresholds H for which the limit FFT and GPBH theorems are (approximately) valid, and at the same time there is a sufficient number of observations over these thresholds; Choose in this interval (H L ; H H ) a finite set of r thresholds H (H L ≤ H 3. Derive the estimates of the GPD parameters by the ML (maximum likelihood) method for each of these r thresholds, which yields ξ (H ), ξ (H ),… ξ (H r ); s(H ), s(H ),… s(H r ); (14) 4. Using the relations s(H k ) = s(H ) + ξ (H k -H ), k=2,…,r, (15) following from equation (8), estimate the average values ! ˆ , ˆ s of the GPD parameters ξ , s(H ) by regressing s(H k ) as a function of H k . 5. Use Lomnitz formula (7) to get the estimated DF of the maximum magnitude M τ of a Poissonian flow of main shocks over an arbitrary time interval τ : Ψ τ (x) = exp(- λτ [1 + ! ˆ (x - H )/ ˆ s ! ˆ/1 ] " ). (16) Sections 2.2 and relations (8,9) describe a one-to-one relationship between the two limit distributions for maxima, GPD and GEV , for sufficiently high threshold H and large time intervals T . The two relations (8,9) are very important. They provide the estimates of the GEV -parameters for different time window sizes T using regressions based on equations (10,11) and of the GDP parameters for different thresholds H using a regression based on equations (14). The limit distribution of maxima can thus be obtained in two ways: through the GEV -approach (when we choose appropriate intervals T and estimate three parameters µ , σ , ζ = ξ ) or through the GPD -approach (when we choose sufficiently large thresholds H and estimate two parameters H, ξ ) , Which one is better? A priori, each method needs to set one arbitrary value: the time interval duration T in the GEV method and the threshold H in the GPD method. On the one hand, the GPD method requires the determination of only two parameters, compared with the three 7 7 parameters needed in the GEV method. On the other hand, the GEV method relies on T -maxima (magnitudes in time intervals of duration T ), which can include magnitudes smaller than the lower threshold H used in the GPD method. This can reveal more information from the catalog. A contradicting argument is that, if two or more magnitudes larger than H are observed in a given T -interval, the GEV method keeps only the largest one while the GDP method will use all of them. It should be remembered that increasing H decreases the Poisson intensity λ of the flow of events with magnitudes exceeding H , thus lowering the product λ T, which is dangerous for reliable statistical estimations with the GDP method. On the other hand, increasing T so as to get closer to the asymptotic conditions of application of the FFT theorem decreases the number of T -intervals inversely proportionally, leading to poorer statistical estimations with the GEV method. Our practical experience with the Harvard global catalog of seismic moments over the period 1977-2006 and with the Fennoscandia region which is presented below shows that the two methods give basically the same efficiency. We thus recommend using both methods simultaneously to check their mutual consistency. In order to implement them, both GEV and GPD methods can be improved by bootstrap-like procedures that we now describe. In the GEV -approach, this can be done by using the following property of Poissonian flows: if the number of Poissonian events occurring in the interval (0, T) is fixed , then these times are distributed as uniform iid rv on this interval. As a consequence, starting from the initial catalog, we can reshuffle the occurrence times of all the main shock earthquakes in the catalog n s times and thus obtain n s catalog replicas, which should have the same statistical properties. Then, applying step 3 of section 2.3.1 to each of these n s catalog replicas, we can average over the n s resulting estimates of the parameters. However, it would be an illusion to believe that making n s go to infinity would result in making the scatter of the average vanish, as it would happen for independent samples by the law of large numbers. Because general bootstrap procedures are performed using the fixed initial sample , the scatter does not decrease anymore when n s becomes of the order of ≅ ÷ , due to the dependences between the bootstrap replicas. Our numerical experiments on samples with known values of the parameters show that it is sufficient to take n s ≅ ÷ , which results in a reduction of the standard deviation of the estimates by to This method was advocated and used in (Pisarenko et al., 2008). Similarly for the GPD -approach, the “classical" bootstrap method can be applied before implementing step 3 (estimation of parameters) of section 2.3.2. The bootstrap proceeds as follows. Starting with a fixed sample of Peaks Over Threshold values, say, magnitudes exceeding e.g. ,M ,...,M r ; M j >=6.4, j=1,...r}, we pick up randomly r times one value from this set, i.e., we allow the choice of the same value several times, corresponding to the so-called “sampling with replacement”. Such r successive picks of one value from the set {M ,M ,...,M r ; M j >=6.4, j=1,...r} defines one bootstrap sample. By construction, this bootstrap sample has a sample size of r values. Repeating this operation n b times provides n b bootstrap samples, each of them with a sample size r values. For each of the bootstrap sample, we get the statistical estimates of the parameters of the GPD. We then average over these n b statistical estimates and evaluate the corresponding std as well. This bootstrap procedure provides any desired number n b of bootstrap samples of observations exceeding a given threshold H , and one can average the resulting estimates over the n b samples. This procedure slightly decreases the scatter of the estimates. But of course, making n b tend to infinity does not result in decreasing the scatter to zero (as it would be for independent samples), since all bootstrap procedures are performed with the fixed initial sample . This situation is quite similar to the general bootstrap situation. As in the GEV approach, our numerical experiments on samples with known results 8 8 show that it is enough to take approximately n b ≅ ÷ to exhaust the gains of this bootstrapping. Further increasing n b does not lead to any more decrease of the scatter of the estimates. max versus quantiles Q τ (q) In the spirit of (Pisarenko et al., 2008), we would like to stress the instability of the traditional parameter M max . It follows from eq.(1) that, for negative form parameters ξ of the GPD distribution, the maximum magnitude is given by M max = H - s / ξ . Thus, if ξ is close to zero (say, ξ ∈ ( - 0.2; - 0.1) which corresponds to the range of values found for the global Harvard catalog), then the sample estimates of M max can exhibit large spurious bursts due to random errors occurring in the estimation of the form parameter ξ . These spurious outliers are clearly visible in synthetic tests using a perfect sample distributed according to a GPD distribution with a slightly negative form parameter ξ . For instance, in a simulation of our estimation procedure on synthetic realizations with true GPD -parameters equal to ξ = - 0.1; s =0.5; H= 4.5; M max = 9.5 , we find that one among the realizations gives the fantastic value of M max =17.7! On these synthetic realizations, the quantile estimates for M max = are (8.25; 9.65; 12.15). In contrast, for the quantile Q (0.97) , the estimates are (7.03; 7.22; 7.42), which should be compared with the true quantile is . The stability of the estimation of the quantiles Q τ (q) can be understood from the following formula Q τ (q) = µ – ( σ / ξ ) ⋅ [ ( )/1log( q !" ) ξ ] , showing that Q τ (q) converges to a finite value (under the condition that q < 1 ) as ξ goes to zero, namely: Q τ (q) → µ + σ⋅ log( )/1log( q !" ), ξ → In contrast, M max = H - s / ξ diverges as ξ goes to zero. This is the source of the instability of the estimation of M max compared with the relative stability of the estimation of Q τ (q) for samples characterized by negative ξ values close to zero. This reasoning explains why Q τ (q) should be intrinsically more stable than M max for small ξ . Of course, if q → , then Q τ (q) → M max , but this convergence occurs extremely slowly, as the logarithm of the logarithm of the difference Besides, going from the parameter M max to the q- quantile Q τ (q) does not give up any useful information, since one can always use Q τ (q) for sufficiently large τ and for q close enough to unity, for which Q τ (q) becomes arbitrarily close to M max . 3. Application of the method to global Harvard catalog 1977-2006. 3.1 Preliminary steps m i and occurrence time t i . We assume that, for each i -th earthquake, its magnitude m i is drawn from an unknown function F(m |H) = F(m) describing the probability density distribution of magnitudes exceeding some lower threshold H . For our analysis, we shall also assume the Poissonian property of the occurrence times t i of the earthquakes. This assumption is evidently in contradiction with the existence of aftershocks, seismic swarms and other clustered seismic events. We thus restrict our analysis to sub-catalogs obtained in the following way, which constitute better approximations of realizations of Poisson processes. In a first step, we decluster our catalogs using the standard Knopoff-Kagan algorithm [Knopoff et al. 1982]. The algorithm works as follows. We start by identifying the largest event in a catalog, whose time, location and magnitude are denoted as (time t , location ( φ , λ ) in longitude-latitude , magnitude m W ). Then, we exclude all events in the time-space window: (t; t +10 -0.31+0.46 m W ); R( φ , λ ; φ , λ ) ≤ -0.85+0.46 m W ; (17) where R is the distance in km between points ( φ , λ ) and ( φ , λ ). The window (17) was taken from ( Knopoff et al.1982). After the first elimination, we identify the next largest event of the remaining earthquakes (excluding the previous one already accounted for). And we apply the same pruning with the same rule for the space-time window associated with this second largest event among the remaining earthquakes. We iterate until the algorithm stops. This leads to a sequence of events (m i , t i ), i = where t i is close to a Poissonian flow, whose intensity is denoted as λ (H) = λ . Pisarenko et al. (2008) have quantified the strong gain in declustering obtained by this method, and have documented quantitatively how the remaining events approximate a Poissonian flow. Then, we calculate the T- maxima of the magnitudes (magnitude maxima in successive time intervals of length T ). This operation transforms a point process of events occurring at random times into a discrete time random process of T -maxima, which is much more convenient for statistical analysis. Besides, this operation improves further the declustering of the catalog, since clusters are usually formed by relatively weak events that are mostly eliminated by keeping only the T -maxima in each successive time intervals of length T . Our analysis restricts T to take sufficiently large values in order to avoid (with some high probability) the occurrence of empty T -intervals, with no value for the maximum magnitude. This restriction is equivalent to the condition λ T>>1 . We first study the Harvard catalog, from 01.01.1977 to 16.06.2006, of seismic moments M (dyne-cm) transformed into magnitudes m W by the formula m W = [log (M) - 16.1]. The next section 4 studies the Fennoscandia catalog covering the time period 01.01.1900 – 31.12.2005. For the Harvard catalog, we restrict our analysis to earthquake of depth smaller than 70 km and of magnitudes m W larger than the lower threshold 5, corresponding to 8102 events. After the application of the Knopoff-Kagan algorithm [Knopoff et al. 1982] described above, we are left with 4193 so-called main shock events suggesting that, according to this declustering method, aftershocks and clustered events constituted about 49.4% of the total set of earthquakes. The complementary cumulative distribution of the magnitudes of the main shocks is shown in Fig.1. One can observe that the graph is very close to a straight line up to m W =7.7 (seismic moment M=4.5 ⋅ ergs ), in agreement with the standard Gutenberg-Richter law, while a 10 10 significant downward bend occurs beyond that value, which contains the 39 largest events. As discussed previously, e.g. by Pisarenko and Sornette (2003, 2004), this small number of events limits the detailed characterization of the departure from the Gutenberg-Richter law. (n = 4193) We now apply the method described in section 2.3.1 to estimate the parameters of the GEV -distribution of the T -maxima. We use the method of statistical moments, since this method is slightly more efficient for moderate sample sizes than the maximum likelihood method, as shown previously in (Pisarenko et al. 2008). We calculated these estimates for T -intervals in the range (50; 250) days. The numbers N T of different T -intervals in the catalog and the corresponding products λ T are the following: T 50 75 100 125 150 175 200 225 250 N T 214 143 107 85 71 61 53 47 42 (18) λ T 19.5 29.3 39.1 48.9 58.6 68.4 78.2 87.9 97.7 The Poisson intensity λ is determined as n/number of days = . All T -intervals are non-empty. For T- values exceeding 150 days, the number N T of different T -intervals in the catalog is too small for a reliable estimation of the three GEV -parameters. Figs.2-4 show the moment-estimates of the GEV -parameters for the declustered Harvard catalog. Fig.2 shows an approximate “stabilization” of the ξ -estimates in the range ≤ T ≤ . The scale and location estimates of σ (T) and µ (T) do not provide any useful restriction on the usable values of T , as they behave as prescribed by equation (9). In order to choose an appropriate interval for the T- values, we consider the Kolmogorov distances between the sample distribution of the T -maxima and the fitted GEV -distribution function. The Kolmogorov distance KD is defined as follows: KD = ! n T 1/ 2 max | ! ˆ F T (x) – Φ T (x| ! ) µ , ! ) " , ! ˆ " ) |, (19) where ( ! ) µ , ! ) " , ! ˆ " ) are the ML -estimates of the GEV -parameters, ! ˆ F T (x) is the sample cumulative distribution of the T -maxima. Since we use a theoretical function Φ T with parameters fitted on the data, we cannot use the standard Kolmogorov distribution to check the statistical significance of the sample value of KD . Instead, in order to determine the confidence level of a given KD -distance obtained for a given sample, one has to use a numerically simulated distribution of KD -distances in the simulation procedure using random GEV samples with the fitted parameters ( ! ) µ , ! ) " , ! ˆ " ). The artificial GEV samples simulating our real sample were taken with the following parameters T = 80 days; σ ( Т ) = s ⋅ ( λ T) ξ = 0.84 (31.3) -0.185 = 0.45; (20) µ ( Т ) = H –(s/ ξ ) ⋅ [1 – ( λ T) ξ ] = 4.98 +(0.84/0.185) ⋅ [1- (31.3) -0.185 ] = 7.12; ζ = ξ = -0.185; H=4.98; s=0.85 . These values are close to our estimates for the declustered Harvard catalog. We simulated our estimation procedure times and obtained numerically the probability p(z) = P{ KD ≥ z | ξ = -0.185; s = 0.85; h = 4.98}. The obtained probabilities for different values of the argument z are p(0.55) = 0.53; p(0.6) = 0.30; p(0.65) = 0.17; p(0.7) = 0.073; p(0.75) = 0.027; (21) 11 11 We can thus consider a value KD ≤ as acceptable, while KD ≥ is taken as contradicting the GEV-distribution. Fig.5 shows that we should therefore exclude from our estimation procedure T -intervals with T < 50 days. Combining the evidence for an approximate “stabilization” of the ξ -estimates in the range ≤ T ≤ and the criterion provided by the KD statistics, we decide to choose the range ≤ T ≤ for the T -intervals, in discrete values T = 75; 100; 125; 150 , to apply equations (8,9) for the estimates of ξ , σ , µ . We used shuffled samples to decrease the scatter of these regression estimates. As final estimates, we suggest to take the medians of the corresponding estimates. We show below these final estimates together with the 16%- and 84%-quantiles, characterizing the bootstrap scatter: quantile level median (50%) ξ -0.221 -0.185 -0.152 (22) s H 4.669 M max Q (0.97) 8.96 As we shall see below, the estimates of the form parameter ξ and of the tail variables M max and Q (0.97) are compatible with those obtained below in section 3.3 using the GPD method. However, we will see that the estimates of the other two parameters σ ,µ differ rather significantly between the two methods, and we provide the explanation of this discrepancy. In order to estimate the Mean Square Error ( MSE ) of these estimates, we implemented our whole estimation procedure on N=500 artificially generated GEV samples with the following parameters: ξ = -0.185; σ = 0.45; µ = 7.12; n = 4193; λ =0.3908; T = 80 days; N s =100. We obtained the following results: MSE( ξ ) = 0.047; MSE( σ ) = 0.145; MSE(M max ) = 0.68; MSE( Q (0.97) ) = 0.23 (23) It should be remarked that in all cases the bias was much smaller than the Std for all the parameters. We stress once more than the scatter of M max is much larger than that of Q (0.97). The values reported in (23) can be taken as estimates of the real scatter . This “real” scatter is different from and larger than that obtained with the bootstrap procedure. This is not surprising since the latter gives only the scatter conditional to the same unique data sample. Thus, our final results for the estimation of the GEV parameters by the GEV method can be summarized as: ! ˆ = -0.185 ± (24) s ˆ = 0.847 ± H ˆ = 4.982 ± max ˆ M = 9.58 ± )97.0(ˆ Q =9.06 ± (n = 4193) We now apply the method described in section 2.3.2 to estimate the parameters of the GPD -distribution of the earthquake magnitudes in the declustered Harvard catalog above thresholds H chosen in the interval (6.0; 7.6). The set of thresholds and corresponding numbers n H of exceedances were the following: 12 12 H 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 n H (25) Thresholds larger than are hardly admissible, since the number of exceedances becomes too small. Figs. 6 and 7 plot the estimates of the parameters ξ , and s obtained by the maximum likelihood method with N b = 100 bootstrap samples. Equation (15) predicts that, if the form parameter ξ is negative, the s -estimates should decrease with the threshold H . However, one can observe in Fig. 7 that the negative ξ - estimates shown in Fig.6 are associated with decreasing s- estimates only for m W ≥ One can thus draw the conclusion that threshold magnitudes smaller than are too small for a proper application of the limit GBPH theorem. For our following analysis, we thus impose the restriction that all thresholds will be taken larger than . Fig.6 exhibits a stabilization (approximate constant value) of the estimates of the form parameter ξ (although the scatter for h > 7.2 become very large), in the interval 6.6 ≤ H ≤ This stabilization suggests that the interval 6.6 ≤ H ≤ for the estimation of the GPD parameters is appropriate. The Kolmogorov distances between the sample excess functions and the fitted GPD -distribution function provides a systematic approach to determine the appropriate interval for the thresholds. In the present application, the Kolmogorov distance KD is defined as follows: KD = ! n H 1/ 2 max | ! ˆ F H (x) – G H (x | ! ˆ , s ˆ )|, (26) where ( ! ˆ , s ˆ ) are the ML -estimates of the GPD -parameters, and ! ˆ F H (x) is the sample stepwise excess function. Since we use a theoretical function G H with parameters fitted on the data, we cannot use the standard Kolmogorov distribution to check the sample value of the KD . In order to determine the confidence level of a given KD -distance obtained for a given sample, one has to use a numerically simulated distribution of KD -distances in the simulation procedure using random GPD samples with the same fitted parameters ( ! ˆ , s ˆ ) . Fig.8 shows the KD -distances for thresholds H in the interval 6.4 ≤ H ≤ Artificial GPD samples simulating our real sample were taken with parameters ξ = -0.2; s =0.53 close to the values obtained from the empirical data. We simulated our estimation procedure times and estimated numerically the probability p(z) = P{ KD ≥ z | ξ = -0.2; s = 0.53} . We obtained the following values: p(0.9) = 0.52; p(1.0) = 0.27; p(1.1) = 0.08; p(1.2) = 0.04. (27) This leads us to consider KD ≤ as acceptable, while KD ≥ as contradicting the hypothesis that the data is generated by the GPD -distribution. We thus exclude values of thresholds H smaller than . Combining these different conditions into the regressions used to derive the estimates of ξ , and s = s(h ) (with equation (15)), we decided to restrict the range of adequate thresholds to the interval ≤ H ≤ , sampled with the following discrete values: h = 6.6; 6.8; 7.0; 7.2. We used 100 bootstrap samples so as to decrease the scatter of these regression estimates. As final estimates, we suggest to consider the medians of these 100 estimates. The later are shown below together with the 16%- and 84%-quantiles used to characterize the bootstrap scatter: quantile level median (50%) ξ -0.350 -0.204 -0.148 (28) s 0.481 13 13 M max Q (0.97) 9.196 Note that the distribution of ξ - estimates is rather asymmetrical, and that the scatter of M max is significantly larger than that of Q (0.97). In order to estimate the Mean Square Error ( MSE ) of these estimates given by MSE = Bias +Std , we simulated N=500 times our full estimation procedure for artificially generated GPD samples with parameters: ξ = -0.2; s = 0.53; H= 6.6; n H = 293; N b =100 and obtained the following results: MSE( ξ ) = 0.049; MSE(s) = 0.039; MSE(M max ) = 0.50; MSE(Q (0.97)) = 0.20 (29) In all cases, the bias was found much smaller than the Std for all parameter values. The values reported in (29) can be taken as reasonable estimates of the real scatter (in contrast to the conditional scatter obtained from the bootstrap procedure). Thus, our final results on the estimation of the GPD parameters by the GPD method are ! ˆ = -0.204 ± s ˆ = 0.529 ± max ˆ M = 9.53 ± (30) )97.0(ˆ Q =9.21 ± Comparison between the results of the GEV and GPD methods We now discuss the consistency of the estimations of the form parameter ξ , of the scale parameter s and of the tail parameters M max and Q (0.97) , obtained respectively by the GEV and GPD methods. Let us first address the large apparent discrepancy between the scale parameter estimate in eq.(22) obtained by the GEV and the corresponding estimate in eq.(28) obtained by the GPD method. It turns out that there is a simple explanation for this discrepancy, which has to do with (i) the fact that the parameters obtained with the GEV method imply a lower threshold which is different from the threshold used in the GPD method and (ii) the dependence of the scale factor on the value of the used threshold. To show this, let us recall that the correspondence stated in the Corollaries 1 and 3 of section 2.2 between the GEV and GPD distributions asserts that exp{ -[1+ ξ (x- µ )/ σ ] -1/ ξ } = exp(- λ T[1-F(x)]), x ≤ µ – σ / ξ ; ( λ T >>1), (31) where F(x) is the DF of the observed magnitudes. It follows from eq.(31) that, up to terms of order exp(- λ T) , we have F(x) = 1 - T ! [1 + ! " (x - µ )] -1/ ξ , (32) which shows that the DF F(x) has the form of a GPD distribution. Since F(x) is non-negative, we have to restrict the domain of definition of F(x) from below, that is, F(x) approximated by (32) is found non-negative for µ + !" [ ( λ T) - ξ - 1] ≤ x . Since the upper bound x ≤ µ – σ / ξ is also imposed from (31), this implies that, up to terms of order exp(- λ T) , the DF F(x) varies from zero to approximately unity over the interval 14 14 µ + !" [ ( λ T) - ξ - 1] ≤ x ≤ µ – σ / ξ (33) However, there is no reason for the left boundary in (33) to be equal to the threshold H used in the GPD method. Indeed, if we accept for ξ , σ , µ the values given by equation (20), we get the lower boundary of the interval defined in (33) equal to µ + !" [ ( λ T) - ξ - 1] ≅ This is very different from the lower threshold H = 6.6 used in the GPD method. But Corollary 2 shows that the GPD -distribution has its scale parameter changed under a shifting threshold according to the formula S = s + ξ (H –K). If we take H - K = 6.6 – 5.0 = 1.6; ξ =-0.185; s =0.847 (see eq.(24)), then we get S = 0.847 – 0.185 ⋅ which is close to the s -estimate in eq.(28) obtained by the GPD method. We can thus conclude that the values (eq.(22) obtained by the GEV ) and (eq.(28) obtained by the GPD method) are actually compatible, when adjusted to the same effective threshold H . Concerning the form parameter ξ , both methods give very similar results ! ˆ = -0.185 ± (eq.(24) for the GEV method) and ! ˆ = -0.204 ± (eq.(30) for the GPD method), with basically the same scatter. The estimates of max ˆ M = 9.58 ± for the GEV method and max ˆ M = 9.53 ± for the GPD method are highly consistent; similarly for the quantile )97.0(ˆ Q =9.06 ± for the GEV method and )97.0(ˆ Q =9.21 ± for the GPD method. As the two methods exploit the data in complementary ways and appear to be similarly efficient, we recommend using both of them simultaneously to check their consistency on unknown data sets. These results as well as additional synthetic tests confirm that our methods provide efficient ways to recover the correct parameters. However, while there is no significant bias, the uncertainty on max ˆ M is about two to three times larger than that on )97.0(ˆ Q . In addition, the distribution of max ˆ M exhibits a fat tail towards large values, making its determination uncertain. 4. Estimation of GEV and GPD parameters for discrete magnitudes. Application to Fennoscandia (01.01.1900 – 31.12.2005) 4.1 Preliminary considerations In many regions, magnitude catalogs report magnitudes with a discrete scale. This is mainly due to human truncation biases associated with the use of Intensity scales converted to magnitude scales. We thus investigate how to adapt our methods to this situation. As a representative example, we consider the catalog of the Fennoscandia [Ahjes and Uski, 1992; Uski and Pelkonen, 2006] covering the time period 01.01.1900 – 31.12.2005 and the space domain restricted by polygons with the coordinates: Latitude Longitude -5.0 15 15 A total number of events with magnitudes -0.7 ≤ m ≤ depth ≤ H ≤ 97 km occurred in this space-time interval . The event with maximum magnitude occurred on with coordinates λ = 66.4; φ = 14.4; depth unknown (Norway). The seismic flow exhibits some variability. Fig.9 shows the yearly average number of shocks in moving windows of years duration for four lower thresholds: m = -0.7 (n = 7298 events); m = 3.0 (n = 964 events); m = 3.5 (n = 450 events); m = 4.0 (n = 197 events). For the lowest threshold m = -0.7 , one can observe a positive trend taking off around 1940-2005, contrasted by a negative trend before 1940. For higher thresholds (m = 3.0; 3.5; 4.0) , the positive and negative oscillations are comparable, and no clear tendency prevails on the whole time interval 1900-2005. Thus, we can conclude that 1. On the time interval 1900-2005, the seismic flow m ≥ can be considered as approximately stationary; 2. A more cautionary approach to ensure a better stationarity of the seismic flow would consists in taking the lower threshold m = 3.5 , although the corresponding sample size becomes rather low (n=450). Since we are interested in the analysis of main shocks, we have applied the algorithm developed by Knopoff-Kagan to remove a significant part of the aftershocks from the catalog. We obtained the following numbers of main shocks: m = -0.7 main shocks; intensity λ =0.177 events/day = events/year m = 3.0 main shocks; intensity λ =0.0240 events/day = events/year m = 3.5 main shocks; intensity λ =0.0112 events/day = events/year m = 4.0 main shocks; intensity λ =0.0049 events/day = events/year The intensity of the seismic flow restricted to the set of main shocks for these 4 thresholds is paralleling very closely that shown in fig. 9 for all earthquakes. It should be noted that the percentage of aftershocks is very low for Fennoscandia, about 6%, compared to other seismic zones, where it reaches 50% and more. Fig.10 shows the histogram of the main shock magnitudes. The discreteness of the reported magnitudes is clearly visible. The histogram becomes smoother and regularly decreasing for m ≥ . There is some slight evidence of a preeminence of half-integer magnitude values, but the effect is not large. This leads to conclude that we can apply our methods to the main shocks in the magnitude range m ≥ , which ensures an approximate stationary seismic flow and a reasonably smooth distribution. Fig. 11 shows the sample tail of the main shock magnitudes ( ). For m ≥ , an approximate linear dependence can be observed, followed beyond m = 4.0 by a somewhat steeper slope, which is indicative of a faster decay of the magnitude PDF in this range. We now apply the GEV and GPD methods to this sample of main shocks with magnitudes m ≥ , n = 928 . The magnitudes in the catalog are quantized with 0.1 units. First of all, let us show that this quantization does not affect significantly the resulting estimates. We take just one illustrative example, namely synthetic samples generated with a GPD -distribution with parameters close to the estimates determined below for the real catalog: ξ = - 0.275; s = 0.776. For a lower threshold H = 2.5, we took a sample size n = 2200 (which is close to the value for the Fennoscandia catalog). We use the GPD method of estimation with N b bootstraps. We quantized our sample of magnitudes in step dm = 0.1; 0.15; 0.2; 0.25 and then compared the estimates obtained with the continuous and discrete samples. We found that, for dm = 0.1 and even perhaps dm = , the mean square difference of the estimates for discretely and continuously sampled magnitudes is negligible compared with the std of the estimates. For dm = 0.2 and larger, the effect of quantization should be taken into account. These tests are summarized in the following table. Denoting the Mean Square Difference of the estimates 16 16 obtained with the continuous and discrete magnitude sampling by MSD( ξ ) and MSD(s) , we have: dm 0.10 0.15 0.20 0.25 MSD( ξ )/std( ξ ) Our conclusion about neglecting the quantization effect for dm = 0.1 , which corresponds to the case of the Fennoscandia catalog, can be considered as justified. For the sake of completeness, let us briefly point out the modifications that would be needed for data with larger magnitude steps. One can indeed account exactly for discrete magnitudes by using the discrete analog of the Likelihood L d (for the GPD approach): L d = (n ,…, n r | H, s, s ξ ) = ! = rk P k (n k | H, s, ξ ). Here n k is the number of occurrences of a given discrete magnitude m k , and P k are the probabilities calculated from the GPD- distribution: P k (n k | H, s, ξ ) = G H (m+k Δ m |s, ξ ) – G H (m+(k-1) Δ m |s, ξ ), k=1,2,…,r. The discrete likelihood L d must be maximized over the parameters (s, ξ ) by numerical methods. For the GEV approach, we would recommend to use the so-called Sheppard’s corrections (see (Cramer 1940) ) to statistical moments calculated from discrete data. We calculated the estimates of the GEV distribution for T -intervals in the range (100; days. The numbers N T of different T -intervals in the catalog and the corresponding products λ T are the following: T 100 300 500 700 900 1100 1300 1500 N T 387 129 77 55 43 35 29 25 (34) λ T 2.40 7.19 12.0 16.8 21.6 26.4 31.2 36.0 The Poisson intensity λ was determined as n/number of days = . All T -intervals larger than T = 300 are found non-empty. T- values exceeding days should not be used in the estimation procedure, as the number of different T -intervals in the catalog becomes too small for a reliable estimation of the three GEV -parameters. Figs. 12-14 show the moment-estimates of the three GEV -parameters. Fig.12 exhibits an approximate plateau for the dependence of the ξ -estimates as a function of T , in the range ≤ T ≤ . The estimates of σ (T) and µ (T) shown respectively in Figs. 13 and 14 do not impose any restriction on the values of T , since they behave as prescribed by equations (11,12). The calculations of the KD -distances defined by expression (19) following the method of section 3.2 show that we should exclude windows with T ≤ days. For the calculations of the KD -distances, we used artificial GEV samples simulating our real data with parameters T = 400 days; σ ( Т ) = s ⋅ ( λ T) ξ = 0.36; µ ( Т ) = H –(s/ ξ ) ⋅ [1 – ( λ T) ξ ] =4.05; ζ = ξ = -0.275; H=2.93; s=0.67. We simulated our estimation procedure times and estimated numerically the probability p(z) = P{ KD ≥ z | ξ = -0.275; s = 0.67; H = 2.93}. We obtained p(0.55)= 0.52; p(0.6) = 0.27; p(0.65) = 0.15; p(0.7) = 0.05; p(0.75) = 0.02 , showing that values KD ≤ are 17 17 acceptable, while values of KD ≥ reject at the confidence level the hypothesis that the data could be generated with the GEV -distribution. Combining these different constraints, we decided to keep for the analysis and for the regression estimates of ξ , s, H the T -intervals with T in the range ≤ T ≤ , sampled with the following discrete values: T = 400; 500; 600; 700; 800. We used shuffled samples in order to decrease the scatter of these regression estimates. As final estimates, we suggest to take the medians of the corresponding estimates. We show below these final estimates together with the and quantiles, which are characterizing the scatter of the bootstrap procedure: quantile level median (50%) ζ -0.315 -0.262 -0.213 (35) s 0.631 H 2.70 M max Q (0.97) 5.36 In order to evaluate the Mean Square Error ( MSE ) of these estimates, we simulated N=500 times our whole estimation procedure on artificially generated GEV samples with parameters ζ = -0.275; σ = 0.36; µ = 4.05; n = 928; λ =0.024; T = 400 days; N s =100 and obtained the following results: MSE( ζ ) = 0.0434; MSE(s) = 0.0769; MSE(M max ) =0.211; MSE( Q (0.97) ) = 0.091 (36) The biases are much smaller than the Std for all parameters. We stress once more than the scatter of M max is much larger than that of Q (0.97). The MSE (36) can be taken as estimates of the real scatter (in contrast to the conditional scatter of the bootstrap). Thus, our final results on the estimation of the parameters by the GEV method can be summarized as: ! ˆ = -0.262 ± (37) s ˆ = 0.763 ± H ˆ = 2.94 ± max ˆ M = 5.86 ± (0.97) = 5.41 ± The ML -estimates of the parameters ξ , s, M max are shown in Fig.15-17 as functions of the lower magnitude threshold H. We see in Fig.16 that the s -estimates decrease as it is requested from eq.(9) only for H > 3.0 . Thus, thresholds H ≤ are excluded from our analysis. An approximate plateau for the ξ -estimates is observed in the range ≤ H ≤ The numbers of exceedences n h decreases from for H=3.0 to for H=4.0 , which is sufficient for our analysis over the whole range ≤ H ≤ We also calculated the KD -distances to help constrain further the range of admissible thresholds H . Artificial GPD samples simulating our real sample were taken with parameters ξ = -0.275; s =0.67 , close to the estimates using the real data. We simulated our whole estimation procedure times and evaluated numerically the probability p(z) = P{ KD ≥ z | ξ = -0.275; s = 0.67}. We obtained p(0.9) = 0.58; p(1.0) = 0.31; p(1.1) = 0.15; p(1.2) = 0.07 , showing that KD ≤ can be considered acceptable, while KD ≥ is rejecting the hypothesis that the data is generated with the GPD -distribution. This leads to exclude threshold values less than . Combining all above mentioned restrictions, we decided to choose the thresholds in the interval ≤ H ≤ , sampled with the following discrete 18 18 values h =3.05 ; 3.15; 3.25; 3.35; 3.45; 3.55; 3.65; 3.75. We used bootstrap samples to decrease the scatter of the regression estimates ξ , s = s(h ) via equation (15). Following our above analyses, we use the median of the 100 estimates as the final estimates of the parameters and represent their bootstrap scatter with the 16%- and 84%-quantiles: quantile level median (50%) ξ -0.312 -0.278 -0.245 (38) s 0.632 M max Q (0.97) 5.35 The distribution of the ξ - estimates is asymmetrical, and the scatter of M max is again larger than that of Q (0.97). As in the previous calculations, in order to evaluate the Mean Square Error ( MSE ) of these estimates, we simulated N=500 times our whole estimation procedure on artificially generated GPD samples with parameters ξ = -0.275; s = 0.67; H= 3.05; n H = 928; N b =100. We obtained the following results: MSE( ξ ) = 0.0254; MSE(s) = 0.0294; MSE(M max ) = 0.165; MSE( Q (0.97) ) = 0.073 (39) As in previous examples, the biases are much smaller than the Std for all the parameters. The MSE given by (39) provide reasonable estimates of the real scatter. Thus, our final results for the estimation of the parameters by the GPD method can be summarized by the following numbers: ! ˆ = -0.275 ± s ˆ = 0.669 ± (40) max ˆ M = 5.76 ± )97.0(ˆ Q =5.44 ± These estimates are completely consistent with those obtained with the GEV method. This can be checked directly for the form parameter ξ and for the tail parameters M max , Q (0.97) . The estimates of the parameters s,H differ, but this discrepancy results only from using different thresholds, as explained for the Harvard catalog in section 3.4. For the Fennoscandia catalog, the reasoning is the same and the results are parallel to those obtained for the Harvard catalog. We do not repeat the detailed calculation. 5. Conclusion We have applied a new approach combining the two main theorems of EVT to estimate the parameters quantifying the tails of the distribution of large earthquakes in the global Harvard catalog and in the local Fennoscandia catalog. We have developed several procedures and check points to decrease the scatter of the estimates and to verify their consistency. The results are satisfactory and can be used as reasonable estimates both for scientific applications and for risk assessment. We expect that this methodology can be fruitfully applied to many other catalogs, by providing both checks of the quality and reliability of the catalogs and useful estimates of the large seismic risks in terms of both maximum possible magnitudes and quantiles. The estimation of characteristics of the tail of the earthquake magnitude distribution beyond the range of magnitudes available in the historical record, i.e. for a probability level q > 1- 1/n where n is sample size, is only possible if some additional assumptions about the distribution function are imposed. Sometimes, such assumptions can be made on physical grounds. In the present paper, such additional assumptions have been formulated on the basis of general limit theorems for maximum values of iid sequence of random values. Of course, there is no a-priori guarantee 19 19 that such assumptions will be fulfilled in a real situation. In our case, these assumptions boil down to assuming a regular behavior of the tail in the vicinity of the right limit point of the distribution. The fact that there is no limit theorem without such regular behavior serves as a certain justification of such an assumption. But strictly speaking, it is not verifiable in practice. Extreme Value Theory offers a methodology to extrapolate outside the range of the available data. The question of whether the conditions of EVT are satisfied in a given real problem should be solved in each specific case. As Richard Smith [Smith 1990] said: “But what EVT is doing is making the best use of whatever data you have about extreme phenomena”. 20 20 Appendix. Proofs of the three Corollaries. Corollary 1. Let F H (x) be the GPD -distribution F H (x) = G H (x | ξ , s) = 1 – (1 + ξ (x-H)/s) - 1 / ξ , x ≥ H (A1) In accordance with Lomnitz formula (7), up to terms of order exp(- λ T) , the distribution function of the T -maxima M T is given by Ψ T (x) = exp(- λ T ⋅ (1 + ξ (x-H)/s) - 1 / ξ ) if λ T >>1. (A2) If we set σ = σ (T) = s ⋅ ( λ T); µ = µ (T,H) = H – (s/ ξ )[1- ( λ T) ξ ] (A3) then we can rewrite (A2) in the form of a GEV -distribution Ψ T (x) = exp{- [1 + ξ (x- µ )/ σ ] - 1 / ξ }. (A4) It should be noted that, in equation (A1), F H (x) is defined only for x ≥ H, while Ψ T (x) does not vanish at x = H , since: Ψ T (H) = exp(- λ T). But according to the condition of Corollary 1, we can neglect terms of order exp(- λ T). Therefore, one can complement the domain of definition of Ψ T (H) for x < H (as it is required for the GEV -distribution), since Ψ T (x) remains smaller than exp(- λ T) for x < H. Inversely, if we assume that M T have a GEV -distribution, then using the transformation law (A3) for the parameters, we have s = σ / ⋅ ( λ T); H = µ + (s/ ξ )[1- ( λ T) ξ ] , (A5) and we get the distribution function of M T in the form (A1), from which it follows that F H (x) is a GPD -distribution. Corollary 2. Let X be distributed according to the GPD -distribution: F H (x) = ξ (x-H)/s) - 1 / ξ , x ≥ H . (A6) Then, for any K > H the conditional distribution of X under the condition X > K is F K (x) = { F H (x) – F H (K)} /{ 1 - F h (K)}, x ≥ K. (A7) Putting (A6) into (A7), we get F K (x) = 1 – (1 + ξ (x-K)/S) - 1 / ξ , x ≥ K , (A8) where S = s + ξ (K-H). (A9) Corollary 3 . This corollary follows from the proof of Corollary 1 above. 21 21 References. Ahjes, T. and Uski M. (1992) Earthquakes in Northern Europe in 1375-1989, Tectonophysics, vol.207, pp.1-23. Bassi, F., Embrechts, P. and M.Kafetzaki (1998) Risk Management and Quantile Estimation, in A Practical Guide to Heavy Tails, R.Adler, R.Feldman, M.Taqqu (Eds), Birkhauser, 1998, pp.111-130. Bender, B.K. and D.M. Perkins (1993) Treatment of parameter uncertainty and variability for a single seismic hazard map, Earthquake Spectra 9 (2), 165-195. Bird, P., and Y. Y. Kagan (2004) Plate-tectonic analysis of shallow seismicity: Apparent boundary width, beta, corner magnitude, coupled lithosphere thickness, and coupling in seven tectonic settings, Bull. Seismol. Soc. Am., 94 (6), 2380-2399. Black, N., Jackson, D. and Rockwell, T. (2004) Maximum Magnitude in Relation to Mapped Fault Length and Fault Rupture, American Geophysical Union, Fall Meeting 2004, abstract Bull. Seism. Soc. Am. 32, 163-191. Gutenberg B., Richter C. (1954) Seismicity of the Earth. 2-nd Edition, Princeton Univ. Press. Gutenberg, B. and C. Richter (1956) Earthquake magnitude, intensity, energy, and acceleration, part II, Bull. Seism. Soc. Am. 46, 105-145. Hosking, J.R., Wallis, J.R., and Wood, E.F. (1985) Estimation of the Generalized Extreme-Value Distribution by the Method of Probability-Weighted Moments, Technometrics 27, 251-261. Kagan Y.Y. (1991) Seismic moment distribution. Geophys. J. Int. 106, 123-134. Kagan Y.Y. (1996) Comment on “The Gutenberg-Richter or characteristic earthquake distribution, which is it?” by Steven G. Wesnousky. Bull. Seism. Soc. Am. 86, 274-285. Kagan Y.Y. (1997) Seismic moment-frequency relation for shallow earthquakes: Regional comparison. Journ. of Geophys. Research. 102, 2835-2852. 22 22 Kagan Y.Y. (1997) Earthquake Size Distribution and Earthquake Insurance. Commun. Statist.-Stachastic Models, 13(4), 775-797. Kagan Y.Y. (1999) Universality of the Seismic Moment-frequency Relation. Pure Appl. Geophys. 155, 537-573. Kagan Y.Y. (2002) Seismic moment distribution revisited: I. Statistical results. Geophys. J. Int. 148, 520-541. Kagan Y.Y. (2002) Seismic moment distribution revisited: II. Moment conservation principle. Geophys. J. Int. 149, 731-754. Kagan Y.Y. and F. Schoenberg (2001) Estimation of the upper cutoff parameter for the tapered distribution. J. Appl. Probab. 38A, 901-918. Kijko A. (1999) Statistical Estimation of Maximum Regional Earthquake Magnitude mmax , 12-th European Conference on Earthquake Engineering, FW:022, 1-22. Kijko, A. (2004) Estimation of the Maximum Earthquake Magnitude, M max , Pure appl. Geophys, 161, 1-27. Kijko, A. and G. Graham (1998) Parametric-historic procedure for probabilistic seismic hazard analysis. Part I: estimation of maximum regional magnitude Mmax, Pure Appl. Geophys. 152, 413-442. Kijko, A., S. Lasocki and G. Graham (2001) Non-parametric seismic hazard in mines. Pure Appl. Geophys. 158, 1655-1675. Kijko A. and Sellevoll M.A. (1989) Estimation of earthquake hazard parameters from incomplete data files. Part I, Utilization of extreme and complete catalogues with different threshold magnitudes. Bull. Seism. Soc. Am. 79, 645-654. Kijko A. and Sellevoll M.A. (1992) Estimation of earthquake hazard parameters from incomplete data files. Part II, Incorporation of magnitude heterogeneity. Bull. Seism. Soc. Am. 82, 120-134. Knopoff L. and Kagan Y. (1977) Analysis of the Extremes as Applied to Earthquake Problems, J. Geophys. Res. 82, 5647–5657. Knopoff L., Kagan Y., and R. Knopoff (1982) b-values for foreshocks and aftershocks in real and simulated earthquake sequences, Bull. Seism. Soc. Amer.72 (5), 1663-1675. Main Y., Irving D., Musson R. (1999) and A.Reading. Constraints on frequency-magnitude relation and maximum magnitudes in the UK from observed seismicity and glacio-isostatic recovery rates. Geophys. J. Int. 137, 535-550. Main Y. (2000) Apparent Breaks in Scaling in Earthquake Cumulative Frequency-Magnitude Distribution: Fact or Artifact? Bull. Seism. Soc. Am. 90, 86-97. Molchan G., Kronrod T., and Panza G.F. (1997) Multi-scale seismicity model for seismic risk. Bull. Seis. Soc. Am. 87, 1220-1229. Ogata Y., and K. Katsura (1993) Analysis of temporal and special heterogeneity of magnitude frequency distribution inferred from earthquake catalogues. Geophys. J. Int. 113, 727-738. Pisarenko V.F. (1991) Statistical evaluation of maximum possible magnitude. Izvestia, Earth Physics, 27, 757-763. Pisarenko V.F., Lyubushin A.A., Lysenko V.B., and T.V.Golubeva (1996) Statistical Estimation of Seismic Hazard Parameters: maximum possible magnitude and related parameters. Bull. Seism. Soc. Am. 86, 691-700. 23 23 Pisarenko V.F., and D.Sornette (2003) Characterization of Frequency of Extreme Earthquake Events by the Generalized Pareto Distribution. Pure Appl. Geophys. 160, 2343-2364. Pisarenko V.F., and D.Sornette (2004) Statistical Detection and Characterization of a Deviation from the Gutenberg-Richter Distribution above Magnitude 8. Pure Appl. Geophys. 161, 839-864. Pisarenko V.F., and M.V.Rodkin (2007) Distributions with Heavy Tails: Application to the Analysis of catastrophes, Computational Seismology issue 38, 242 p. (in Russian). Pisarenko V.F., Sornette A., Sornette D., and M.V.Rodkin (2008) New approach to the Characterization of M max and of the Tail of the Distribution of Earthquake Magnitudes, Pure Appl. Geophys. 165, 1-42. Smith R. (1985) Maximum Likelihood Estimation in a Class of Non-Regular Cases, Biometrika, 72, 67-92. Smith R. (1990) Extreme value theory. In: Ledermann, W. (Chief Ed.) Handbook of Applicable Mathematics, Supplement, pp.437-472. Wiley, Chichester. Sornette, A., Davy, P. and Sornette, D. (1990) Growth of fractal fault patterns, Phys. Rev. Letters, 65, 2266-22-69. Sornette, A. and Sornette, D. (1989) Self-organized criticality and earthquakes, Europhys. Lett.9, 197-202. Sornette, A. and Sornette, D. (1990) Earthquake rupture as a critical point, Tectonophysics, 179, 327-334. Sornette, A. and Sornette, D. (1999) Renormalization of earthquake aftershocks, Geophys. Res. Lett., 26, 1981-1984. Sornette, D., L. Knopoff, Y.Y. Kagan and C. Vanneste (1996) Rank-ordering statistics of extreme events : application to the distribution of large earthquakes, J.Geophys.Res. 101, 13883-13893. Thompson, E.M, L.G. Baise and R.M. Vogel 2007) An Index Earthquake Frequency Distribution, in press in J. Geophys. Res. (Solid Earth). Uski, M. and Pelkonen E. (2006) Earthquakes in Northern Europe in 2005, Inst. Seismology, Univ. Helsinki, Report R-232. Utsu T. (1999) Representation and Analysis of the Earthquake Size Distribution: A Historical Review and Some New Approaches. Pure Appl. Geophys. 155, 509-535. Ward, S.N. (1997) More on Mmax, Bulletin of the Seismological Society of America 87 (5), 1199-1208. Wesnousky S.G. (1994) The Gutenberg-Richter or characteristic earthquake distribution, which is it? Bull. Seism. Soc. Am. 84, 1940-1959. Wu Z.L. (2000) Frequency-size distribution of global seismicity seen from broad-band radiated energy. Geophys. J. Int. 142, 59-66. 24 24 Fig. 1: Complementary cumulative distribution (log-scale) of the main shock magnitudes obtained after declustering of the Harvard catalog (01.01.1977 to 16.06.2006), as described in the text. 25 25 Fig. 2: Dependence as a function of the duration T of the time intervals of the Moment estimate of the form parameter ξ of the GEV distribution to the declustered Harvard catalog (01.01.1977 to 16.06.2006), as described in the text. The upper and lower thin lines give the plus-and-minus one standard deviation around the central estimate shown as the thick line. We used 100 shuffling samples, as explained in section 2.5. 26 26 Fig. 3: Dependence as a function of the duration T of the time intervals of the Moment estimate of the scale parameter σ of the GEV distribution to the declustered Harvard catalog (01.01.1977 to 16.06.2006), as described in the text. The upper and lower thin lines give the plus-and-minus one standard deviation around the central estimate shown as the thick line. We used 100 shuffling samples, as explained in section 2.5. 27 27 Fig. 4: Dependence as a function of the duration T of the time intervals of the Moment estimate of the location parameter µ of the GEV distribution for the declustered Harvard catalog (01.01.1977 to 16.06.2006), as described in the text. The upper and lower thin lines give the plus-and-minus one standard deviation around the central estimate shown as the thick line. We used 100 shuffling samples, as explained in section 2.5. 28 28 Fig. 5: Kolmogorov distance KD (defined in (19)) between the sample distribution of the T -maxima and the fitted GEV -distribution function as a function of the duration T of the time intervals. The upper and lower thin lines give the plus-or-minus one standard deviation of the KD value, obtained by simulating times synthetic GEV samples with parameters given by (20) chosen close to our estimates for the declustered Harvard catalog. 29 29 Fig. 6: Dependence as a function of the lower threshold H of the estimate of the parameter ξ , obtained by the maximum likelihood method with N b = 100 bootstrap samples, for the declustered Harvard catalog. 30 30 Fig. 7: Dependence as a function of the lower threshold H of the estimate of the parameter s, obtained by the maximum likelihood method with N b = 100 bootstrap samples, for the declustered Harvard catalog. 31 31 Fig. 8: Dependence as a function of the lower threshold H of the median KD -distance (thick middle line) defined in equation (26) obtained from artificial GPD samples simulating our real sample taken with the parameters ξ = -0.2; s =0.53 close to the values obtained from the empirical data (declustered Harvard catalog). 32 32 Fig. 9: Intensity of the seismic flow in Fennoscandia over the time period 01.01.1900 – 31.12.2005, defined as the average number of shocks in moving windows of years duration for four lower thresholds (each intensity value is plotted versus the center of the time window).: m = -0.7 (n = 7298 events, top curve); m = 3.0 (n = 964 events, second curve from top); m = 3.5 (n = 450 events, third curve from top); m = 4.0 (n = 197 events, bottom curve). 33 33 Fig. 10: Histogram of the main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 34 34 Fig. 11: Sample tail of the main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 35 35 Fig. 12: Moment-estimate of the GEV -parameter ξ as a function of T , for main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 36 36 Fig. 13: Moment-estimate of the GEV -parameter σ (T) as a function of T , for main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 37 37 Fig. 14: Moment-estimate of the GEV -parameter µ (T) as a function of T , for main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 38 38 Fig. 15: ML -estimate of the form parameter ξ as a function of the lower magnitude threshold H, for main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 39 39 Fig. 16: ML -estimate of the scale parameter s as a function of the lower magnitude threshold H, for main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005). 40 40 Fig. 17: ML -estimate of the tail parameter M max as a function of the lower magnitude threshold H, for main shock magnitudes that occurred over Fennoscandia (01.01.1900 – 31.12.2005).over Fennoscandia (01.01.1900 – 31.12.2005).