Hypothesis testing for tail dependence parameters on the boundary of the parameter space
HHypothesis testing for tail dependence parameters on theboundary of the parameter space
Anna Kiriliouk
Universit´e de NamurFacult´e des sciences ´economiques, sociales et de gestionRempart de la vierge 8, B-5000 Namur, Belgium.E-mail: [email protected]
December 17, 2018
Abstract
Modelling multivariate tail dependence is one of the key challenges in extreme-valuetheory. Multivariate extremes are usually characterized using parametric models, some ofwhich have simpler submodels at the boundary of their parameter space. Hypothesis testsare proposed for tail dependence parameters that, under the null hypothesis, are on theboundary of the alternative hypothesis. The asymptotic distribution of the weighted leastsquares estimator (Einmahl, Kiriliouk and Segers, Extremes 21, pages 205–233, 2018) isgiven when the true parameter vector is on the boundary of the parameter space, and twotest statistics are proposed. The performance of these test statistics is evaluated for theBrown–Resnick model and the max-linear model. In particular, simulations show that itis possible to recover the optimal number of factors for a max-linear model. Finally, themethods are applied to characterize the dependence structure of two major stock marketindices, the DAX and the CAC40.
Keywords:
Brown–Resnick model: hypothesis testing; max-linear model; multivariate ex-tremes; stable tail dependence function; tail dependence.
Extreme-value theory is the branch of statistics concerned with the characterization of ex-treme events. These occur in a large variety of fields, such as hydrology, meteorology, financeand insurance, but also in settings like human life span or athletic records. For some exam-ples, see Einmahl and Magnus (2008), Towler et al. (2010), Chavez-Demoulin et al. (2016),Thomas et al. (2016) and Rootz´en and Zholud (2017). In the univariate case, the limitingdistribution of suitably normalized block maxima or threshold exceedances can be character-ized entirely using the generalized extreme-value (Fisher and Tippett, 1928; Gnedenko, 1943)and the generalized Pareto distribution (Balkema and De Haan, 1974; Pickands III, 1975)respectively. However, many extreme events are inherently multivariate, and an importantchallenge is to model the tail dependence between two or more random variables of interest.If dependence disappears as the variables take on more and more extreme values, we say thatthey are asymptotically independent. Testing for asymptotic independence and modellingasymptotically independent data has been done in Ledford and Tawn (1996), Draisma et al.1 a r X i v : . [ s t a t . M E ] D ec Background
Let X i = ( X i , . . . , X id ), i ∈ { , . . . , n } , be random vectors in R d with cumulative distri-bution function F and marginal cumulative distribution functions F , . . . , F d . Let M n :=( M n, , . . . , M nd ) with M nj := max( X j , . . . , X nj ) for j = 1 , . . . , d . We say that F is in the max-domain of attraction of an extreme-value distribution G if there exist sequences of nor-malizing constants a n = ( a n , . . . , a nd ) > b n = ( b n , . . . , b nd ) ∈ R d such that P (cid:20) M n − b n a n ≤ x (cid:21) = F n ( a n x + b n ) d −→ G ( x ) , as n → ∞ . (2.1)The margins, G , . . . , G d , of G are univariate extreme-value distributions , G j ( x j ) = exp − γ x j − µ j σ j ! − /γ j + , σ j > , γ j , µ j ∈ R , where x + := max( x, G is determined by G ( x ) = exp {− ‘ ( − log G ( x ) , . . . , − log G d ( x d )) } , where ‘ : [0 , ∞ ) d → [0 , ∞ ) is called the (stable) tail dependence function , ‘ ( x ) := lim t ↓ t − P [1 − F ( X ) ≤ tx or . . . or 1 − F d ( X d ) ≤ tx d ] . (2.2)The cumulative distribution function F is in the max-domain of attraction of a d -variateextreme value distribution G if and only if the limit in (2.2) exists and the marginal dis-tributions in (2.1) converge to univariate extreme-value distributions. In what follows, weonly assume existence of (2.2), which concerns the dependence structure of F , but not themarginal distributions F , . . . , F d .Because the class of stable tail dependence functions is infinite-dimensional, one usuallyconsiders parametric models for ‘ . Henceforth we assume that ‘ belongs to a parametricfamily { ‘ ( · ; θ ) : θ ∈ Θ } with Θ ⊂ R p . Some examples can be found below, see also de Haanand Ferreira (2006), Falk et al. (2010), Segers (2012) and references therein. Example 2.1.
The d -dimensional logistic model (Gumbel, 1960) has stable tail dependencefunction ‘ ( x ; θ ) = (cid:0) x /θ + · · · + x /θd (cid:1) θ , θ ∈ (0 , . If θ = 1, the variables are (asymptotically) independent, while θ ↓ Example 2.2.
The d -dimensional Brown–Resnick model defined on spatial locations s , . . . , s d ∈ R has stable tail dependence function ‘ ( x ; θ ) = d X j =1 x j Φ d − ( η ( j ) (1 / x ); Υ ( j ) ) , η ( j ) ( x ) = ( η ( j )1 ( x , x j ) , . . . , η ( j ) j − ( x j − , x j ) , η ( j ) j +1 ( x j +1 , x j ) , . . . , η ( j ) d ( x d , x j )) ∈ R d − , η ( j ) l ( x l , x j ) = s γ ( s j − s l )2 + log ( x l /x j ) q γ ( s j − s l ) ∈ R ,γ ( s ) = ( || s || /ρ ) α and Υ ( l ) ∈ R ( d − × ( d − is the correlation matrix with entriesΥ ( j ) lk = γ ( s j − s l ) + γ ( s j − s k ) − γ ( s l − s k )2 q γ ( s j − s l ) γ ( s j − s k ) , l, k = 1 , . . . , d ; l, k = j, (Kabluchko et al., 2009; Huser and Davison, 2013). The parameter vector is θ = ( ρ, α ) ∈ (0 , ∞ ) × (0 , Smith model (Smith, 1989) is obtained when α = 2. Example 2.3.
The max-linear model with r factors has stable tail dependence function ‘ ( x ; θ ) = r X t =1 max j =1 ,...,d b jt x j , x ∈ [0 , ∞ ) d , (2.3)where the factor loadings b jt are non-negative constants such that P rt =1 b jt = 1 for every j ∈ { , . . . , d } and all column sums of the d × r matrix B := ( b jt ) j,t are positive (Einmahlet al., 2012). Since the rows of B sum up to one, the parameter matrix has only d × ( r −
1) free elements. Rearranging the columns of B will not change the value of the stabletail dependence function. For identification purposes, we define the parameter vector θ bystacking the columns of B in decreasing order of their sums, leaving out the column with thelowest sum. An example of a random vector Z = ( Z , . . . , Z d ) that has stable tail dependencefunction (2.3) is Z j = max t =1 ,...,r b jt S t , for j ∈ { , . . . , d } , where S , . . . , S r are independent unit Fr´echet variables, P [ S j ≤ x ] = exp( − /x ) for x > j ∈ { , . . . , d } . Example 2.4.
Let I , . . . , I d denote random (possibly dependent) Bernoulli random variableswith p j = P [ I j = 1] ∈ (0 ,
1] for j ∈ { , . . . , d } . Let, for J ⊂ { , . . . , d } , p ( J ) = P [ { j = 1 , . . . , d : I j = 1 } = J ], so that ( p ( J )) J ⊂{ ,...,d } is a probability distribution. The multivariate Marshall–Olkin model (Embrechts et al., 2003; Segers, 2012) has stable tail dependence function ‘ ( x ; θ ) = X ∅ = J ⊂{ ,...,d } p ( J ) max j ∈ J x j p j ! . In this model, any subset of components of the vector is assigned a shock that influences allcomponents of that subset. An interpretation in terms of credit risk modelling can be foundin Embrechts et al. (2003).The Marshall–Olkin model is obtained as a special case of the max-linear model (2.3) bysetting b jt = { p ( J t ) /p j } ( j ∈ J t ), where J t ∈ P ( { , . . . , d } ), J t = ∅ , and P ( A ) denotes thepower set of A . The Marshall–Olkin model iswell-known in d = 2, where B = b − b b − b ! = P [ I =1 ,I =0] P [ I =1] P [ I =1 ,I =1] P [ I =1] P [ I =0 ,I =1] P [ I =1] P [ I =1 ,I =1] P [ I =1] . .2 Estimation of the stable tail dependence function Let k = k n ∈ (0 , n ] be such that k → ∞ and k/n → n → ∞ . A straightforward non-parametric estimator of ‘ is obtained by replacing P and F , . . . , F d in (2.2) by the (modified)empirical distribution functions and replacing t by k/n , yielding e ‘ n,k ( x ) := 1 k n X i =1 { R i > n + 1 / − kx or . . . or R id > n + 1 / − kx d } . Here, R ij = P nt =1 { X tj ≤ X ij } denotes the rank of X ij among X j , . . . , X nj . This estimator,the empirical tail dependence function , was introduced in slightly different form for d = 2 inHuang (1992) and studied further in Drees and Huang (1998). Using n + 1 / n allows for better finite-sample properties, that is, for a lower mean squared error (Einmahlet al., 2012).Another nonparametric estimator is the beta tail dependence function , which was veryrecently proposed in Kiriliouk et al. (2018b). Contrary to the empirical tail dependencefunction, the beta tail dependence function is a smooth estimator which leads to some im-provement in its finite-sample behavior. Its name stems from the fact that it is based on theempirical beta copula (Segers et al., 2017). We define e ‘ βn,k ( x ) = nk (cid:26) − C βn (cid:18) − k x n (cid:19)(cid:27) , where C βn ( u ) = 1 n n X i =1 d Y j =1 F n,R ij ( u j ) , and for r ∈ { , . . . , n } , F n,r ( u ) = P rs =1 (cid:0) ns (cid:1) u s (1 − u ) n − s is the cumulative distribution functionof a Beta( r, n + 1 − r ) random variable.A drawback of e ‘ n,k or e ‘ βn,k might be their possibly growing bias as k increases: Foug`ereset al. (2015) show that, under suitable conditions, the estimator e ‘ n,k satisfies the asymptoticexpansion e ‘ n,k ( x ) − ‘ ( x ) ≈ k − / Z ‘ ( x ) + α ( n/k ) M ( x ) , where Z ‘ is a continuous centered Gaussian process, α is a function such that lim x →∞ α ( x ) = 0and M is a continuous function. When √ kα ( n/k ) tends to a non-zero constant, an asymptoticbias appears. In Foug`eres et al. (2015) and Beirlant et al. (2016), this bias is estimated andsubtracted from the estimator e ‘ n,k ( x ) in order to propose new bias-corrected estimators.Finally, many other nonparametric estimators exist of the Pickands dependence function,which is the restriction of ‘ to the unit simplex. See, for instance, Cap´era`a et al. (1997),Zhang et al. (2008), Gudendorf and Segers (2012), Berghaus et al. (2013), Vettori et al.(2016) and Marcon et al. (2017). While these can be transformed into estimators of thestable tail dependence function, they rely on stronger assumptions, i.e., they are based ondata from an extreme-value distribution, and we will not consider them here. Nonparametric estimation forms a stepping stone for semi-parametric estimation (Einmahlet al., 2012, 2016, 2018). Here, we focus on the method proposed in Einmahl et al. (2018).Let b ‘ n,k denote any initial estimator of ‘ based on X , . . . , X n . Let c , . . . , c q ∈ [0 , ∞ ) d ,5ith c m = ( c m , . . . , c md ) for m = 1 , . . . , q , be q points in which we will evaluate ‘ and b ‘ n,k .Consider for θ ∈ Θ the q × b L n,k := (cid:0)b ‘ n,k ( c m ) (cid:1) qm =1 , L ( θ ) := (cid:0) ‘ ( c m ; θ ) (cid:1) qm =1 , D n,k ( θ ) := b L n,k − L ( θ ) . (2.4)The points c , . . . , c q need to be chosen in such a way that the map L : Θ → R q is one-to-one,i.e., θ is identifiable from ‘ ( c ; θ ) , . . . , ‘ ( c q ; θ ). In particular, we will need to assume that q ≥ p .For θ ∈ Θ, let Ω( θ ) be a symmetric, positive definite q × q matrix and define f n,k ( θ ) := D Tn,k ( θ ) Ω( θ ) D n,k ( θ ) . (2.5)The continuous updating weighted least squares estimator for θ is defined as b θ n,k := arg min θ ∈ Θ f n,k ( θ ) = arg min θ ∈ Θ n D n,k ( θ ) T Ω( θ ) D n,k ( θ ) o . (2.6)In Section 4, we will study the performance of this estimator for Ω( θ ) = I q , the q × q identitymatrix. Expression (2.6) then simplifies to b θ n,k = arg min θ ∈ Θ q X m =1 (cid:0)b ‘ n,k ( c m ) − ‘ ( c m ; θ ) (cid:1) . Einmahl et al. (2018) show the consistency and asymptotic normality of b θ n,k under theassumption that the true parameter vector θ is in the interior of Θ. In the next section wegive the asymptotic distribution of b θ n,k without this restriction. The asymptotic results presented in Section 3.1 build on Andrews (1999), who establishedthe asymptotic distribution of a general extremum estimator when one or more parameterslie on the boundary of the parameter space. The methodology to obtain the asymptoticdistribution of √ k ( b θ n,k − θ ) can be summarized as follows: first, one shows that as n → ∞ ,the criterion function f n,k ( θ ) in (2.5) is equal to a quadratic function q n,k ( √ k ( θ − θ )) plus aterm that does not depend on θ . If b θ n,k is consistent, then its asymptotic distribution is onlyaffected by the part of Θ close to θ ; equivalently, we are only concerned with the shiftedparameter space Θ − θ near the origin. If Θ − θ can be approximated near the origin by aconvex cone Λ, one can show that minimizing f n,k ( θ ) over θ ∈ Θ is asymptotically equivalentto minimizing q n,k ( λ ) over λ ∈ Λ. Finally, √ k ( b θ n,k − θ ) converges in distribution to theargument minimizing the limit of q n,k ( λ ) as n → ∞ .In Section 3.2, we show how a closed-form expression can be obtained for the asymptoticdistribution of √ k ( b θ n,k − θ ). For β ⊂ θ , we show in Section 3.3 how to test H : β = β ∗ against H : β = β ∗ when, under the null hypothesis, β ∗ is on the boundary of the alternativehypothesis.We set up some notation first. Let B ε ( θ ) denote an open ball centered at θ with radius ε and let C ε ( θ ) denote an open cube centered at θ with sides of length 2 ε . Let cl(Θ) denote theclosure of Θ. A set Γ ⊂ R p is said to be locally equal to a set Λ ⊂ R p if Γ ∩ B ε ( ) = Λ ∩ B ε ( )for some ε >
0. Finally, a set Λ ⊂ R p is a cone if λ ∈ Λ implies a λ ∈ Λ for all a > .1 Estimation, consistency and asymptotic normality
When θ is on the boundary of Θ, the map L in (2.4) is not defined and thus not differentiableon a neighbourhood of θ . We will need the following assumption. (A1) Θ includes a set Θ + such that Θ + − θ equals the intersection of a union of orthantsand an open cube C ε ( ) for some ε >
0. Moreover, Θ ∩ B ε ( θ ) ⊂ Θ + for some ε > − θ happens to be locally equal to a union of orthants, we can simply set Θ + = Θ ∩ C ε ( θ ).This is the case for the models considered in Section 4. We will assume existence of the so-called left/right (l/r) partial derivatives on Θ + ; a formal definition is given in the appendix.The shape of Θ + is such that these can always be defined.Write ˙ L := ( ∂/∂ θ ) L ( θ ) ∈ R q × p for θ ∈ Θ + , where ˙ L denotes the matrix of l/r partialderivatives. Let λ ( θ ) > θ ). Theorem 3.1 (Existence, uniqueness and consistency) . Let c , . . . , c q ∈ [0 , ∞ ) d be q ≥ p points such that the map L : θ ( ‘ ( c m ; θ )) qm =1 is a homeomorphism. Let θ ∈ cl(Θ) andassume that (A1) holds, that each element of L ( θ ) has continuous l/r partial derivatives oforder two on Θ + , that ˙ L ( θ ) is of full rank, that Ω : Θ → R q × q has continuous l/r partialderivatives on Θ + and that inf θ ∈ Θ λ ( θ ) > . Finally assume, for m = 1 , . . . , q , b ‘ n,k ( c m ) p −→ ‘ ( c m ; θ ) , as n → ∞ . (3.1) Then with probability tending to one, the minimizer b θ n,k in (2.6) exists and is unique. More-over, b θ n,k p −→ θ , as n → ∞ . We omit the proof of this theorem since it is directly obtained by replacing B ε ( θ ) by Θ + in the proof of Einmahl et al. (2018, Theorem 1).When f n,k is not defined on a neighbourhood of θ , but there exists a set Θ + whichsatisfies (A1), a Taylor expansion of f n,k ( θ ) around f n,k ( θ ) holds (Andrews, 1999, Theorem6). For each θ ∈ Θ + , we have f n,k ( θ ) = f n,k ( θ ) + Df n,k ( θ ) T ( θ − θ )+ ( θ − θ ) T D f n,k ( θ )( θ − θ ) / R n,k ( θ ) , where Df n,k and D f n,k are based on l/r partial derivatives and R n,k ( θ ) is the remainderterm. We suppress dependence of Ω, ˙ L and D n,k on θ for ease of notation. From Einmahlet al. (2018, Proof of Theorems 3.2.1 and 3.2.2) we know that Df n,k ( θ ) = − D Tn,k
Ω ˙ L + o p (1) , D f n,k ( θ ) = 2 ˙ L T Ω ˙ L + o p (1) , so the quadratic expansion above is equal to f n,k ( θ ) = f n,k ( θ ) − D Tn,k
Ω ˙ L ( θ − θ ) + ( θ − θ ) T ˙ L T Ω ˙ L ( θ − θ ) + R n,k ( θ ) . Define J := ˙ L T Ω ˙ L ∈ R p × p , Y n,k := √ kJ − ˙ L T Ω D n,k ∈ R p , and q n,k ( λ ) = ( λ − Y n,k ) T J ( λ − Y n,k ) . f n,k ( θ ) can be written as f n,k ( θ ) = f n,k ( θ ) − k − Y Tn,k J Y n,k + k − q n,k (cid:16) √ k ( θ − θ ) (cid:17) + R n,k ( θ ) . We see that f n,k ( θ ) is equal to a quadratic function plus a term that does not depend on θ plus R n,k ( θ ). In Andrews (2002, Lemma 3), it is shown that under the assumptions of Theorem 3.1and (A3) below, the remainder term R n,k ( θ ) is sufficiently small so that minimizing thequadratic approximation to f n,k ( θ ) is equivalent to minimizing q n,k (cid:16) √ k ( θ − θ ) (cid:17) . Now,assume that (A2) Θ − θ is locally equal to a convex cone Λ ⊂ R p .Then inf θ ∈ Θ q n,k (cid:16) √ k ( θ − θ ) (cid:17) = inf λ ∈ Λ q n,k ( λ ) + o p (1) , for some cone Λ. Finally, assume that (A3) √ k D n,k ( θ ) d −→ D ∼ N q (0 , Σ( θ )) as n → ∞ , for some covariance matrix Σ( θ ).Assumptions (A3) and (3.1) hold for the nonparametric estimators presented in Section 2.2.1under some general regularity conditions. The matrix Σ can be expressed in terms of ‘ andits first-order partial derivatives. We have Y n,k d −→ Y := J − ˙ L T Ω D , q n,k ( λ ) d −→ q ( λ ) := ( λ − Y ) T J ( λ − Y ) , for all λ ∈ R p . Theorem 3.2 (Asymptotic normality) . If all of the above assumptions hold, √ k ( b θ n,k − θ ) d −→ b λ = arg min λ ∈ Λ q ( λ ) , as n → ∞ . Here b λ can be interpreted as the projection of Y on Λ with respect to the norm k y k J := y T J y . This theorem is a special case of Andrews (1999, Theorem 3). In the appendix, weverify that our assumptions are sufficient. Note that if Θ includes a neighbourhood of θ ,then Λ = R p and we find the same distribution as in Einmahl et al. (2018, Theorem 2) since b λ = Y . The goal of this section is to simplify the asymptotic distribution of √ k ( b θ n,k − θ ) and togive conditions under which (part of) b λ has a closed-form expression. Let G := ˙ L T Ω D , i.e., Y = J − G . We start by partitioning the vector θ ∈ R p into two subvectors, β ∈ R c and δ ∈ R p − c for c ∈ { , . . . , p } , where δ consists of all parameters that are in the interior of theparameter space. We partition θ n,k , Y , J , G and λ accordingly: θ n,k = β n,k δ n,k ! , Y = Y β Y δ ! , J = J β J βδ J δβ J δ ! , G = G β G δ ! , λ = λ β λ δ ! . Let I c denote the c × c identity matrix. For H := (cid:0) I c : (cid:1) ∈ R c × p , define q β ( λ β ) := ( λ β − Y β ) T (cid:16) HJ − H T (cid:17) − ( λ β − Y β ) . Suppose that 8
A4)
The cone Λ of assumption (A2) is equal to the product set Λ β × R p − c , where Λ β ⊂ R c is a cone. Corollary 3.3.
If all of the above assumptions hold, then √ k ( b β n,k − β ) d −→ b λ β := arg min λ β ∈ Λ β q β ( λ β ) , √ k ( b δ n,k − δ ) d −→ J − δ G δ − J − δ J δβ b λ β . When Λ β is defined by equality and/or inequality constraints, a closed-form expressionfor b λ β can be computed; we give some examples that are relevant for Section 4. For a moreformal solution, see Andrews (1999, Theorem 5). Example 3.1 ( c = 1) . Suppose that Θ = [0 , p . If β = 0, then Λ β = [0 , ∞ ) and b λ β =max( Y β , β = 1, then Λ β = ( −∞ ,
0] and b λ β = min( Y β , Example 3.2 ( c = 2) . Suppose that Θ = [0 , p . If β = (1 , β = ( −∞ , . Let ρ := HJ − H T . Then b λ β := { Y β < , Y β < } Y β + { Y β − ρ Y β < , Y β ≥ } ( Y β − ρ Y β , T + { Y β ≥ , Y β − ρ Y β < } (0 , Y β − ρ Y β ) T . If β = (0 , β = (0 ,
1) or β = (1 , We are interested in constructing hypothesis tests for parameter values that, under the nullhypothesis, are on the boundary of the alternative hypothesis. We propose two test statistics,whose asymptotic distribution follows from the results in Andrews (2001).Assume that there are no nuisance parameters on the boundary, i.e., all components of θ that lie on the boundary are part of the null hypothesis. We are interested in testing H : β = β ∗ vs H : β = β ∗ , β ∗ ∈ R c . Let Θ := { θ ∈ Θ : θ = ( β ∗ , δ ) for some δ ∈ R p − c } denote the restricted parameter space.Assume that (A5) For all θ ∈ Θ , Θ is a product set with respect to ( β , δ ) local to θ . That is, for all θ ∈ Θ ,Θ ∩ B ε ( θ ) = ( B × ∆) ∩ B ε ( θ ) for some B ⊂ R c , ∆ ⊂ R p − c and ε > . Define b θ (0) n,k := arg min θ ∈ Θ f n,k ( θ ). A deviance test statistic can be defined as T (1) n,k := k (cid:16) f n,k ( b θ (0) n,k ) − f n,k ( b θ n,k ) (cid:17) . orollary 3.4. Suppose θ ∈ Θ and all previous assumptions hold. Then T (1) n,k d −→ b λ Tβ (cid:16) HJ − H T (cid:17) − b λ β . Recall that G = ˙ L T Ω D and let J denote its covariance matrix, i.e., G ∼ N p ( , J ) with J = ˙ L T ΩΣΩ ˙ L ∈ R p × p . Note that J = J if and only if Ω = Σ − . If Λ β = R c (no parameterson the boundary) and J = J , then T (1) n,k d −→ Y Tβ (cid:16) HJ − H T (cid:17) − Y β ∼ χ c , where χ c denotes a chi-squared random variable with c degrees of freedom.A Wald-type test statistic can be based on the quadratic form in b β n,k − β ∗ . Let b V − n :=( HJ − n H T ) − ∈ R c × c denote a weight matrix where J n := J ( b θ n,k ). Define T (2) n,k := k (cid:16) b β n,k − β ∗ (cid:17) T b V − n (cid:16) b β n,k − β ∗ (cid:17) . Corollary 3.5.
Suppose θ ∈ Θ and all of the above assumptions hold. Then T (2) n,k d −→ b λ Tβ (cid:16) HJ − H T (cid:17) − b λ β . Corollary 3.4 is a special case of Andrews (2001, Theorem 4c) and Corollary 3.5 is a specialcase of Andrews (2001, Theorem 6d); their proofs are direct and thus omitted.
We conduct simulation experiments where we simulate 1000 samples of size n = 5000 from aparametric model and we assess the quality of the weighted least squares estimator in termsof its root mean squared error (RMSE) for settings where one or more of the parametersare on the boundary of the parameter space. We take k ∈ { , , . . . , } and we use theempirical and the beta tail dependence function as initial estimators. Next, we study theperformance of the two test statistics introduced in Section 3.3 in terms of empirical level andpower for k ∈ { , , , } , as we observe that higher k leads to a steadily growing biasthat quickly deteriorates the performance of the hypothesis test. In all experiments, we takeΩ = I q , the q × q identity matrix, because using an optimal weight matrix has a very minoreffect on the quality of estimation while severely slowing down the estimation procedure.Moreover, inverting Σ may be hindered by numerical problems in the case of a max-linearmodel (Einmahl et al., 2018). We simulate data from a Brown–Resnick model with α = 2 and ρ = 1 on a regular 3 × d = 6) and on a regular 4 × d = 16). As in Einmahl et al. (2018), we let c m ∈ { , } d such that exactly two components of c m are equal to one (meaning that we consider a pairwiseestimator) and we focus on pairs of neighbouring locations only, i.e., locations that are at mosta distance of √ q = 11 and q = 42 respectively.Figure 1 shows the RMSE of the parameter estimates based on the empirical tail depen-dence function (solid lines) and the beta tail dependence function (dashed lines). We see that10 . . . d = 6 k R M SE o f a
50 100 150 200 250 300 . . . . d = 6 k R M SE o f r
50 100 150 200 250 300 . . . d = 16 k R M SE o f a
50 100 150 200 250 300 . . . . d = 16 k R M SE o f r Figure 1: RMSE for parameter estimators of the Brown–Resnick model based on the empiricaltail dependence function (solid lines) and the beta tail dependence function (dashed lines) for( α, ρ ) = (2 , d = 6 (left) and d = 16 (right).the empirical tail dependence function outperforms the beta tail dependence function for α ,while the opposite is true for ρ , although differences are very minor. For small values of k ,the RMSE is lower for d = 16 than for d = 6.Next, we study the empirical level and power of the test H : α = 2. Assumtion (A4)holds with Λ = ( −∞ , × R . Table 1 shows the empirical level of the test statistic T (2) n,k basedon a significance level of α = 0 .
05. We do not display T (1) n,k as its behaviour is identical to thatof T (2) n,k . Surprisingly, the empirical tail dependence function is preferred for d = 16 while thebeta tail dependence function performs best for d = 6. In general, we can conclude that lowvalues of k are to be preferred and that the test performs better for lower dimension. Figure2 shows the empirical power of T (2) n,k as a function of α ∈ [1 . , k . k = 25 k = 50 k = 75 k = 100emp beta emp beta emp beta emp beta d = 6 6 . . . . . . . . d = 16 6 . . . . . . . . T (2) n,k based on the empirical and the beta tail dependence functionfor a significance level of 0 . The constants c , . . . , c q need to be chosen such that the max-linear model is identifiable. InEinmahl et al. (2018) it was observed that taking extremal coefficients for c m , i.e., c m ∈ { , } d which at least two non-zero elements, is not enough for identifiability. Instead, one needs tochoose c m such that its non-zero elements are unequal. For some theoretical considerations,see also Einmahl et al. (2018, Appendix B). Simulation experiments showed that in practice,taking a large grid of values (meaning that q (cid:29) p ) on [0 , d will lead to good estimatorsin terms of RMSE. Hence, in all following simulation experiments, we choose c , . . . , c q suchthat c m ∈ { , . , . , . , . . . , . , . , . , } , m ∈ { , . . . , q } . .5 1.6 1.7 1.8 1.9 2.0 d = 6, k = 25 a P o w e r i n % d = 6, k = 50 a P o w e r i n % d = 6, k = 75 a P o w e r i n % d = 6, k = 100 a P o w e r i n % d = 16, k = 25 a P o w e r i n % d = 16, k = 50 a P o w e r i n % d = 16, k = 75 a P o w e r i n % d = 16, k = 100 a P o w e r i n % Figure 2: Empirical power of T (2) n,k based on the empirical tail dependence function (solid lines)and the beta tail dependence function (dashed lines) for k ∈ { , , , } ; d = 6 (top) and d = 16 (bottom). We consider three different scenarios:
Model 1
Let r = 2 and d = 3, so that Θ = [0 , . Let θ = ( b , b , b ) = (1 , . , . H : b = 1. Assumption (A4) holds with Λ = ( −∞ , × R . Model 2
Let r = 2 and d = 3, so that Θ = [0 , . Let θ = ( b , b , b ) = (1 , . , H : ( b , b ) = (1 , −∞ , × [0 , ∞ ) × R . Model 3
Let r = 3 and d = 2, so that Θ = [0 , . Let θ = ( b , b , b , b ) =(0 , . , . , H : ( b , b ) = (0 , , ∞ ) × R × [0 , ∞ ). If the null hypothesis cannot be rejected, it means that a Marshall–Olkin model suffices.Figure 3 shows the RMSE of the parameter estimates of the three max-linear modelsexplained above, based on the empirical tail dependence function (solid lines) and the betatail dependence function (dashed lines). We see again that these two initial estimators leadto similar results. Parameters whose true values are on the boundary are better estimatedfor low k , while higher values of k are preferred for parameters whose true values are in theinterior of the parameter space.Table 2 shows the empirical level of the test statistics T (1) n,k and T (2) n,k for k ∈ { , , , } ,using a significance level of 0 .
05. The tests perform well for model 1 in general, while lowvalues of k are necessary for models 2 and 3. The beta tail dependence function outperformsthe empirical tail dependence function for model 2, while the opposite is the case for model3. For model 3, test statistic T (2) n,k performs better than T (1) n,k .12 . . . Model 1, b = 1 k R M SE o f b
50 100 150 200 250 300 . . . Model 1, b = 0.7 k R M SE o f b
50 100 150 200 250 300 . . . Model 1, b = 0.2 k R M SE o f b
50 100 150 200 250 300 . . . Model 2, b = 1 k R M SE o f b
50 100 150 200 250 300 . . . Model 2, b = 0.7 k R M SE o f b
50 100 150 200 250 300 . . . Model 2, b = 0 k R M SE o f b
50 100 150 200 250 300 . . . . Model 3, b = 0 k R M SE o f b
50 100 150 200 250 300 . . . Model 3, b = 0.8 k R M SE o f b
50 100 150 200 250 300 . . . . Model 3, b = 0.6 k R M SE o f b
50 100 150 200 250 300 . . . . . Model 3, b = 0 k R M SE o f b Figure 3: RMSE for models 1–3 based on the empirical tail dependence function (solid lines),and the beta tail dependence function (dashed lines). T (1) n,k k = 25 k = 50 k = 75 k = 100emp beta emp beta emp beta emp betaM1 2 . . . . . . . . . . . . . . . . . . . . . . . . T (2) n,k k = 25 k = 50 k = 75 k = 100emp beta emp beta emp beta emp betaM1 3 . . . . . . . . . . . . . . . . . . . . . . . . T (1) n,k and T (2) n,k for models 1–3 (M1–M3) based on the empirical andthe beta tail dependence function for a significance level of 0 . b ∈ { . , . , . . . , } for model 1 ,b ∈ { . , . , . . . } , b ∈ { , . , . . . . } for model 2 , and b , b ∈ { , . , . . . . } for model 3 . Figures 4 and 5 show the empirical power of T (1) n,k for models 1–2 and of T (2) n,k for model 3. Themost striking result of Figure 5 is that the graphs are not symmetric, i.e., the power is muchhigher when b is near zero (and b is not) than when b is near zero (and b is not). T n,k(1) for k = 50 and k= 75 b P o w e r i n % Power of T n,k(1) based on empirical stdf b b power Power of T n,k(1) based on beta stdf b b power Figure 4: Top: empirical power in % of T (1) n,k for model 1 and k = 50 (dashed line), k = 75(solid line), based on the empirical tail dependence function. Bottom: empirical power in% of T (1) n,k for model 2 and k = 50 based on the empirical (left) and the beta (right) taildependence function. 14 ower of T n,k(2) based on empirical stdf b b power Power of T n,k(2) based on beta stdf b b power Figure 5: Empirical power in % of T (2) n,k for model 3 and k = 50, based on the empirical (left)and the beta (right) tail dependence function. We studied the performance of the test statistics in case we wanted to identify a specificsubmodel of the max-linear model. We would also like to investigate the capability of the teststatistics to correctly retrieve the true number of factors, i.e., for s ∈ { , . . . , r } , we wish totest the hypothesis H : ( b s , . . . , b ds ) = . However, as mentioned in Section 2.1, the max-linear model is only defined for parameter values with P dj =1 b js > s ∈ { , . . . , d } ;when this condition is not met, there are zeroes on the diagonal of Σ( θ ), making computationof b λ β impossible. We solve this problem by computing the asymptotic distribution of the teststatistics using b θ n,k rather than θ . We consider a model with d = 2 and r = 3, with parametervector θ = ( b , b , b , b ) = (0 . , . , . , . B contains onlyzeroes and the model has effectively two factors. We estimate a three-factor model and wetest H : ( b , b ) = (0 , b , b ) (andhence for the other parameters as well).Table 3 shows the empirical level of the test statistic T (2) n,k using a significance level of0 .
05. We included the results based on the beta tail dependence function for completeness:because it tends to overestimate ( b , b ), it rejects the null hypothesis far too often. Teststatistic T (1) n,k is not presented because for any k and any initial estimator, it has an empiricallevel of 0. Even when | b θ n,k − b θ (0) n,k | is large, | f n,k ( b θ n,k ) − f n,k ( b θ (0) n,k ) | is small and hence thedeviance-type test is not adapted to this type of model. Finally, Figure 7 shows the empiricalpower of T (2) n,k for b , b ∈ { , . , . . . . } for two values of k . We remark that the power issimilar when one parameter is far from its boundary and when both parameters are far fromtheir boundary. 15 . . . . b = 0.8 k r m s e o f b
50 100 150 200 250 300 . . . b = 0.6 k r m s e o f b
50 100 150 200 250 300 . . . . . b = 0.2 k r m s e o f b
50 100 150 200 250 300 . . . . b = 0.4 k r m s e o f b Figure 6: RMSE based on the empirical tail dependence function (solid lines) and the betatail dependence function (dashed lines). k = 25 k = 50 k = 75 k = 100emp beta emp beta emp beta emp beta T (2) n,k . . . . . . . . T (2) n,k based on the empirical and the beta tail dependence functionfor a significance level of 0 . Power of T n,k(2) based on empirical stdf, k = 50 b b power Power of T n,k(2) based on empirical stdf, k = 75 b b power Figure 7: Empirical power in % of T (2) n,k , based on the empirical tail dependence function for k = 50 (left) and k = 75 (right). 16 Application to stock market indices
Consider two major European stock market indices, the German DAX and the French CAC40.We take the daily negative log-returns of the prices of these two indices from https://finance.yahoo.com for the period of January 1st, 1997 to December 31st, 2017. We removeall dates for which at least one of the two series has missing values, ending up with a sampleof size n = 5313. Figure 8 shows the time series plots of log-returns and the dependencestructure for the returns standardized to unit Pareto margins, plotted on the exponentialscale. The time series plots look stationary: we perform a Augmented Dickey–Fuller test,which gives p -values below 0 .
01 for both series, and a KPSS test, giving p -values above 0 . u j denote the 0 . X j , . . . , X nj for j ∈ { , . . . , d } . We fit a generalized Pareto distribution (GPD)to X ij − u j | X ij > u j and obtain the parameter estimates b σ = 1 .
12 (0 . b γ = 0 .
03 (0 . b σ = 1 .
10 (0 .
10) and b γ = 0 .
02 (0 . − − Daily negative log−returns of the DAX index
Year r e t u r n s i n % − − Daily negative log−returns of the CAC40 index
Year r e t u r n s i n % ll lll lll lll llll lll llll ll lll lll ll llll ll ll llllll ll lll lll ll lll l lllll ll l lll ll l llll lll lll llll ll ll ll lll l ll lll lll llll l l lll l lll l l llll lll l llll l llll lll ll l lllll lll ll lll l ll ll llll ll ll l l llll ll ll ll ll lll ll l l lll l ll lll l ll l ll lll ll l lll ll lll l llll ll lllll l l ll ll ll ll ll lll ll ll ll llll l lllllll ll l lll ll l llll l lll ll lll l llll ll llll lll lll ll llllll llll l lll l ll l ll lll ll ll ll ll lll lll lllll lll ll llll llll ll ll ll ll lll lll lll l lll ll ll l ll lll l ll ll ll l l lll l ll lll llll l lll ll ll lll lll l l llll l lll ll ll ll lllll llll ll ll l llll lllll llll l lll l ll ll lllll lllll l ll ll ll ll llll ll l lllllll llll l llll ll ll llll ll ll lll ll lll ll lll l lll ll llll lllll ll ll lll lll l lll l lll ll l l lllll ll lll llll ll ll llll ll l llll lllll lllll ll lll llll lll lll ll lll lllll l l lllll ll l ll lll llll lll lll lll ll lll lllll ll llllll l ll ll llllll l llll lll ll l llll l lll ll ll ll lllll lllll ll ll llllll lllll lll ll lll l lllll llll ll lll llll ll llll lll lll ll llll ll ll ll l ll ll lll l lll llll lll lll lll ll ll l ll ll lll ll lll lll lllllll l lll ll llll lll ll ll l llll lll lll l lll ll lll llll l ll llll llll l llll lll ll l ll l ll llll lll ll ll lll lll l lll lll lll ll lll l lll lll l ll ll llll ll ll ll l lll l l llll llllll ll llll ll ll ll llll ll ll ll llll ll ll l ll ll l ll ll l l llll l l l ll ll lll l llll ll l l lll lll ll ll l l ll llll l ll ll ll l lll lll l ll ll lll ll lll l ll l lll ll l llllll lll ll llll ll ll ll llllll l llll lllll l l llllll l ll ll ll l lll lll ll l l lll llll llll ll lll lll l lllll ll llll ll lll lll llll llll l l lll lll l lll l lll l lllll lll lll ll ll ll l lll l ll lll l ll ll ll lll ll lll ll lllll ll ll l llll ll lll ll ll llll lll llll l ll ll l llll lllll l lll l ll ll llll ll ll l lll lll ll ll l l ll ll lll l llll l l lll lll l llll lll l lll ll lll ll ll ll lll lll lll l ll ll lll llll lll llll l lll lllllll ll ll ll l ll ll lll ll l l llll ll ll lll l l ll l ll ll ll ll l lll l lll l lll ll l ll lll ll lll ll llll ll l ll llll lllll lll l llll l ll ll ll llllll ll ll ll l lll lll llllll lll l lll lllll l ll l lll l llll llll lll ll ll llll ll ll ll ll ll ll ll lll l l lll ll ll ll ll ll lll lll lllllll ll lll l lll l ll ll l ll l ll lll ll l ll ll ll ll ll l lll ll ll l lll ll lllllll l lll lllll llllll l ll ll ll l lll ll ll llll lll ll l l ll llll lll ll ll lll lll l llll ll l lll ll ll llllll ll ll ll ll lll ll llll ll l llll llll lll ll lllll llll ll ll ll ll l lll lll lll ll lllll l ll l lll ll l lll ll l ll ll lll l ll l ll lll ll ll ll l ll ll lll llll ll lll llll l lllll ll lll llll ll lll lll llll ll ll ll llll ll l llll lll lll llll l lll l lllll ll lll lll l lll l ll ll lll ll lll l ll l l lllllll llll ll llll lll l lll ll llll llll ll l ll l ll ll ll lll ll ll l lll lllll l ll l l ll llll l lllllll l l llll llll ll lll ll ll l lll llll l lll llll llll ll l ll lll llllll ll lllll lll l lllll lll llll l ll l ll ll l lll lll l ll ll ll llll ll llll lllll lll l l ll ll ll lll lll ll llll ll llll ll ll ll lll ll ll lll lll lll lll ll ll ll ll l ll l ll lll llllll lll llllll ll lllllll lll ll lll l llll ll lllllll llll ll lll ll lll llll lll l ll lll ll l l lll lll ll lll l ll l l l ll ll l lll lll ll ll llll ll ll ll ll lll ll l ll llll l ll ll lll llllll lll ll ll ll lll l lll lll ll l ll l llll ll lllllll ll ll ll lll lll llll lll l llll ll l ll lll l lll llll ll ll llll l ll l ll lll l l lll ll ll llll lll llll lll lll ll lll lll l llll llll ll ll ll l ll ll lll l ll ll lll lll llll l lll l lll lllll l ll ll ll lll lll lll lll l l llll llll lll lll llll ll l lll lll ll l ll ll ll llll ll llll lll ll ll lll lll llll ll ll lll ll llll ll llll llllll llll l lll ll ll ll ll ll llll ll ll ll ll ll lll lll ll lll lll ll ll ll lll ll ll lll llll lll l ll ll lll lll l ll ll ll ll ll l ll lll l lll ll lllll ll ll lll ll lll l lll lll llll ll lll ll l lll l ll l llll lll lll llll ll ll ll ll l lll ll ll ll ll lll lll ll ll l ll llll lll ll lll l lll l lll l llll ll ll ll ll lll l lll llll lllll llll l ll l ll ll ll lll llll l ll lll lll ll l lll llll ll llll lll l ll l l lll l l lll ll llllll ll ll ll l lll l ll lll ll lllll ll l ll ll l lll lllll l lll llll ll ll lllll lll l llll l lllll lll lll ll l llll ll ll ll llllll ll ll l lll l lll llll l ll ll l lll lll ll ll llll llll llllll ll lll ll l lll l lll l ll lllll l ll l lllll ll ll llll lll l lll l l llll ll ll ll l ll ll ll ll ll ll lll ll ll lll ll lll ll ll ll l ll l lllll ll lll lll l ll ll lll lll lll lllll l llll l lll ll lllllll l ll lllll ll lll ll lll l lllll ll lll l l ll l ll l lll l lll lll ll lll ll lll lll llll lll lllll lllll lll ll ll l l lll ll lllllllll l l lllll lll llll ll l ll l lllll l lll llll lll lll lll ll l ll ll l lll ll lll lll lll ll lll lll ll l ll lll ll l ll llll lll l l lll l llll lll llll lll lll ll lll ll l ll l lll llll lll llll lll lllll llllll ll ll llll lll lll l ll l lll ll lll l lll ll lll ll l ll ll lll llllll l lll ll l lll lll l lll ll ll l ll ll ll lll lll ll l llllllll lll l lll ll ll lllll lll lll ll ll lll l l l lllll lll ll l l lll l llllll ll l llll ll l lll ll ll ll lll lll l ll l l lll lll lll lll ll llllll ll lll ll ll ll ll lll llll lllll ll lll ll lll ll ll lll l lll ll llll ll ll lll l lll ll ll l l lll lllll ll ll lll l l lll l ll lll lll lll lll ll lll ll llll lll llll lll ll ll lllll lll lll ll l l l lll llll ll lll l ll llll l llllll ll ll ll ll lll lll ll llll ll lll lll l ll l ll l ll ll ll ll ll lll llll lll ll ll llll ll ll l ll lllll ll ll l lll ll ll ll lll llll llll ll ll ll ll lll lll lll ll llll ll lll ll lll lll llll l lll lll lll lll llll ll ll llll lll ll ll ll l lll lllllll lllll lll llllllll ll llll ll lll lll l lllll l l lllll l ll lll lll llll lll llll llllll ll lll llll ll l lll ll lll ll llll ll l lll lll l ll lll ll ll llll l lll ll ll ll lll ll ll lll ll lll ll lllll lllll llll l lllll llllll l ll ll l lllll ll lll l lll ll l llll l ll lll lll lll l l ll ll ll lll l lllllll lll ll lllll lll l ll llll ll lll l ll lll lll l lll lllll l llll llllll lllllll l ll lllll ll l l ll lll ll ll ll ll lll l llll l lll ll l lll ll lll l llllll ll lll ll lll l l ll ll ll l ll lll lll lll ll ll l ll lll llll ll l ll lll lll llll llll ll l lll lll ll lll l ll llll lll llll lllll ll l lll ll l ll ll l lllllll lll l lllll ll lll llll ll llll ll ll lll l ll l ll l l lll l lll llll l lll lll ll l ll l ll ll l llll l l ll ll ll l lll lll ll l lllll lll ll lll l ll llll ll l ll l ll lll ll l ll ll ll lll lll l lllll l ll ll ll lll lllll ll l l ll ll llll l ll lll l lll lll llll ll l l lll ll ll lll lll lll ll llll ll l ll l llll l l lll lll lll ll l lll l lll lll lll ll l lll ll l ll l ll lll ll ll lll lll l ll ll lllll l lll l l lllll ll llllll l ll ll ll llll ll lll lll l l llll lll lll llllll l llllll ll ll ll lllll ll l llllll ll ll l ll lll l ll ll lll lll ll llll ll lll l lll lll ll ll ll ll ll ll ll l llll lllll ll llll ll lll lll lll llllll l ll ll lll llllllll llll lll ll lll ll lll l l lll ll ll ll ll ll llll l l llll ll lllllllll l l lll l ll lll ll llll ll l lllll lllll lll ll l lll l lll ll llll llll llll lll lll lll ll ll lll ll lll ll ll l lll llll llll lll ll llll ll ll ll llll llll llll ll ll lll lll llllllll lll lllllll llll lll ll ll l ll llllll lll lll ll l l ll l lll ll llll l ll ll ll DAX versus CAC40
Figure 8: Left and middle: time series plots of daily negative log-returns of the DAX and theCAC40; right: scatterplot of daily negative log-returns of the DAX versus the CAC40 on theunit Pareto scale, plotted on the exponential scale.Table 4 shows the estimated parameter matrix B = ( b jt ) j,t for j = 1 , t = 1 , , , k ∈ { , , . . . , , } were considered: we chose k = 40because parameter estimates are stable around this value. Standard errors were calculatedusing the asymptotic variance matrix of the estimator. The results in Table 4 suggest to test H : ( b , b ) = (0 , T (2) n,k is 0 .
04. Comparing with a criticalvalue of 17 . .
05, we find that we cannot reject the nullhypothesis. Critical values are calculated by simulation from the asymptotic distribution ofthe test statistics, which is given in Corollaries 3.4 and 3.5. We do not consider T (1) n,k or thebeta tail dependence function because of their bad performance (see Section 4.2.2).Table 5 shows the estimated parameter matrix B = ( b jt ) j,t for j = 1 , t = 1 , , H : ( b , b ) = (0 , T (2) n,k is 75 .
9. Comparing to a critical value of 8 .
03, we rejectthe null hypothesis of a Marshall–Olkin model.Figure 9 shows two goodness-of-fit measures and the estimated probability of a jointexceedances of the two stocks. On the left, we plotted the level sets { ( x , x ) : ‘ ( x , x ) = c } r = 4 factors for k = 40;standard errors are in parentheses.DAX 0.41 (0.11) 0.14 (0.06) 0.46 (0.09)CAC40 0.43 (0.12) 0.44 (0.10) 0.13 (0.05)Table 5: Parameter matrix for a bivariate max-linear model with r = 3 factors for k = 40;standard errors are in parentheses.for c ∈ { . , . , . , } based on the empirical tail dependence function (solid lines) and onthe fitted max-linear model (dashed lines).A common summary measure of dependence is the tail dependence coefficient, χ ( q ) := lim q ↑ P [ F ( X ) < q | F ( X ) < q ] = 2 − ‘ (1 , , see (2.2). Figure 9 (middle) shows nonparametric estimates of the tail dependence coefficient(black dots) e χ n,k = 2 − e ‘ n,k (1 ,
1) and its model-based counterpart (red dots) for decreasing k . The horizontal grey line corresponds to the value k = 40 that was used for the parameterestimators and the tests, corresponding to b χ n,k ≈ .
68, which is equal to the model-based χ .The dotted lines correspond to 95% pointwise bootstrap confidence intervals. DAX versus CAC40 x y 0.2 0.4 0.6 0.8 1 . . . . . . l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
200 150 100 50 . . . . . . . Tail dependence coefficient c ^ n,k k l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l − − − probability of a joint exceedance return threshold Figure 9: Left: level sets { ( x , x ) : ‘ ( x , x ) = c } for c ∈ { . , . , . , } based on theempirical tail dependence function (solid lines) and on the fitted max-linear model (dashedlines); middle: nonparametric (black dots) and model-based (red dots) estimates of the taildependence coefficient b χ n,k for k ∈ { , , . . . , } ; right: probability of a joint exceedancebased on the fitted max-linear model (solid line) and on the empirical excesses (dashed line).The probability of a joint excess F ( x , x ) = P [ X > x , X > x ] can be approximatedfor large ( x , x ) by F ( x , x ) = x ∗ + x ∗ − {− ‘ ( x ∗ , x ∗ ) } for x ∗ j = 1 − F j ( x j ) , j = 1 , . H ( · ; γ j , σ j ) denote the survival function of a GPD with shape γ j and scale σ j . For x > u and x > u , we can estimate F ( x , x ) by b F ( x , x ) = b x ∗ + b x ∗ − n − ‘ ( b x ∗ , b x ∗ ; b θ n,k ) o , where ‘ ( · ; b θ n,k ) denotes the max-linear stable tail dependence function based on the fittedparameter estimates and b x ∗ j = 0 . × H ( x j − u j ; b γ j , b σ j ). The right-hand plot of Figure 8 showsthe probabilities b F ( x, x ) on the log-scale for x ∈ { . , , . . . , } (solid line). These can becompared to the empirical excess probabilities n − P ni =1 { X i > x, X i > x } (dashed line),which are zero for x > . We proposed two test statistics for parameters that, under the null hypothesis, are on theboundary of the alternative hypothesis. Simulation studies showed that especially the Wald-type test statistic performs well. Moreover, these test statistics are convenient because theirasymptotic distribution has an explicit expression. In practice, this distribution becomes cum-bersome when testing for a higher dimensional ( c >
4) parameter vector. A possible solutionmight be to consider a multiple testing procedure with a Bonferroni-type of correction.We applied the test statistics to the max-linear model and the Brown–Resnick model,but the results are generic and could be used on any multivariate extreme-value model wheresubmodels appear at boundary values or where the number of “factors” is of interest. Ex-amples include Foug`eres et al. (2009), where a mixture model is obtained based on a certainnumber of stable random variables, and Gissibl and Kl¨uppelberg (2018), where a max-linearmodel is defined on a directed acyclic graph. For the latter, our methods could be useful inthe procedure to reconstruct the graph structure of a dataset (Gissibl et al., 2018), since it isbased on identifying pairwise tail dependence coefficients that are zero.
A Proofs
Definition ( Left/right partial derivatives ) . Let f be a function whose support includes X ⊂ R p and let a ∈ X . Suppose that X − a equals the intersection of a union of orthants andan open cube C ε ( ) for some ε >
0, i.e.,
X − a is locally equal to a union of orthants. Thefunction f is said to have left/right (l/r) partial derivatives of order 1 on X if1. it has partial derivatives at each interior point of X ;2. it has partial derivatives at each boundary point of X with respect to coordinates thatcan be perturbed to the left and right;3. it has left (right) partial derivatives at each boundary point of X with respect to coor-dinates that can be perturbed only to the left (right).The shape of X is such that for all x ∈ X and for all i ∈ { , . . . , p } , it is possible toperturb x i to the left, the right, or both and stay within X . This means that it is alwayspossible to define the left, right, or two-sided partial derivative of f with respect to x i . Wesay that f has l/r partial derivatives of order k on X for k ≥ f has l/r partial derivatives19f order k − X and each of the latter has l/r partial derivatives on X . When we say the f has continuous l/r partial derivatives, continuity is defined in terms of local perturbationswithin X only. Proof of Theorem 3.2.
This theorem is a special case of Andrews (1999, Theorem 3b). Aclosely related work is Andrews (2002), where the focus is on generalized method of momentestimators. This setting is closer to ours but uses a convergence rate of √ n , whereas Andrews(1999) allows for more generality. The quantities ‘ T , B T and R T in Andrews (1999) correspondto − ( k/ f n,k , √ kI p and − ( k/ R n,k respectively in this paper. Theorem 3b in Andrews(1999) holds under the assumptions 1, 2 ∗ , 3 ∗ , 5 ∗ , and 6 of that paper. We show that theseare implied by ours: Assumption 1
The assumptions made in Theorem 3.1 imply assumption 1.
Assumption 2 ∗ Assumption GMM2 in Andrews (2002) implies Assumption 2 ∗ ; this is provenin Andrews (2002, Lemma 3). We show that our assumption imply GMM2: assump-tion GMM2(a) holds because because D n,k ( θ ) converges in probability to L ( θ ) − L ( θ ).Assumption GMM2(b) holds if assumption GMM2 ∗ ( b ) holds, which in turn is im-plied by the assumptions of Theorem 3.1 and (A1). Assumption GMM2(c) holds since D ( θ ) = . Assumption GMM2(d) holds since D n,k ( θ ) − D ( θ ) − D n,k ( θ ) = 0. Finally,Assumption GMM2(e) holds automatically since our weight matrix isn’t random. Assumption 3 ∗ Assumption 3 ∗ holds because of (A3) and because J is non-random, sym-metric and non-singular. Assumption 5 ∗ Assumption 5 ∗ holds because of (A2) and because B T = √ kI p and √ k → ∞ as n → ∞ . Assumption 6
Assumption (A2) implies assumption 6.
Proof of Corollary 3.3.
This corollary is a special case of Andrews (1999, Corollary 1b), whereno parameter ψ appears. Assumptions 1, 2 ∗ , 3 ∗ ,5 ∗ and 6–8 in that paper are needed: in theproof of Theorem 3.2, we’ve already shown that our assumptions imply 1, 2 ∗ , 3 ∗ ,5 ∗ and 6;assumptions 7 and 8 hold by our assumption (A4). Acknowledgements
The author would like to thank two reviewers and an associate editor for their careful readingof the paper and their constructive comments that greatly improved the generality of thepaper. She would also like to thank Johan Segers for helpful comments on an earlier versionof this paper.
REFERENCES
Andrews, D. W. K. (1999). Estimation when a parameter is on a boundary.
Econometrica , 67(6):1341–1383.Andrews, D. W. K. (2001). Testing when a parameter is on the boundary of the maintained hypothesis.
Econometrica , 69(3):683–734. ndrews, D. W. K. (2002). Generalized method of moments estimation when a parameter is on aboundary. Journal of Business & Economic Statistics , 20(4):530–544.Balkema, A. A. and De Haan, L. (1974). Residual life time at great age.
The Annals of Probability ,2(5):792–804.Beirlant, J., Escobar-Bach, M., Goegebeur, Y., and Guillou, A. (2016). Bias-corrected estimation ofstable tail dependence function.
Journal of Multivariate Analysis , 143(1):453–466.Berghaus, B., B¨ucher, A., and Dette, H. (2013). Minimum distance estimators of the Pickands depen-dence function and related tests of multivariate extreme-value dependence.
Journal de la Soci´et´eFran¸caise de Statistique , 154(1):116–137.Brigo, D., Mai, Jan, Scherer, M., and Sloot, H. (2018). Consistent iterated simulation of multivari-ate defaults: Markov indicators, lack of memory, extreme-value copulas, and the Marshall-Olkindistribution. In
Innovations in Insurance, Risk- and Asset Management .Burtschell, X., Gregory, J., and Laurent, J.-P. (2009). A comparative analysis of CDO pricing models.
The Journal of Derivatives , 16(4):9–37.Cap´era`a, P., Foug`eres, A.-L., and Genest, C. (1997). A nonparametric estimation procedure forbivariate extreme value copulas.
Biometrika , 84(3):567–577.Castruccio, S., Huser, R., and Genton, M. G. (2016). High-order composite likelihood inferencefor max-stable distributions and processes.
Journal of Computational and Graphical Statistics ,25(4):1212–1229.Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2016). An extreme value approach for modelingoperational risk losses depending on covariates.
Journal of Risk and Insurance , 83(3):735–776.Cui, Q. and Zhang, Z. (2018). Max-linear competing factor models.
Journal of Business & EconomicStatistics , 36(1):62–74.Davison, A. C., Padoan, S. A., and Ribatet, M. (2012). Statistical modeling of spatial extremes.
Statistical Science , 27(2):161–186.de Fondeville, R. and Davison, A. C. (2018). High-dimensional peaks-over-threshold inference.
Biometrika , 105(3):575–592.de Haan, L. and Ferreira, A. (2006).
Extreme Value Theory: an Introduction . Springer-Verlag Inc.Dombry, C., Engelke, S., and Oesting, M. (2017). Bayesian inference for multivariate extreme valuedistributions.
Electronic journal of statistics , 11(2):4813–4844.Draisma, G., Drees, H., Ferreira, A., and De Haan, L. (2004). Bivariate tail estimation: dependencein asymptotic independence.
Bernoulli , 10(2):251–280.Drees, H. and Huang, X. (1998). Best attainable rates of convergence for estimators of the stable taildependence function.
Journal of Multivariate Analysis , 64(1):25–47.Einmahl, J. H., Kiriliouk, A., Krajina, A., and Segers, J. (2016). An M-estimator of spatial taildependence.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 78(1):275–298.Einmahl, J. H., Kiriliouk, A., and Segers, J. (2018). A continuous updating weighted least squaresestimator of tail dependence in high dimensions.
Extremes , 21:205–233.Einmahl, J. H. and Magnus, J. R. (2008). Records in athletics through extreme-value theory.
Journalof the American Statistical Association , 103(484):1382–1391.Einmahl, J. H. J., Krajina, A., and Segers, J. (2012). An M-estimator for tail dependence in arbitrarydimensions.
The Annals of Statistics , 40(3):1764–1793.Embrechts, P., Lindskog, F., and McNeil, A. (2003). Modelling dependence with copulas and applica-tions to risk management. In Rachev, S., editor,
Handbook of Heavy Tailed Distributions in Finance ,chapter 8, pages 329–384. Elsevier.Falk, M., H¨usler, J., and Reiss, R.-D. (2010).
Laws of small numbers: extremes and rare events .Springer Science & Business Media.Fisher, R. A. and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution of the largest orsmallest member of a sample. In
Mathematical Proceedings of the Cambridge Philosophical Society ,volume 24, pages 180–190. Cambridge Univ Press.Foug`eres, A.-L., de Haan, L., and Mercadier, C. (2015). Bias correction in multivariate extremes.
The nnals of Statistics , 43(2):903–934.Foug`eres, A.-L., Nolan, J. P., and Rootz´en, H. (2009). Models for dependent extremes using stablemixtures. Scandinavian Journal of Statistics , 36(1):42–59.Gissibl, N., Kl¨uppelberg, C., and Otto, M. (2018). Tail dependence of recursive max-linear modelswith regularly varying noise variables.
Econometrics and statistics , 6:149–167.Gissibl, N. and Kl¨uppelberg, C. d. (2018). Max-linear models on directed acyclic graphs.
Bernoulli ,24(4A):2693–2720.Gnedenko, B. (1943). Sur la distribution limite du terme maximum d’une serie aleatoire.
Annals ofmathematics , 22(3):423–453.Gudendorf, G. and Segers, J. (2012). Nonparametric estimation of multivariate extreme-value copulas.
Journal of Statistical Planning and Inference , 142(12):3073–3085.Guillou, A., Padoan, S. A., and Rizzelli, S. (2018). Inference for asymptotically independent samplesof extremes.
Journal of Multivariate Analysis , 167:114–135.Gumbel, E. J. (1960). Bivariate exponential distributions.
Journal of the American Statistical Asso-ciation , 55(292):698–707.Huang, X. (1992).
Statistics of bivariate extreme values . PhD thesis, Tinbergen Institute ResearchSeries.Huser, R. and Davison, A. (2013). Composite likelihood estimation for the Brown–Resnick process.
Biometrika , 100(2):511–518.H¨usler, J. and Li, D. (2009). Testing asymptotic independence in bivariate extremes.
Journal ofStatistical Planning and Inference , 139(3):990–998.Kabluchko, Z., Schlather, M., and de Haan, L. (2009). Stationary max-stable fields associated tonegative definite functions.
Annals of Probability , 37(5):2042–2065.Kiriliouk, A., Rootz´en, H., Segers, J., and Wadsworth, J. L. (2018a). Peaks over thresholds modellingwith multivariate generalized Pareto distributions. To be published in Technometrics.Kiriliouk, A., Segers, J., and Tafakori, L. (2018b). An estimator of the stable tail dependence functionbased on the empirical beta copula.
Extremes , 21(4):581–600.Ledford, A. W. and Tawn, J. A. (1996). Statistics for near independence in multivariate extremevalues.
Biometrika , 83(1):169–187.Marcon, G., Padoan, S., Naveau, P., Muliere, P., and Segers, J. (2017). Multivariate nonparametricestimation of the Pickands dependence function using Bernstein polynomials.
Journal of StatisticalPlanning and Inference , 183:1–17.Padoan, S., Ribatet, M., and Sisson, S. (2010). Likelihood-based inference for max-stable processes.
Journal of the American Statistical Association (Theory and Methods) , 105(489):263–277.Pickands III, J. (1975). Statistical inference using extreme order statistics.
The Annals of Statistics ,3(1):119–131.Rootz´en, H. and Tajvidi, N. (2006). Multivariate generalized Pareto distributions.
Bernoulli ,12(5):917–930.Rootz´en, H. and Zholud, D. (2017). Human life is unlimited–but short.
Extremes , 20(4):713–728.Segers, J. (2012). Max-stable models for multivariate extremes.
REVSTAT — Statistical Journal ,10(1):61–92.Segers, J., Sibuya, M., and Tsukahara, H. (2017). The empirical beta copula.
Journal of MultivariateAnalysis , 155:35–51.Smith, R. L. (1989). Extreme value analysis of environmental time series: an application to trenddetection in ground-level ozone.
Statistical Science , pages 367–377.Smith, R. L. (1990). Max-stable processes and spatial extremes. Unpublished manuscript.Su, J. and Furman, E. (2017). Multiple risk factor dependence structures: Copulas and relatedproperties.
Insurance: Mathematics and Economics , 74:109–121.Tawn, J. A. (1990). Modelling multivariate extreme value distributions.
Biometrika , 77(2):245–253.Thomas, M., Lemaitre, M., Wilson, M. L., Viboud, C., Yordanov, Y., Wackernagel, H., and Carrat,F. (2016). Applications of extreme value theory in public health.
PloS one , 11(7):e0159312.Towler, E., Rajagopalan, B., Gilleland, E., Summers, R. S., Yates, D., and Katz, R. W. (2010). odeling hydrologic and water quality extremes in a changing climate: A statistical approachbased on extreme value theory. Water Resources Research , 46(11).Vettori, S., H¨user, R., and Genton, M. G. (2016). A comparison of non-parametric and parametricestimators of the dependence function in multivariate extremes. Submitted.Wadsworth, J., Tawn, J., Davison, A., and Elton, D. (2016). Modelling across extremal dependenceclasses. To be published in the Journal of the Royal Statistical Society: Series B (StatisticalMethodology).Wadsworth, J. L. and Tawn, J. A. (2014). Efficient inference for spatial extreme-value processesassociated to log-Gaussian random functions.
Biometrika , 101(1):1–15.Zhang, D., Wells, M. T., and Peng, L. (2008). Nonparametric estimation of the dependence functionfor a multivariate extreme value distribution.
Journal of Multivariate Analysis , 99(4):577–588.Zhao, Z. and Zhang, Z. (2018). Semiparametric dynamic max-copula model for multivariate timeseries.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 80(2):409–432., 80(2):409–432.