A simulation study of semiparametric estimation in copula models based on minimum Alpha-Divergence
aa r X i v : . [ s t a t . M E ] S e p A simulation study of semiparametric estimation in copulamodels based on minimum Alpha-Divergence
Morteza Mohammadi ∗ , Mohammad Amini † , and Mahdi Emadi ‡∗ , † , ‡ Department of Statistics, Faculty of Mathematical Sciences,Ferdowsi University of Mashhad, P.O. Box 1159, Mashhad 91775, IranSeptember 14, 2020
Abstract
The purpose of this paper is to introduce two semiparametric methods for the estima-tion of copula parameter. These methods are based on minimum Alpha-Divergence betweena non-parametric estimation of copula density using local likelihood probit transformationmethod and a true copula density function. A Monte Carlo study is performed to mea-sure the performance of these methods based on Hellinger distance and Neyman divergenceas special cases of Alpha-Divergence. Simulation results are compared to the MaximumPseudo-Likelihood (MPL) estimation as a conventional estimation method in well-knownbivariate copula models. These results show that the proposed method based on MinimumPseudo Hellinger Distance estimation has a good performance in small sample size and weakdependency situations. The parameter estimation methods are applied to a real data set inHydrology.
Key words and Phrases:
Alpha-Divergence; Copula Density; Hellinger Distance; Semi-parametric Estimation.
The copulas describe the dependence between random vector components. Unlike marginal andjoint distributions that are clearly observable, the copula of a random vector is a hidden depen-dence structure that connects the joint distribution with its margins. The copula parametercaptures the inherent dependence between the marginal variables and it can be estimated byeither parametric or semiparametric methods. Maximum likelihood estimation (MLE), which isused to estimate the parameter of any type of model, is the most effective method. It can alsobe applied to copula, but the problem becomes complicated as the number of parameters anddimension of copula increases, because the parameters of the margins and copula are estimatedsimultaneously. Therefore, MLE is highly affected by misspecification of marginal distributions.A rather straightforward way at the cost of lack of efficiency is inference functions formargins (IFM), which is put forward by Joe (2005). Similar to MLE in this method themargins of the copula are important, because the parameter estimation is dependent on the ∗ Email: [email protected] † Email: [email protected] (corresponding author) ‡ Email: [email protected] L -variantsof the CvM-statistic based on the empirical copula process, Kendall’s dependence function andRosenblatt’s probability integral transform.Tsukahara (2005) proposed the Hellinger distance based on copula density to improve theperformance of the MD estimator, but did not proceed with it, because it required the estima-tion of the copula density function. Hellinger distance is a special case of Alpha-Divergence.The authors present semiparametric methods based on minimum Alpha-Divergence estimationbetween non-parametric estimation of copula density and true copula density which it calls”Minimum Pseudo Alpha-Divergence” (MPAD) estimation. In this method, the copula densityis estimated using local likelihood probit transformation ( LLPT ) method that was recentlysuggested by Geenens et al. (2017). The purpose of this paper is to present a comprehensivesimulation study on the performance of the MPL estimator and special cases of the MPADestimator for bivariate parametric copulas.In what follows, discussions will be restricted to bivariate observations only for simplicity.The rest of the paper is arranged as follows. In Section 2, the preliminaries for copulas and MPLmethod are described. The estimation of the copula density function using local likelihood probittransformation method is provided in Section 3. In Section 4, the copula parameter estimationbased on minimum Alpha-Divergence is introduced. The simulation results are provided tocompare the MPL and MPHD methods in Section 5. In Section 6, the performance of theconsidered methods for real data in Hydrology is presented. Concluding remarks are given inSection 7. 2able 1: Some well-known bivariate copulasCopula C ( u, v ; θ ) Parameter Space Kendall’s tau Clayton ( u − θ + v − θ − − /θ θ ∈ ( − , + ∞ ) − { } θθ +2 Gumbel exp n − h ( − ln u ) θ + ( − ln v ) θ i /θ o θ ∈ [1 , + ∞ ) θ − θ F rank − θ log n ( e − uθ − e − vθ − e − θ − o θ ∈ ( −∞ , + ∞ ) − { } θ ( D ( θ ) − Gaussian Φ (Φ − ( u ) , Φ − ( v ); θ ) θ ∈ [ − , +1] π arcsin ( θ ) T t ,ν ( t − ν ( u ) , t − ν ( v ); θ ) θ ∈ [ − , +1] , ν > π arcsin ( θ ) Some definitions related to a copula function will be briefly reviewed. Sklar (1959) was theprimary to display the fundamental concept of the copula. Let (
X, Y ) be a continuous randomvariable with joint cumulative distribution function (cdf) F , then copula C corresponding to F defined as: F ( x, y ) = C ( F X ( x ) , F Y ( y )) , ( x, y ) ∈ R , (1)where F X and F Y are the marginal distributions of X and Y , respectively. A bivariate copulafunction C is a cumulative distribution function of random vector ( U, V ), defined on the unitsquare [0 , , with uniform marginal distributions as U = F X ( X ) and V = F Y ( Y ).The authors shall write C ( u, v ; θ ) for a family of copulas indexed by the parameter θ . If C ( u, v ; θ ) is an absolutely continuous copula distribution on [0 , , then its density function is c ( u, v ; θ ) = ∂ C ( u,v ; θ ) ∂u∂v . As a result, the relationship between the copula density function ( c ) andthe joint density function ( f ) of ( X, Y ) according to equation (1) can be represented as f ( x, y ) = c ( F X ( x ) , F Y ( y ); θ ) f X ( x ) f Y ( y ) , ( x, y ) ∈ R , (2)where f X and f Y are the marginal density functions of X and Y , respectively.Table 1 presents summary information of some well-known bivariate copulas such as theparameter space and Kendall’s tau ( τ ) of them. In this table, Clayton, Gumbel, and Frankcopulas belong to the class of Archimedean copulas and Gaussian and T copulas belong to theclass of Elliptical copulas. The copula-based Kendall’s tau association for continuous variables X and Y with copula C is given by τ = 4 R [0 , C ( u, v ) dC ( u, v ) − X , Y ) , ( X , Y ) , ..., ( X n , Y n ) be a random sample of size n from a pair ( X, Y ). Empir-ical copula that was initially introduced by Deheuvels (1979) defined as C n ( u, v ) = 1 n n X i =1 I { ˜ U i ≤ u, ˜ V i ≤ v } , (3)where ˜ U i = n ˆ F X ( x i ) / ( n + 1), ˜ V i = n ˆ F Y ( y i ) / ( n + 1) for i = 1 , · · · , n , are the pseudo observationsand ˆ F X and ˆ F Y are the empirical cumulative distribution function of the observation X i and Y i , respectively. D k ( θ ) = kθ k R θ t k e t − dt . Φ − is the inverse of the standardized univariate Gaussian distribution and Φ is the standardized bivariateGaussian distribution with correlation parameter θ . t − ν is the inverse of the standardized univariate Student’s t distribution with ν degree of freedom and t ,ν isthe standardized bivariate Student’s t distribution with correlation coefficient θ and ν degree of freedom. .1 Semiparametric maximum likelihood estimation In view of (2), the log-likelihood function takes the form L ( θ ) = n X i =1 log (cid:16) c ( F ( x ) , G ( y ); θ ) (cid:17) + n X i =1 log (cid:16) f ( x ) (cid:17) + n X i =1 log (cid:16) g ( y ) (cid:17) . Hence the MLE of θ , which we denote by ˆ θ ML is the global maximizer of L ( θ ) and √ n (ˆ θ ML − θ )converges to a Gaussian distribution with mean zero, where θ is the true value. Since we assumethat the model is correctly specified and hence L ( θ ) is the correct log-likelihood, it follows thatthe MLE enjoys some optimality properties and hence is the preferred first option. If the modelis not correctly specified so that L ( θ ) is not the correct log-likelihood, then the maximizer of L ( θ ) is not the MLE and hence it may lose its preferred status.In MPL method, the marginal distributions have unknown functional forms. Estimation ofmarginal distributions are estimated non parametrically by their sample empirical distributions.Then, θ is estimated by the maximizer of the pseudo log-likelihood,ˆ θ MP L = arg max θ n X i =1 log (cid:16) c ( ˜ U i , ˜ V i ; θ ) (cid:17) , (4)where ( ˜ U i , ˜ V i ) , i = 1 , · · · , n , are the pseudo observations. The authors shall refer to (4) as themaximum pseudo likelihood (MPL) estimator of θ . Genest et al. (1995) and Tsukahara (2005)showed that ˆ θ MP L is consistent estimator. This non-linear optimization problem can easily besolved by Statistical programming language R or Mathematica.
Transformation method was introduced to kernel copula density estimation by Charpentier et al.(2007). The simple idea is to transform the data so that it is supported on the full R (instead ofthe unit cube). On this transformed domain, standard kernel techniques can be used to estimatethe density. An adequate back-transformation then yields an estimate of the copula density.The inverse of the standard Gaussian CDF is most commonly used for the transformation sinceit is known that kernel estimators tend to do well for Gaussian random variables.Let ( U i , V i ) i =1 ,...,n are independent and identically distributed observations from the bivariatecopula C and the purpose is to estimate the corresponding copula density function. Denote Φas the standard Gaussian distribution and φ as its first order derivative. Then ( S i , T i ) =(Φ − ( U i ) , Φ − ( V i )) is a random vector with Gaussian margins and copula C. According to (2),the corresponding density function can be written as f ( s, t ) = c (Φ( s ) , Φ( t )) φ ( s ) φ ( t ). Thus, anestimation of the copula density function can be given byˆ c ( PT ) n ( u, v ) = ˆ f n (Φ − ( u ) , Φ − ( v )) φ (Φ − ( u )) φ (Φ − ( v )) , ( u, v ) ∈ (0 , . (5)However, as the ( U i , V i ) are unavailable and one has to use ( ˆ S i , ˆ T i ) = (Φ − ( ˆ U i ) , Φ − ( ˆ V i ))the pseudo-transformed sample, instead. As a first natural idea, the standard kernel densityestimator for ˆ f n in (5) can be considered as follows:ˆ f n ( s, t ) = 1 n | H ST | n X i =1 K (cid:16) H − ST (cid:16) s − ˆ S i t − ˆ T i (cid:17)(cid:17) , K : R → R is a kernel function, and H ST = (cid:20) b n b n (cid:21) is a bandwidth matrix.This kernel estimator has asymptotic problems at the edges of the distribution support.To remedy this problem, local likelihood probit transformation ( LLPT ) method was recentlysuggested by Geenens et al. (2017). Instead of applying the standard kernel estimator, theylocally fit a polynomial to the log-density of the transformed sample. The advantages of esti-mating f ( s, t ) by local likelihood methods instead of raw kernel density estimation are relatedto the detailed discussion in Geenens (2014). This method can fix the boundary issues in anatural way and able to cope with unbounded copula densities. The notations are similar toones used in Geenens et al. (2017). Recently, Nagler (2018) with a comprehensive simulationstudy has shown that LLPT method for copula density estimation yields very good.Around ( s, t ) ∈ R and ( s ′ , t ′ ) close to ( s, t ), the local log-quadratic likelihood estimation oflog f ( s, t ) from the pseudo-transformed sample is defined as: logf ( s ′ , t ′ ) = a , ( s, t ) + a , ( s, t )( s ′ − s ) + a , ( s, t )( t ′ − t )+ a , ( s, t )( s ′ − s ) + a , ( s, t )( t ′ − t ) + a , ( s, t )( s ′ − s )( t ′ − t ) ≡ P a ( s ′ − s, t ′ − t ) . The vector a ( s, t ) ≡ ( a , ( s, t ) , · · · , a , ( s, t )) is then estimated by solving a weighted maximumlikelihood problem asˆ a ( s, t ) = arg max a n n X i =1 K (cid:16) H − ST (cid:16) s − ˆ S i t − ˆ T i (cid:17)(cid:17) P a ( ˆ S i − s, ˆ T i − t ) − n Z R K (cid:16) H − ST (cid:16) s − s ′ t − t ′ (cid:17)(cid:17) exp (cid:0) P a ( s ′ − s, t ′ − t ) (cid:1) ds ′ dt ′ o . Therefore, the estimation of f ( s, t ) is ˜ f p ( s, t ) = exp { ˆ a ( s, t ) } and thus LLPT estimator of acopula density is ˆ c ( LLPT ) n ( u, v ) = ˜ f p (Φ − ( u ) , Φ − ( v )) φ (Φ − ( u )) φ (Φ − ( v )) , ( u, v ) ∈ [0 , . (6)When the underlying density is on [0 , , the performance of the kernel estimator dependson the choice of the kernel function and the bandwidth (smoothing parameter). For bandwidthchoice, a practical approach is to consider the minimization of the AMISE on the level of thetransformed data. In this article, the bandwidth choice based on nearest-neighbor method (seeGeenens et al. (2017), Section 4). Initially, Chernoff (1952) proposed the Alpha-Divergence, which is a generalization of the KLdivergence. For some Alpha-Divergence investigations see, for example, Amari and Nagaoka(2000), Cichocki and Amari (2010), and Read and Cressie (2012). Alpha-Divergence measurecan be derived from Csisz´ar f-divergence if f ( t ) = t α − α ( t − − α ( α − , t ≥ , α = 0 ,
1. The Alpha-Divergence ( AD ) between two probability density functions f and f of a continuous randomvariable can be defined as: AD α ( f k f ) = 1 α ( α − (cid:16) Z [0 , f α ( x ) f − α ( x ) dx − (cid:17) , α ∈ R \ { , } . (7)5he AD divergence is non-negative and true equality to zero holds if and only if f ( x ) = f ( x ).If α →
1, the Kullback-Leibler divergence (KLD) can be obtained from equation (7). TheKullback-Leibler (KL) divergence between two densities f and f that was introduced byKullback and Leibler (1951) is given by KL ( f || f ) = Z R log f ( x ) dF ( x ) − Z R log f ( x ) dF ( x ) , where F ( x ) = R x −∞ f ( t ) dt . Also, two other special cases of Alpha-Divergence are Hellinger dis-tance and Neyman divergence that will be used in practice. The well-known Hellinger distance(HD) and Neyman (Neyman Chi-square) divergence (ND) can be obtained from equation (7)for α = 0 . α = 2, respectively as HD ( f k f ) = 14 AD / ( f k f ) = 12 Z R ( p f ( x ) − p f ( x )) dx,N D ( f k f ) = AD ( f k f ) = 12 Z R ( f ( x ) − f ( x )) f ( x ) dx. It is well known that maximizing the likelihood is equivalent to minimizing the KL di-vergence. Let c ( u, v ; θ ) be true copula density function associated with copula C. The MPLestimator is equivalent to minimum pseudo KL divergence (MPKLD) between copula densityestimation ˆ c ( u, v ) and true copula density c ( u, v ; θ ) and given byˆ θ MP KLD = arg min θ KL (ˆ c || c )= arg min θ Z [0 , log ˆ c ( u, v ) dC n ( u, v ) − Z [0 , log c ( u, v ; θ ) dC n ( u, v )= arg max θ Z [0 , log c ( u, v ; θ ) dC n ( u, v )= arg max θ n n X i =1 log (cid:16) c ( ˜ U i , ˜ V i ; θ ) (cid:17) ≡ ˆ θ MP L (8)The factor 1 /n in the equation (8) does not affect the attained arg max with respect to θ , andthe two approaches MPL and MPKLD gives the same result. The Alpha-Divergence betweencopula density estimation ˆ c ( u, v ) and true copula density c ( u, v ; θ ) to obtain MPAD estimationdefined as ˆ θ MP AD = arg min θ AD (ˆ c || c ).The minimum pseudo Hellinger distance (MPHD) is given byˆ θ MP HD = arg min θ HD (ˆ c || c ) = arg min θ Z [0 , ˆ c ( u, v ) (cid:16) − s c ( u, v ; θ )ˆ c ( u, v ) (cid:17) dudv = arg min θ Z [0 , (cid:16) − s c ( u, v ; θ )ˆ c ( u, v ) (cid:17) dC n ( u, v )= arg min θ n n X i =1 (cid:16) − s c ( ˜ U i , ˜ V i ; θ )ˆ c ( ˜ U i , ˜ V i ) (cid:17) . (9)6imilarly, the minimum pseudo Neyman divergence (MPND) defined asˆ θ MP ND = arg min θ N D (ˆ c || c ) = arg min θ Z [0 , ˆ c ( u, v ) (cid:16) − c ( u, v ; θ )ˆ c ( u, v ) (cid:17) dudv = arg min θ Z [0 , (cid:16) − c ( u, v ; θ )ˆ c ( u, v ) (cid:17) dC n ( u, v )= arg min θ n n X i =1 (cid:16) − c ( ˜ U i , ˜ V i ; θ )ˆ c ( ˜ U i , ˜ V i ) (cid:17) . (10)In practice, instead of ˆ c in equations (9) and (10), the local likelihood probit transformation es-timation of copula density (ˆ c ( LLPT ) n ) , which obtain from equation (6), will be used. Tsukahara(2005) explores the asymptotic properties of minimum distance estimators based on copula. Hefollowed Beran (1984) closely in investigating these properties. A simulation study was performed to compare the MPL estimator to the MPHD and MPNDestimators as special cases of minimum Alpha-Divergence estimator described in the Section4. All computations were performed using copula and kdecop packages in R software. Theaim of this simulation study is to compare the true parameter θ with the parameter estimateˆ θ , under the assumption that the copula’s parametric form is correctly selected. This aim isaccomplished by comparing the Bias, mean square error (MSE) and relative efficiency (rMSE)of the three approaches of copula parameter estimations that given by Bias (ˆ θ ) ≡ E (ˆ θ ) − θ,M SE (ˆ θ ) ≡ E (ˆ θ − θ ) ,rM SE (ˆ θ , ˆ θ ) ≡ q M SE (ˆ θ ) /M SE (ˆ θ ) . The data are generated from three Archimedean copulas such as Clayton, Gumbel, and Frankand two Elliptical copulas such as Gaussian and T ( ν =2 and ν =10) copulas with Kendall’s tau0.1, 0.2, 0.4, 0.6, and 0.8 that are presented in Table 1. These copulas cover different dependencestructures. Gaussian and Frank copulas exhibit symmetric and weak tail dependence in bothlower and upper tails. The Clayton copula exhibits strong left tail dependence and the Gumbelcopula has strong right tail dependence. In T copula with positive dependency and small degreesof freedom ( ν <
10) tail dependency occurs in both lower and upper tails and as the degree offreedom increases, dependency in the tail areas decreases (see Demarta and McNeil (2005)).Moreover, 1000 Monte Carlo samples of sizes n = 30, 75, and 150 are generated from each typeof copulas and the three estimates are computed: MPL, MPHD, and MPND. Results of the simulation study are presented in Tables 2-7. These tables present the Bias andMSE relative to the three estimators of the respective copulas for different values of samplesizes and Kendall’s tau. The simulation procedure was performed for the positive and negativevalues of Kendall’s tau and according to the symmetry of the obtained results, the results havebeen reported only for positive values of Kendall’s tau. As the results for the sample sizesgreater than 150 were in line with our expectation that the increase in sample size will improve7able 2: estimated Bias of the estimators for Archimedean copulas
Copula τ n = 30 n = 75 n = 150ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND
Clayton 0.1 0.0140 -0.0037 -0.0124 0.0095 -0.0022 -0.0088 0.0011 -0.0013 -0.00140.2 0.0288 -0.0180 -0.0973 0.0216 -0.0146 -0.0714 0.0107 -0.0129 -0.05820.4 0.0624 -0.0516 -0.1825 0.0334 -0.0376 -0.1306 0.0181 -0.0228 -0.11330.6 0.0807 -0.2256 -0.4554 0.0432 -0.1633 -0.3761 0.0347 -0.1119 -0.27900.8 0.1069 -0.4127 -0.8107 0.0844 -0.3835 -0.6848 0.0439 -0.2381 -0.5727Gumbel 0.1 0.0362 0.0157 -0.0359 0.0106 -0.0091 0.0217 0.0017 -0.0062 -0.01060.2 0.0373 -0.0219 -0.0329 0.0119 -0.0113 -0.0248 0.0021 -0.0076 -0.02130.4 0.0460 -0.0414 -0.0622 0.0124 -0.0328 -0.0575 0.0028 -0.0106 -0.04320.6 0.0730 -0.2323 -0.2425 0.0157 -0.1512 -0.1797 0.0045 -0.1357 -0.14270.8 0.1188 -0.5503 -0.5853 0.0319 -0.5195 -0.5455 0.0113 -0.3847 -0.4163Frank 0.1 0.0924 -0.0331 -0.0502 0.0744 -0.0229 -0.0371 0.0501 -0.0163 -0.01980.2 0.1222 -0.1032 -0.1172 0.0911 -0.0905 -0.0947 0.0685 -0.0737 -0.08500.4 0.1436 -0.1247 -0.1595 0.1271 -0.1060 -0.1361 0.0894 -0.0918 -0.11690.6 0.1588 -0.2594 -0.2994 0.1474 -0.2376 -0.2635 0.1208 -0.2004 -0.21270.8 0.1822 -0.3829 -0.4165 0.1658 -0.2992 -0.3487 0.1401 -0.2654 -0.3183
Table 3: estimated Bias of the estimators for Elliptical copulas
Copula τ n = 30 n = 75 n = 150ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND
Gaussian 0.1 -0.0171 -0.0093 0.0109 0.0129 -0.0063 0.0072 -0.0069 -0.0011 -0.00230.2 -0.0188 -0.0146 -0.0227 -0.0136 -0.0123 -0.0165 -0.0081 -0.0095 -0.01260.4 -0.0215 -0.0192 -0.0432 -0.0183 -0.0140 -0.0375 -0.0023 -0.0116 -0.02960.6 -0.0164 -0.0326 -0.0366 -0.0065 -0.0302 -0.0338 -0.0010 -0.0227 -0.02970.8 -0.0022 -0.0111 -0.0529 -0.0002 -0.0073 -0.0415 -0.0002 -0.0051 -0.0337 T ( ν = 2) 0.1 0.0284 0.0128 0.0159 0.0110 -0.0084 0.0127 -0.0039 -0.0026 0.01150.2 -0.0230 -0.0214 -0.0541 -0.0138 -0.0170 -0.0437 -0.0101 -0.0124 -0.03290.4 -0.0158 -0.0483 -0.0901 -0.0147 -0.0223 -0.0813 -0.0129 -0.0162 -0.06690.6 -0.0148 -0.0516 -0.1126 -0.0118 -0.0463 -0.0911 -0.0088 -0.0326 -0.07610.8 -0.0031 -0.0488 -0.0568 -0.0024 -0.0423 -0.0534 -0.0017 -0.0188 -0.0232 T ( ν = 10) 0.1 0.0258 0.0015 0.0129 0.0146 -0.0011 0.0112 0.0038 -0.0009 -0.00760.2 0.0065 -0.0042 -0.0268 0.0036 -0.0031 -0.0159 0.0005 -0.0024 -0.01250.4 0.0030 -0.0384 -0.0389 0.0011 -0.0268 -0.0313 0.0003 -0.0124 -0.02360.6 -0.0025 -0.0460 -0.0485 0.0009 -0.0314 -0.0375 0.0007 -0.0194 -0.03170.8 -0.0011 -0.0163 -0.0427 0.0002 -0.0141 -0.0206 0.0001 -0.0095 -0.0143 Table 4: estimated MSE of the estimators for Archimedean copulas
Copula τ n = 30 n = 75 n = 150ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND
Clayton 0.1 0.0791 0.0396 0.0742 0.0469 0.0256 0.0437 0.0161 0.0131 0.01810.2 0.0944 0.0689 0.0956 0.0533 0.0428 0.0632 0.0232 0.0216 0.02980.4 0.1092 0.0818 0.1206 0.0736 0.0622 0.1004 0.0341 0.0525 0.07370.6 0.2121 0.2925 0.3135 0.1391 0.2312 0.2402 0.0834 0.1753 0.20020.8 0.5243 0.8571 0.8686 0.4549 0.8129 0.8345 0.3227 0.7778 0.7902Gumbel 0.1 0.0282 0.0164 0.0260 0.0110 0.0087 0.0103 0.0055 0.0048 0.00820.2 0.0349 0.0226 0.0387 0.0199 0.0165 0.0236 0.0086 0.0079 0.01590.4 0.0486 0.0342 0.0603 0.0285 0.0260 0.0370 0.0121 0.0216 0.02780.6 0.1077 0.1185 0.1453 0.0595 0.0863 0.0894 0.0254 0.0537 0.06400.8 0.4591 0.7942 0.8325 0.3228 0.6535 0.6886 0.1488 0.3877 0.3988Frank 0.1 0.5431 0.4164 0.5143 0.4390 0.3680 0.4525 0.2375 0.2119 0.25960.2 0.5950 0.5167 0.5859 0.4520 0.4206 0.4767 0.2554 0.2611 0.29970.4 0.6116 0.5691 0.6437 0.4775 0.4692 0.5319 0.2693 0.2918 0.34870.6 0.6642 0.6984 0.7158 0.4831 0.5742 0.5983 0.3207 0.4379 0.51570.8 0.8096 0.8749 0.8967 0.6711 0.8494 0.8807 0.4098 0.7760 0.8616 the parameter estimation, the corresponding results were omitted from the tables for brevity.Also, the results show that the MPL method outperforms MPHD and MPND for sample sizes8able 5: estimated MSE of the estimators for Elliptical copulas
Copula τ n = 30 n = 75 n = 150ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND ˆ θ MPL ˆ θ MPHD ˆ θ MPND
Gaussian 0.1 0.0421 0.0218 0.0255 0.0178 0.0147 0.0196 0.0075 0.0071 0.01120.2 0.0270 0.0161 0.0216 0.0141 0.0124 0.0158 0.0070 0.0068 0.01080.4 0.0220 0.0141 0.0189 0.0109 0.0098 0.0138 0.0048 0.0062 0.01170.6 0.0085 0.0101 0.0126 0.0033 0.0061 0.0071 0.0015 0.0032 0.00480.8 0.0047 0.0069 0.0094 0.0020 0.0044 0.0053 0.0011 0.0027 0.0038 T ( ν = 2) 0.1 0.0442 0.0322 0.0343 0.0261 0.0211 0.0337 0.0204 0.0186 0.02960.2 0.0372 0.0305 0.0333 0.0205 0.0194 0.0310 0.0122 0.0160 0.02660.4 0.0324 0.0276 0.0327 0.0163 0.0172 0.0280 0.0088 0.0142 0.02170.6 0.0173 0.0248 0.0279 0.0066 0.0105 0.0219 0.0035 0.0089 0.01740.8 0.0042 0.0084 0.0139 0.0031 0.0083 0.0115 0.0013 0.0039 0.0082 T ( ν = 10) 0.1 0.0292 0.0251 0.0282 0.0218 0.0199 0.0241 0.0131 0.0126 0.01970.2 0.0275 0.0245 0.0273 0.0167 0.0159 0.0229 0.0091 0.0115 0.01590.4 0.0242 0.0226 0.0249 0.0139 0.0136 0.0204 0.0066 0.0090 0.01380.6 0.0096 0.0178 0.0182 0.0065 0.0141 0.0169 0.0032 0.0076 0.01110.8 0.0044 0.0091 0.0116 0.0025 0.0062 0.0094 0.0011 0.0033 0.0063 Table 6: estimated MSE of MPL estimator relative to the MPHD and MPND estimators (rMSE)in percent for Archimedean copulas
Copula τ rMSE (ˆ θ MPL , ˆ θ MPHD ) rMSE (ˆ θ MPL , ˆ θ MPND ) n = 30 n = 75 n = 150 n = 30 n = 75 n = 150Clayton 0.1 70.8 73.9 90.2 96.9 96.5 106.10.2 85.4 89.6 96.5 100.7 108.9 113.30.4 86.6 91.9 124.1 105.1 116.8 147.00.6 117.4 128.9 145.0 121.6 131.4 154.90.8 127.9 133.7 155.2 128.7 135.4 156.5Gumbel 0.1 76.3 89.0 93.7 95.9 96.9 122.70.2 80.5 91.0 95.8 105.3 108.8 135.80.4 84.0 95.5 133.7 111.4 113.8 151.90.6 104.9 120.4 145.3 116.1 122.6 158.60.8 131.5 142.3 161.4 134.7 146.1 163.7Frank 0.1 87.6 91.6 94.4 97.3 101.5 104.50.2 93.2 96.5 101.1 99.2 102.7 108.30.4 96.5 99.1 104.1 102.6 105.5 113.80.6 102.5 109.0 116.9 103.8 111.3 126.80.8 104.0 112.5 137.6 105.2 114.6 145.0 Table 7: estimated MSE of MPL estimator relative to the MPHD and MPND estimators (rMSE)in percent for Elliptical copulas
Copula τ rMSE (ˆ θ MPL , ˆ θ MPHD ) rMSE (ˆ θ MPL , ˆ θ MPND ) n = 30 n = 75 n = 150 n = 30 n = 75 n = 150Gaussian 0.1 72.0 90.8 97.4 77.8 104.8 122.10.2 77.2 93.8 99.1 89.5 105.8 124.70.4 80.3 95.1 113.2 92.9 112.6 155.50.6 109.1 136.0 146.9 121.4 147.1 178.80.8 120.7 148.9 153.8 140.8 164.6 182.9 T ( ν = 2) 0.1 85.4 90.0 95.4 88.1 113.5 120.50.2 90.6 97.3 114.3 94.6 123.1 147.30.4 92.3 102.7 127.1 100.5 131.0 157.20.6 119.9 126.0 159.5 127.2 182.0 222.80.8 141.0 163.5 172.0 181.6 192.1 250.2 T ( ν = 10) 0.1 92.7 95.5 98.1 98.2 105.0 122.50.2 94.5 97.7 112.5 99.7 117.2 132.30.4 96.6 99.0 117.2 101.4 121.0 145.30.6 136.2 147.5 154.4 137.4 161.1 185.90.8 144.3 157.1 169.1 162.9 193.2 234.2 n ≥ τ ≥ . n < τ < .
5) , Minimum Hellinger distance estimation outperforms MPLestimation method. Among the two new minimum distance estimators, the results show thatˆ θ MP HD is better than ˆ θ MP ND based on MSE in always. This advantage for ˆ θ MP HD is clearerin Archimedean copulas than in Elliptical copulas. Thus, there is no evident reason why onewould be inclined to use an ˆ θ MP ND . In addition to these results, the estimated bias seem tobe considerably higher for Archimedean copulas than for Elliptical copulas. In all tables, thebiases of the MPL estimators are almost always lower than the biases of the MPHD and MPNDestimators for the large sample size ( n >
An application of estimation methods is demonstrated to a given dataset in Hydrology. Wong et al.(2008) established a joint distribution function of drought intensity, duration, and severity byusing Gaussian and Gumbel copulas. Song and Singh (2010a) used several meta-elliptical cop-ulas in drought analysis and found that meta-Gaussian and T copula had a better fit. Ma et al.(2013) investigated the drought events in the Weihe river basin and selected the Gaussian and Tcopulas to model the joint distribution among drought duration, severity, and peaks. Recently,a very comprehensive book on the application of copula in Hydrology has been published byChen and Guo (2019) and the concepts in this section are taken from this book.McKee et al. (1993) proposed the concept of standardized precipitation index (SPI) basedon the long-term precipitation record for a specific period such as 1, 3, 6, 12, months, etc.Guttman (1998) recommended the use of SPI as a primary drought index because it is simple,spatially invariant in its interpretation, and probabilistic. Therefore, the SPI series is used forthis article. Fitting this long-term precipitation record to a probability distribution is the firststep to calculate SPI series. Once the probability distribution is determined, the cumulativeprobability of observed precipitation is computed and then inverse transformed by a standardGaussian distribution is equal to SPI series. A drought event is thus defined as a continuousperiod in which the SPI is below 0.The objective of this section is the estimation of copula parameter between drought char-acteristics (events) based on SPI, including drought duration, drought severity, and droughtinterval time. Drought characteristics are recognized as important factors in water resourceplanning and management. Drought duration ( D d ) is defined as the number of consecutive10ntervals (months) where SPI remains below the threshold value 0 (see Shiau (2006)). Droughtseverity ( S d ) is defined as a cumulative SPI value during a drought period, S d = P D d i =1 SP I i where SP I i means the SPI value in the ith month (see Mishra and Singh (2010)). The droughtinterval time ( I d S d , D d , and I d are used to copula parameter estimation. The estimation ofsample version of Kendall’s tau correlation coefficient (ˆ τ n ) of drought variables is calculated.The results confirm that two pairs ( S d , I d ) and ( D d , I d ) have positive and weak dependency.The values (ˆ τ n ) for two pairs ( S d , I d ) and ( D d , I d ) of drought variables are given in Table 8. d SP I − − − . . . . . . Empirical distribution of drought severity ( S d ) E m p i r i c a l d i s t r i bu t i on o f d r ough t i n t e r v a l ( I d ) . . . . . . Empirical distribution of drought duration ( D d ) E m p i r i c a l d i s t r i bu t i on o f d r ough t i n t e r v a l ( I d ) Figure 1: The 1-month SPI time series for the Masshad station [left panel] and scatter plots forthe empirical distributions of pair ( S d , I d ) [middle panel] and pair ( D d , I d ) [right panel]A goodness of fit testing procedure based on parameter estimations methods is applied. Inthe large scale Monte Carlo experiments carried out by Genest et al. (2009), the CvM statisticas S n = n Z [0 , (cid:16) C n ( u, v ) − C ˆ θ ( u, v ) (cid:17) dC n ( u, v ) = n X i =1 (cid:16) C n ( ˜ U i , ˜ V i ) − C ˆ θ ( ˜ U i , ˜ V i ) (cid:17) , gave the best results overall, where C n is the empirical copula defined in (3) and C ˆ θ is anestimator of C under the hypothesis that H : C ∈ C θ holds. The estimators ˆ θ of θ appearingin (4) and (9). An approximate P-Value for S n can be obtained by means of a parametricbootstrap-based procedure as described in Genest et al. (2009)One of the challenges that we face is the specification of a suitable copula. Since thereare a large number of copulas, specifying one that would suit a particular case in practice isnot easy. Therefore, a reasonable strategy is to consider different copulas and evaluate theirgoodness of fits. To this end, the Archimedean and Elliptical copulas in Table 1 are consideredthat have attracted considerable interest because of its flexibility and simplicity. The diagnosticchecks to investigate the dependence structure for pairs ( S d , I d ) and ( D d , I d ) suggested thatGumbel and Gaussian copulas fit well and better than the others considered. The Gumbel11able 8: Parameter estimates and summary statistics for the SPI-Mashhad data Pair Copula Method ˆ θ τ (ˆ θ ) S n P-Value AICGumbel MPL 1.4176 0.2946 0.0234 0.6287 -16.1803( S d , I d ) MPHD 1.3047 0.2335 0.0212 0.6418 -17.0441(ˆ τ n = 0 . D d , I d ) MPHD 1.5608 0.3593 0.0336 0.3390 -27.4128(ˆ τ n = 0 . and Gaussian copulas are fitted by the MPL and MPHD methods. The estimates and variousrelevant quantities are presented in Table 8.The scatter plots for the empirical distributions of pair ( S d , I d ) [middle panel] and pair( D d , I d ) [right panel] are shown in Figure 1. This figure shows that the points tend to concentratenear (1, 1). Thus, the Gumbel copula that have upper tail dependence appears to be moreappropriate for both two pairs. On the other hand, according to the values of the AkaikeInformation Criterion (AIC) in Table 8, it can be concluded that for both pairs ( S d , I d ) and( D d , I d ), the Gumbel copula is better suitable than Gaussian copula, because it has the leastvalue of AIC. The P-Values and values of statistic S n can be used to compare the goodnessof fits. These are given here just as a point of reference but we recognize that they do nothave the usual meaning of the P-Value. The large P-Values, for pair ( S d , I d ) based on S n would be 0.6418 for the Gumbel copula with parameter estimation by MPHD. Also, the largeP-Values, for pair ( D d , I d ) based on S n would be 0.3390 for the Gumbel copula with parameterestimation by MPHD. The values of the copula parameter are difficult to interpret, but thecorresponding values of the Kendall’s tau have more intuitive interpretations. By using therelations in Table 1, the values the Kendall’s tau corresponding to the different estimates of θ ( τ (ˆ θ )) are given in Table 8. Note that for pair ( S d , I d ), the Gumbel copula based on MPHDmethod has ˆ θ MP HD = 1 . τ (ˆ θ ) = 0 . τ (ˆ θ ) is nearly identical tothe non-parametric sample estimate, ˆ τ n = 0 . In this paper, two methods of copula parameter estimation based on Alpha-Divergence were pre-sented for some bivariate Archimedean and Elliptical copulas. The minimum of Kullback-Leiblerdivergence, Hellinger distance, and Neyman Divergence as special cases of Alpha-Divergencebased on pseudo observations were used to obtain the copula parameter estimation. The simu-lation results suggests that the minimum pseudo Hellinger distance estimation method has goodperformance in small sample size ( n < τ < .
5) situations whencompared with the MPL estimation methods for Archimedean and Elliptical copulas. Also, thesimulation results show that ˆ θ MP HD is better than ˆ θ MP ND in almost always. The estimationmethods were developed in the Goodness of fit test based on CvM distance for a data set inHydrology and the results show that the MPHD method is more accurate than MPL method.12 eferences
Amari, S. I., and Nagaoka, H. (2000).
Methods of information geometry (Vol. 191). AmericanMathematical Society.Beran, R. (1977).
Minimum Hellinger distance estimates for parametric models . The Annals ofStatistics, (3), 445-463.Beran, R. (1984).
30 Minimum distance procedures . Handbook of statistics, , 741-754.Cichocki, A., and Amari, S. I. (2010). Families of alpha-beta-and gamma-divergences: Flexibleand robust measures of similarities . Entropy, (6), 1532-1568.Charpentier, A., Fermanian, J. D., and Scaillet, O. (2007). The estimation of copulas: Theoryand practice. Copulas: From theory to application in finance , 35-64.Chen, L., and Guo, S. (2019).
Copulas and Its Application in Hydrology and Water Resources .Singapore: Springer.Chernoff, H. (1952).
A measure of asymptotic efficiency for tests of a hypothesis based on thesum of observations . The Annals of Mathematical Statistics, (4), 493-507.Deheuvels, P. (1979). La fonction de dependence empirique et ses proprietes, Un test nonparametrique d’independance . Bulletin de la classe des sciences, Academie Royale de Bel-gique, (65), 274-292.Demarta, S., and McNeil, A. J. (2005). The t copula and related copulas . International statisticalreview, (1), 111-129.Geenens, G. (2014). Probit transformation for kernel density estimation on the unit interval .Journal of the American Statistical Association, (505), 346-358.Geenens, G., Charpentier, A., and Paindaveine, D. (2017).
Probit transformation for nonpara-metric kernel estimation of the copula density . Bernoulli, (3), 1848-1873.Genest, C., Ghoudi, K., and Rivest, L. P. (1995). A semiparametric estimation procedure ofdependence parameters in multivariate families of distributions . Biometrika, (3), 543-552.Genest, C., Rmillard, B., and Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A reviewand a power study . Insurance: Mathematics and Economics, (2), 199-213.Guttman, N. B. (1998). Comparing the palmer drought index and the standardized precipitationindex 1 . JAWRA Journal of the American Water Resources Association, (1), 113-121.Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method for copula-basedmodels. Journal of Multivariate Analysis, 94(2), 401-419.Kim, G., Silvapulle, M. J., and Silvapulle, P. (2007). Comparison of semiparametric and para-metric methods for estimating copulas . Computational Statistics and Data Analysis, (6),2836-2850.Kullback, S., and Leibler, R. A. (1951). On information and sufficiency . The Annals of Math-ematical Statistics, (1), 79-86.Ma, M., Song, S., Ren, L., Jiang, S., and Song, J. (2013). Multivariate drought characteristicsusing trivariate Gaussian and Student t copulas . Hydrological processes, (8), 1175-1190.13cKee, T. B., Doesken, N. J., and Kleist, J. (1993, January). The relationship of droughtfrequency and duration to time scales . In Proceedings of the 8th Conference on AppliedClimatology, (22), 179-183.Millar, P. W. (1981). Robust estimation via minimum distance methods . Zeitschrift frWahrscheinlichkeitstheorie und verwandte Gebiete, (1), 73-89.Mishra, A. K., and Singh, V. P. (2010). A review of drought concepts . Journal of hydrology, (1-2), 202-216.Nagler, T. (2018). kdecopula: An R Package for the Kernel Estimation of Bivariate CopulaDensities . Journal of Statistical Software (7), 1-22.Rao, P. V., Schuster, E. F., and Littell, R. C. (1975). Estimation of shift and center of symmetrybased on Kolmogorov-Smirnov statistics . The Annals of Statistics, 862-873.Read, T. R., and Cressie, N. A. (2012).
Goodness-of-fit statistics for discrete multivariate data .Springer Science and Business Media.Shiau, J. T. (2006).
Fitting drought duration and severity with two-dimensional copulas . Waterresources management, (5), 795-815.Sklar, M. (1959). Fonctions de repartition an dimensions et leurs marges . Publ. inst. statist.univ. Paris, , 229-231.Song, S., and Singh, V. P. (2010). Meta-elliptical copulas for drought frequency analysis ofperiodic hydrologic data . Stochastic Environmental Research and Risk Assessment, (3),425-444.Song, S., and Singh, V. P. (2010). Frequency analysis of droughts using the Plackett copulaand parameter estimation by genetic algorithm . Stochastic Environmental Research and RiskAssessment, (5), 783-805.Tsukahara, H. (2005). Semiparametric estimation in copula models . Canadian Journal of Statis-tics, (3), 357-375.Weiß, G. (2011). Copula parameter estimation by maximum-likelihood and minimum-distanceestimators: a simulation study . Computational Statistics, (1), 31-54.Wong, G., Lambert, M. F., and Metcalfe, A. V. (2007). Trivariate copulas for characterisationof droughts . Anziam Journal,49