Binary Outcome Copula Regression Model with Sampling Gradient Fitting
aa r X i v : . [ s t a t . M E ] J a n Statistica Sinica 1
Binary Outcome Copula Regression Modelwith Sampling Gradient Fitting
Weijian Luo , Mai Wo School of Mathematical Sciences ,National School of Development, Peking University
Abstract:Use copula to model dependency of variable extends multivariate gaus-sian assumption. In this paper we first empirically studied copula regression modelwith continous response. Both simulation study and real data study are given.Secondly we give a novel copula regression model with binary outcome, and wepropose a score gradient estimation algorithms to fit the model. Both simulationstudy and real data study are given for our model and fitting algorithm.Key words and phrases:
Copula regression; binary outcome; semi-parametricestimation; gradient estimation; sampling;
1. Introduction
Copula has been a powerful mathematical tool for modelling dependencestructure of variables in last decades. Assume X = ( X , ..., X d ) T be a ran-dom vector of dimension d > = 1 and Y be a random variable of our most concern, which means we take Y as our response. Further assume X i eachhas culumlative distribution F i and density f i while Y has cumulative distri-bution F and density f . A copula function is defined as C θ ( u , ..., u d , v ) = P ( F ( X ) ≤ u , ..., F d ( X d ) ≤ u d , F ( Y ) ≤ u ). It is quite clear that copulafunction is joint cumulative function with uniform marginal distribution.Assume the copula function of ( X , ..., X d , Y ) is C θ ( u , ..., u d , v ).Sklar’s the-orem claim joint cumulative function of ( X , ..., X d , Y ) can be expressed viaa composition of a copula function and marginal cumulative function, whichmeans P ( X ≤ x , ..., X d ≤ x d , Y ≤ y ) = C θ ( F ( x ) , ..., F d ( x d ) , F ( y )). Copula formulation give a clear seperation of marginal distribution anddependence structure. Use of Copula has been found in wide variaty of ap-plied science such like quantitative risk control Wu Yijun et al. (2011) andstatistical modelling. Intense research on copula based statistical methodhas been proposed. Chapman et al. (1951) introduced copula at the firsttime. Sklar, A (1959) give a theoretical background for copula. Most clas-sical results on copula can be found in Nelsen (2000) . Galiani. (2003)propose a copula application for financial derivative products management.Parsa et al. (2011) and Noh et al. (2013) propose a regression model basedon copula, which they give name as copula regression.Even in recent years, copula based regression method have been continously proposed. Rainer et al.(2012) and Radice et al. (2016) have studied copula regression for binaryoutcomes.Joint distributions with special copulas have shown properties differentfrom joint gaussian assumption. So copula has brought researcher a goodview to go beyond joint gaussian distribution. Interesting properties liketail dependency of copula has motivated statistical community to persis-tently do research on copula. Since copula has been used sucessfully inquantitative risk control. Most work focus on simple modelling dependencyand extreme behavior of multivariate variables using copula, few have triedto use copula to do regression inference or classification prediction.A in-teresting idea is to use copula based method to do regression inference.Parsa et al. (2011) proposes a copula regression method. Noh et al. (2013)propose a copula-based regression method and analyze the asymptotic prop-erty. In this paper, will first give a implemention of copula based regressionmodel, and thus we will give and analyze a copula based model estimationwith binary outcomes based on latent variable model corresponding realdata experiment.
2. Preliminaries2.1 Copulas and Backgrounds
Assume we have variables ( X , ..., X d , Y ). Assume variables { X i ≤ i ≤ d } has dependence with response variable Y while each covariate X i and X j also has dependence to each other. A elegent formulation of the model isto give a copula dependence among all variables. Assume ( X , ..., X d , Y )has Copula C θ ( u , ..., u d , v ) which represents variables dependence. Assumeeach variable has marginal distribution F , ..., F d , F . Joint cumulative dis-tribution function is naturally P ( X ≤ x , ..., X d ≤ X d , Y ≤ Y ) = C θ ( F ( x ) , ..., F d ( x d ) , F ( y ))Before we go further, we give some basic lemma to demonstrate the model. Lemma 1.
Assume ( X , ..., X d , Y ) has copula smooth C θ ( u , ..., u d , v ) , andmarginal cumulative distribution(density) F i ( f i ) , then ( X , ..., X d , Y ) hasjoint density: f ( x , ..., x d , y ) = c θ ( F ( x ) , ..., F d ( x d ) , F ( y ))Π di =1 f i ( x i ) where c θ ( u , ..., u d , v ) = ∂ d +1 C θ ( u ,...,u d ,v ) ∂u ∂u ...∂v The lemma gives a relation from cumulative distribution function anddensity function;.2 General Copula Regression method Lemma 2.
Assume ( X , ..., X d , Y ) has copula smooth C θ ( u , ..., u d , v ) , andmarginal cumulative distribution(density) F i ( f i ) , then conditional mean ofvariable Y given covariates ~X = ~x is: m ( x ) = E ( Y | X ) = R yc θ ( F ( x ) , ..., F ( y ))Π f i ( x i ) f ( y ) dyC X ( F ( ~X )) where C X ( u ) = ∂ d C θ ( u ,...,u d ,v =1) ∂u ∂u ...∂u d For many copula families, m ( x ) may have closed or informative expres-sion, we list some copulas for example. Example 1.
Assume ρ ( corr ( Y, X ) , ..., corr ( Y, X d )) T and Σ X denote thecorrelation matrix of X. If the copula of ( Y, X T ) T is Gaussian Copula, thenwe have: m ( x ) = E [ F − (Φ( u T Σ − X ρ + q − ρ T Σ − X ρZ ))] where u = (Φ − ( F ( x ) , ..., Φ − ( F d ( x d )))) T and Z ∼ N (0 , Linear models or Generalized linear model are popular statistical model forcontinous or categorical response prediction. As for short, we take linearregression model for short, generalized linear model can be viewed as aextension of linear regression model. Linear regression model often modelthe conditional expectation of Y given X as a linear function, however this.2 General Copula Regression method may lead to lack of fit because of simplicity of liearn function. Another viewon linear regression can be derived via generative modelling which leads usto consider copula regression and classification later. Assume variables( X , ..., X d , Y ) has a joint gaussian distribution with mean µ = ( µ Tx , µ y ) T and covariance matrix Σ = Σ xx Σ xy Σ xy Σ yy ,then E ( Y | X ) = µ y − Σ yx Σ − xx ( µ x − X )So under assumption ( X , ..., X d , Y ) has joint gaussian distribution, theprediction function is natrually a linear function. Evidences have beenproposed that in many real world problem, joint distribution of covariatesand response is far from gaussiann Rainer et al. (2012). Embrechts et al.(2002) show how the Pearson correlation coefficient can be misleading whenthe underlying distributions are not normal. They advise using copulas tomodel data that are not normal because such models capture a greater va-riety of relationships (essentially being nonparametric). So there is needfor statistic models to caputure more complex dependence structure amongvariables. Noh et al. (2013) have proposed a generic method to use cop-ula dependence and thus fit regression model. In his work he propsed tofit model in a semi-parametric way. Other methods for fitting model havebeen proposed either. Chib et al. (2007) and Marra et al. (2013) introduced.2 General Copula Regression method Bayesian and likelihood estimation methods based on penalized splines.e,Rainer et al. (2012) discussed a modification of the recursive bivariate pro-bit that maintains the Gaussian assumption for the marginal distributionsof the two equations while introducing non-Gaussian dependence betweenthem using the Frank and Clayton copulas. However, these metods onlyconsider bivariate case while multivariate case is largely different from it. Inthis part, we do an empirical study of copula regression model in Noh et al.(2013) and discuss its pros and cons, while in latter section we will deriveour copula regression model with binary outcome motivated by Noh et al.(2013).Assume ( X , ..., X d , Y ) has copula smooth C θ ( u , ..., u d , v ), and marginalcumulative distribution(density) F i ( f i ). Noh2013 proposed to esitmiatemarginal distribution with non-parametric method while fit maximum like-lihood for parametric copula family. More clearly, they use kernel smoothedestimation ¯ F j ( x i ) = 1 n Σ K ( x i − X i,j h )to be estimation of marginal distribution. As for copula estimation, thereare a bunch of method which can estimate copula’s parameter. Nonpara-metric methods for estimating c include kernel smoothing estimators (seefor example Gijbels et al. (1990), Charpentier et al. (2006) and Chen et al..3 Simulation Study for Copula Regression (2010)) and Bernstein estimator (see Chen et al. (2013)). In spite of thegreat flexibility of nonparametric methods, they are typically affected by thecurse of dimensionality and they come with the difficult problem of select-ing a good smoothing parameter. On the other hand imposing a parametricstructure on both the copula and marginal distributions can lead to severelybiased and inconsistent (fully parametric) estimator in case of misspecifi-cation. So a non-parametric marginal together with a parametric copulaestimation is supposed to be considered. The objective of this section is to compare the semi-parametric copula re-gression estimator proposed by Noh2013 with OLS both when the truecopula family is known and when the copula family and its parameters areadaptively selected using the data. To this end, we consider the followingdata generating procedures (DGPs): • DGP I.a ( F ( Y ) , F ( X )) ∼ Clayton copula with parameter δ =1; Y ∼ N ( µ Y = 1 , σ Y = 1), X ∼ N ( µ X = 0 , σ X = 1). Theresulting regression function is m ( x ) = µ Y + E [ σ Y Φ − ( T − /δ )], where T ∼ f T ( t ) = (1 /δ + 1)(1 + ξ ) (1 /δ +1) / ( t + ξ ) (1 /δ +2) for t > ξ = F X ( x ) − δ − • DGP I.b ( F ( Y ) , F ( X )) ∼ FGM copula with parameter θ = 0 . Y ∼ N ( µ Y = 0 , σ Y = 1), X is generated from the Gumbel distribu-tion F X ( x ) = 1 − exp ( − exp ( x )). The resulting regression functionis m ( x ) = µ Y − θ √ π σ Y + 2 θ √ π σ Y F ( x ). • DGP I.c ( F ( Y ) , F ( X ) , ...F d ( X d )) ∼ Gaussian copula with corre-lation matrix Σ = (cid:2) ρ T ρ Σ X (cid:3) , where ρ is a d-dimensional vector; Y ∼U (0 , X j ∼ N ( µ X j = 0 , σ X j = 1) , j = 1 , ...d , d = 3. We choosecorrelation matrix as Σ = (cid:0) .
23 0 .
90 0 . .
23 1 0 .
51 0 . .
90 0 .
51 1 0 . .
67 0 .
26 0 .
49 1 (cid:1) . The resulting regres-sion function is m ( x ) = Φ( P dj =1 a j √ − ρ T a Φ − ( F j ( x j ))), where a =( a , ..., a d ) T ≡ Σ − X ρ . • DGP II.a the same as DGP I.c • DGP II.b ( F ( Y ) , F ( X ) , ...F d ( X d )) ∼ R-Vine copula with the samestructure and parameters as the illustrating example in the help pageof function RVineMatrix of R package
VineCopula ; Y ∼ U (0 , X j ∼N ( µ X j = 0 , σ X j = 1) , j = 1 , ...d , d = 4. • DGP II.c ( F ( Y ) , F ( X ) , ...F d ( X d )) ∼ Clayton copula with param-eter δ = 1; Y is generated from the Beta distribution with parameters α = 0 . , β = 0 . X j ∼ N ( µ X j = 0 , σ X j = 1) , j = 1 , ...d , d = 2. • DGP II.d ( F ( Y ) , F ( X ) , ...F d ( X d )) ∼ T copula with correlation.3 Simulation Study for Copula Regression matrix Σ as DGP I.c and degree of freedom df = 5; Y is generatedfrom the Beta distribution with parameters α = 0 . , β = 0 . X j ∼N ( µ X j = 0 , σ X j = 1) , j = 1 , ...d , d = 3.As mentioned above, we conduct two set of simulation studies. In the firstpart, data are generated from DGP I.a to DGP I.c, and when we estimatecopula regression, we know the true copula structure and estimate cop-ula parameters using pseudo-MLE. In the second part, data are generatedfrom DGP II.a to DGP II.d, and when we estimate copula regression, weadaptively select copula structure and parameters from data. In all sevenexperiments, we do simulations N = 200 times, each time with a data sam-ple of n = 100 observations. Then we calculate IMSE, IBIAS and IVARin a fixed evaluation set with I = 150 observations in each experiment asfollows: IM SE = 1 N N X l =1 ISE ( ˆ m ( l ) ) ≡ N N X l =1 [ 1 I I X i =1 ( ˆ m ( l ) ( x i ) − m ( x i )) ]= 1 I I X i =1 ( m ( x i ) − ¯ˆ m ( x i )) + 1 I I X i =1 [ 1 N N X l =1 ( ˆ m ( l ) ( x i ) − ¯ˆ m ( x i )) ] ≡ IBIAS + IV AR where { ( y i , x i ) , i = 1 , ...I } is a fixed evaluation set, which correspondsto a random sample of size I = 150 generated from the DGP, ˆ m ( l ) ( · ) isthe estimated regression function from the l th data sample and ¯ˆ m ( x i ) = N − P Nl =1 ˆ m ( l ) ( x i ). We should point out that in the second part of simu-.3 Simulation Study for Copula Regression lation study, multi-dimensional X makes it difficult to calculate the trueregression function m ( x ). So we replace m ( x i ) with y i in the calculation ofIMSE and IBIAS. Then the variances of error terms y i − m ( x i ) are includedin IMSE and IBIAS, making them larger than their counterparts in the firstpart of simulation study. In this part, data are generated from DGP I.a to DGP I.c and when weestimate copula regression, we know the true copula structure and thenestimate parameters using pseudo-MLE. Table 1 shows the IMSE togetherwith the IBIAS and the IVAR of copula regression and OLS (with inter-cept). In all three settings, copula regression has much lower bias but highervariance than OLS. Together, copula regression attains lower IMSE. Thissimulation study reveals the potential of copula regression. But to apply itpractically, we need to adaptively select copula structure and parametersfrom data, which is dealt with in the following section.
In this part, data are generated from DGP II.a to DGP II.d and we adap-tively select copula structure and parameters from data. This step may be.3 Simulation Study for Copula Regression Table 1: Copula regression and OLS: Known Copula StructureY margin copula IMSE IBIAS IVARcopula OLS copula OLS copula OLSnormal Clayton 0.0197 0.0611 0.0031 0.0463 0.0166 0.0147FGM 0.0159 0.0241 0.0006 0.0068 0.0153 0.0173Uniform Gaussian 0.0010 0.0037 0.0001 0.0033 0.0009 0.0004difficult especially when the number of covariates is large. The reason isthat the set of high-dimensional copulas available in the literature is limitedto very special and restrictive copula families such as elliptical copulas andArchimedean copulas. For this reason, we make use of the recent avail-able work about the simplified pair-copula decomposition. The main ideais to decompose a multivariate copula to a cascade of bivariate copulas sothat we can take advantage of the relative simplicity of bivariate copulaselection and estimation. In our simulation, we choose one decomposition(R-Vine structure) for the data and use R package
VineCopula to selectcopula structure and estimate parameters.Again, we compare copula regression with OLS (with intercept). In all foursettings, copula regression has lower bias but higher variance than OLS. To-gether, copula regression attains lower IMSE. Note that now the variances.4 Real Data Study for Copula Regression of error terms are also included in IMSE and IBIAS, so they are higher inthe Gaussian setting compared to I.c. This simulation study illustrates thatcopula regression may have higher prediction power than OLS practicallyand in the next section, we will show some evidence in the real data.Table 2: Copula regression and OLS: Unknown Copula StructureY margin copula IMSE IBIAS IVARcopula OLS copula OLS copula OLSUniform Gaussian 0.0077 0.0078 0.0054 0.0074 0.0023 0.0004R-Vine 0.0116 0.0136 0.0083 0.0129 0.0033 0.0007Beta(0.5,0.5) Clayton 0.0887 0.0910 0.0854 0.0881 0.0033 0.0029T 0.0126 0.0204 0.0082 0.0191 0.0043 0.0013 In this section, we analyze Boston Housing Data. The data consist of 506observations with 14 variables. The dependent variable is MEDV, the me-dian value of owner-occupied homes in $ teacher ratio etc. To estimate regression function, we consider 4 methods: • (OLS) Least-squre estimator • (GAM) Generalized additive model estimator • (CART) Classification and Regression Tree • (CR) Copula regression methodWe use a smoothing spline to fit GAM using function gam of R package mgcv .For CART, we use R package rpart and choose complexity parameterequals 0.001. As an evaluation measure of each estimator, we randomlysplit the data into training set ( n = 337) and testing set ( n = 169) for 100times and calculate the mean and standard error of MSE in the test setfor each estimator. Table 3 shows that, copula regression can attain muchlower prediction error than OLS, slightly lower than CART and almost thesame as GAM. As for the stability of prediction power, copula regression hasslightly higher MSE standard error than OLS and CART, but much lowerthan that of GAM. All together, copula regression can balance predictionprecision and stability and do fairly good job in the real data. Table 3: Real Data Comparison of OLS, GAM, CART and CRMSE OLS GAM CART CRmean 0.465 0.295 0.318 0.296sd 0.0670 0.2302 0.0711 0.0781
3. Binary outcome model3.1 Latent Variable formulation
In this section, we give our formulation of Binary outcome regression modelalong with our proposed sampling based fitting method. Recape last sec-tion, we give a copula regression model and corresponding semi-parametricfitting method. Evidence have shown the model can perform well if bothcovariate and response are continous. In real world applications, variableswith binary outcome play an important role. In medical field, doctor usespatients’ observed varaible to predict whether a patient has certain disease.In individual credit risk management field, bank uses custormers’ observedvariable to judge if a custormer may default in near future or not. There arecases revealed the importance of prediction for binary response. Classicalmodel such as logistic regression or linear discriminant analysis has beenproposed for prediction of binary variables. However the simplicity of linear.1 Latent Variable formulation fomula reduces the dependency structure for varaibles and will lead to lackof fit. In this paper, we consider use a latent variable model with copula tomodel varaible dependency. Assume we have ( X , ..., X n , Y ) are observeddata where Y takes value from { , } and X i are continous variable takingvalues in R . Assume the relation ship between X and Y are connectedfrom one latent variable Z ∈ [0 , Y | X, Z ≡ Y | Z ∼ Ber ( Z )( X, Z ) ∼ c θ ( F X ( x ) , F Z ( z )) f X ( x ) f Z ( z )where c θ ( u, v ) is the Copula Density of (
X, Z ). The nature of the modelcan be interpreted as, the response Y is determind by latent probability Z via a bernoulli experiment Y | X ∼ Ber ( Z ), while the covariates X and la-tent probability Z share joint distribution density c θ ( F X ( x ) , F Z ( z )) f X ( x ) f Z ( z ).Our attempt to use latent variable to reveal the connection between covari-ates and binary outcome is not the first one. Rainer et al. (2012) andRadice et al. (2016) have proposed a latent probit model with copula de-pendency, but they assumed a gaussian latent variable Z which does nothave much explainable meaning. To our best knowledge, we are the first oneto use a bernoulli response with latent probability variable with copula torepresent dependency. It is benefitial to use bernoulli response to model the.2 Fitting Alogorithm binary outcomes. One benefit is if we assume the marginal distribution oflatent probability has a beta distribution form, the nature property of onepeak for beta distribution can interpret the prior response strength, whilecovaraites X are then to adjust the response strength. The second benifitis once we fit the parameter of the model, the natural conditonal mean can E ( Z | X ) is the prediction probability for an individual obeservation. Thefunction m ( x ) = E ( Z | X = x ) can not only predict response but also theprobability with which the response will take 1 or 0. The probability is ofgreat importance in many statistical application such as individual creditscoring or custormer click rate prediction. In the following part we will giveour proposed methods for fitting the model. Assume variables (
X, Y ) , X ∈ R , Y ∈ { , } are observed variable, Z ∈ [0 ,
1] is the latent variable. Assume Y | Z, X ∼ Ber ( Z ) and ( X, Z ) ∼ c θ ( F X ( x ) , F φ ( z )) f X ( x ) f φ ( z ). The joint density of ( X, Y, Z ) is P ( x, y, z ) = p ( x, z ) p ( y | z ) = c θ ( F X ( x ) , F φ ( z )) f X ( x ) f φ ( z ) × z y (1 − z ) − y The likelihood for parameter ( θ, φ ) is L ( θ, φ ) = p ( x, y ) = Z c θ ( F X ( x ) , F φ ( z )) f X ( x ) f φ ( z ) × z y (1 − z ) − y dz .2 Fitting Alogorithm The derivative for likelihood wrt parameters are under regularity con-dition: ∂L ( θ, φ ) ∂θ = Z ∂c θ ( F X ( x ) , F φ ( z )) ∂θ f X ( x ) f φ ( z ) × z y (1 − z ) − y dz∂L ( θ, φ ) ∂φ = Z [ ∂c θ ( u, v ) ∂v ∂F φ ( z ) ∂φ | u = F X ( x ) v = F φ ( z ) + c θ ( F X ( x ) , F φ ( z )) ∂f φ ( z ) ∂φ f φ ( z ) ] × f X ( x ) z y (1 − z ) − y f φ ( z ) dz For F X ( . ), we can use non-parametric method like kernel smoothingmethod to estimate. It is clear in most case the integral will not have explicitformula, but one fortunate thing is that because of good structure of model,the integral can be interpreted as a expectation for Z if Z have density f φ ( z ). The fact means under current parameters ( θ, φ ), one can sample( z , ..., z K ) ∼ f φ ( z ) and use sample mean to estimate true parameters.ˆ L ( θ, φ ) θ = [Σ Kk =1 ∂c θ ( F X ( x ) , F φ ( z k )) ∂θ f X ( x ) × z yk (1 − z k ) − y ] /K ˆ L ( θ, φ ) φ = [Σ Kk =1 ∂c θ ( u, v ) ∂v ∂F φ ( z k ) ∂φ | u = F X ( x ) v = F φ ( z k ) + c θ ( F X ( x ) , F φ ( z k )) ∂f φ ( z k ) ∂φ f φ ( z k ) ] × f X ( x ) z yk (1 − z k ) − y /K In practice, the likelihood is strictly bounded in [0 , likelihood, and we briefly give the formula here: ∂l ( θ, φ ) ∂θ = R ∂c θ ( F X ( x ) ,F φ ( z )) ∂θ f φ ( z ) × z y (1 − z ) − y dz R c θ ( F X ( x ) , F φ ( z )) f φ ( z ) × z y (1 − z ) − y dz∂l ( θ, φ ) ∂φ = R [ ∂c θ ( u,v ) ∂v ∂F φ ( z ) ∂φ | u = F X ( x ) v = F φ ( z ) + c θ ( F X ( x ) , F φ ( z )) ∂fφ ( z ) ∂φ f φ ( z ) ] × z y (1 − z ) − y f φ ( z ) dz R c θ ( F X ( x ) , F φ ( z )) f φ ( z ) × z y (1 − z ) − y dz where l ( θ, φ ) = log[ R c θ ( F X ( x ) , F φ ( z )) f φ ( z ) × f X ( x ) z y (1 − z ) − y dz ]Our algorithm is show as: Input:
Observed Data ( X i , Y i ) , ≤ i ≤ N Output:
Fitted Parameter (ˆ θ, ˆ φ ),stepsize ǫ initialize ( θ , φ ) for t in M ax Iter do Sample N × K samples ( z nk ) ˆ g θ = Σ nk ∂cθ ( FX ( xn ) ,Fφ ( znk )) ∂θ × z ynnk (1 − z nk ) − yn Σ nk c θ ( F X ( x n ) ,F φ ( z nk )) × z ynnk (1 − z nk ) − yn ˆ g φ = Σ nk [ ∂cθ ( u,v ) ∂v ∂Fφ ( z ) ∂φ | u = FX ( xn ) v = Fφ ( znk ) + c θ ( F X ( x n ) ,F φ ( z nk )) ∂fφ ( znk ) ∂φfφ ( znk ) ] × z ynnk (1 − z nk ) − yn Σ nk c θ ( F X ( x n ) ,F φ ( z nk )) × z ynnk (1 − z nk ) − yn θ t +1 = θ t + ǫ ˆ g θ φ t +1 = φ t + ǫ ˆ g φ endAlgorithm 1: Sampling Gradient Fitting Algorithm.3 Some examples of the method In this section, we give some examples with different c θ ( u, v ) and f φ ( z ) toget explicit update formula with our algorithm. Proposition I
When c θ is Guassian copula, we have ∂l (Σ , φ ) ∂ Σ = R z y (1 − z ) − y c Σ ( F X ( x ) , F φ ( z ))[ Σ − tt T Σ − − Σ − ] f φ ( z ) dz R z y (1 − z ) − y c Σ ( F X ( x ) , F φ ( z )) f φ ( z ) dz where t = (Φ − ( F φ ( z )) , Φ − ( F X ( x )) , ... Φ − ( F X d ( x d ))) T , Σ is the correla-tion matrix of Guassian copula. This section aims to evaluate the proposed binary output copula regressionmethod. We compare this new method with logit regression to gain someinsight about its performance. To this end, we consider the following datagenerating procedures (DGPs): • DGP III.a ( F ( Z ) , F ( X ) , ...F d ( X d )) ∼ Clayton copula with param-eter δ = 1; Z ∼ U (0 , X j ∼ N ( µ X j = 0 , σ X j = 1), d = 3. Theoutcome Y is subject to Bernoulli distribution with success rate Z . • DGP III.b
The structure of (
Z, X , ...X d ) is the same as DGP I.c.The outcome Y is subject to Bernoulli distribution with success rate Z ..5 Real Data Studies • DGP III.c ( X , ...X d ) , d = 4 is generated from multivariate normaldistribution which has correlation matrix as in DGP I.c and standardnormal marginal distribution. Then we generate Z = sigmoid ( Xβ ),where β = (1 , − , − , T . The outcome Y is subject to Bernoullidistribution with success rate Z .We first generate a data sample of n = 300 observations from DGP III.a toDGP III.c. After that, we randomly split the data sample into training set(200 obs) and testing set (100 obs). We estimate binary-outcome copularegression and logit regression on the training set and then calculate AUCand KS-value for them on the testing set. Table 4 below shows the averageAUC and KS-value for the two method. On average, our binary-outcomecopula regression method attains slightly better AUC and KS-value thanlogit regression. In this section, we analyze Breast-Cancer-Wisconsin Data. The data con-sist of 699 observations with 10 variables. The outcome variable is CLASS,whether the cancer is benign or malignant. The independent variables in-clude clump thickness, uniformity of cell size and shape, marginal adhesionetc. We consider 4 classification algorithms:.5 Real Data Studies Table 4: BOCR and LogitAUC KS valuelogit BOCR logit BOCRClayton 0.639 0.651 0.284 0.295Guassian 0.747 0.742 0.375 0.408Logit 0.604 0.612 0.218 0.242 • (Logit) Logit regression • (CART) Classification and Regression Tree • (SVM) Supporting Vector Machine • (BOCR) Binary-Outcome Copula RegressionWe use R package rpart to estimate CART and package e1071 to es-timate SVM. We randomly split the data into training set (n=466) andtesting set (n=233) and calculate AUC and KS-value in the testing set foreach method. Table 5 shows that, BOCR can attain similar AUC and KS-value as the other three methods. From the results, it seems that it is fairlyeasy to classify cancer to benign or malignant since all four methods attainKS-value higher than 0.9! Table 5: Real Data Analysis for BOCRlogit CART SVM BOCRAUC 0.9956 0.9780 0.9964 0.9964KS value 0.9482 0.9350 0.9539 0.9548
4. Conclusion and Future Work
In this paper, we empirically investigated copula regression model and pro-posed a binary outcome copula regression model. We give a sampling gra-dient method for fitting the model together with simulation study and realdata study of the model. Evidence has show our model can overwhelmlogistic regression model and machine learning models such as CART andSVM. To our best knowledge, our model is the first copula based model todeal with multivariate variables with single binary outcome. Also we arethe first attempt to introduce Sampling Gradient Estimation in fitting suchmodels. However, there are still manys work to do in future. We brieflydiscuss two aspects.One future work may be evaluation of various copula functions underour model framework. In our paper we mainly focus on Gaussian Copulaas a template, which means nearly all other copulas can be evaluated in the same way we do.Another future work may be Variance Reduction in Sample GradientEstimation Procedure. One knows their are bunches of methods to reducevariance for Monte Carlo Estimation, some of them include importancesampling, Rao-Blackwellization et-al. The application of variance reductiontricks in our model will be an interesting research direction. Supplementary MaterialAcknowledgements
This paper is motivated by our final project for
Multivariate Statistics set up by Guanghua School of Management, Peking University, 2020 fall. Inclass, professor Chen Songxi have offered great help for us. In this section,we want to say sincere thanks to professor Chen on both his profoundknowledge and his enthusiastic help.EFERENCES References
Wu Yijun, Zheng Zhi, Zhou Shulin and Yang Jingping(2011). Dependence structure betweenLIBOR rates by copula method. In
FRONTIERS OF MATHEMATICS IN CHINA .Frechet, M (1951). Sur les tableaux de corr ´ elation dont les marges sont donn ´ ees. Ann.Univ. Lyon. Sect. A. (3) 14 , 53—77.Sklar, A (1959). Fonctions de repartition ´ a‘ n dimensions et leurs marges. Publ. Inst. Statist.Univ. Paris 8 , 229—231.Nelsen,R.B (1999). An introduction to copulas, Lecture Notes in Statistics.
SpringerVerlag,New York vol. 139.
SS Galiani. (2003). Copula functions and their application in pricing and risk managing multi-name credit derivative products.R.A.Parsa, and S.A.Klugman (2011). Copula Regression.
Variance Casualty Actuarial Soci-ety. Volume 05, Issue 01 , 45–54.H.Noh, A.E.Ghouch and T.Bouezmarni (2013). Copula-Based Regression Estimation and In-ference.
JASA: Journal of the American Statistical Association.
Rainer and Winkelmann (2012). Copula bivariate probit models: with an application to medicalexpenditures.
Health Economics.
Radice R, Rosalba Mara, Ma and G.Wojty (2016). Copula regression spline models for binaryoutcomes.
Statistics and Computing.
EFERENCES Embrechts,P, McNeil, A. and Straumann, D. (2002). Correlation and Dependence in RiskManagement: Properties and Pitfalls.
Risk Management: Value at Risk and Beyond ,176–223.Chib S and Greenberg E . (2007). Semiparametric Modeling and Estimation of InstrumentalVariable Models.
Journal of Computational & Graphical Stats 16(1) , 86–114.Marra G and Radice R . (2013). Estimation of a regression spline sample selection model.
Computational Statistics & Data Analysis(2013) 61(61) , 158–173.Gijbels and I`ene, Mielniczuk J . (1990). Estimating the density of a copula function.
Commu-nications in Statistics(1990) 19(2) , 445–464.Charpentier A.,Fermanian J. and Scaillet O. (2006). Nonparametric estimation of copula den-sities.Chen S X , Huang T M . Nonparametric estimation of copula functions for dependencemodelling[J]. Canadian Journal of Stats, 2010, 35(2):265-282.Chen, Song Xi , and T. M. Huang . (2010). Nonparametric estimation of copula functions fordependence modelling.
Canadian Journal of Stats, 2010 35(2) , 265–282.Bouezmarni, T. , Ghouch, E. , and Taamouti, A. .(2013). Bernstein estimator for unboundedcopula densities.
Statistics & Risk Modeling, 2013 30(4) , 343–360.
EFERENCES27