Giampiero Marra
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giampiero Marra.
Computational Statistics & Data Analysis | 2011
Giampiero Marra; Simon N. Wood
The problem of variable selection within the class of generalized additive models, when there are many covariates to choose from but the number of predictors is still somewhat smaller than the number of observations, is considered. Two very simple but effective shrinkage methods and an extension of the nonnegative garrote estimator are introduced. The proposals avoid having to use nonparametric testing methods for which there is no general reliable distributional theory. Moreover, component selection is carried out in one single step as opposed to many selection procedures which involve an exhaustive search of all possible models. The empirical performance of the proposed methods is compared to that of some available techniques via an extensive simulation study. The results show under which conditions one method can be preferred over another, hence providing applied researchers with some practical guidelines. The procedures are also illustrated analysing data on plasma beta-carotene levels from a cross-sectional study conducted in the United States.
Statistical Methods in Medical Research | 2010
Giampiero Marra; Rosalba Radice
Generalised additive models (GAMs) allow for flexible functional dependence of a response variable on covariates. The aim of this article is to provide an accessible overview of GAMs based on the penalised likelihood approach with regression splines. In contrast to the classical backfitting, the penalised likelihood framework taken here provides researchers with an efficient computational method for automatic multiple smoothing parameter selection, which can determine the functional form of any relationship from the data. We illustrate through an example how the use of this methodology can help to gain insights into medical research.
Statistics and Computing | 2016
Rosalba Radice; Giampiero Marra; Małgorzata Wojtyś
We introduce a framework for estimating the effect that a binary treatment has on a binary outcome in the presence of unobserved confounding. The methodology is applied to a case study which uses data from the Medical Expenditure Panel Survey and whose aim is to estimate the effect of private health insurance on health care utilization. Unobserved confounding arises when variables which are associated with both treatment and outcome are not available (in economics this issue is known as endogeneity). Also, treatment and outcome may exhibit a dependence which cannot be modeled using a linear measure of association, and observed confounders may have a non-linear impact on the treatment and outcome variables. The problem of unobserved confounding is addressed using a two-equation structural latent variable framework, where one equation essentially describes a binary outcome as a function of a binary treatment whereas the other equation determines whether the treatment is received. Non-linear dependence between treatment and outcome is dealt using copula functions, whereas covariate-response relationships are flexibly modeled using a spline approach. Related model fitting and inferential procedures are developed, and asymptotic arguments presented.
Electronic Journal of Statistics | 2013
Giampiero Marra; Rosalba Radice
Sample selection models are employed when an outcome of interest is observed for a restricted non-randomly selected sample of the population. We consider the case in which the response is binary and continuous covariates have a nonlinear relationship to the outcome. We introduce two statistical methods for the estimation of two binary regression models involving semiparametric predictors in the presence of non-random sample selection. This is achieved using a multiple-stage procedure, and a newly developed simultaneous equation estimation scheme. Both approaches are based on the penalized likelihood estimation framework. The problems of identification and inference are also discussed. The empirical properties of the proposed approaches are studied through a simulation study. The methods are then illustrated using data from the American National Election Study where the aim is to quantify public support for school integration. If non-random sample selection is neglected then the predicted probability of giving, for instance, a supportive response may be biased, an issue that can be tackled using the proposed tools.
Statistical Modelling | 2011
Giampiero Marra; Rosalba Radice
Classical regression model literature has generally assumed that measured and unmeasured covariates are statistically independent. For many applications, this assumption is clearly tenuous. When unobservables are associated with included regressors and have an impact on the response, standard estimation methods will not be valid. This means that estimation results from observational studies, whose aim is to evaluate the impact of a treatment of interest on a response variable, will be biased and inconsistent in the presence of unmeasured confounders. One method for obtaining consistent estimates of treatment effects when dealing with linear models is the instrumental variable (IV) approach. Linear models have been extended to generalized linear models (GLMs) and generalized additive models (GAMs), and although IV methods have been proposed to deal with GLMs, fitting methods to carry out IV analysis within the GAM context have not been developed. We propose a simple but effective two-stage procedure for IV estimation when dealing with GAMs represented using any penalized regression spline approach and a correction procedure for confidence intervals. We explain under which conditions the proposed method works and illustrate its empirical validity through an extensive simulation experiment and a health study where unmeasured confounding is suspected to be present.
Bulletin of Economic Research | 2012
Luca Zanin; Giampiero Marra
During the last decade, economists have shown that the inverse relationship between economic growth and unemployment rate varies over time. Rolling regression has been the main tool used to quantify such a relationship. This methodology suffers from several well-known problems which lead to spurious non-linear patterns in the Okuns coefficient behaviour over time. Here, we take a penalized regression spline approach to estimate the Okuns time-varying effects. As a result, spurious non-linearities are suppressed and hence important time-varying coefficient features revealed. Our empirical results show that the inverse relationship in some Euro area countries is spatially heterogeneous and time-varying. The findings are complemented by the calculation of the rate of output growth needed for a stable unemployment rate, as proposed by Knotek.
Computational Statistics & Data Analysis | 2013
Giampiero Marra; Rosalba Radice
It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures.
Journal of the American Statistical Association | 2017
Giampiero Marra; Rosalba Radice; Till Bärnighausen; Simon N. Wood; Mark E. McGovern
ABSTRACT Estimates of HIV prevalence are important for policy to establish the health status of a country’s population and to evaluate the effectiveness of population-based interventions and campaigns. However, participation rates in testing for surveillance conducted as part of household surveys, on which many of these estimates are based, can be low. HIV positive individuals may be less likely to participate because they fear disclosure, in which case estimates obtained using conventional approaches to deal with missing data, such as imputation-based methods, will be biased. We develop a Heckman-type simultaneous equation approach that accounts for nonignorable selection, but unlike previous implementations, allows for spatial dependence and does not impose a homogenous selection process on all respondents. In addition, our framework addresses the issue of separation, where for instance some factors are severely unbalanced and highly predictive of the response, which would ordinarily prevent model convergence. Estimation is carried out within a penalized likelihood framework where smoothing is achieved using a parameterization of the smoothing criterion, which makes estimation more stable and efficient. We provide the software for straightforward implementation of the proposed approach, and apply our methodology to estimating national and sub-national HIV prevalence in Swaziland, Zimbabwe, and Zambia. Supplementary materials for this article are available online.
Computational Statistics & Data Analysis | 2017
Giampiero Marra; Rosalba Radice
Rigby & Stasinopoulos (2005) introduced generalized additive models for location, scale and shape (GAMLSS) where the response distribution is not restricted to belong to the exponential family and its parameters can be specified as functions of additive predictors that allows for several types of covariate effects (e.g., linear, non-linear, random and spatial effects). In many empirical situations, however, modeling simultaneously two or more responses conditional on some covariates can be of considerable relevance. In this article, we extend the scope of GAMLSS by introducing a bivariate copula additive model with continuous margins for location, scale and shape. The framework permits the copula dependence and marginal distribution parameters to be estimated simultaneously and, like in GAMLSS, each parameter to be modeled using an additive predictor. Parameter estimation is achieved within a penalized likelihood framework using a trust region algorithm with integrated automatic multiple smoothing parameter selection. The proposed approach allows for straightforward inclusion of potentially any parametric continuous marginal distribution and copula function. The models can be easily used via the copulaReg() function in the R package SemiParBIVProbit. The usefulness of the proposal is illustrated on two case studies (which use electricity price and demand data, and birth records) and on simulated data.
Journal of the Operational Research Society | 2016
Raffaella Calabrese; Giampiero Marra; Silvia Angela Osmetti
We introduce a binary regression accounting-based model for bankruptcy prediction of small and medium enterprises (SMEs). The main advantage of the model lies in its predictive performance in identifying defaulted SMEs. Another advantage, which is especially relevant for banks, is that the relationship between the accounting characteristics of SMEs and response is not assumed a priori (eg, linear, quadratic or cubic) and can be determined from the data. The proposed approach uses the quantile function of the generalized extreme value distribution as link function as well as smooth functions of accounting characteristics to flexibly model covariate effects. Therefore, the usual assumptions in scoring models of symmetric link function and linear or pre-specified covariate-response relationships are relaxed. Out-of-sample and out-of-time validation on Italian data shows that our proposal outperforms the commonly used (logistic) scoring model for different default horizons.