Daniela Marella
Roma Tre University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniela Marella.
Computational Statistics & Data Analysis | 2008
Pier Luigi Conti; Daniela Marella; Mauro Scanu
A new matching procedure based on imputing missing data by means of a local linear estimator of the underlying population regression function (that is assumed not necessarily linear) is introduced. Such a procedure is compared to other traditional approaches, more precisely hot deck methods as well as methods based on kNN estimators. The relationship between the variables of interest is assumed not necessarily linear. Performance is measured by the matching noise given by the discrepancy between the distribution generating genuine data and the distribution generating imputed values.
Computational Statistics & Data Analysis | 2013
Pier Luigi Conti; Daniela Marella; Mauro Scanu
The aim is to analyze the uncertainty in statistical matching for ordered categorical variables. Uncertainty in statistical matching consists in estimating a joint distribution by observing only samples from its marginals. Unless very restrictive conditions are met, observed data do not identify the joint distribution to be estimated, and this is the reason of uncertainty. The notion of uncertainty is first formally introduced, and a measure of uncertainty is then proposed. Moreover, the reduction of uncertainty in the statistical model due to the introduction of logical constraints is investigated and evaluated via simulation.
Communications in Statistics-theory and Methods | 2013
Daniela Marella; Paola Vicard
In this article, Object-Oriented Bayesian Networks (OOBN) are proposed as a tool to model measurement errors in a categorical variable due to respondent. A mixed measurement error model is presented and an OOBN implementing such a model is introduced. The insertion of evidence represented by the observed value and its propagation throughout the network yields for each unit the probability distribution of the true value given the observed. Two methods are used to predict the individual true value and their performance is evaluated via simulation.
Communications in Statistics-theory and Methods | 2017
Pier Luigi Conti; Daniela Marella; Mauro Scanu
ABSTRACT Statistical matching consists in estimating the joint characteristics of two variables observed in two distinct and independent sample surveys, respectively. In a parametric setup, ranges of estimates for non identifiable parameters are the only estimable items, unless restrictive assumptions on the probabilistic relationship between the non jointly observed variables are imposed. These ranges correspond to the uncertainty due to the absence of joint observations on the pair of variables of interest. The aim of this paper is to analyze the uncertainty in statistical matching in a non parametric setting. A measure of uncertainty is introduced, and its properties studied: this measure studies the “intrinsic” association between the pair of variables, which is constant and equal to 1/6 whatever the form of the marginal distribution functions of the two variables when knowledge on the pair of variables is the only one available in the two samples. This measure becomes useful in the context of the reduction of uncertainty due to further knowledge than data themselves, as in the case of structural zeros. In this case the proposed measure detects how the introduction of further knowledge shrinks the intrinsic uncertainty from 1/6 to smaller values, zero being the case of no uncertainty. Sampling properties of the uncertainty measure and of the bounds of the uncertainty intervals are also proved.
Archive | 2006
Pier Luigi Conti; Daniela Marella; Mauro Scanu
In this paper, the difference between the data generating process and the imputation procedures used in statistical matching is evaluated. The investigated imputation procedures are the distance hot deck and those referring to the kNN method, both with fixed and variable number of donors k. The matching noise is evaluated formally and investigated via a simulation.
Statistical Methods and Applications | 2017
Pier Luigi Conti; Daniela Marella; Andrea Neri
Among the goals of statistical matching, a very important one is the estimation of the joint distribution of variables not jointly observed in a sample survey but separately available from independent sample surveys. The absence of joint information on the variables of interest leads to uncertainty about the data generating model since the available sample information is unable to discriminate among a set of plausible joint distributions. In the present paper a short review of the concept of uncertainty in statistical matching under logical constraints, as well as how to measure uncertainty for continuous variables is presented. The notion of matching error is related to an appropriate measure of uncertainty and a criterion of selecting matching variables by choosing the variables minimizing such an uncertainty measure is introduced. Finally, a method to choose a plausible joint distribution for the variables of interest via iterative proportional fitting algorithm is described. The proposed methodology is then applied to household income and expenditure data when extra sample information regarding the average propensity to consume is available. This leads to a reconstructed complete dataset where each record includes measures on income and expenditure.
Journal of the American Statistical Association | 2016
Pier Luigi Conti; Daniela Marella; Mauro Scanu
ABSTRACT The goal of statistical matching is the estimation of a joint distribution having observed only samples from its marginals. The lack of joint observations on the variables of interest is the reason of uncertainty about the joint population distribution function. In the present article, the notion of matching error is introduced, and upper-bounded via an appropriate measure of uncertainty. Then, an estimate of the distribution function for the variables not jointly observed is constructed on the basis of a modification of the conditional independence assumption in the presence of logical constraints. The corresponding measure of uncertainty is estimated via sample data. Finally, a simulation study is performed, and an application to a real case is provided. Supplementary materials for this article are available online.
Cladag 2013. 9th Meeting of the Classification and Data Analysis Group | 2015
Daniela Marella; Paola Vicard
In this paper we propose to use the object-oriented Bayesian networks (OOBNs) architecture to model measurement errors in the Italian survey on household income and wealth (SHIW) 2008 when the variable of interest is categorical. The network is used to stochastically impute microdata for households. Imputation is performed both assuming a misreport probability constant over all the population and learning a Bayesian network for estimating such a probability. Finally, potentialities and possible extensions of this approach are discussed.
Archive | 2014
Daniela Marella; Paola Vicard
In this paper we propose to use the object-oriented Bayesian network (OOBN) architecture to model measurement errors. We then apply our model to the Italian survey on household income and wealth (SHIW) 2008. Attention is focused on errors caused by the respondents. The parameters of the error model are estimated using a validation sample. The network is used to stochastically impute micro data for households. In particular imputation is performed also using an auxiliary variable. Indices are calculated to evaluate the performance of the correction procedure and show that accounting for auxiliary information improves the results. Finally, potentialities and possible extensions of the Bayesian network approach both to the measurement error context and to official statistics problems in general are discussed.
Communications in Statistics - Simulation and Computation | 2017
Daniela Marella; Paola Vicard
ABSTRACT In this article, the quality of data produced by national statistical institutes and by governmental institutions is considered. In particular, the problem of measurement error is analyzed and an integrated Bayesian network decision support system based on non-parametric Bayesian networks is proposed for its detection and correction. Non-parametric Bayesian networks are graphical models expressing dependence structure via bivariate copulas associated to the edges of the graph. The network structure and the misreport probability are estimated using a validation sample. The Bayesian network model is proposed to decide: (i) which records have to be corrected; (ii) the kind and amount of correction to be adopted. The proposed correction procedure is applied to the Banca d’Italia Survey on Household Income and Wealth and, specifically, the bond amounts are analyzed. Finally, the sensitivity of the conditional distribution of the true value random variable given the observed one to different evidence configurations is studied.