Mind the wealth gap: a new allocation method to match micro and macro statistics for household wealth
MMind the wealth gap: a new allocation methodto match micro and macro statistics forhousehold wealth
Michele Cantarella, * Andrea Neri and Maria Giovanna Ranalli Centre for Consumer Society Research, University of Helsinki Economic and Financial Statistics Department, Banca d’Italia Department of Political Science, Università degli Studi di Perugia
Abstract
The financial and economic crisis recently experienced by many European countries hasincreased demand for timely, coherent and consistent distributional information for thehousehold sector. In the Euro area, most of the national central banks collect such in-formation through income and wealth surveys, which are often used to inform their de-cisions. These surveys, however, may be affected by non-response and under-reportingbehaviours which leads to a mismatch with macroeconomic figures coming from nationalaccounts. In this paper, we develop a novel allocation method which combines infor-mation from a power law (Pareto) model and imputation procedures so to address theseissues simultaneously, when only limited external information is available. Finally, weproduce distributional indicators for four Euro-Area countries.
Key words:
Wealth distribution, Non-response, Measurement error, Pareto distribution,Survey calibration, Household Finance and Consumption Survey.
JEL Codes:
D31, E01, E21, N3
The financial and economic crisis recently experienced by many European countries hasincreased demand for timely, coherent, and consistent distributional information relatingto household income and wealth. Such information is receiving high priority especiallyin the agenda of national central banks (NCBs) which use it in several ways (EurosystemHousehold Finance and Consumption Network, 2009). Distributional information is usedfor financial stability purposes, for example, to assess how much debt is concentrated inthe hands of financially vulnerable households (see, for instance, Ampudia et al., 2016;Michelangeli and Rampazzi, 2016). Moreover, distributional information allows to esti-mate the aggregate consumption response to wealth shocks when individual responses are * Corresponding author. E-mail: michele.cantarella@helsinki.fi a r X i v : . [ ec on . GN ] J a n heterogeneous (Paiella, 2007; Guiso et al., 2005) and, more generally, to understand theinterplay between monetary policy measures, especially non-standard ones, and the distri-bution of income and wealth (Casiraghi et al., 2018; Colciago et al., 2019; Coibion et al.,2017). Recent years have also been characterised by a surge of interest in the study of thedynamics of wealth accumulation over the last century (Garbinti et al., 2018; Alvaredoet al., 2018; Frémeaux and Leturcq, 2020).Sample surveys are the main source of distributional information on household wealth.In the Euro area, most NCBs conduct the Eurosystem Household Finance and Consump-tion Survey (HFCS) which collects harmonized household-level data on households’ fi-nances and consumption (Eurosystem Household Finance and Consumption Network,2009).The second source of information relating to household wealth comes from nationalaccounts which record the stock of assets, both financial and non-financial, and liabilitiesat a particular point in time.In theory, since the HFCS is designed to be representative of all households, aggre-gating this microdata should correspond to the macro aggregates. In practice, however,differences are large: aggregate totals based on surveys are often substantially below thetotals to be found in national accounts. Before using the distributional information fromsurvey data, it is, therefore, crucial to explain and possibly eliminate the differences be-tween the two sources of information. There are several possible reasons for the differences (Expert Group on Linking macroand micro data, 2020). From the survey side, two relevant issues are unit non-responseand measurement errors. There is substantial evidence that household’s decision whetheror not to participate in the survey is not at random. In particular, wealthy householdsare difficulty to contact and convince to participate (Chakraborty et al., 2019; Kennickell,2008, 2019; Vermeulen, 2018). Since these households own a large share of total wealth,their under-representation in the final sample is likely to result in a biased picture of thewealth distribution.Moreover, wealth surveys generally include both complex and sensitive items. As aconsequence, respondents are not always able or even willing to report the correct amountof wealth they hold. Similar to non-response, measurement error is not at random and dif-fers across population subgroups and portfolio items (D’Alessio and Neri, 2015; Ranalliand Neri, 2011).The ideal solution for overcoming these problems would be to link survey data withadministrative records (such as tax records or credit registers, as in Blanchet et al., 2018;Garbinti et al., 2018, 2020). Alternative approaches to data linkage are directly basedon the use of wealth (tax) records (Alvaredo and Saez, 2009) or on the use of capitalincome information from tax records to construct wealth estimates assuming certain ratesof return on wealth (Saez and Zucman, 2016).Unfortunately, when such administrative records exist and are not limited in scope,they are not usually available for confidentiality reasons. Because of that, the recent In 2015, the European System of Central Banks (ESCB) has established an expert group with the aim ofcomparing and bridging macro data (i.e. national accounts/financial accounts) and microdata (i.e. the HouseholdFinance and Consumption Survey) on wealth. literature has developed methods to combine survey data with the limited external infor-mation publicly available, such as aggregate figures from national accounts or lists ofrich individuals’ total wealth. Vermeulen (2018, 2016) uses Forbes World Billionaireslists in combination with some wealth surveys to estimate the total wealth held by richhouseholds. He shows that the use of such lists increases the quality of the results (com-pared to estimating a Pareto model from survey data alone). Building on this approach,Chakraborty et al. (2019), Waltl (2018), and Chakraborty and Waltl (2018) extend theanalysis by benchmarking survey results to the national accounts. In another recent study,Bach et al. (2019) implement these methodologies to impute rich list data to wealth sur-veys.The common assumption behind all these studies is that unit non-response of wealthyhouseholds is the only reason for the micro-macro gap.This paper contributes to the literature which tries to produce distributional indicatorsof wealth that are consistent with the national accounts, by proposing a methodology thatdraws on existing and well-established methods.We contribute to this literature in fourways.First, whereas previous studies focus only on the missing part of the tail, assumingexisting survey observations as representative, we claim that differential non-responsealso affects the representativeness of existing survey observations, which in turn affectsestimates for the total number of households in the Pareto tail, and their total wealth. Wepropose a correction for differential non-response that accounts for the missing rich, butfocuses on observed survey households. By any means, this correction does not substitutethe imputation (Bach et al., 2019) or simulation (Waltl, 2018) procedures developed in theliterature, but rather complements them allowing for the correction of non-response biasamong existing survey observations.Second, while existing papers only focus on non-response at the tail of the distri-bution, we present a methodology that allows us to correct also for measurement error.Dealing with both aspects simultaneously is important, even when the research purposeis to estimate the share of total wealth held by wealthy households. Indeed, some richhouseholds may misreport their true wealth and therefore they could be misclassified inadjustment process. An advantage of our approach is that it enables us to compute distri-butional indicators that refer to "non-rich" households, such as those relating to financialvulnerability.Our third contribution is that even if we apply methods that are well-established (suchas the Pareto model, imputation, and calibration) we show how to combine and use themin a single framework and how to evaluate the precision of the results.The fourth contribution is to produce a modified and readily usable dataset in whichsurvey values are adjusted for the above-mentioned quality issues and, by construction,the totals add up to the national accounts. While the existing papers are mainly focusedon methods to estimate of total wealth held at the top, our adjusted dataset can be used forestimating any distributional indicator that may be of interest.The paper is structured as follows. Section 2 describes the data sources used in ourapplication and motivating example. Section 3 presents the Pareto approach (section 3.1)and calibration (section 3.2) and the methodologies we use to combine them in a singleapproach (sections 3.3, 3.4, 3.5). Section 4 describes the tools used to assess the propertiesof the proposed methods. Section 5 describes how the method is applied to our data, whilesection 6 discusses the results and the main findings of the application. Section 7 providessome conclusions and lines for future research.
This paper uses the Household Finance and Consumption Survey (HFCS) and two sourcesof auxiliary information, that is the national accounts which include both financial andnon-financial accounts, and rich list data.The Household Finance and Consumption Survey (HFCS) is a joint project of all thenational central banks (NCBs) of the Eurosystem and several national statistical institutes(NSIs). The survey collects detailed household-level data on various aspects of house-hold balance sheets and related economic and demographic variables, including income,private pensions, employment, and measures of consumption. The HFCS is conducted ina decentralised manner. A group of experts from the European Central Bank (ECB) andfrom the NCBs (the Household Finance and Consumption Network, HFCN) coordinatesthe whole project, ensuring the cross-country comparability of the final data.We use the second wave of HFCS (2014) and we restrict our analysis to four coun-tries: Italy, France, Germany, and Finland. This choice is motivated by two considera-tions. First, rich lists and non-financial accounts are available for this subset of countries.Second, these surveys present methodological differences that can be used to evaluate ourmethod. For example, some countries over-sample rich households using individual taxrecords (as in the French and Finnish survey) or using the information at the regional level(as in the German one), while others do not over-sample (as in the Italian case). Moreover,in some cases, the survey is linked with administrative data (as in the Finnish one). In bothcases of over-sampling and use of administrative records, we should expect a lower effectof the adjustment method.Our variable of interest is household net wealth defined as the sum of deposits, bonds,shares, mutual funds, money owed to the household, the value of insurance policies andpension funds, business wealth, and housing wealth, minus debts.The second source of information is national accounts. The financial component (fi-nancial accounts) is produced by NCBs and relates the total financial assets and liabilitiesheld by households, classified by financial instrument, in order of liquidity based on theoriginal maturity and negotiability (from cash to deposits and insurance and pension in-struments). Non-financial accounts are produced by NSIs and contain the total value ofdwellings, other buildings and structures, and land owned by households. Even if nationalaccounts figures may suffer from quality issues and may adopt different concepts and def-initions from the ones used in the survey, we use them as a benchmark to correct surveydata.Rich lists are our third source of information. They have already been used in theliterature to adjust for missing rich households (Vermeulen, 2018; Chakraborty and Waltl,2018). Their use may generate concerns since the methodology adopted is often obscureand usually only figures for net worth are provided, with no financial instrument break-down. Some studies have tried to overcome these issue by using different types of Paretoadjustments (Blanchet et al., 2017; Waltl, 2018). Other studies (such as Schröder et al.,2019) have also explored new ways of sampling high-wealth individuals with adequateprecision. However, these methods can only be employed in specific instances when in-formation on these households exists and is easily accessible. When these sources are notavailable, rich lists remain a reliable alternative, and evidence from Waltl (2018) indicatesthat, after the integration with rich lists, there might be little difference between the wealthestimated by different Pareto adjustments.In our case, we use wealthy household data from the 2014
Forbes’ Billionaires List .This information has been replaced by that from larger region-specific lists, such as 2014editions of
Challenges’ "Les 500 plus grandes fortunes de France" for France,
ManagerMagazin ’s list for Germany and
Arvopaperi ’s list for Finland, when available. We also ad-just this rich list data by estimating the debts and portfolio composition, based on portfolioshares from top wealth observations in the HFCS. In this way, estimates for portfolio compositions among top fortunes can be obtained,and rich list data can be fully integrated with the HFCS for estimation purposes.
Let w be household net wealth and t ( w ) the population total to be estimated using sur-vey data. Let ˆ t ( w ) = (cid:80) Si =1 d i w i be the Horvitz-Thompson estimator, where d i is thesampling weight and w i the net wealth for each individual household i in the sample ofrespondents S = { , , ..., S } , ordered by net wealth rank.Because of unit non-response and measurement error the expected value of theHorvitz-Thompson estimator ˆ t ( w ) is generally lower than t ( w ) , the correspondingmacro figure. Unit non-response occurs when some households refuse to participateto the survey. If this decision is related to household wealth (i.e richer households aremore difficult to enrol in the survey than others) the sample of respondents S maynot represent adequately the upper tail of the distribution. Measurement error happenswhen the information collected in the survey w is different from the true unknown value w ∗ . The error term ( w ∗ − w ) may depend on many factors such as the difficulty ofrespondents to recall the required information or their unwillingness to report their truewealth.Our methodology to address these issues is based on two techniques that are well-established in the literature. We use the Pareto distribution to compensate for unit non-response of wealthy households (section 3.1), and the calibration methods commonly usedin survey sampling to deal with the issue of measurement error (section 3.2).The two correction methods are dependent on each other and they must be imple-mented simultaneously. The Pareto correction starts with an assessment of the rich house-holds available in the survey. Because of measurement error, some households could be This is a simplifying assumption. An improvement over this form of portfolio allocation can be offered bythe approach used in Chakraborty and Waltl (2018). misclassified and therefore a preliminary calibration adjustment is required. On the otherhand, calibration is used again for the adjustment for measurement error across the wholedistribution, requiring that the survey represents adequately the upper tail of the distribu-tion.Our solution to conduct the two adjustment simultaneously is to run them in an itera-tive process, based on the procedure described in the following sections.The final product of the methodology is an adjusted survey data set with total estimatesof net wealth, real assets, financial assets and liabilities that match the aggregate figuresin the national accounts balance sheet. This data set can be used to compute severaldistributional indicators of interest.Before applying the method, we reclassify some definitions of wealth items used inthe survey data in order to remove as many of the conceptual differences with nationalaccounts as possible (see for instance EG-LMM, 2017; Chakraborty et al., 2019). In par-ticular, we remove from national accounts totals the wealth held by non-profit institutionsserving households (NPISHs), and we only focus on the items with the highest level ofcomparability.
The Pareto adjustment assumes that, over a certain wealth threshold ( w ) , the complemen-tary cumulative distribution (CCDF) of wealth is approximated by a power law, which (for w i ≥ w ) can be expressed as: P ( W ≤ w i ) = 1 − ( w /w i ) α (1)where the parameter α ∈ R + indicates the shape of the tail. The lower the value of α , thefatter is the tail, and the more concentrated is wealth.The first step of the adjustment is the estimation of the threshold w . Previous researchhas often adopted the arbitrary threshold of C1 million and, as a robustness check, of 1.5or 2 million. We relax this assumption by using a less arbitrary method, based on theproperties of the mean excess function (2) (Yang, 1978): E [ W − w i | W > w i ] = (cid:80) ij =1 d j ( w j − w i ) (cid:80) ij =1 d j (2)with j ≤ i . The expectation expressed by this function is estimated with the weightedmean of the deviation from w i for all observations j whose wealth exceeds w i . Essentially,every value of w i is treated as a possible threshold when the corresponding expected valueof E [ W − w i | W > w i ] is estimated.A useful property of this function is its linearity in w i if the distribution is Pareto(Yang, 1978; Davison and Smith, 1990). Following from this property, we estimate E [ W − w i | W > w i ] for each value of w i in S , and then we find the threshold afterwhich the mean excess function is linear on w i . This can be achieved by selecting thevalue w ∗ for which the R-squared of the linear regression of E [ W − w i | W > w i ] on w i is maximised (Langousis et al., 2016).It is worth stressing that the threshold w is the point where the Pareto distributionstarts, which differs from the truncation point ( w ) after which the survey has no richhouseholds. Indeed, survey data will generally include observations in the bottom part ofthe Pareto while missing those at the very top of the distribution. In the presence of trun-cation, the relationship between mean excesses and wealth will turn to take a downwardbias the closer we approach the truncation point (see Aban et al., 2006). To account forthis issue, we weight the regression E [ W − w i | W > w i ] on w i by the sum of surveyweights for all j ≤ i . After the threshold w is found, the shape parameter α can be estimated using themethod described in Vermeulen (2018).Define S T = { T , T , ..., m T } as the sub-sample of respondents with wealth higherthan w . The rich list S R = { R , R , ..., m R } and the sample S T are appended creatinga new file S I = { I , I , ..., m I } with m I observations. For simplicity, we will drop thesample subscript from now on. Households are again ordered by wealth rank i where thelower the rank the higher household wealth. So the rank of the richest household in thesample is one, the rank for the second richest is two, and so on until household m , whosewealth w m equals the threshold w , is reached.Survey weights are taken into account by assigning to observations in the rich listweight d i = 1 , while survey observations (a subset of the sample S ) retain their originalsurvey weight. Denote by ¯ D the average survey weight of all observations in sample S I (i.e. ¯ D = (cid:80) mj =1 d j /m ). Denote the sum of all weights as D = (cid:80) mj =1 d j , representing anestimate of the number of households that have wealth at least as high as w . Define ¯ D i the average weight of the first i sample points (i.e. ¯ D i = (cid:80) ij =1 d j /i ).Linear estimates for α can then be obtained through the following least squares spec-ification (see also Gabaix and Ibragimov (2011)): ln (( i − /
2) ¯ D i / ¯ D ) = C − αln ( w i ) (3)As discussed earlier, Chakraborty and Waltl (2018) and Waltl (2018) showed that thisestimator produces unbiased and consistent estimates of α when information on top tailobservations is provided. The rich list sample is only used for the estimation of the Paretotail parameters α and w . Afterward, the adjustment method is applied to survey sample S . The third step of the adjustment consists of estimating the total wealth in the toptail ˆ t ( w ; top ) by multiplying the total number of rich households D resulting from the S sample by the mean of the estimated Pareto distribution (given by αw / ( α − for α > ).We will later use this information to calibrate the sampling weights of rich households inthe survey to the total wealth implied by the Pareto adjustment.This approach assumes that the sample estimate of D (the total number of rich house-holds) is unbiased. Indeed, some households have zero probability of being included inthe survey (the missing tail from now on) after wealth reaches the truncation point w .This may be due to the difficulties in contacting such rich households to even negotiate As a robustness check, we also used the 1 million threshold to estimate the Pareto shape parameter and ranall our adjustment methods afterward. These conservative estimates are very close to the ones obtained usingour estimated threshold and are available on request. an interview, or to a specific decision by the data producer to exclude them for operativeor confidentiality reasons. Appended rich list observations will rarely be representativeof all missing households. Also, the presence of differential non-response will imply thatobserved households in the Pareto tail are also under-represented as the probability ofa household being interviewed approaches zero the closer its wealth is to the truncationpoint.As a result of the underestimation of households in the Pareto tail, estimates for totalwealth in the tail will also be underestimated. We then propose a novel method for theestimation of the number of missing rich households and their wealth.Consider the sample S T of Pareto-tailed households ordered by their wealth, and recallthat w is the truncation point above which there are not rich households in the sample.Following from the Glivenko-Cantelli theorem, because of the truncation the empiricalcumulative distribution function resulting from this sample is different from the theoreti-cal distribution implied by the Pareto adjustment.In particular, the following relation holds: inf i ∈ R D − D i − D − (cid:18) − (cid:18) w w i (cid:19) α (cid:19) ≥ (4)where D i − is the sum of weights of all households richer than w i ( D i = (cid:80) ij =1 d j )and D is the sum of the survey weights of observations in the survey Pareto tail (so that D − D i − = (cid:80) mj = i d j ). This relation means that the empirical CDF will always sufferfrom a bias equal or larger than zero since units whose wealth exceeds w are unobserved.The theoretical Pareto CDF can then be used to correct the survey-based estimate bydividing the cumulative sum of survey weights for any point by the value of the ParetoCDF at that point: t ( d i ; top ) ≈ D − D i − − ( w /w i ) α . (5)Analytically, the estimate from equation (5) should be the same for each i -th observa-tion in the tail. In practice, with empirical data, variability in survey weights will affect theestimate of the number of households in the tail. Because of differential non-response, thisbecomes a particularly relevant problem when weight quality can deteriorate the closerobserved wealth gets to the truncation point. The estimate can then be improved by es-timating t ( d i ; top ) for each value of wealth over a range of top tail observations, thenestimating the mean ˆ t ( d ; top ) as follows: ˆ t ( d ; top ) = 1 m m (cid:88) i =1 D − D i − − ( w /w i ) α (6)An estimator of the number of missing, unobserved, households after the trunca-tion point can be computed as ˆ t ( d ; miss ) = ˆ t ( d ; top )( w /w ) α . To account for thesemissing households, the total of observable households will be estimated as ˆ t ( d ; obs ) =ˆ t ( d ; top )(1 − ( w /w ) α ) .Finally, the total wealth in the top tail ˆ t ( w ; top ) can be estimated by the product of theestimated number of households and the Pareto mean: ˆ t ( w ; top ) = αw ( α −
1) ˆ t ( d ; top ) (7)Wealth in the missing part of the tail can similarly be computed as: ˆ t ( w ; miss ) =ˆ t ( d ; miss ) αw / ( α − , setting the new threshold at the truncation point w . Calibration is a method whose aim is to correct the sampling weights d i through re-weighting methods while keeping the individual responses w i unchanged (Deville andSärndal, 1992; Särndal, 2007). In the literature, this approach is referred as design-basedand it is mainly used: ( i ) to force consistency of certain survey estimates to known pop-ulation quantities; ( ii ) to reduce non-sampling errors such as non-response errors andcoverage errors; ( iii ) to improve the precision of estimates (Haziza et al., 2017).Alternatively, the so-called model-based approach aims at adjusting the individual re-sponses collected through the survey w i while sampling weights d i are left unchanged.It requires a model for the distribution of the measurement error and auxiliary informa-tion to estimate the parameters of the model. Among the several models available in theliterature, those most suitable for our purposes are imputation methods. For a generaldescription, see the seminal works by Rubin (1976, 1987).The two approaches have some shared traits, so that the distinction is not always clear-cut. For example, the weighting adjustment can also be seen as a method of imputationconsisting of compensating for the missing responses by using those of the respondentswith the most similar characteristics; in the same way, the imputation of plausible esti-mates in lieu of respondents’ claimed values can be thought of as a re-weighting method.The choice of the method of adjustment is driven by three factors. First, it dependson the estimator of interest. For example, if the interest is to estimate the share of totalwealth held by rich households, the use of the Pareto method (as described in section 3.1)could be sufficient. Second, the choice depends on the magnitude of the gap to fill andthe reasons behind it. If the gap is considerable and depends on both measurement errorand non-response, one single approach may not be sufficient. Therefore, one may need tocombine several methods. Finally, the choice depends on the information that is available.If, for example, the only available auxiliary information is in the form of population totals,then the calibration approach might be the only feasible way. However, if auxiliary dataare available at the individual level, then the model-based methods may represent the mosteffective solution.In the design-based approach, the calibration method for estimating the populationtotal of a variable of interest is addressed through the following optimisation problem forfinding a new set of weights d ∗ i : min d ∗ i S (cid:88) i =1 c i G ( d ∗ i ; d i ) s.t. t ( y ) = S (cid:88) i =1 d ∗ i y i , (8) This is possible because the Pareto shape parameter does not change along the Pareto distribution. d ∗ i = d i a i , c i G ( d ∗ i ; d i ) is a distance function between the basic design weights andthe new calibrated weights, c i are known constants the role of which will be discussedin more detail later, and y represents an auxiliary variable, possibly vector valued. Theadjustment factor a i is a function of the value on the sample of the variables used in thecalibration procedure y i = ( y i , y i , ..., y ik ) , and it is computed so that final weights meetbenchmark constraints, t ( y ) , while, at the same time, being kept as close as possible to theinitial ones. Closeness can be defined by means of several distance functions (see table 1in Deville and Särndal, 1992), the most common being the chi-squared type c i G ( d ∗ i ; d i ) = ( d ∗ i − d i ) d i c i (9)for which an analytical solution always exists. The benchmark constraints are definedwith respect to t ( y ) = ( t ( y ) , t ( y ) , ..., t ( y k )) , that is the known vector of populationtotals or counts of the calibration variables.The final output is a single new set of weights to be used for all variables. The magni-tude of the adjustment factors and therefore the variability of the final set of weights is afunction of the number of constraints (dimension k of the vector t ( y ) ) and the imbalance(the difference between the Horvitz-Thompson estimate and the population total). Veryvariable weights hinder the quality of final estimates for sub-populations and for variablesthat are not involved in the calibration procedure. For these reasons, weights are usuallyrequired to meet range restrictions such as to be positive and/or within a chosen range.This can be achieved by suitably choosing and tuning the distance function G ( · ) .The method was originally proposed to improve the efficiency of the estimators andto ensure coherence with population information, but then it was also largely applied toadjust for non-response (Särndal and Lundström, 2005). For example, Little and Vartivar-ian (2005) showed that if the variables used to construct the weights are associated bothwith non-participation and with the variable of interest, the bias and the variance of theestimator are reduced.The main problem with the use of household balance sheet data in re-weighting meth-ods is that wealth is generally skewed and concentrated in the hands of a small groupof the population that has both low propensity to participate in the survey and differentsocio-demographic characteristics from the average population. We begin by exploiting the information obtained after fitting a Pareto distribution, as insubsection 3.1, to adjust the wealth distribution in the survey for differential non-responseusing the calibration methods described in section 3.2.We proceed by using ˆ w and ˆ α and equation (6) to estimate the total number of ob-servable households over the threshold ˆ t ( w ; obs ) and their total net wealth ˆ t ( w ; obs ) (andthe corresponding figures for households below the threshold w ).We then calibrate the sampling weights from sample S using the following con-straints: t ( y ) = (ˆ t ( w ; obs ) , ˆ t ( d ; obs ) , t ( w ; bot ) , ˆ t ( d ; bot ) , t ( x )) (10)1where ˆ t ( d ; obs ) is the estimated number of observed households in the Pareto tail, ˆ t ( d ; bot ) relates to the observations not in the tail, ˆ t ( w, obs ) is the estimated observablewealth in the Pareto tail, t ( w ; bot ) is a vector of Horvitz-Thompson estimators decompos-ing the initial wealth of observations below the threshold into their corresponding portfo-lio items, and t ( x ) is a vector of population counts for demographic characteristics.Let the indicator variable I i = 1 for w i ≥ w and I i = 0 otherwise, then set theauxiliary variables vector for calibration to y i = ( w i I i , I i , w i (1 − I i ) , − I i , x i ) (11)After calibrating survey data to these parameters, we obtain non-response adjustedweights d ∗ . This approach will be referred as ‘Pareto-calibration’ from now on.Should the survey be suffering from differential non-response issues only, this stepmight be sufficient to fill the gap with financial accounts. However, this is not always thecase: provided that the we have a good approximation of wealth distribution in the tail,the remaining differences in coverage between the estimate obtained in equation (7) andthe national accounts will then be left to measurement error. In order to correct for measurement error, we combine the adjustment for differentialnon-response described in subsection 3.3 with the following procedure.The first step is to run the Pareto-calibration adjustment, as described earlier. Let d ∗ i be the final weight from the non-response adjustment procedure.As second step we run a calibration procedure as in (8) in which ( i ) the d ∗ i ’s areconsidered to be the basic weights and ( ii ) the set of benchmark constraints t ( y ) aregiven by the macro aggregates. The adjustment factor a i , for i = 1 , . . . , S , obtained bythis procedure is such that S (cid:88) i =1 d ∗ i a i y i = t ( y ) (12)We apply this adjustment factor directly to the variables of interest so that y ∗ i = a i y i . (13)This approach shares similar traits with reverse calibration introduced by Chambers andRen (2004) to deal with outlier-robust imputation.Recall that y i is vector-valued. Then, note that this calibration is multivariate becauseit accounts for all constraints with respect to macro estimates in a single procedure and, Calibrating weights in the bottom part of the distribution to the initial, unadjusted, wealth in that part of thesurvey, average wealth among these observations will increase. To account for this issue, the calibration bench-mark could be adjusted by subtracting t ( w ; bot ) − ( t ( d ; bot ) − ˆ t ( d ; bot )) t ( w ; bot ) /t ( d ; bot ) . However, thedisparity between the number of households in the Pareto tail and the ones in the bottom part of the distributionis so large that this adjustment is unlikely to affect our analysis. Therefore, in order not to over-stress the compu-tational requirements of the model and focus on the part of the distribution where the effect of Pareto-calibrationis significant, wealth in the non-Pareto part of the survey has been kept fixed. y . Inaddition, every household has a different adjustment factor a i that depends on all thevalues of y .A special case of multivariate calibration is proportional allocation, which consists ofallocating the gap by multiplying each component of y i by the corresponding inverse ofthe item-specific coverage ratio. This equivalence sheds some light on the role of the constants c i ’s in the distancefunction ((8)). In univariate calibration, if they are chosen to be the inverse of the variablein the constraint, then the adjustment factors are shrunk towards a common value for allhouseholds as in proportional allocation. On the contrary, if they are set to be constant,the adjustment factors would be roughly proportional to the values of the item. For thisreason, in the proposed multivariate calibration for imputation, we have set the constantsto possibly depend on the wealth of the household, that is c i = (cid:18) w i (cid:19) τ , (14)where τ ≥ can be seen as a shrinkage factor: larger values provide adjustment fac-tors that are more uniform across households, while values towards 0 provide adjustmentfactors with a higher variability and correlation with w i . In order to account for the missing wealthy households, we add a single observationwith weight ˆ t ( d ; miss ) and wealth ˆ t ( w ; miss ) / ˆ t ( d ; miss ) is created and imputed at thetop of the sample. This observation’s portfolio is also allocated using portfolio shares inthe Pareto tail of the distribution.At the end of the multivariate calibration the gap is filled. However, the distributionof w i has changed, because its components have changed. Some households which wereinitially classified as not rich may have moved in the top tail of wealth distribution. There-fore, we need to find the new Pareto threshold, and apply again the Pareto-calibrationprocedure described earlier. This requires an iterative procedure that alternates a Pareto-calibration step that improves coverage and a multivariate calibration step that addressesmeasurement error. The two steps are iterated until convergence. Convergence has beenset on the parameter α of the Pareto distribution: if the estimated values in two consecutivesteps differ by less than a small predefined threshold the procedure stops . If one is willing to assume that (1) that relative error is independent from the observedwealth, at least among the very rich, and that (2) the relative error converges in probability In fact, if we focus on a single item, y , the adjustment factor used by proportional allocation can beobtained as the solution to a univariate calibration procedure in which ( i ) the starting weights are again the d ∗ i ’s, ( ii ) there is only one benchmark constraint (cid:80) Si =1 d ∗ i a i y i = t ( y ) , and ( iii ) the distance function G ( · ) ischi-squared as in (9) with constants c i = 1 /y i . The proof is omitted for brevity, but it is close in spirit toExample 1 in Deville and Särndal (1992) For this work, we set τ = 1 . Future research might seek to retrieve information on τ using external datawhere no misreporting behaviour is present. It is worth stressing that the converge of the process could also not be achieved, especially in the case thegap to be filled is sizeable. ( w ∗ − w ) /w p → ζ , so that, on average, the unobserved‘true’ total wealth will be given by ˆ w ∗ i = ζw i , provided that ζ ⊥ w , the method simplifies.Thanks to Slutsky’s theorem, survey wealth would still be Pareto distributed with tailparameter α after adjusting for measurement error. As it follows, total wealth in thesurvey would scale up to (cid:80) Si =1 ζd ∗ i w i , and the Pareto CDF would turn into F α ( ζw i ) =1 − ( ζw /ζw i ) α .Simplifying this last formula and updating equation (7) for measurement error, weobtain the following estimate for total wealth: ζ ˆ t ( w ) = ζ ( αw ( α −
1) ˆ t ( d ; top ) + S (cid:88) i = s d ∗ i w i ) (15)This means that our estimate for α does not depend on the scaling of the variables.In this case, the coefficient for the Pareto-adjusted coverage ratio, given the national ac-counts total wealth, as in ζ = t ( w ) / ˆ t ( w ) , will yield the scalar to which to re-allocate re-ported survey wealth. It is straightforward that, to account for the missing wealth, wealthshould be scaled to ζ (ˆ t ( w ) − ˆ t ( w, miss )) , which, after Pareto-calibration, simplifies to ζ (cid:80) Si =1 d ∗ i w i .As the Pareto shape parameter is unaffected by the re-scaling, the iterative procedurewould no longer be needed. The adjustment for measurement error and for non-responseat the tail of the distribution can be run independently from each other.In theory, because of the assumptions above mentioned, whatever the adjustmentmethod for measurement error is used, the final data should still be Pareto distributedamong rich households.In practice, if one wants to make sure that this is the case, it is advisable to correct formeasurement error using calibration in a slightly different manner than the one describedin section 3.2. Traditional calibration methods find the optimal adjustment factor a i whichminimises the quadratic distortion of new weights relative to prior ones. We propose tochange the objective function so that the adjustment factor a i is minimised with respectto a quadratic loss function for reported wealth values, as follows: min S (cid:88) i =1 ( a i w i − w i ) w i s.t. ζ S (cid:88) i =1 d i w i = S (cid:88) i =1 d ∗ i a i w i (16)In this method, the correction for measurement error is based on univariate calibrationusing total wealth as a sole benchmark. As the objective function minimised distortionsrelative to the initial reported value, the final imputed data will be Pareto distributed. The ideal approach for assessing the quality of the results would be to compare them withan external benchmark, for instance, coming from highly reliable administrative records.Without such auxiliary information, we can assess the method in two ways. First, weassess the robustness of our results by comparing them with other estimators based on4different assumptions. Second, we assess the precision of our results by estimating theirvariability .Beyond our simultaneous approach, we compute five alternative estimators:• ‘Survey & missing tail’. The results are produced using the unadjusted survey data,plus an estimation of the total wealth held by rich household with zero probabilityof being in the survey (missing tail).• ‘Pareto-calibration & missing tail’. Survey data are adjusted with the Pareto-calibration model. Survey weights are calibrated and the total wealth of themissing tail is included in the estimate.• ‘Pareto-calibration, proportional allocation & missing tail’. This method adds tothe previous one a correction for measurement error based on proportional alloca-tion, as in Fesseau and Mattonetti (2013). This is a very naive method based onthe assumption that measurement error is equal across households and that it onlydepends on the instrument. Moreover, it does not enable to adjust for no-reporting.• ‘Single-iteration approach & missing tail’. In this method, the correction for mea-surement error is based on univariate calibration method described in subsection3.5. Adjustments are applied on the y variable (gross wealth). After rescaling thethreshold w to account for measurement error, the missing tail is re-estimated andincluded.• ‘Single-iteration approach, portfolio calibration & missing tail’. This method ex-tends the previous one by adding an extra step in which portfolios are calibratedusing financial accounts totals – adjusted to account for the missing part of the tail– and Pareto distributional information as benchmarks. Calibration in the extra stepworks again on weights.Variance estimation in our methodology has two main components. The first one isthe sampling variance, which indicates the variability introduced by choosing a sampleinstead of enumerating the whole population, assuming that the information collectedin the survey is otherwise exactly correct. A second source of variability is imputationvariance which refers to the fact the methodology for filling the gap can produce severaldifferent plausible imputed data sets. The uncertainty due to the imputation process addsup to the sampling variance.To estimate the overall variability we use the Rao-Wu rescaled bootstrap weights re-leased with HFCS data to account for sampling variability (Eurosystem Household Fi-nance and Consumption Network, 2020). For each of the 1,000 sets of bootstrap weightswe replicate all the methods previously described. In each replication, the parameters ofthe Pareto distribution are re-estimated introducing additional variability. We then obtainthe mean and standard deviation from all successful simulations and compute the coeffi-cient of variation to evaluate the robustness of our methods and derive a measure of theirvariability. A simulation is flagged as unsuccessful, and discarded, whenever a calibration procedure fails because oflack of convergence under the chosen restraints. The method described in the previous sections has been applied to the second 2014 waveof the HFCS. The first step consists of estimating the parameters of the Pareto distribu-tion ( w and α ). Figure 1 provides a graphical intuition of the automatic selection ofthreshold for the four selected countries, showing the estimated w and showing, giventhis threshold, linear fits for the mean excess conditional on wealth. Table 1 summarisesthe final results. As it appears, this approach provides benefits over an arbitrary thresholdselection: in all cases, the new threshold is found to be lower than C1 million, meaningthat subsequent estimates on tail behaviour will significantly benefit in precision.Figure 2 illustrates the outcome of the Pareto-calibration process, showing the empir-ical CCDF on a log-log scale before and after the adjustment. Re-weighted figures areproduced by using the proposed Pareto-calibration method; α indicates the Pareto shapeparameter estimated by imputing the rich list, while θ shows these estimation results withsurvey data only.Table 2 shows coverage ratios between survey wealth estimates and financial accounts.Column (1) shows initial coverage ratios, while column (2) displays coverage ratio foradjusted data, and column (3) grosses up survey wealth by estimating total wealth aftertruncation and adding it to the previous estimate. Columns (4) and (5) show the estimatednumber of households in the Pareto tail, along with the number of “missing rich”.Overall, these figures suggest that the proposed Pareto-calibration approach can pro-duce substantial improvements in survey coverage, especially in the absence of over-sampling or administrative data. In the case of Finland and Germany, the discrepanciesbetween micro and macro figures virtually disappear after calibrating survey data and ac-counting for the unobservable households. Coverage is also significantly improved forItaly and France, but the persistence of a mismatch between survey data and financialaccounts points to the presence of measurement error.Having re-estimated the number of households in the Pareto tail of the survey, ourmethod also shows substantial improvements in coverage over the grossing up methodsalready explored in the literature, and suggests that adjustments for non-response shouldalso focus on correcting the number of households in the Pareto tail, rather than only thewealth contained in it.After dealing with the issue of nonresponse at the tail of the distribution, we usemultivariate calibration to adjust for measurement error along the whole distribution.As benchmark constraints t ( y ) we use the financial instruments with high conceptualcomparability between survey and financial accounts – namely, deposits, bonds, shares,funds, insurances and pensions, money owed to the household and liabilities – followingfrom the comparability scale provided by EG-LMM (2017). The resulting adjustmentfactors are then applied to financial instruments with lower comparability – business andhousing wealth – which, assuming that measurement error is comparable within compa-rable financial instruments, should ensure that the adjustment will not be biased by thepresence of instruments with low comparability.We then iterate the Pareto-calibration and the multivariate calibration until conver-gence. Convergence has been set on the parameter α of the Pareto distribution: if the6estimated value in two consecutive steps differs by less than a small predefined thresh-old, the procedure stops. Convergence is usually achieved in a limited number of steps(between 1 and 3 in the application at hand).Table 3 shows the average values of the adjustment factors a i ’s (as well as coefficientsof variation) as a function of gross wealth percentiles at the end of the iterative procedurefor the four countries. That is, these are the overall adjustment of the survey variables atthe end of the procedure obtained as the ratio between the final imputed values and theones from the original survey. Table 4 shows distributional results indicating the proportion of net wealth held by thetop 1, 5, 10, and 20 weighted percentiles, along with the bottom 50%. Weighted Gini in-equality indices are also presented in column (6), while column (7) provides the estimatedPareto tail parameter α given the data. These figures are reproduced under each allocationmethod. The bootstrap-based coefficient of variation is reported in parentheses for eachestimate.The first set of rows (‘Base Survey’) presents distributional figures from the unad-justed HFCS data. As is well known, truncation in top wealth distribution and measure-ment error can cause survey estimates to understate the true level of wealth inequality,and the figures presented in the table provide support for this possibility. Indeed, esti-mates from the unadjusted HFCS would suggest wealth inequality in Italy, which has oneof the largest micro-macro gaps, to be close to the inequality level in Finland, where thegap is lower.Column (7) displays the Pareto tail coefficients. In the first set of rows, the α parameteris estimated using survey data only, meaning that this is the Pareto estimate that surveydata yields when truncation is not corrected through the imputation of a rich list.For all following sets of rows, which correspond to the alternative estimators discussedin section 4, we also include an adjustment for the unobserved part of the Pareto tail aspresented in section 3.1. To do so, these missing households are imputed as a singleobservation in which the weight and wealth are respectively equal to the estimated numberof unobserved households and the estimated average wealth in the unobserved Pareto tail.The second set of rows (‘Survey & missing tail’) displays estimates produced usingthe un-adjusted survey data, plus the missing tail households. Depending on the size ofthe truncation in the Pareto tail, inequality estimates can be affected considerably. Forsurveys, such as the Italian and German ones, in which truncation bias is particularlypronounced, the sole inclusion of these unobserved households increases the proportionof wealth held by the top 1% households by at least 10.7 and 10.3 percentage points,respectively. This increase is much less pronounced for the French and Finnish surveys,where the truncation is also much more modest.The inclusion of the unobserved tail raises inequality levels for all the surveys con-sidered, but again these increases are proportional to the size of the truncation. Finally, In the current application, this tolerance was set at 0.05. θ parameters (obtainedwithout imputation of the rich list) as well, which are now closer to the rich-list imputed α parameters, as shown in figure 2.The row sets from fourth to sixth adjust the survey applying the estimators described insection 4. For countries like Finland and Germany, where measurement error seems to bea negligible issue, these adjustments might not be needed, and remaining divergences inportfolio item coverage against macroeconomic aggregates should be treated as samplingissues and adjusted through weight calibration, as detailed in section 4, and shown in thelast set of rows in table 4.In the fourth set of rows (‘Par-cal, proportional allocation & missing tail’), portfolioitems are scaled proportionally to the Financial Accounts aggregates. Proportional allo-cation, however, seems like an inadequate solution. While proportionally allocated itemsdo not generate severe distortions in the estimated Pareto distribution, the proportionalallocation will most likely affect the portfolio allocation within each household. Since itis based on very unreliable assumptions this method should be considered in cases wherethe gap to fill is minimal.The fourth (‘Par-cal, Wealth calibration, & missing tail’) and fifth (‘Par-cal,Wealth/portfolio calibration, & missing tail’) sets of rows show how distributional figuresare affected by the approach described in subsection 3.2 .In both cases, substantial differences over proportional allocation can be noted. First,the Pareto tail parameter is always closer to the initial estimate, meaning that the reallo-cation process, this time, leaves the distributional features of the survey intact. Secondly,inequality figures appear to be much more like the estimates produced in the previoussteps. Indeed, the final output shows comparable results across all surveys, in which theincreases in inequality, compared to the initial survey data, are proportional to the severityof both truncation and measurement error problems.Most importantly, the α parameter is still close enough to the one estimated initially,suggesting, once again, that neither adjustment gives rise to unnecessary distortions in thetail wealth distribution. This is a relevant result, that validates the assumption of relative8error converging in probability to a constant.While wealth calibration should not be treated as a substitute for proper models foradjusting for measurement error, especially when this error is linked to socio-economicor behavioural factors, these calibration-based methods can still assist in the productionof distributional figures without exposing the researcher to the risk of misrepresenting thedistribution of household wealth and individual asset compositions.Also, the use of portfolio calibration (as in the penultimate set of rows) can help whenmeasurement error is supposed to be null (Finland, and Germany to a lesser degree),and when models have been used to address such a problem. In these cases, the wealthcalibration step can be skipped entirely, while the portfolio calibration can be paired withPareto-calibration within the same step, so that the weighted sum of each portfolio item iskept consistent with the corresponding macro-economic aggregate, producing consistentand correct distributional figures.The results obtained using the simultaneous approach are presented in the final setof rows. Here, we see that the distributional estimates are broadly in line with the resultsproduced by the other methods. In particular, it provides very similar results to the Single-iteration approach, suggesting that its simplifying assumptions are likely to hold, at leastin the four countries used in the analysis.Overall, all the methods consistently show that the household finance survey under-estimate the levels of wealth inequality. Moreover, the larger the wealth gap betweenmicro and macro data, the higher the increase in the measures of inequality.As to variance estimation, the adjustment methods generally produce a decrease inthe reliability of the results. This is expected since they add some additional variabilitybecause of the imputation process.For each method, the precision increases when the statistic relates the bottom or me-dian part of the wealth distribution. The estimators of the wealth share held by the top 1percent have a low precision in all countries.Compared to other methods, the simultaneous approach produces the lowest increasein variability. This is also due to the use of multivariate calibration, a method that hasbeen originally developed to increase the precision of estimators. The final coefficients ofvariation are not very different from those based on the unadjusted survey data, especiallyfor the statistics that do not relate to the top tail of the distribution. In this paper, we show how a combination of well-established methodologies for the fittingof a Pareto distribution and the calibration of survey data can be used to correct for non-response and misreporting when only limited external information is available.We apply these methods to the HFCS data, using the 2014 Finnish, French, German,and Italian surveys, and employing rich list data from Forbes or national press sources,along with household sector aggregates from national accounts, as auxiliary sources ofinformation.We show that these adjustment methods improve the production of distributional na-9tional accounts for the household sector, since inequality estimates from the survey dataunderstate the population parameters, depending on the severity of both non-responseand measurement error. We also discuss how to assess the quality of these distributionalindicators.Further work is needed for the refinement of the methodology we propose. For exam-ple, the estimation of the number of wealthy households could be further validated andimproved, for instance by using alternatives to rich lists (such as tax records) or by apply-ing additional methods (such as the Type II Pareto or the Estate Multiplier Method). Also,the correction of measurement error could be further improved by enriching the auxiliaryvariables vector with more granular external information (if available).Nonetheless, our framework has the advantage to offer a set of adaptable tools thatcan be fine-turned on a case-by-case basis. Indeed, both the Pareto-calibration adjustmentand the multivariate calibration methods can be enhanced with external information andcan be run separately when needed.Moreover, our contribution allows to compute several distributional indicators usingthe adjusted micro data-set, while most studies in the current literature only focus onproviding aggregate estimates for wealthy households.
Acknowledgements
The paper has greatly benefited from the discussions with all members of the EG-LMMand EG-DFA working groups.
References
Aban, I. B., Meerschaert, M. M., and Panorska, A. K. (2006). Parameter estimationfor the truncated pareto distribution.
Journal of the American Statistical Association ,101(473):270–277.Alvaredo, F., Atkinson, A. B., and Morelli, S. (2018). Top wealth shares in the UK overmore than a century.
Journal of Public Economics , 162:26–47.Alvaredo, F. and Saez, E. (2009). Income and wealth concentration in spain from a histor-ical and fiscal perspective.
Journal of the European Economic Association , 7(5):1140–1167.Ampudia, M., van Vlokhoven, H., and ˙Zochowski, D. (2016). Financial fragility of euroarea households.
Journal of Financial Stability , 27:250 – 262.Bach, S., Thiemann, A., and Zucco, A. (2019). Looking for the missing rich: tracing thetop tail of the wealth distribution.
International Tax and Public Finance , 26(6):1234–1258.Blanchet, T., Flores, I., and Morgan, M. (2018). The weight of the rich: Improvingsurveys using tax data.
WID.world WORKING PAPER SERIES N° 2018/12 .0Blanchet, T., Fournier, J., and Piketty, T. (2017). Generalized pareto curves: theory andapplications.
WID.world WORKING PAPER SERIES N° 2017/3 .Casiraghi, M., Gaiotti, E., Rodano, L., and Secchi, A. (2018). A “reverse Robin Hood”?The distributional implications of non-standard monetary policy for Italian households.
Journal of International Money and Finance , 85(C):215–235.Chakraborty, R., Kavonius, I., Perez-Duarte, S., and Vermeulen, P. (2019). Is the toptail of the wealth distribution the missing link between the household finance and con-sumption survey and national accounts?
Journal of official Statistics , 35:31–65.Chakraborty, R. and Waltl, S. R. (2018). Missing the wealthy in the HFCS: micro prob-lems with macro implications. ECB Working Paper Series No 2163, European CentralBank.Chambers, R. L. and Ren, R. (2004). Outlier robust imputation of survey data.
TheProceedings of the American Statistical Association , pages 3336–3344.Coibion, O., Gorodnichenko, Y., Kueng, L., and Silvia, J. (2017). Innocent bystanders?monetary policy and inequality.
Journal of Monetary Economics , 88:70 – 89.Colciago, A., Samarina, A., and de Haan, J. (2019). Central bank policies and incomeand wealth inequality: A survey.
Journal of Economic Surveys , 0(0).D’Alessio, G. and Neri, A. (2015). Income and wealth sample estimates consistent withmacro aggregates: some experiments. Questioni di Economia e Finanza (OccasionalPapers) 272, Bank of Italy, Economic Research and International Relations Area.Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds.
Journal of the Royal Statistical Society. Series B (Methodological) , 52(3):393–442.Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling.
Jour-nal of the American Statistical Association , 87(418):376–382.EG-LMM (2017). Understanding, quantifying and explaining the differences betweenmacro and micro data of household wealth: Final report. mimeo, European CentralBank.Eurosystem Household Finance and Consumption Network (2009). Survey data on house-hold finance and consumption research: summary and policy use. ECB OccasionalPaper Series No 100, European Central Bank.Eurosystem Household Finance and Consumption Network (2020). The household fi-nance and consumption survey:methodological report for the 2017 wave. ECB Statis-tics Paper Series No 35, European Central Bank.Expert Group on Linking macro and micro data (2020). Understanding household wealth:linking macro and micro data to produce distributional financial accounts. StatisticsPaper Series 37, European Central Bank.1Fesseau, M. and Mattonetti, M. L. (2013). Distributional measures across householdgroups in a national accounts framework. OECD Statistics Working Papers No.2013/08, Organisation for Economic Co-Operation and Development (OECD).Frémeaux, N. and Leturcq, M. (2020). Inequalities and the individualization of wealth.
Journal of Public Economics , 184:104145.Gabaix, X. and Ibragimov, R. (2011). Rank - 1 / 2: A simple way to improve the olsestimation of tail exponents.
Journal of Business & Economic Statistics , 29(1):24–39.Garbinti, B., Goupille-Lebret, J., and Piketty, T. (2018). Income inequality in france,1900–2014: Evidence from distributional national accounts (DINA).
Journal of PublicEconomics , 162:63–77.Garbinti, B., Goupille-Lebret, J., and Piketty, T. (2020). Accounting for Wealth-InequalityDynamics: Methods, Estimates, and Simulations for France.
Journal of the EuropeanEconomic Association .Guiso, L., Paiella, M., and Visco, I. (2005). Do capital gains affect consumption? es-timates of wealth effects from italian households’ behavior.
Long-run Growth andShort-run Stabilization: Essays in Memory of Albert Ando .Haziza, D., Beaumont, J.-F., et al. (2017). Construction of weights in surveys: A review.
Statistical Science , 32(2):206–226.Kennickell, A. (2008). The role of over-sampling of the wealthy in the survey of consumerfinances.
Irving Fisher Committee Bulletin , 28.Kennickell, A. (2019). The tail that wags: differences in effective right tail coverage andestimates of wealth inequality.
The Journal of Economic Inequality .Langousis, A., Mamalakis, A., Puliga, M., and Deidda, R. (2016). Threshold detection forthe generalized Pareto distribution: Review of representative methods and applicationto the NOAA NCDC daily rainfall database.
Water Resources Research , 52(4):2659–2681.Little, R. J. and Vartivarian, S. (2005). Does weighting for nonresponse increase thevariance of survey means?
Survey Methodology .Michelangeli, V. and Rampazzi, C. (2016). Indicators of financial vulnerability: a house-hold level study. Questioni di Economia e Finanza (Occasional Papers) 369, Bank ofItaly, Economic Research and International Relations Area.Paiella, M. (2007). Does wealth affect consumption? Evidence for Italy.
Journal ofMacroeconomics , 29(1):189–205.Ranalli, M. G. and Neri, A. (2011). To misreport or not to report?, The case of the Italiansurvey on household income and wealth.
Statistics in Transition new series , 12(2):281–300.2Rubin, D. B. (1976). Inference and missing data.
Biometrika , 63(3):581–592.Rubin, D. B., editor (1987).
Multiple Imputation for Nonresponse in Surveys . John Wiley& Sons, Inc.Saez, E. and Zucman, G. (2016). Wealth inequality in the united states since 1913:Evidence from capitalized income tax data*.
The Quarterly Journal of Economics ,131:qjw004.Särndal, C.-E. (2007). The calibration approach in survey theory and practice.
SurveyMethodology , page 99.Särndal, C.-E. and Lundström, S. (2005).
Estimation in Surveys with Nonresponse . JohnWiley & Sons, Ltd.Schröder, C., Bartels, C., Grabka, M. M., König, J., Kroh, M., and Siegers, R. (2019). Anovel sampling strategy for surveying high net-worth individuals—a pretest applicationusing the socio-economic panel.
Review of Income and Wealth .Vermeulen, P. (2016). Estimating the top tail of the wealth distribution.
American Eco-nomic Review , 106(5):646–50.Vermeulen, P. (2018). How fat is the top tail of the wealth distribution?
Review of Incomeand Wealth , 64(2):357–387.Waltl, S. (2018). Multidimensional Wealth Inequality: A Hybrid Approach toward Dis-tributional National Accounts in Europe. In
Proc. 35th IARIW General Conference(IARIW 2018) .Yang, G. L. (1978). Estimation of a biometric function.
The Annals of Statistics , 6(1):112–116.3 F IGURE
1: Pareto Threshold detection. Mean excess plots for gross recorded wealthin the HFCS. Predicted Pareto thresholds and linear fits estimated using the proposedmethodology.4F
IGURE
2: Pareto Tail Re-weighting. Empirical cumulative distribution functions (logscale) for survey wealth distributions in the Pareto Tail. Re-weighting achieved by usingthe Pareto-calibration method, using the calibration benchmarks from equation (10). θ parameters estimated using survey data only, α estimated using Vermeulen’s 2018 regres-sion method with imputed rich list.5T ABLE θ α w Country (1) (2) (3)IT 1.952 1.491 310,084FR 1.771 1.537 567,378DE 1.499 1.362 254,000FI 2.145 1.718 880,806
Notes:
Pareto tail parameters and estimated thresholds. θ parameters estimated using survey dataonly, α estimated using Vermeulen’s 2018 regression method with imputed rich list. T ABLE ± ± ± ± Notes:
Coverage ratios and estimated number of households in the tail. Re-weighting achieved withthe Pareto-calibration method, using the calibration benchmarks from equation 10. T ABLE
Notes:
Mean and coefficient of variation of overall adjustment factors a i , equations (12) and (13),from the multivariate calibration approach for imputation as a function of gross wealth percentiles. ABLE α S.r.Country (1) (2) (3) (4) (5) (6) (7) (8)Base SurveyIT 0.114 0.293 0.424 0.598 0.103 0.597 1.952(0.074) (0.032) (0.021) (0.013) (0.039) (0.012) (0.075)FR 0.178 0.361 0.494 0.664 0.069 0.663 1.771(0.089) (0.036) (0.022) (0.012) (0.040) (0.011) (0.052)DE 0.226 0.443 0.575 0.744 0.030 0.741 1.499(0.111) (0.050) (0.031) (0.015) (0.073) (0.014) (0.159)FI 0.120 0.286 0.416 0.595 0.101 0.596 2.145(0.050) (0.019) (0.012) (0.007) (0.025) (0.007) (0.069)Survey & missing tailIT 0.221 0.379 0.495 0.648 0.091 0.647 1.491 0.999(0.140) (0.069) (0.044) (0.024) (0.056) (0.019) (0.015)FR 0.198 0.377 0.507 0.672 0.067 0.671 1.537 1.000(0.254) (0.111) (0.067) (0.034) (0.079) (0.021) (0.030)DE 0.329 0.520 0.634 0.779 0.026 0.776 1.362 0.971(0.077) (0.038) (0.024) (0.012) (0.074) (0.012) (0.017)FI 0.136 0.299 0.427 0.602 0.099 0.604 1.718 1.000(0.072) (0.026) (0.016) (0.009) (0.027) (0.009) (0.039)Pareto-calibration & missing tailIT 0.266 0.431 0.540 0.682 0.084 0.677 1.456 0.999(0.087) (0.044) (0.029) (0.015) (0.045) (0.014) (0.015)FR 0.274 0.438 0.556 0.705 0.061 0.704 1.500 1.000(0.182) (0.084) (0.050) (0.025) (0.039) (0.023) (0.029)DE 0.360 0.548 0.657 0.793 0.025 0.789 1.365 0.971(0.084) (0.040) (0.024) (0.010) (0.071) (0.010) (0.017)FI 0.192 0.350 0.472 0.635 0.091 0.634 1.718 1.000(0.165) (0.075) (0.046) (0.022) (0.030) (0.020) (0.039)Par-cal, Proportional allocation & missing tailIT 0.299 0.470 0.581 0.715 0.077 0.716 1.427 0.999(0.108) (0.054) (0.035) (0.018) (0.048) (0.017) (0.009)FR 0.239 0.408 0.531 0.687 0.070 0.699 1.558 0.996(0.181) (0.086) (0.053) (0.027) (0.048) (0.025) (0.032)DE 0.337 0.517 0.632 0.774 0.033 0.734 1.376 0.971(0.092) (0.044) (0.026) (0.011) (0.071) (0.012) (0.016)FI 0.190 0.349 0.470 0.634 0.092 0.633 1.735 1.000(0.172) (0.079) (0.049) (0.024) (0.031) (0.021) (0.039)7T
ABLE α S.r.Country (1) (2) (3) (4) (5) (6) (7) (8)Par-cal, Wealth calibration & missing tailIT 0.300 0.465 0.569 0.702 0.078 0.674 1.442 0.984(0.225) (0.117) (0.077) (0.042) (0.102) (0.042) (0.009)FR 0.264 0.428 0.548 0.698 0.063 0.714 1.534 0.988(0.230) (0.111) (0.068) (0.033) (0.068) (0.033) (0.034)DE 0.357 0.549 0.658 0.795 0.024 0.761 1.365 0.970(0.085) (0.046) (0.030) (0.016) (0.080) (0.015) (0.016)FI 0.188 0.349 0.472 0.636 0.090 0.635 1.707 1.000(0.168) (0.084) (0.056) (0.032) (0.049) (0.028) (0.039)Single-iteration approach & missing tailIT 0.303 0.470 0.580 0.712 0.073 0.707 1.432 0.983(0.297) (0.121) (0.077) (0.042) (0.110) (0.042) (0.009)FR 0.244 0.422 0.543 0.695 0.064 0.693 1.511 0.987(0.246) (0.104) (0.059) (0.026) (0.071) (0.026) (0.038)DE 0.362 0.549 0.658 0.794 0.025 0.789 1.365 0.942(0.115) (0.050) (0.032) (0.016) (0.091) (0.016) (0.021)FI 0.142 0.312 0.439 0.613 0.100 0.610 1.779 0.992(0.110) (0.044) (0.029) (0.017) (0.032) (0.013) (0.029)Simultaneous approachIT 0.299 0.483 0.595 0.730 0.069 0.708 1.465 0.999(0.081) (0.038) (0.023) (0.013) (0.055) (0.013) (0.006)FR 0.287 0.454 0.569 0.710 0.071 0.698 1.502 0.996(0.148) (0.065) (0.040) (0.021) (0.044) (0.018) (0.019)DE 0.340 0.524 0.636 0.774 0.034 0.767 1.388 0.970(0.092) (0.043) (0.025) (0.012) (0.068) (0.012) (0.018)FI 0.173 0.336 0.461 0.628 0.097 0.617 1.748 0.999(0.115) (0.049) (0.034) (0.020) (0.050) (0.017) (0.016)