[PDF] Fully Bayesian Estimation under Dependent and Informative Cluster Sampling

Abstract

Survey data are often collected under multistage sampling designs where units are binned to clusters that are sampled in a first stage. The unit-indexed population variables of interest are typically dependent within cluster. We propose a Fully Bayesian method that constructs an exact likelihood for the observed sample to incorporate unit-level marginal sampling weights for performing unbiased inference for population parameters while simultaneously accounting for the dependence induced by sampling clusters of units to produce correct uncertainty quantification. Our approach parameterizes cluster-indexed random effects in both a marginal model for the response and a conditional model for published, unit-level sampling weights. We compare our method to plug-in Bayesian and frequentist alternatives in a simulation study and demonstrate that our method most closely achieves correct uncertainty quantification for model parameters, including the generating variances for cluster-indexed random effects. We demonstrate our method in two applications with NHANES data.

Full PDF

OO R I G I N A L A R T I C L E

Fully Bayesian Estimation under Dependent andInformative Cluster Sampling

Luis G. León-Novelo PhD | Terrance D. Savitsky PhD Department of Biostatistics and DataScience, University of Texas Health ScienceCenter at Houston- School of Public Health,Houston, TX, 77030, USA Oﬃce of Survey Methods Research,Bureau of Labor Statistics, Washington, DC,20212-0001, USA

Correspondence

Luis León-Novelo PhD, Department ofBiostatistics and Data Science, Universityof Texas Health Science Center at Houston-School of Public Health, 1200 Pressler St.Suite E805, Houston, TX, 77030, USAEmail: [email protected]

Present address † Department of Biostatistics and DataScience, University of Texas Health ScienceCenter at Houston- School of Public Health,1200 Pressler St. Suite E805, Houston, TX,77030, USA

Funding information

Survey data are often collected under multistage samplingdesigns where units are binned to clusters that are sam-pled in a ﬁrst stage. The unit-indexed population variablesof interest are typically dependent within cluster. We pro-pose a Fully Bayesian method that constructs an exact like-lihood for the observed sample to incorporate unit-levelmarginal sampling weights for performing unbiased infer-ence for population parameters while simultaneously ac-counting for the dependence induced by sampling clustersof units to produce correct uncertainty quantiﬁcation. Ourapproach parameterizes cluster-indexed random eﬀects inboth a marginal model for the response and a conditionalmodel for published, unit-level sampling weights. We com-pare our method to plug-in Bayesian and frequentist alter-natives in a simulation study and demonstrate that our methodmost closely achieves correct uncertainty quantiﬁcation formodel parameters, including the generating variances forcluster-indexed random eﬀects. We demonstrate our methodin two applications with NHANES data.

K E Y W O R D S

Cluster sampling; Inclusion probabilities; Informative sampling;

Abbreviations:

BMI, Body Mass Index; CI, Credible Interval; cSRS, cluster simple random sampling; dep- y , association among theresponses within the same PSU; dep- π , association among inclusion probabilities within the same PSU; LS2019, León-Novelo andSavitsky[2019]; IS, Informative Sampling; kcal, kilocalories; NHANES, National Health and Nutrition Examination Survey; MSE, MeanSquare Error; PBF, Proportion of Body Fat; PPS, probability proportional to size; PSU, primary stage sampling unit; PSU-REs, PSU-speciﬁc random eﬀects; SE, Standard Error; SLR, Simple Linear Regression; SRS, Simple Random Sampling. * Equally contributing authors. a r X i v : . [ s t a t . M E ] J a n Bayes Estimation/Informative Cluster Sampling

Mixed eﬀects linear model; NHANES; primary stage samplingunit; Sampling weights; Survey sampling | INTRODUCTION

Inference with data from a complex sampling scheme, such as that collected by the National Health and NutritionExamination Survey (NHANES), requires consideration of the sampling design. A common multistage sampling schemein public survey datasets: i Divide survey population into H strata. ii Each stratum contains N h clusters of individuals called primary stage sampling units (PSU) from which J h PSUs areselected. PSU hj is selected with probability π hj . By design, at least one PSU is selected in each stratum, i.e. J h ≥ , (cid:91) h . iii Within each selected PSU, n hj individuals are sampled out of the total N hj in the PSU. Each individual is sampledwith probability π i | hj , i = 1 , . . . , N hj . We call a sampled individual survey participant.The indices i , j , h index individual, PSU and stratum, respectively. The marginal probability of including an individualin the sample is then π (cid:48) ihj = π i | hj π hj .In addition to sampling clusters of dependent individuals, both clusters and individuals-within-clusters are typ-ically selected with unequal sampling inclusion probabilities in order to improve estimation power for a populationsubgroup or to reduce variance of a global estimator. The sample inclusion probabilities are constructed to be corre-lated with or “informative" about the response variable of interest to reduce variance of the estimator. On the onehand, stratiﬁcation reduces the standard error (SE) of estimates while, on the other hand, clustering tends to increaseit since clustering induces dependence and is used for convenience and to reduce cost. Utilizing unequal inclusionprobabilities can reduce the variance of the estimator where a subset of units drives the computation, such as is thecase where larger-sized employers drive the estimation of total employment for the Current Employment Statisticssurvey administered by the U.S. Bureau of Labor Statistics; more often, the use of unequal inclusion probabilitiestends to increase the variance of the estimator due to the variation in the information about the population reﬂectedin observed samples. Ignoring PSU and unequal sampling inclusion probabilities underestimates the SE because ofthe dependence among individuals within a PSU and the variation of information about the population reﬂected ininformative samples drawn from it.The statistical analyst receives variables of interest for each survey participant along with the stratum and PSUidentiﬁers to which s/he belongs to as well as sampling weights, w ihj ∝ / π ihj . The inclusion probability, π ihj , is aversion of π (cid:48) ihj after adjusting for oversampling of subpopulations and nonresponse.In the context of NHANES, a stratum is deﬁned by the intersection of geography with concentrations of minoritypopulations and a PSU is constructed as a county or a group of geographically continuous counties. Secondary andtertiary stage sampling units include segments (contiguous census blocks) and households. The ﬁnal unit is an eligibleparticipant in the selected household. NHANES released masked stratum and PSU information to protect participant’sprivacy. Every 2-year NHANES-data cycle (CDC, 2011) releases information obtained from H = 15 strata with n h = 2 PSU per stratum.In this paper, we focus on a two-stage sampling design characterized by the ﬁrst stage sampling of PSUs and,subsequent, second stage sampling of units without loss of generality. We employ a fully Bayesian estimation ap- ayes Estimation/Informative Cluster Sampling 3 proach that co-models the response variable of interest and the marginal inclusion probabilities as is introduced inLeón-Novelo and Savitsky (2019), hereafter referred to LS2019. We extend their approach by constructing PSU-indexed random eﬀects speciﬁed in both the marginal model for the response variable and the conditional model forthe sampling inclusion probabilities.Since, from now on, we do not deal with strata we do not consider point i and subindex h in the discussion above.Sampled individual, ( ij ) , denotes individual i ∈ ( , . . . , n j ) in cluster j ∈ ( , . . . , J pop ) included in the sample, where J pop denotes the total number of PSUs included in the population. Let J ≤ J pop denote the number of PSUs actuallysampled. We assume that the sampling weight, w ij , is proportional to the inverse marginal inclusion probability, π ij ofindividual ij included in the sample; or π ij ∝ / w ij . We denote the vector of predictors associated to individual ij as x ij . The data analyst aims to estimate the parameters, θ , of a population model, p ( y | θ , x ) , that they specify fromthese data. Relabeling PSU indices in the sample so they run from , . . . , J , the analyst observes sample of size, n = (cid:205) Jj =1 n j , and the associated variables, { y ( s ) ij , x ( s ) ij , π ( s ) ij ∝ / w ( s ) ij , j } i =1 ,..., n j , j =1 ,..., J with n j the number of participantsfrom PSU j and superindex ( s ) denotes in the sample. By contrast, y ij without index ( s ) denotes a response of anindividual in the survey population but not, necessarily, a survey participant included in the observed sample.The probability of inclusion of each PSU (denoted as π j in point ii, above) is unknown to the analyst becauseit is not typically published for the observed sample, though the PSU inclusion probabilities are used to constructpublished unit marginal inclusion probabilities, (such that inclusion probabilities within the same PSU tend to be moresimilar or correlated), but the dependence of units in the same PSU may not be fully accounted for by the dependenceon their inclusion probabilities.A sampling design is informative for inference about individuals within a group when y ij (cid:54)⊥ π ij | x ij . A samplingdesign will also be informative for PSUs in the case that ¯ y · j − ¯ y = ( / N j ) (cid:205) N j i =1 y ij − ¯ y (cid:54)⊥ π j | ¯ x j with ¯ x j = ( / N j ) (cid:205) N j i =1 x ij .Even if a sampling design is not informative for individuals and/or groups, however, there are typically scale eﬀectsinduced by within group dependence that must be accounted for to produce correct uncertainty quantiﬁcation.LS2019 propose a model-based Bayesian approach appropriate under informative sampling that incorporates thesampling weights into the model by modelling both the response given the parameter of interest and the inclusionprobability given the response, π ij | y ij . The main advantages of this approach is that it yields (1) consistent pointestimates [LS2019] (point estimates converge in probability to true values), (2) credible intervals that achieve nominal(frequentist) coverage, and (3) robust inference against mis-speciﬁcation of π ij | y ij .LS2019 only focus on ﬁxed eﬀect models and ignore the dependence induced by the sampling design; that is,both association among the responses within the same PSU (that we label, dep- y ), and possible association amonginclusion probabilities within the same PSU (that we label, dep- π ). This paper extends the approach of LS2019 toaccount for these associations via mixed eﬀect models. More speciﬁcally, we include PSU-speciﬁc random eﬀects(PSU-REs) in both the model for the responses and in the model for the inclusion probabilities. We introduce our gen-eral approach in Section 3, but, in the rest of the paper, we focus on the linear regression setting. Competing methodsare summarized in this section as well. In Section 4, we show via simulation that our approach yields credible intervalswith nominal (frequentist) coverage, while the competing methods do not in some simulation scenarios. In Section5 we demonstrate our approach by applying it to two NHANES datasets. The ﬁrst application models the relation-ship between percentage of body fat to body mass index while the second estimates the daily kcal consumption ofdiﬀerent demographic groups in the U.S. population. Inference under our Fully Bayes approach is compared againstinference under competing plug-in Bayesian and frequentist methods. We provide a ﬁnal discussion section and anAppendix containing details not discussed, but referred to in the main paper. Bayes Estimation/Informative Cluster Sampling | REVIEW OF LS2019

LS2019 introduces the inclusion probabilities into the Bayesian paradigm by assuming them to be random. In thissection we review their approach to extend it to include PSU information, in the next section. The superpopulationapproach in LS2019 assumes that the ﬁnite population of size N , ( y , π , x ) , . . . ( y N , π N , x N ) is a realization of ( y i , π i ) | x i , θ , κ ∼ p ( y i , π i | x i , θ , κ ) = p ( π i | y i , x i , κ ) p ( y i | x i , θ ) , i = 1 , . . . , N . (1)Here, y i is the response for individual i with vector of covariates x i and π i > is proportional to probability ofindividual i being sampled. It is assumed that ( y i , π i ) ⊥ ( y i (cid:48) , π i (cid:48) ) | x i , x i (cid:48) , θ , κ , for i (cid:44) i (cid:48) and x i is not modelled. Notealso that equation above presumes that π i ⊥ θ | y i , x i , κ and y i ⊥ κ | x i , θ . The population parameter θ determinesthe relationship between the x i and y i , and is of main interest. The parameter κ is a nuisance parameter that allowsmodeling the association between π i and y i , though we later see in our simulation study section that it provides insighton the informativeness of the sampling design for a particular response variable of interest. The informative sampleof size n is drawn so that P [ individual i in sample ] = π i (cid:46) (cid:205) Ni (cid:48) =1 π i (cid:48) . Bayes theorem implies, p ( y i , π i | x i , θ , κ , individual i in sample ) = Pr ( individual i in sample | y i , π i , x i , θ , κ ) × p ( y i , π i | x i , θ , κ ) denominator , (2)By the way the informative sample is drawn, and (1), the numerator in (2) is π i (cid:205) Ni (cid:48) =1 π i (cid:48) × p ( π i | y i , x i , κ ) p ( y i | x i , θ ) . (3)The denominator is obtained by integrating out ( y i , π i ) in the numerator, this is (cid:205) Ni (cid:48) =1 π i (cid:48) × ∫ π (cid:63) i p ( π (cid:63) i | y (cid:63) i , x i , κ ) p ( y (cid:63) i | x i , θ ) d π (cid:63) i dy (cid:63) i = 1 (cid:205) Ni (cid:48) =1 π i (cid:48) × E y (cid:63) i | x i , θ (cid:2) E (cid:2) π (cid:63) i | y (cid:63) i , x i , κ (cid:3)(cid:3) (4)The superindex (cid:63) is used to distinguish the quantities integrated out from the ones in the numerator. Plugging (3) and(4) in (2) we obtain equation (5) in LS2019 given by, p s ( y i , π i | x i , θ , κ ) =  π i p ( π i | y i , x i , κ ) E y (cid:63) i | x i , θ (cid:2) E ( π (cid:63) i | y (cid:63) i , x i , κ ) (cid:3)  × p ( y i | x i , θ ) (5)Where the LHS, p s (· · · | · · · ) , denotes the joint distribution of ( y i , π i ) conditioned on the individual i being in thesample, i.e. , the LHS of (2) that is equal to p (· · · | · · · , individual i in sample ) . Inference is based on this exact likelihoodfor the observed sample with, (cid:96) ( θ , κ ; y ( s ) , π ( s ) , x ( s ) ) = n (cid:214) i =1 (cid:104) p s ( y ( s ) i , π ( s ) i | x i ( s ) , θ , κ ) (cid:105) where the superindex ( s ) is used to emphasize that these are the values observed in the sample. We also relabel theindex i running from , . . . , N in the population so it runs from , . . . , n in the sample. A Bayesian inference model iscompleted by assigning priors to θ and κ .Notice that the π i s are proportional, and not equal, to the inclusion probabilities and besides being positive, no ayes Estimation/Informative Cluster Sampling 5 restriction is imposed on (cid:205) ni =1 π ( s ) i or (cid:205) Ni =1 π i . Also note that under noninformative sampling, i.e. when y i ⊥ π i | x i ,the quantity between curvy brackets in (5) does not depend on y i and therefore inference on θ does not depend onthe inclusion probabilities, or the π i s. In other words, inference using (5) is the same as if treating the sample as an SRSfrom the model y i ∼ p ( y i | x i , θ ) . For the informative sampling case, in theory, we can assume any distribution for y i | x i , θ and π i | y i , x i , κ . In practice, the calculation of E y (cid:63) i | x i , θ [· · · ] in (5) is a computational bottleneck. Theorem 1in LS2019, stated below, provides conditions to obtain a closed form for this expected value.Let v i and u i be subvectors of x i , the covariates used to specify the conditional distribution of π i | y i , x i , κ and y i | x i , θ , respectively; that is, π i | y i , x i , κ ∼ π i | y i , v i , κ and y i | x i , θ ∼ y i | u i , θ . Note that we allow for v i and u i to have common covariates. Let normal ( x | µ , s ) denote the normal distribution pdf with mean µ and variance s evaluated at x , and lognormal (· | µ , s ) denote the lognormal pdf, so that X ∼ lognormal ( µ , s ) is equivalent tolog X ∼ normal ( µ , s ) . Theorem 1 (Theorem 1 in LS2019) If p ( π i | y i , v i , κ ) = lognormal ( π i | h ( y i , v i , κ ) , σ π ) , with the function h ( y i , v i , κ ) ofthe form h ( y i , v i , κ ) = g ( y i , v i , κ ) + t ( v i , κ ) where σ π = σ π ( κ , v i ) , possibly a function of ( κ , v i ) then p s ( y i , π i | u i , v i , θ , κ ) = normal (cid:16) log π i | g ( y i , v i , κ ) + t ( v i , κ ) , σ π (cid:17) exp (cid:8) t ( v i , κ ) + σ π / (cid:9) × M y ( κ ; u i , v i , θ ) × p ( y i | u i , θ ) with M y ( κ ; u i , v i , θ ) : = E y (cid:63) i | u i , θ (cid:2) exp (cid:8) g ( y (cid:63) i , v i , κ ) (cid:9)(cid:3) . If both M y and p ( y i | · · · ) admit closed form expressions, then p s ( y i , π i | · · · ) has a closed form, as well; forexample, when g ( y i , v i , κ ) = κ y y i with κ y ∈ κ ∈ (cid:210) , then M y ( κ ; u i , v i , θ ) is the moment generating function (MGF)of y i | θ evaluated at κ y , which may have a closed form deﬁned on (cid:210) . This implies a closed form for p s ( y i , π i |· · · ) . Analogously, we may consider an interaction between y and v , using g ( y i , v i , κ ) = ( κ y + v ti κ v ) y i ≡ t y i with κ = ( κ y , κ v ) . In this case, we achieve, M y ( t ; · · · ) , which is the MGF evaluated at t . As mentioned in LS2019, theassumption of a lognormal distribution for π i is mathematically appealing. The inclusion probability, ∝ π i , for individual, i , is composed from the product of inclusion probabilities of selection across the stages of the multistage survey design.If each of these stagewise probabilities are lognormal then their product, ∝ π i , is lognormal as well. This is particularlyhelpful in the setting including PSU in next section. | INCLUSION OF PSU INFORMATION INTO FULLY BAYESIAN APPROACH

In Subsection 3.1, we extend the approach in LS2019, reiviewed in Section 2, that co-models the response variableand sampling weights, modifying their notation by adding cluster-indexed parameters, in preparation to include PSUinformation into the analysis in Subsection 3.2 to capture within PSU dependence in the response and sample inclusionprobabilities. In Subsection 3.3 we introduce the Fully Bayes joint population model for the response and the sampleinclusion probabilities in the linear regression case. In Subsections 3.4 and 3.5 we brieﬂy review competing approachesto analyze informative samples that we will compare in a simulation study.

Bayes Estimation/Informative Cluster Sampling | Extend Joint Population Model to incorporate PSU-indexed Parameters

We assume a population with a total of J pop PSUs and size N = (cid:205) Jj =1 N j with N j the number of individuals in PSU j .More speciﬁcally, the population consists of ( y , , π | , x , ) , . . . , ( y N , , π N | , x N , ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) PSU , . . . , ( y , j , π | j , x , j ) , . . . , ( y N j , j , π N j | j , x N j , j ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) PSU j , . . . ( y , J pop , π | J pop , x , J pop ) , . . . , ( y N Jpop , J pop , π N J pop | J pop , x N Jpop , J pop ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) PSU J pop and also of, π , π , . . . , π j , . . . , π J pop > with π i | j > , (cid:91) i , j . Notice that, as before, we do not impose any restriction on (cid:205) N j i =1 π i | j or (cid:205) J pop j =1 π j . The sample ofsize n = (cid:205) Jj =1 n j (with n j and J speciﬁed by the survey sampler) is drawn in two steps:• Step 1: PSU sampling. Sample J diﬀerent PSUs j , . . . , j J ∈ { , . . . , J pop } so that P r [ PSU j is in the sample ] = π j (cid:30) J pop (cid:213) j (cid:48) =1 π j (cid:48) • Step 2: Individual sampling. Within each PSU in sample j (cid:96) , (cid:96) = 1 , . . . , J , draw n (cid:96) diﬀerent individuals so thatindividual i (in the sampled PSU j ) is in the sample with probability P [ Individual ij is in the sample | PSU j is in the sample ] = π i | j (cid:30) N j (cid:213) i (cid:48) =1 π i (cid:48) | j and therefore the marginal inclusion probability is proportional to π ij : = π j π i | j .The superpopulation approach assumes that the population is a realization of the joint distribution for responseand inclusion probabilities, ( y ij , π ij ) | x ij , θ , η yj , κ , π j ∼ p ( y ij , π ij | x ij , θ , η yj , κ , π j ) = p ( π ij | y ij , x ij , κ , π j ) p ( y ij | x ij , θ , η yj ) , (6)We model π ij | y ij with p ( π ij | y ij , x ij , κ , π j ) . The (population) parameter of interest is θ . We have augmentedthe parameters used in LS2019, given in (1), to incorporate η yj and π j that are shared by all observations in PSU j .Parameters, η yj ,induce a correlation in the response for individuals in the same PSU (dep- y ) while π j induces associ-ation of marginal inclusion probabilities (dep- π ) among respondents nested in the same PSU. We will later constructpriors on these parameters to deﬁne PSU-indexed random eﬀects. κ is a nuisance parameter used to model the inclu-sion probabilities. After relabeling the sampled PSU indices j , . . . j J to , . . . J , and the indices i in the sample to runfrom i = 1 , . . . , n j , the sample of size n = (cid:205) Jj =1 n j consists of { y ( s ) ij , x ( s ) ij , π ( s ) ij , j } i =1 ,..., n j ; j =1 ,..., J with j indicating fromwhich PSU the individual was sampled, n j the number of participants from PSU j , and J the total number of sampledPSUs. Recall, superindex ( s ) denotes in the sample. The equality in (6) assumes that y ij ⊥ ( κ , π j ) | x ij , θ , η yj and ayes Estimation/Informative Cluster Sampling 7 π ij ⊥ ( θ , η yj ) | y ij , x ij , κ , π j .We extend (5) that captures the joint probability model for the sample by replacing θ and κ with ( ( θ , η y ) and ( κ , π j ) ) to achieve, p s ( y ij , π ij | x ij , θ , η yj , κ , π j ) = π ij p ( π ij | y ij , x ij , κ , π j ) E y (cid:63) ij | x ij , θ , η yj (cid:104) E ( π (cid:63) ij | y (cid:48) ij , x ij , κ , π j ) (cid:105) × p ( y ij | x ij , θ , η yj ) . (7)The subindex s on the joint distribution p s on the LHS denotes that we are conditioning on individual ij being in thesample; that is, p s ( y ij , π ij | . . . ) = p ( y ij , π ij | individual ij is in the sample , . . . ) . In contrast the distributions on theRHS are population distributions. Inference on θ is based on the likelihood (cid:96) ( θ , η y , κ , π ; y ( s ) , π ( s ) , x ( s ) ) = J (cid:214) j =1 n j (cid:214) i =1 (cid:104) p s ( y ( s ) ij , π ( s ) ij | x ( s ) ij , θ , η yj , κ , π j ) (cid:105) with η y : = ( η y , . . . , η yJ ) , π : = ( π , . . . , π J ) . Inference for θ is achieved via the posterior distribution of the modelparameters: p s ( θ , η y , κ , π | y ( s ) , π ( s ) , x ( s ) ) ∝ (cid:96) ( θ , η y , κ ; y ( s ) , π ( s ) , x ( s ) ) × Prior ( θ ) × Prior ( η y ) × Prior ( π ) × Prior ( κ ) . To obtain a closed form of the likelihood we need a closed form of the expected value in the denominator inEquation (7). Theorem 1 (same as Theorem 1 in LS2019) is here extended under our extended PSU-indexed param-eterization, ( ( θ , η y ) and ( κ , π j ) ), to provide conditions that allow a closed form expression of this expected value.Similar to the set-up for Theorem 1, let v and u be subvectors of x , the covariates used to specify the conditionaldistribution of π ij | y , x , κ , π j and y | x , θ , η y , respectively; that is, π ij | y ij , x ij , κ , π j ∼ π ij | y ij , v ij , κ , π j and y ij | x ij , θ , η y ∼ y ij | u ij , θ , η y . Theorem 2 If p ( π ij | y ij , v ij , κ , π j ) = lognormal ( π ij | h ( y ij , v ij , κ , π j ) , σ π ) , with the function h ( y ij , v ij , κ , π j ) of theform h ( y ij , v ij , κ , π j ) = g ( y ij , v ij , κ ) + t ( v ij , κ , π j ) where σ π = σ π ( v ij , κ , π j ) , possibly a function of ( v ij , κ , π j ) then p s ( y ij , π ij | u ij , v ij , θ , η yj , κ , π j ) = normal (cid:16) log π ij | g ( y ij , v ij , κ ) + t ( v ij , κ , π j ) , σ π (cid:17) exp (cid:8) t ( v ij , κ , π j ) + σ π / (cid:9) × M y ( κ ; u ij , v ij , θ , η yj ) × p ( y ij | u ij , θ , η yj ) with M y ( κ ; u ij , v ij , θ ) : = E y (cid:63) ij | u ij , θ , η yj (cid:104) exp (cid:110) g ( y (cid:63) ij , v ij , κ ) (cid:111)(cid:105) . So, analogously to the discussion after Theorem 1, if g ( y , v , κ ) = κ y y with κ y depending on κ and, perhaps, on v ,then M y ( κ ; u ij , v ij , θ , η yj ) : = E y (cid:63) ij | u ij , θ , η yj (cid:104) exp (cid:16) κ y y (cid:63) ij (cid:17)(cid:105) is the moment generating function of y evaluated at κ y . So when both the population distribution of y , p ( y | θ ) , hasa close form and the moment generating function has a close form over the real line, then the likelihood, p s , has a Bayes Estimation/Informative Cluster Sampling closed form, as well. | Inclusion of PSU Information into Conditional Population Model for Weights

The marginal inclusion probability of the individual ij , ∝ π ij , is the product of the probability of selecting PSU j , ∝ π j ,and the probability of selecting the individual i conditioning on PSU being in the sample, ∝ π i | j such that π ij = π j π i | j . Therefore, log π ij = log π i | j + log π j Specifying, log π j ∼ normal ( µ j , σ P SU ) where µ j could depend on PSU covariates (e.g. county population, etc) but, forsimplicity, we assume that it does not and set µ j = 0 . Choosing a normal distribution for log π i | j ∼ normal ( g ( y , v , κ ) + t (cid:48) ( v , κ ) , σ π ) yields log π ij | y ij , v ij , κ , η πj ∼ normal ( g ( y ij , v ij , κ ) + t (cid:48) ( v ij , κ ) + η πj , σ π ) (8)with η πj : = log π j iid ∼ Normal ( , σ P SU ) PSU-speciﬁc random eﬀects. So deﬁning t ( v , κ , π j ) : = t (cid:48) ( v , κ ) + log π j = t (cid:48) ( v , κ ) + η πj , the distribution of π ij satisﬁes the conditions of Theorem 2.This set-up is coherent with our assumption that the data analyst does not have information about the PSU-indexed sampling weights for either the population or sampled units because they are not typically published bysurvey administrators. Nevertheless, our derivation of (8) by factoring the marginal inclusion probabilities, π ij , demon-strates how we may capture within PSU dependence among ( π ij ) by inclusion of random eﬀects, η πj . Our Fully Bayesconstruction has the feature that it does not require availability of PSU-indexed random eﬀects, unlike the pseudo-likelihood approach of Williams and Savitsky (2020). | Linear Regression Joint Population Model

Assume the linear regression model for the population, y ij | u ij , θ , η yj ∼ normal (cid:16) u tij β + η yj , σ y (cid:17) , with θ = ( β , σ y ) (9)with the PSU-speciﬁc random eﬀect η yj in (9) playing the roll of η yj in (6). The conditional population model forinclusion probabilities is speciﬁed as in (8), with π ij | y ij , v ij , κ , η πj ∼ lognormal (cid:16) κ y y ij + v tij κ v + η πj , σ π (cid:17) , with κ = ( κ y , κ v , σ π ) (10)This construction results from setting, g ( y ij , v ij , κ ) = k y y ij , t ( v ij , κ , π j ) = v tij κ v + η πj (remember η πj = log π j ), and σ π ( κ , v ij , π j ) = σ π in (8). Here β and κ v are vectors of regression coeﬃcients that include an intercept, so the ﬁrst ayes Estimation/Informative Cluster Sampling 9 entry of both u ij and v ij equals 1. We select prior distributions, β ∼ MVN ( , I ) κ ∼ MVN ( , I ) η y , . . . , η yJ iid ∼ normal ( , σ η y ) η π , . . . , η πJ iid ∼ normal ( , σ η π ) σ y , σ π , σ η y , σ η π iid ∼ normal + ( , ) (11)with normal + ( m , s ) denoting a normal distribution with mean m and variance s restricted to the positive real line;MVN ( m , Σ ) denotes the multivariate normal distribution with mean vector m and variance-covariance matrix Σ ; and I the identity matrix. Since y ∼ normal ( m , s ) admits a closed form expression for moment generating function M y ( t ) = exp ( t m + t s / ) , we apply Theorem 2 to obtain, p s (cid:16) y ij , π ij | u ij , v ij , θ , η yj , κ , η πj (cid:17) = normal (cid:16) log π ij | κ y y ij + v tij κ v + η πj , σ π (cid:17) exp (cid:110) v tij κ v + η πj + σ π / κ y ( u tij β + η yj ) + κ y σ y / (cid:111) (12) × normal (cid:16) y ij | u tij β + η yj , σ y (cid:17) The implementation of the Gibbs sampler is not straightforward. To obtain a posterior sample of the model parameters,we rely on the ‘black box" solver, “Stan" (Carpenter et al., 2016), which performs an eﬃciently-mixing HamiltonianMonte Carlo sampling algorithm with a feature that non-conjugate model speciﬁcations are readily accommodated. | Pseudolikelihood Approach

In a related work, Savitsky and Williams (2019) propose an approach to incorporate sampling weights using a plug-inaugmented pseudolikelihood:  J (cid:214) j =1 n j (cid:214) i =1 p ( y ( s ) ij | θ ) w ( s ) ij  × J (cid:214) j =1 p ( η yj | σ η y ) w ( s ) j (13)where an augmented pseudolikelihood exponentiates the random eﬀects prior for η yj by the PSU-indexed samplingweight, w ( s ) j , to permit asymptotically unbiased inference about σ η y . They standardize marginal individual samplingweights so that (cid:205) Jj =1 (cid:205) n j i =1 w ( s ) ij = n and the PSU sampling weights are standardized as, (cid:205) Jj =1 w ( s ) j = J to approximatelyreﬂect the amount of posterior uncertainty in the sample. Nevertheless, as in LS2019, they did not account for eitherdep- y nor dep- π , so resultant credibility intervals are overly optimistic (short). In a related work, Williams and Savitsky(2020) propose to utilize a separate, post-processing step applied to the pseudoposterior samples produces posteriordraws with the sandwich ( H − ∗ J ∗ H − ) variance estimator of the pseudo MLE to account for the dependenceinduced by clustering. Their post-processing step, however, assumes a pseudolikelihood without random eﬀects;that is, they assume a ﬁxed eﬀects model and show asymptotically correct uncertainty quantiﬁcation under clustersampling where their sandwich adjustment captures the PSU-induced dependence. This approach, however, does notpermit inference on PSU-indexed random eﬀects. Their post-processing algorithm is computationally-intensive as it requires re-sampling PSUs and units within PSUs to estimate the sandwich variance construction.Since it is usually the case that the ( w ( s ) j ) are not published with the survey, they propose to use instead w ( s )· j = (cid:205) n j i =1 w ( s ) ij . The sampling weights not only play a role for individual-indexed pseudolikelihood, but also on the weightingof the prior of the PSU-indexed random eﬀects. Inference on θ is done treating the pseudoposterior as if it were aposterior distribution. In Section 4 we label this approach, “Pseudo.w". In Section 4, we will also study the performanceof the pseudolikelihood approach when the prior of the random eﬀects is not weighted (equivalent to w ( s ) j : = 1 in(13)) and label this approach, “Pseudo".Related frequentist approaches of Pfeﬀermann et al. (1998) and (Rabe-Hesketh and Skrondal, 2006) also employan augmented likelihood similar to that of (13) where they integrate out the η yj to perform estimation. They also focuson consistent estimation of parameters, rather than correct uncertainty quantiﬁcation.We will see in the sequel that because our approach uses a joint model for ( y ij , π i | x ij ) , the asymptotic covariancematrix of the joint posterior is H − , which is the same for the MLE, resulting in correct uncertainty quantiﬁcation.In the context of the linear regression in (9), the quantity between square brackets in (13) matches the likelihoodof the weighted regression model y ij | u ij , θ , η yj ∼ normal (cid:16) u tij β + η yj , σ y / w ij (cid:17) (See Subsection A.3 for details.) So (13)becomes  J (cid:214) j =1 n j (cid:214) i =1 normal (cid:16) y ( s ) ij | u tij β + η yj , σ y / w ( s ) ij (cid:17) × J (cid:214) j =1 p ( η yj | σ η y ) w ( s ) j This becomes useful in the applications in stan where one can specify the weighted linear regression model and addthe log of p ( η y | . . . ) to the log of the full conditional or “Target”.. | Frequentist Approach

The frequentist approach is designed-based, assuming the population is ﬁxed. It employs the pseudolikelihood con-struction, but without PSU-REs. The point estimate of θ , called ˜ θ f req , maximizes p pseudo ( θ ; y ( s ) , π ( s ) , x ( s ) ) = (cid:206) Jj =1 (cid:206) n j i =1 (cid:104) p ( y ( s ) ij | x ( s ) ij , θ ) (cid:105) w ( s ) ij with w ( s ) ij ∝ / π ( s ) ij standardized so that (cid:205) ij w ( s ) ij = n .The PSU indices, together with p pseudo , are used to estimate the standard error of ˜ θ f req via resampling methods ( i.e. ,balanced repeated replication, Jack-Knife, Bootstrap) or Taylor series linearization. The R function svyglm in the Rpackage survey(Lumley, 2019) uses the latter (as default) to ﬁt common generalized regression models such as linear,Poisson, logistic, etc.In the context of multiple linear regression with θ = ( β , σ y ) , inference ( i.e. conﬁdence intervals for each entry) forthe ( p + 1 ) -dimension vector of regression coeﬃcients β (that includes an intercept) is based on that, asymptotically, ˜Σ / ( ˜ β f req − β ) ∼ ( p + 1 ) -variate Student-t with degrees of freedom equal to df = P SU − St r at a − p ; and ˜Σ / isthe matrix such that ˜Σ / ˜Σ / = ˜Σ with ˜Σ the estimate of the variance-covariance matrix of ˜ β f req . No stratiﬁcation isequivalent to having one stratum and the degrees of freedom are then df = P SU − ( p +1 ) . This frequentist approachfor uncertainty quantiﬁcation is similar to the post processing correction of Williams and Savitsky (2020) in that theanalysis model for the population does not employ a PSU-indexed random eﬀects term; rather, the re-sampling ofclusters captures the dependence within clusters. Both methods perform nearly identically, in practice, so that wefocus on comparing our Fully Bayes approach to this frequentist re-sampling method in the simulation study thatfollows. ayes Estimation/Informative Cluster Sampling 11 | SIMULATION

We perform a Monte Carlo simulation study to compare the performance of our fully Bayes method of (12) thatemploys PSU-indexed random eﬀects in both the models for the response and inclusion probabilities to the pseudo-posterior and frequentist methods, presented in Sections 3.4 and 3.5, respectively. In each Monte Carlo iteration,we generate a population of J pop clusters and N j individuals per cluster. The response variable is generated propor-tionally to size-based group and marginal inclusion probabilities to induce informativeness (dependence between theresponse variable and inclusion probabilities). We next take a sample of groups and, subsequently, individuals withingroup. A clustered simple random sample (cSRS) is also generated from the same population and the population es-timation for the response that includes sampling weights is used to estimate parameters. The cSRS is included toserve as a gold standard for point estimation and uncertainty quantiﬁcation and is compared to methods estimatedon the informative sample taken from the same population. For each population and sample we apply the estimationapproaches of our Fully Bayes method and associated comparator methods, assessing the bias, MSE and coverageproperties.Here gamma ( a , b ) denotes the gamma distribution with shape and rate parameters a and b ( i.e. , mean a / b ). | Monte Carlo Simulation Scheme

Steps 1-6 describe how the synthetic population dataset is generated, steps 7-8 how the samples are generated and9 how they are analyzed. We use the superindex ‘DG’ to refer to data generating model as opposed to the analysismodel: Generate π i | j iid ∼ gamma ( a π = 2 , b π = 2 ) for i = 1 , . . . , ( N j = 20 ) individuals nested in PSU, j = 1 , . . . , ( J pop = 10 ) total PSUs. The total population size is J pop × N j = 20 , . Deﬁne PSU j inclusion probability π tem j : = (cid:205) N j i =1 π i | j (therefore π tem j iid ∼ gamma ( N j a π , b π ) ). Standardize π j : = π tem j / (cid:205) J pop j (cid:48) =1 π tem j (cid:48) , so ( π , , π , , . . . , π , J pop ) ∼ Dirichlet (cid:16) a π × ( N , N , . . . , N J pop ) (cid:17) . (Thus, b π does not play a roll on the distribution of π j .) Generate η y , DGj iid ∼ N ( , σ η y , DG = 0 . ) PSU speciﬁc random eﬀects and predictor x ij iid ∼ Uniform ( , ) Generate the response. We consider three simulation scenarios to generate the response, in all of them y ij = β DG + β DG x ij + β π , π j + β π , π i | j + β η y , DG η y , DGj + (cid:15) DGij , with (cid:15) DGij iid ∼ Normal ( , ( σ DGy ) ) , but the last three regression coeﬃcients diﬀer:Scenario β π , β π , β η y , DG S Informative PSU-RE J pop S Non-informative PSU-RE 0 1 1 S No stage is informative 0 0 1with ( σ DGy ) = 0 . , β DG = 0 , β DG = 1 . Note that, in scenario S , β π , = 1 / E ( π j ) = J pop so that β π , E ( π j ) = 1 .Informative random eﬀects are instantiated in Scenario S by generating y ij from π j , where π j , the inclusionprobability for PSU, j , is equivalent to a PSU-indexed random eﬀect. We set the regression coeﬃcient for π j equal to and that for η y , DGj equal to in Scenario S , where we generated random eﬀects as non-informative(uncorrelated with the selection probabilities). Take a clustered simple random sample (cSRS) from the population: From each population dataset we draw twosamples, one informative and the other under two-stage cluster SRS . Both samples contain J = 30 PSUs and n j = 5 individuals, thus sample size is n = (cid:205) Jj =1 n j = 150 . Results under cSRS will serve us as Gold Standard andwill be compared with the results of using methods analysing the informative sample. To implement the clusteredrandom sample we, a. Draw an SRS (without replacement) of size J of PSUs indices from { , . . . , J pop } . b. Within each drawn PSU j , obtain a SRS (without replacement) of size n j . c. Relabel the PSU indices so they run from to J and individual indices so they run from to n j . d. The cSRS consists of {( y ij , x ij , j ) } i =1 ,..., n j ; j =1 ,..., J Generate an informative sample: a. Draw without replacement, J PSU indices j , . . . , j J ∈ { , . . . , J pop } with P r ( j ∈ sample ) = π j . b. For each j (cid:96) drawn, sample, without replacement, n (cid:96) individual indices i ∈ { , . . . , N j (cid:96) } with probability P r ( i ∈ sample from PSU j ) = π i | j / (cid:205) N j i (cid:48) =1 π i (cid:48) | j . c. Deﬁne π ij = π j π i | j and relabel the PSU and individual indices so they run from to J and from to n j ,respectively, and add superindex “ ( s ) ” to denote sampled quantities. d. The informative sample consists of {( y ( s ) ij , x ( s ) ij , π ( s ) ij , j ) } i =1 ,..., n j , j =1 ,..., J . Analyze the informative sample by estimating parameters under the following approaches: a. FULL.both : Denotes the approach enumerated in Subsection 3.2 that employs PSU-REs in models for bothresponse and inclusion probability; the model for the response includes PSU-indexed random eﬀects with, y ij = β Ana + β Ana + η y , Anaj + (cid:15) Anaij with (cid:15)

Anaij iid ∼ Normal ( , ( σ Anay ) ) , (14)where the superscript, “ Ana " denotes the model for analysis or estimation as contrasted with the DG modelused for population data generation. We are interested in estimating β Ana and the standard deviation ofthe PSU-REs, σ η y , Ana where η y , Anaj iid ∼ Normal ( , ( σ η y , Ana ) ) . (Note that the estimation of β Ana is unbiasedregardless of the sampling scheme) We, subsequently, use the conditional estimation model for the marginalinclusion probabilities of (10) to include a PSU-indexed random eﬀects term,log π ij | y ij , η π ∼ Normal ( κ + κ y y ij + κ x x ij + η πj , σ π ) . These two formulations describe the joint population model that employs random eﬀects in both the marginalmodel for the response and conditional model for the inclusion probabilities. We leverage the Bayes ruleapproach of Section 3.1 under the linear regression population to produce (12) that adjusts the populationmodel to condition on the observed sample. We use this equation to estimate the Fully Bayes populationmodel on the observed sample.It bears noting that FULL.both assumes that the data analyst does not have access to PSU-indexed samplingweights ( ∝ / π j ). Yet, we show in the simulation study results that FULL.both is able to adjust for informativesampling of PSUs for estimation of population model parameters (e.g., the intercept and PSU random eﬀectsvariance). This relatively good result owes to the inclusion of PSU-indexed random eﬀects in the conditionalmodel for the inclusion probabilities, π ij , because it captures the within PSU dependence among them.Note that the Full.both analysis assumes that log π ij | y ij , . . . is a normal distribution, but that does nothold under any simulation scenario, which allows our assessment of the robustness of FULL.both to modelmisspeciﬁcation. ayes Estimation/Informative Cluster Sampling 13 b. FULL.y : This alternative is a variation of FULL.both that uses the same population estimation for the responsestated in (14). In this option, however, PSU-REs are excluded from the conditional model for the marginalinclusion probabilities; i.e. , log π ij | y ij , η π ∼ Normal ( κ + κ y y ij + κ x x ij , σ π ) does not include PSU-REs. c. Pseudo.w : Denotes the pseudolikelihood of Section 3.4 where the likelihood is exponentiated by marginalsampling weights for individuals and the prior for the random eﬀects is expontentiated with PSU-indexedsampling weights. In (13) we use w ( s ) j ∝ / π ( s ) j (and not w ( s )· j ) standardized so that (cid:205) Jj =1 w ( s ) j = n d. Pseudo : Denotes the pseudolikelihood that exponentiates the likelihood by marginal sampling weights butdoes not exponentiate the prior distribution for random eﬀects by PSU-indexed sampling weights, as de-scribed in Subsection 3.4. e. Freq : Denotes the frequentist, design-based, inference under simple linear regression model as described inSubsection 3.5. Note that this analysis model does not include PSU- REs because we employ a step thatre-samples the PSUs in order to estimate conﬁdence intervals. To ﬁt the model, we use R function svyglm inlibrary survey (Lumley, 2019). f. Pop : Ignore the informative sampling and ﬁt model in (14) (as if the sample were a cSRS). The inclusionprobabilities do not play a roll in the inference, though the model for the response includes PSU-REs. This isequivalent to Pseudo with all (group and individual) sampling weights set equal to 1. Analyze the cluster simple random sample. cSRS : Fit the model in (14) to the sample taken under a cSRS (generatedin step 6) design. The same as Pop but applied to the cSRS generated in step 6. Save parameter estimates to compute Bias, MSE, coverage probability of central 95% credible intervals and theirexpected length. The parameters of inferential interest are the point estimate of β Ana , T RUE : ˜ β Ana : = E ( β Ana | dat a ) (or ˜ β Ana , f req for Freq), and its central 95% credible (or conﬁdence for Freq) interval lower and upper limits. Wealso produce point and interval estimates for σ Ana , T RUEy for those methods that include PSU-REs in the marginalresponse model ( i.e. exclude Freq). The computation of β Ana , T RUE and σ Ana , T RUEy is discussed in Subsection 4.2.Once we have run steps 1-10 times, we use the quantities stored in step 10 to estimate the bias and MSE of thepoints estimate of β Ana , T RUE (deﬁned below in Subsection 4.2) of each method as the average of ˜ β Ana − β Ana , T RUE andaverage of ( ˜ β Ana − β Ana , T RUE ) , respectively. The coverage and expected length of the 95% credible (or conﬁdence)are estimated as the proportion of times that the credible intervals contain β Ana , T RUE and their average length. Wedo the same for σ T RUE η y , Ana also deﬁned in subsection 4.2. | True Model Parameters under Analysis Model

In this section we compute the true values of the intercept and random eﬀects variance parameters for the analysis(

Ana ) models that are obtained from associating parameters of the analysis model to the data generating ( DG ) model.Having true values for the intercept and random eﬀect variance under our analysis models allows our assessment ofbias, MSE and coverage. We use the superindex “ T RU E " to refer to the true parameter values for the

Ana modelimplied by the simulation true parameter values in the DG model. The true value of the intercept parameter underthe analysis model is achieved by integration, β Ana , T RUE = E ( y ij | x ij = 0 ) = β DG + β π , E ( π j ) + β π , E ( π i | j ) + β η y , DG E ( η y , DG ) , yielding, β Ana , T RUE = 2 , and under simulation scenarios S , S and S , respectively. The true value for thepopulation random eﬀect is, η y , Anaj , T RUE = β π , (cid:2) π j − E ( π j ) (cid:3) + β η y , DG η y , DGj and the true values random errors under the analysis model are (cid:15)

Ana , T RUEij = β π , π i | j + (cid:15) DGij . Since π i | j s are not normally distributed, (cid:15) Ana , T RUEij s arealso not normally distributed, which as earlier mentioned allows our assessment of the robustness of the FULL.bothanalysis model to misspeciﬁcation. Since normality assumption of the errors of the simple regression model is violated,the variance of random eﬀects for the marginal population model for the response, v ar ( η y , Anaj T RUE ) = β π , v ar ( π j ) + β η y , DG v ar ( η y , DGj ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) σ η y , DG , is diﬀerent from ( σ T RUE η y , Ana ) . Nevertheless, we may compute σ T RUE η y , Ana by ﬁtting the linear mixed eﬀect population modelin (14), which corresponds to the analysis model for the response, directly to the population dataset via the lmer Rfunction lmer( y ∼ x + ( | PSU index),data= population). In practice, the PSU inclusion probabilities, ( π j ) , are notavailable to the data analyst for individuals in both sampled and non-sampled individuals from the population. Underscenario ai σ T RUE η y , Ana ≈ . , and under the other two scenarios σ T RUE η y , Ana ≈ . . | Simulation Results

Tables 1, 2 and 3 show the simulation results under all scenarios. As expected in all scenarios cSRS credible intervalshave coverage close to nominal level (0.95) and the lowest MSE. In informative scenarios, S and S , (i) Pop performspoorly showing the consequences of ignoring the informative sampling scheme, (ii) all methods to analyze informativesamples yield similar quality point estimators (similar MSE), and (iii) Full.both and Full.y credible intervals maintain nom-inal coverage while Pseudo.w, Pseudo and Freq do not. Under non-informative scenario S , all methods to analyzeinformative samples yield similar results to Pop (now correctly speciﬁed model). Interestingly, only Freq pays the priceof the noise introduced by the non-informative sampling weights producing considerable wider conﬁdence intervalsfor β Ana , T RUE than all other methods.Overall, Tables 1-3 show that Full.both and Full.y are the best methods to analyze informative samples, particularlyin terms of uncertainty quantiﬁcation. But, so far, results have not shown advantage of Full.both over Full.y. To doso, under scenario S , we increase level of informativeness of the PSUs by increasing the value of β π , (See step 5 inSubsection 4.1) from J pop to J pop and J pop . As shown in Table 4, coverage of Full.y credible intervals deterioratesas informativeness increases while Full.both, in contrast to all other methods, maintains coverage similar to cSRS (atnominal level).The strength of Full.both over all other considered approaches is that it accounts for the association among theinclusion probabilities within the same PSU. Table 4 shows that Full.both is the only approach that performs well undersimulation scenario S when the level of informativeness of π ij (or correlation between y ij and π j ) increases. This isFull.both is the only method whose inference quality is not aﬀected when π j ⊥ y ij | x ij (cid:91) i , j . ayes Estimation/Informative Cluster Sampling 15 FULL.both FULL.y Pseudo.w Pseudo Freq Pop cSRS β Ana Bias 0.035 0.046 0.067 0.091 0.015 0.518 0.005MSE 0.028 0.029 0.028 0.032 0.030 0.294 0.01695% CI Coverage 0.947 0.938 0.909 0.883 0.906 0.091 0.95795% CI Length 0.671 0.669 0.537 0.541 0.618 0.619 0.520 σ η y , Ana

Bias 0.016 0.017 0.047 0.045 0.018 -0.009MSE 0.010 0.010 0.010 0.010 0.010 0.00895% CI Coverage 0.960 0.963 0.886 0.911 0.959 0.95195% CI Length 0.425 0.422 0.359 0.359 0.418 0.355

TA B L E 1

Simulation Scenario S : Informative PSU-RE. cSRS analyses the cSRS sample while all otherapproaches analyze the informative sample. CI denotes central credible interval except for Freq where it denotesconﬁdence interval. FULL.both FULL.y Pseudo.w Pseudo Freq Pop cSRS β Ana Bias 0.008 0.009 0.048 0.055 0.011 0.494 0.001MSE 0.020 0.020 0.022 0.022 0.025 0.263 0.01495% CI Coverage 0.958 0.962 0.911 0.908 0.926 0.064 0.96295% CI Length 0.623 0.624 0.495 0.496 0.565 0.574 0.479 σ η y , Ana

Bias 0.063 0.065 0.108 0.107 0.065 0.049MSE 0.008 0.008 0.018 0.017 0.008 0.00695% CI Coverage 0.967 0.971 0.760 0.752 0.966 0.95695% CI Length 0.332 0.331 0.313 0.312 0.329 0.285

TA B L E 2

Simulation Scenario S : Non informative PSU-REs. cSRS analyses cSRS samples while all otherapproaches analyze informative samples. FULL.both FULL.y Pseudo.w Pseudo Freq Pop cSRS β Ana Bias 0.000 0.000 0.000 -0.000 0.000 -0.000 0.000MSE 0.001 0.001 0.001 0.001 0.001 0.001 0.00195% CI Coverage 0.957 0.964 0.941 0.944 0.947 0.956 0.95195% CI Length 0.103 0.103 0.104 0.104 0.134 0.102 0.103 σ η y , Ana

Bias 0.002 0.002 0.006 0.006 0.002 0.003MSE 0.000 0.000 0.000 0.000 0.000 0.00095% CI Coverage 0.935 0.936 0.931 0.933 0.938 0.95595% CI Length 0.067 0.067 0.068 0.068 0.067 0.068

TA B L E 3

Simulation Scenario S : No stage informative. cSRS analyses cSRS samples while all other approachesdo the same as if the samples were informative. β π , FULL.both FULL.y Pseudo.w Pseudo Freq cSRS J pop .947,.960 .938,.963 .909,.886 .883,911 .906,NA .957,.951 J pop .949,.929 .914,.927 .906,.917 .852,.929 .908,NA .961,.927 J pop .949,.946 .85,.954 .897,.916 .808,.914 .910,NA .957,.949 TA B L E 4

Coverage of central 95% credible (conﬁdence for Freq) intervals for β Ana , T RUE , σ T RUE η y , Ana under scenario S increasing the level of informativeness of the PSUs (by increasing β π , ) . ayes Estimation/Informative Cluster Sampling 17 | APPLICATIONS

The National Health and Nutrition Examination Survey (NHANES) is designed to assess the health and nutritionalstatus of the non-institutionalized civilian population living in one of the 50 U.S states or Washington D.C. Althoughnationally representative, NHANES is designed to oversample speciﬁc subpopulations ( e.g. persons 60 and older,African Americans, Asians, and Hispanics) and follows a complex sampling design (CDC, 2017). The NHANES samplingdesign is constructed as multi-stage with stages that include sampling strata and nested primary sampling units (PSUs)that further nest respondents. A PSU is a cluster or grouping of spatially contiguous counties, while a stratum isconstructed as a region nesting multiple PSUs. NHANES publishes respondent-level marginal sampling weights basedon resulting respondent marginal inclusion probabilities in the sample after accounting for clustering. The samplingweights measure the number of people in the population represented by that sampled individual, reﬂecting unequalprobability of selection, nonresponse adjustment, and adjustment to independent population controls.The survey consists of both interviews and physical examinations. The NHANES interview includes demographic,socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, andphysiological measurements, as well as laboratory tests. Data obtained from 15 strata with two PSU per stratum arereleased in two-year cycles. Our ﬁrst example considers examination data and the second dietary data.In both applications the priors are given in (11), and posterior inference under non-frequentist methods is basedon a posterior sample of the model parameters of size 10,000. The Gibbs sampler was run 10,000 iterations, after aburn-in period of another 10,000 iterations, on Stan. | Proportion of Body Fat and BMI

In Heo et al. (2012), here after referred as H2012, the authors model relationship of percentage of body fat (PBF) andbody mass index (BMI in k g / m ) using the simple linear regression (SLR) PBF = β + β ( / BMI ) . H2012 combinedata from three NHANES biannual cycles: 1999-2000,2001-2002 and 2003-2004. They ﬁt a SLR model for eachcombination of sex (men and women), 3 age groups (18–29, 30–49, and 50–84 years of age), and 3 race-ethnicitygroups (non-Hispanic Whites, non-Hispanic Blacks, and Mexican Americans). Table 3 of H2012 reports the estimatedvalues and standard errors of β and β . Their table 4 reports the predicted PBF, ˆ β + ˆ β / BMI, for individuals with BMIlevels of . , , , and that represent BMI cutoﬀs for underweight, normal, overweight, and obesity classes I,II, and III, respectively. The PBF variable expresses a high rate of missing values. NHANES releases two datasets (percycle) with ﬁve sets of imputed PBF values; the ﬁrst dataset includes participants with observed PBF or with imputedPBF values with low variability, while the second data set participants with high variability in their imputed values.H2012 analysis considers sampling design and multiple imputation of missing values.In this section we mimic their analysis but with the 2005-2006 NHANES dataset. We use a multiple linear regres-sion model where we control for stratiﬁcation variables in H2012. Since PBF in the − cycle is reported onlyfor − year old participants, we categorize age into groups: − , − , and − years of age and excludedparticipants not in these age ranges. We also include two more race/ethnicity groups: “Other Hispanic” and “OtherRace - Including Multi-Racial”. As in H2012, we exclude participants with missing BMI, women who tested positive inpregnant test or who claimed to be pregnant (for which by design PBF is not measured), or, with PBF imputed valueswith high variability. Our ﬁnal sample size is n = 4099 . For more detail about the dataset see subsection A.1. Theanalysis model of the non-frequentist methods is the mixed eﬀect linear regression with PBF as the response variable,along with the following predictors: (1/BMI), gender, age group and race ethnicity, with male, 18-29 age group andnon-Hispanic White as reference groups, and PSU-REs. The frequentist analysis model is same (now ﬁxed eﬀect) model but without PSU-REs. We recall that u denotes predictors in the marginal model for y in (9), and construct, u t = (cid:0) , / BMI , ( gender = Female ) , ( Age ∈ [ , ]) , ( Age ∈ [ , ])) , ( R ace / E th = MX-AME ) , ( R ace / E th = Other-Hisp ) , ( R ace / E th = NonHisp black ) , ( R ace / E th = Other or Multiracial ) (cid:1) (15)with dimension p + 1 = 9 , where ( A ) denotes the indicator function of the individual in the set A . We analyze thedataset with the ﬁrst set of PBF imputed values under the following comparator models used in the simulation study:Full.both, Full.y, Pseudo.w, Pseudo, Freq and Pop. We recall that we jointly model the response and sampling inclusionprobabilities under Full.both and Full.y, with the response denoted as y = PBF, and we use the same predictors inthe conditional model for the inclusion probabilities and the marginal model for the response such that v = u in(10). In the implementation of pseudo.w we use sampling weights, w ( s )· j of (13), that sum individual sampling weightsfor those units nested in each PSU in order to exponentiate the random eﬀects prior distribution. Our analysesconsider PSU information and sampling weights as provided by NHANES but not strata information. We add thecomparator method Freq.strata that does consider strata information, which would be expected to produce moreoptimistic conﬁdence intervals, ﬁtted using the R package Survey (Lumley, 2019). We recall that the NHANES designemploys 15 strata with 2 PSUs per stratum such that frequentist inferences will be based on the Student- t distributionwith df = P SU − st r at a − p degrees of freedom, equal to df = 21 and df = 7 under Freq and Freq.strata, respectively.The left panel in Figure 1 compares violin plots and central 95% credibility (or conﬁdence) intervals for the ex-pected PBF value for a person in the reference group with “normal" BMI or BMI = 18 . (where BMI < . is labeledas underweight), which represents uncertainty intervals of β + β / . . All point estimates are close to the valuereported in Table 4 of H2012 of 14.5% for this group with FULL.both and FULL.y at . % are closest to H2012.The right panel of Figure 1 depicts the same point estimates and uncertainty intervals but now for non-HispanicWhite woman with BMI = 18 . , which are computed computed as the uncertainty intervals of β + β / . β . Here,again, all CIs contain the PBF estimated in H2012 for this group of 26.9%.In both ﬁgures, the Frequentist CIs ( i.e. Freq and Freq.strata) are much wider than the other methods, which indi-cates an ineﬃciency of this Freq.strata despite it’s consideration of strata should produce smaller uncertainty intervals.By contrast, inference under Full-both and Full-y is similar to Pop indicating the possibility of a non-informative design.This is conﬁrmed by the central 95% CI for κ y , (− . , . ) in FULL.both and FULL.y that contains 0 indicating a non-informative sample; more formally, y ij ⊥ π ij | x ij . The posterior mean estimates for the plug-in Pseudo.w and Pseudoexpress slightly more bias (relative to the . % of H2012) because of the use of use of noisy weights, which are notnecessary since the NHANES design is non-informative for PBF. The fully Bayesian methods (FULL.both, FULL.y), bycontrast, performs weight smoothing to mitigate bias induced by noisy weights.Figure 2 displays violin plots of the posterior distribution of the standard deviation of the PSU-REs, σ η y (in themarginal model for y ), for the non-frequentist methods (as we recall that the frequentist comparator methods do notinclude PSU-REs). As before, the inference under Full.Both and Full.y are similar to Pop due to the non-informativenessof the sampling design for PBF.Interpreting the coeﬃcients in the model for y (top rows in Table 5), after controlling for BMI, women have, onaverage, 12.3% higher PBF than men, PBF increases with age and Other or multiracial and MX-AME people have thehighest PDF followed by white and other Hispanic while nonhispanic blacks have the lowest PBF.When implementing our comparator frequentist approaches Lohr (2019) (see also Heeringa et al., 2017, p 62)recommends to ﬁt the model with and without weights. If the point and standard error estimates are diﬀerent betweenthe two, they recommend that the analyst should explain the diﬀerence based on the construction of the sampling ayes Estimation/Informative Cluster Sampling 19 lllllll PopFreq.strataFreqPseudoPseudo.wFULL.yFULL.both 0.14 0.15 0.16 0.17 b + b lllllll b + b + b F I G U R E 1

Left: Violin plots along with, mean (dot) and central 95% credible or conﬁdence interval (horizontalline) for the expected PBF for subjects in the reference group (non-Hispanic White man in age group 18-29) withBMI = 18 . , β + β / . , under all considered methods. Right: the same but for non-Hispanic White woman in agegroup 18-29, β + β / . β . Individuals with BMI < . are labeled as underweight.weights, e.g. oversampling of certain minority or age groups. In our example, inference under Freq and Pop generatesimilar point estimated but Freq yields, in general, greater standard errors. The analyst needs to decide and justifywhich model to use inference under frequentist modeling; though under NHANES guidelines (CDC, 2007) the weightsshould be used for all NHANES 2005-2006 analyses. By contrast, the fully Bayesian approaches do not require thedata analyst to make this choice about whether to use the weighted or unweighted estimates. Inference for modelparameter κ y , under Full.Both or Full.y, informs the analyst if the design is informative; κ y = 0 implies non-informativedesign and the magnitude of κ y is a measure of informativeness (in the scale of y ). Full.both and Full.y correct inferencewhen the design is informative, but also mitigate against bias induced by weights when the sampling design is non-informative (through weight smoothing), as in this particular application, and produces results that are similar to thePop method that ignores the sampling weights. Full.Both, is the only method that, as shown in Subsection 4.3, alsoprovides appropriate model uncertainty estimation when the PSUs are informative (not the case in this application);e.g., when y ij (cid:54)⊥ j | x ij for some i and j . Parameter mean sd 2.5% 97.5%Parameters for y ij | · · · intercept 0.518 0.003 0.512 0.524 / BMI -6.862 0.071 -6.999 -6.721gender: Female 0.123 0.001 0.121 0.125Age: 30-49 0.005 0.001 0.002 0.007Age: 50-69 0.022 0.001 0.019 0.025Race/eth: MX-AME 0.004 0.002 0.001 0.007Race/eth: Other-Hisp -0.001 0.003 -0.007 0.005Race/eth: Non Hisp Black -0.018 0.002 -0.021 -0.015Race/eth: Other-Multiracial 0.007 0.003 0.002 0.013 σ η y σ y π ij | y ij , · · · PBF -0.042 0.223 -0.479 0.397Intercept -0.429 0.128 -0.682 -0.177 / BMI 2.240 1.850 -1.375 5.812gender: Female -0.025 0.031 -0.086 0.038Age: 30-49 -0.549 0.020 -0.587 -0.510Age: 50-69 -0.207 0.021 -0.249 -0.167Race/eth: MX-AME 1.589 0.025 1.539 1.637Race/eth: Other-Hisp 0.454 0.045 0.366 0.542Race/eth: Non Hisp Black 1.277 0.023 1.231 1.323Race/eth: Other-Multiracial 0.367 0.040 0.290 0.446 σ η π σ π TA B L E 5

Inferende under Full.both using the ﬁrst set of NHANES imputed values of PBF. Column headers: mean,sd, 2.5% and 97.5% denote the posterior expected value, standard deviation, and 0.025 and 97.5 quantiles. ayes Estimation/Informative Cluster Sampling 21 l llll

PopPseudoPseudo.wFULL.yFULL.both 0.0025 0.0050 0.0075 0.0100 σ η y F I G U R E 2

Violin plots of posterior samples of PSU RE, ( σ η y ) along with central 95% credible interval (blackvertical line) and mean (dot) for non frequentist methods. | Kilocalorie Consumption

In this second example, we estimate the daily kilocalories consumed in each one of the gender, age and ethnicitygroupings as used in the previous application with age now having two additional categories: 0-8,9-17,18-29,30-49and 50-80. In this application, u = v is of dimension p + 1 = 10 ; as in (15) after removing its second entry / BMI, andadding two more age groups. The reference categories are White, male and 0-8 age group.We use dietary data from 2015-2016 NHANES cycle. Each participant answers a 24-hour dietary recall interviewin two days: Day 1 and Day 2. The Day 1 recall interview takes place when the participant visits the Mobile ExamCenter (MEC) unit where other NHANES measurements are taken. The Day 2 recall interview is collected by telephoneand it is scheduled for 3 to 10 days later (See CDC (2015) for more details). Based on theses interviews NHANESprovides datasets with estimates of kilocalories (and many other nutrients) ingested by the participant 24 hours beforethe interview along with their dietary sampling weight. In this analysis, we consider the Day 1 dataset and the samplingweights that come in it. See Subsection A.2 for more details.There are 8,506 participants who completed the Day 1 dietary recall, of which this analysis considers the n = 8 , with positive sampling weights or, equivalently, with recall status labeled “complete and reliable” by NHANES CDC(2015). We ﬁtted the same multiple linear regression models as in the previous example with response y = log ( kcal +1 ) with kcal the NHANES estimate of kilocalories consumption based on Day 1 recall interview.Figure 3 depicts violin plots of the estimated mean daily kcal consumption for White males in age groups 0-8 (left)and 30-49 (right). More speciﬁcally, the left panel depicts violin plots of exp ( β ) − with the posterior mean pointand distribution estimates for β drawn from the posterior distribution under the (full and plug-in pseudo) Bayesianmethods. The frequentist methods estimate the the distribution of exp ( β ) − using ˆ β + t × SE ( ˆ β ) with t ∼ Student-twith

P SU − St r at a − p degrees of freedom (=20 under Freq and 6 under Freq.strata). The right panel constructs thefrequentist methods using the vector ( β , β ) to be drawn from the distribution ( ˆ β , ˆ β ) t + ˆΣ / , t where the randomvector t = ( t , t ) has entries t , t iid ∼ Student-t, with the same degrees of freedom as above, and ˆΣ / , ˆΣ / , = ˆΣ , with ˆΣ , the estimated variance-covariance matrix of ( ˆ β , ˆ β ) t .Figure 3 shows that for these groups, inference under FULL.both and Full.y are similar but, in contrast to theﬁrst application, diﬀerent from Pop. The central 95% credible interval for κ y , (-0.110,-0.048), does not contain zero,indicating that the sampling design is informative and both FULL.both, and also (not shown) Full.y, corrects for this.The point estimates under Pseudo.w, Pseudo, Freq and Freq.strata are close to one another, but diﬀerent from thefully Bayesian methods, indicating that the weight smoothing provided by the full Bayesian methods is more robustto noise present in the weights that may injure point estimation.Figure 4 depicts the violin plots under Pop, the Bayesian and pseudolikelihhod methods of the standard deviationof the PSU-RE, σ η y (Freq and Freq.strata do not model PSU-RE). Inference for this parameter under the plug-in,pseudo posterior methods diﬀer from the fully Bayesian estimates. Figure 5 shows that the posterior distributionof individual PSU random eﬀects in the marginal response model (PSU-RE) also diﬀers. The ﬁgure focuses on twoparticular random eﬀects, η y and η y , that are coherent with the general pattern we see over the random eﬀects;in particular, the fully Bayes methods express less estimation uncertainty than do the pseudo Bayesian methods,indicating a greater estimation eﬃciency by jointly modeling the response and inclusion probabilities. | DISCUSSION

We have extended our work in LS2019 to include PSU information in a model-based, fully Bayesian analysis of in-formative samples. The extension consists of replacing the ﬁxed eﬀect model by a mixed eﬀect model that includes ayes Estimation/Informative Cluster Sampling 23 l llllll

PopFreq.strataFreqPseudoPseudo.wFULL.yFULL.both1200 1300 1400 1500 1600 exp ( b ) - lllll ll exp ( b + b ) - F I G U R E 3

Violin plots, under all methods, for average kcal consumption for people in the reference groupWhite males 8 year old or younger (left), and White males in the age group 29-48 (right) along with point estimate(dot) central 95% credible, or conﬁdence, interval (horizontal line within violin plot). l llll

PopPseudoPseudo.wFULL.yFULL.both 0.00 0.02 0.04 0.06 σ η y F I G U R E 4

Violin plots, of the estimate of the standard deviation of the PSU-RE, σ η y , under all non-frequentistmethods. lll ll PopPseudoPseudo.wFULL.yFULL.both −0.10 −0.05 0.00 0.05 h l llll −0.05 0.00 0.05 h F I G U R E 5

Violin plots, under all non-frequentist methods, for PSU-RE, η y and η y . PSUs 15 and 17 herecorrespond to PSUs 1 in strata 8 and 14, respectively, in the 2015-2016 demographic NHANES dataset. ayes Estimation/Informative Cluster Sampling 25 PSU/cluster-indexed random eﬀects in the marginal model for y and the conditional model for π | y to capture de-pendence induced by the clustering structure. We have shown via simulation that our fully Bayesian approach yieldscorrect uncertainty quantiﬁcation, or equivalently CIs, with coverage close to their nominal level, including for therandom eﬀects variances. Competing methods fail to do so in at least one simulation scenario. In particular, Full.bothis the only appropriate method, of all here considered, when the sample design is informative for the selection ofPSUs. The results in simulation scenario S , where the design is not informative, revealed that the method is alsorobust to noise in weights.Our fully Bayesian methods proposed here are mixed eﬀect linear models that not only take into account the pos-sible association of individuals within the same cluster but also, in contrast to the design-based frequentist methods,quantify this association; that is, the within PSU correlation can be estimated.We demonstrated our method with two applications analyzing NHANES data. In the ﬁrst application, the com-parative methods behave as in simulation scenario S ; that is, the Bayesian methods adjust the inference for nonin-formative design yielding similar results to an analysis assuming cSRS where sampling weights do not play a role. Bycontrast, inference under competing methods is aﬀected by the noise induced from these sampling weights. In thesecond application the design is informative and inference under Bayesian methods diﬀers from other methods.In our two applications, inclusion of the strata information into the frequentist analysis did not yield shorter con-ﬁdence intervals as we expected (Freq.strata not better than Freq), indicating that perhaps strata are homogeneous(with respect to the response/ predictor relationship considered). The next natural step of the method is to includestrata information into our Bayesian analysis. The second application only analyzed data from Day 1 dietary question-naire. To analyze Day 1 and Day 2 data with one model we need to adapt our approach to repeated measures. This isanother current line of research.To implement our Bayesian method, we derived an exact likelihood for the observed sample. In principle, thislikelihood can also be used for maximum likelihood estimation opening the door for model-based frequentist inference.Our approach requires the modeler to specify a distribution for π i | y i , · · · . Estimation requires the computationof an expected value, the denominator in (7). We assume a lognormal conditional likelihood for the marginal inclusionprobability, given the response, where the mean function is linear in the response, both of which facilitate use ofTheorem 2 to obtain a closed form for this expected value. Our simulation study showed that the Bayesian methodis robust against misspeciﬁcation of these assumptions. Future work is needed to ease conditions in Theorem 2.To sum up, we have presented the ﬁrst model-based Bayesian estimation approach that accounts for both informa-tive sampling within the individuals in the same PSU and when the PSU is informative to produce correct uncertaintyquantiﬁcation. A | APPENDIXA.1 | Application Details for PBF and BMI analysisA.1.1 | Data used

Three publicly available datatset were downloaded from the NHANES website.• Demographic Variables & Sample Weights: DEMO_D.XPT (CDC (2007), ). Used columns: – sdmvstra: Stratum to which the participant belongs – sdmvpsu: PSU indicator – wtmec2yr: Full Sample 2 Year MEC Exam Weight – riagendr: Sex – ridageyr: age in years – ridreth1:Race/ethnicity• Body Measures: BMX_D.XPT( ).Used columns: – bmxbmi: BMI ( k g / m )• Dual-Energy X-ray Absorptiometry - Whole Body: dxx_d.XPT ( ) – dxdtopf: Total percent body fat (%) – _MULT_: Imputation Version in , . . . , . For Figures 1 and 2 and Table 5 we use the ﬁrst set of imputations.This is, rows with _MULT_=1. A.2 | Application Details for Daily Kilocalories AnalysisA.2.1 | Dataset

Two publicly available datatset were downloaded from the NHANES website.• Demographic Variables and Sample Weights: DR1TOT_I.XPT( ).Columns used – sdmvstra: Stratum to which the participant belongs – sdmvpsu: PSU indicator – riagendr: Sex – ridageyr: age in years – ridreth1:Race/ethnicity• Dietary Interview - Total Nutrient Intakes, First Day: DR1TOT_I.XPT( ).Columns used – dr1tkcal: Energy (kcal) – wtdrd1: Dietary day one sample weightNotice that, in this example, following the NHANES guidelines for the analisys of dietary data, we are not using theFull Sample 2 Year MEC Exam Weights “wtmec2yr” included in the demographic dataset DR1TOT_I.XPT but insteaddietary day one sample weights “wtdrd1” constructed based on “MEC sample weights and further adjusting for (a) theadditional non-response and (b) the diﬀerential allocation by weekdays (Monday through Thursday), Fridays, Saturdaysand Sundays for the dietary intake data collection” (CDC, 2015). ayes Estimation/Informative Cluster Sampling 27 A.3 | Quantity between Brackets the in Augmented Pseudolikelihood in (13)

MatchesLikelihood under Weighted Linear Regression

The weighted linear regression model is y ij | u ij , θ , η yj ∼ normal (cid:16) u tij β + η yj , σ y / w ij (cid:17) with w ij > known.Using the fact that (cid:104) normal ( y | µ , σ ) (cid:105) w = 1 w / ( π ) ( w − )/ ( σ ) ( w − )/ × normal ( y | µ , σ / w ) we obtain that the expression between brackets in (13), (cid:206) Jj =1 (cid:206) n j i =1 p ( y ( s ) ij | θ ) w ( s ) ij , equals J (cid:214) j =1 n j (cid:214) i =1 (cid:104) normal (cid:16) y ( s ) ij | u tij β + η yj , σ y (cid:17)(cid:105) w ( s ) ij ∝ ( σ ) (cid:16)(cid:205) j , i [ w ( s ) ij − ] (cid:17) /  J (cid:214) j =1 n j (cid:214) i =1 normal (cid:16) y ( s ) ij | u tij β + η yj , σ y / w ( s ) ij (cid:17) but, by construction, (cid:205) Jj =1 (cid:205) n j i =1 w ( s ) ij = n and therefore the expontent of σ in the denominator in the expressionabove is zero, and the quantity between brackets is the likelihood of the weighted linear regression model. references Carpenter, B., Gelman, A., Hoﬀman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P. and Riddell, A.(2016) Stan: A probabilistic programming language.

Journal of Statistical Software , , 1–37.CDC (2007) Centers for Disease Control (CDC). National Health and Nutrition Examination Survey, 2005-2006 data docu-mentation, codebook, and frequencies, demographic variables and sample weights (demo_d). URL - /DEMO_D.htm . Accessed: March 31, 2020.— (2011) Centers for Disease Control (CDC). Continuous NHANES web tutorial: Key concepts about NHANES survey design.URL .htm . Accessed: May 21, 2018.— (2015) Centers for Disease Control (CDC). NHANES: Measuring Guides for the Dietary Recall Interview. URL . Page last reviewed: November 6, 2015.Accessed: May 14, 2020.— (2017) Centers for Disease Control (CDC). NHANES: About the National Health and Nutrition Examination Survey.URL . Page last reviewed: September 15, 2017. Accessed: May14, 2020.Heeringa, S., West, B. and Berglund, P. (2017) Applied survey data analysis. second edi.Heo, M., Faith, M. S., Pietrobelli, A. and Heymsﬁeld, S. B. (2012) Percentage of body fat cutoﬀs by sex, age, and race-ethnicity inthe US adult population from NHANES 1999–2004. The American Journal of Clinical Nutrition , , 594–602. URL https://doi.org/ . /ajcn. . .León-Novelo, L. G. and Savitsky, T. D. (2019) Fully bayesian estimation under informative sampling. Electronic Journal ofStatistics , , 1608–1645.Lohr, S. L. (2019) Sampling: Design and Analysis: Design And Analysis . CRC Press.Lumley, T. (2019) survey: analysis of complex survey samples. R package version 3.35-1.Pfeﬀermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H. and Rasbash, J. (1998) Weighting for unequal selection probabilitiesin multilevel models.

Journal of the Royal Statistical Society: series B (statistical methodology) , , 23–40.Rabe-Hesketh, S. and Skrondal, A. (2006) Multilevel modelling of complex survey data. Journal of the Royal Statistical Society:Series A (Statistics in Society) , , 805–827.Savitsky, T. D. and Williams, M. R. (2019) Bayesian mixed eﬀects model estimation under informative sampling. Williams, M. R. and Savitsky, T. D. (2020) Bayesian estimation under informative sampling with unattenuated dependence.

Bayesian Analysis , , 57–77. URL http://dx.doi.org/ . / -BA1143