[PDF] Reducing Seed Bias in Respondent-Driven Sampling by Estimating Block Transition Probabilities

Abstract

Respondent-driven sampling (RDS) is a popular approach to study marginalized or hard-to-reach populations. It collects samples from a networked population by incentivizing participants to refer their friends into the study. One major challenge in analyzing RDS samples is seed bias. Seed bias refers to the fact that when the social network is divided into multiple communities (or blocks), the RDS sample might not provide a balanced representation of the different communities in the population, and such unbalance is correlated with the initial participant (or the seed). In this case, the distributions of estimators are typically non-trivial mixtures, which are determined (1) by the seed and (2) by how the referrals transition from one block to another. This paper shows that (1) block-transition probabilities are easy to estimate with high accuracy, and (2) we can use these estimated block-transition probabilities to estimate the stationary distribution over blocks and thus, an estimate of the block proportions. This stationary distribution on blocks has previously been used in the RDS literature to evaluate whether the sampling process has appeared to `mix'. We use these estimated block proportions in a simple post-stratified (PS) estimator that greatly diminishes seed bias. By aggregating over the blocks/strata in this way, we prove that the PS estimator is n − − √ -consistent under a Markov model, even when other estimators are not. Simulations show that the PS estimator has smaller Root Mean Square Error (RMSE) compared to the state-of-the-art estimators.

Full PDF

SSubmitted to the Annals of Statistics

REDUCING SEED BIAS IN RESPONDENT-DRIVEN SAMPLINGBY ESTIMATING BLOCK TRANSITION PROBABILITIES B Y Y ILIN Z HANG ∗ , K ARL R OHE ∗ , AND S EBASTIEN R OCH † University of Wisconsin-Madison,Department of Statistics and Department of Mathematics

Respondent-driven sampling (RDS) is a popular approach to study marg-inalized or hard-to-reach populations. It collects samples from a networkedpopulation by incentivizing participants to refer their friends into the study.One major challenge in analyzing RDS samples is seed bias. Seed bias refersto the fact that when the social network is divided into multiple communi-ties (or blocks), the RDS sample might not provide a balanced representationof the different communities in the population, and such unbalance is corre-lated with the initial participant (or the seed). In this case, the distributionsof estimators are typically non-trivial mixtures, which are determined (1) bythe seed and (2) by how the referrals transition from one block to another.This paper shows that (1) block-transition probabilities are easy to estimatewith high accuracy, and (2) we can use these estimated block-transition prob-abilities to estimate the stationary distribution over blocks and thus, an es-timate of the block proportions. This stationary distribution on blocks haspreviously been used in the RDS literature to evaluate whether the samplingprocess has appeared to “mix”. We use these estimated block proportions ina simple post-stratiﬁed (PS) estimator that greatly diminishes seed bias. Byaggregating over the blocks/strata in this way, we prove that the PS estimatoris √ n -consistent under a Markov model, even when other estimators are not.Simulations show that the PS estimator has smaller Root Mean Square Error(RMSE) compared to the state-of-the-art estimators.

1. Introduction.

Respondent-driven sampling (RDS) is one of the most pop-ular network-based approaches to sample marginalized and hard-to-reach popula-tions, such as drug users, sex workers, and the homeless [1]. RDS has been widelyused, for instance, to quantify HIV prevalence in at-risk populations [2, 3]. Accord-ing to a recent literature review [4], RDS has been used in over 460 studies from69 countries.RDS collects samples through peer referral on a social network. It starts fromsome initial participant as the seed, which forms wave zero. In the process, we ∗ These authors gratefully acknowledge support from NSF grant DMS-1612456 and ARO grantW911NF-15-1-0423. † This author gratefully acknowledges support from NSF grants DMS-1149312 (CAREER),DMS-1614242 and CCF-1740707 (TRIPODS), and a Simons Fellowship.

Keywords and phrases: respondent-driven sampling, post-stratiﬁcation, social network, Stochas-tic Blockmodel, Markov process a r X i v : . [ m a t h . S T ] D ec ZHANG ET AL.F IG . This Figure from [5] illustrates the RDS sampling process. incentivize each participant to pass some (usually three to ﬁve) referral couponsto their friends. Those who return to the study site with a referral coupon formthe next wave of samples. We repeat this process until we get enough samples orthe participants stop referring. Figure 1 from [5] gives an illustration for the RDSsampling process. There are three components in RDS sampling: (1) the socialnetwork, (2) the sampling tree, and (3) the variable of interest (denoted by color inFigure 1). The underlying social network is the target population to study, whichis unobserved. For each sampled node, we observe their HIV status (black or greyin Figure 1), and which node refers them to the sample. We aim to estimate theproportion of people with certain trait, such as HIV positive (nodes that are grey inFigure 1), in the population.The link-tracing sampling procedure of RDS enables us to reach the hard-to-reach populations. However, RDS samples are dependent. This dependence is par-ticularly bad when there are multiple communities in the target population and thepeople form most of their friendships within their own communities (i.e. blocks).For example, people from the east side of the town might only know a few peoplefrom the west side of the town, and thus they are much more likely to refer peoplefrom the west side of the town. This is referred to as a “bottleneck” and it leads toa sample that is unbalanced between the different communities. If the HIV preva-lence is higher on one side of the town, then this bottleneck creates dependencebetween observations in an RDS sample. If the initial participant is from the eastside, then the sample may underrepresent people from the west side. This creates“seed bias.” In statistical models which presume that the seed node is randomized,this “seed bias” appears as additional variance in the ﬁnal estimator. When someparticipants refer too many contacts, the variance of the traditional RDS estima-tor, Volz-Heckathorn (VH) estimator [1], decays at a rate slower than O ( n − ) [5].We provide an example in Appendix B.3. To address this issue, recent work [6]has derived an idealized generalized least squares (GLS) estimator for which the OST-STRATIFIED RDS

182 HECKATHORN Table Ic * Recruitment by Drug Preference Drug Preference of Recruit Drug Preference of Recruiter Heroin Other Total

Heroin 87.7% 12.3% 100% (81) Other 67.6% 32.4% 100% (34) Total Distribution of Recruits 80% 20% 100% (94) (21) (105) Equilibrium 84.6% 15.4% 100% Mean Discrepancy, Distribution of Recruits and Equilibrium = 2.86% Table Id * Recruitment by Location Location of Recruit In Town Location of Recruiter (Town 1) In Area Out of Area Total In Town 84.3% 8.4% 7.2% 100% (Town 1) (83) In Area 60% 10% 30% 100% (10) Out of Area 50% 8.8% 41.2% 100% (34) Total Distribution of Recruits 73.2% 8.7% 18.1% 100% (93) (11) (23) (127) Equilibrium 77.5% 8.6% 13.9% 100% Mean Discrepancy, Distribution of Recruits and Equilibrium = 2.86% (r = .997) area of residence. Here the tendency toward in-group recruitment is variable: whereas resi- dents of Site l's town recruit internally 84% of the time, subjects from the area (i.e., from contiguous towns) or out of the area (i.e., from more distant towns) recruit strongly both from within and outside of town. This reflects the town's position as a regional drug distribu- tion center. IDUs travel considerable distances to purchase drugs in the town, and in the process they develop network connections both in the town and in their area of origin.

Sampling as a Markov Process

RDS recruitment has two important characteristics. First, there are a limited number of states (e.g., types of ethnicity) that subjects can assume. Second, any subject's recruits are a function of his or her type, such as his or her ethnicity; and not of previous events, such as who recruited the recruiter. This requirement is satisfied in the case of Table Ia's data be- cause there is no significant association between the ethnicity of a recruiter's recruits and the

This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PMAll use subject to JSTOR Terms and Conditions F IG . Heckathorn (1997) ﬁrst proposed RDS and illustrated the RDS technique with a sample ofdrug users. This is Table 1c from that paper. It summarizes the sample by computing the empiricaltransition matrix between two strata of drug users; those who prefer heroin and those who prefersome other drug. Other empirical transition matrices in that paper stratify based upon ethnicity,gender, and location of recruitment. standard error decays at rate O ( n − / ) with growing sample size n under a ﬁxedsocial network. The practical implementation of the estimator, called the feasibleGLS (fGLS) estimator, requires solving an n × n system of equations and comeswith no theoretical guarantees.This paper provides an estimator that is easy to compute and has root meansquared error that decays at rate Θ( n − / ) up to log factors, by implicitly adjust-ing for bottlenecks between different communities. While this estimator is new, itsessential components are well known and reported in the RDS literature. This newestimator assumes that we have collected the “bottlenecked” community member-ships of the sampled individuals. With this data, a key summary is the empiricaltransition matrix between communities, in which element u, v is the proportion ofreferrals from participants in community u to participants in community v . In theRDS literature, this matrix is a common way to summarize the sampling proce-dure and understand the underlying social network. For example, the original RDSpaper [1] reports on a sample of drug users. Table 1c from that paper (reprintedas Figure 2 herein) gives the empirical transition matrix between communities de-ﬁned by drug preference. This empirical transition matrix is also a key piece of thefeasible GLS estimator [6].Interestingly, an estimate of the proportion of nodes in each community can bederived from the empirical transition matrix. Notice in Figure 2 that [1] reports theequilibrium distribution on the different strata/communities. This takes the empir-ical transition matrix as a Markov transition matrix on the different communities ZHANG ET AL. and computes the stationary (i.e. equilibrium) distribution of this Markov process(i.e. the leading left eigenvector of the transition matrix). In Figure 2, the equi-librium distribution is close to the total distribution of recruits. When there is abottleneck, this paper shows that the equilibrium distribution is a better estima-tor than the total distribution of recruits. The basic reason is that even when thereis a bottleneck , each row of the empirical transition matrix is composed of O ( n ) nearly independent multinomial samples. There is one caveat; our estimator doesnot use the actual equilibrium distribution of the empirical transition matrix (i.e.the quantity reported in Figure 2). Instead, we have a simple approximation of theequilibrium which is easier to compute and thus simpliﬁes the proof.The ﬁnal estimator is a post-stratiﬁed estimator where the strata are the commu-nity memberships and the estimated proportion of nodes in each strata is derivedfrom the estimated equilibrium distribution. We call this the PS estimator. ThePS estimator has three major advantages: (1) computational efﬁciency, (2) smallervariation (bias square, variance and RMSE), and (3) block-wise byproducts. Weshow in Theorem 4.1 that our PS estimator has both its bias and standard devia-tion decay at rate Θ( n − / ) up to log factors, which does not hold for the popularVolz-Heckathorn (VH) esimtator [1] and does not show the GLS estimator [6]. Thesimulation studies also show our PS estimator has smaller variation (bias square,variance and RMSE) compared to the VH estimator and fGLS estimator. The im-provement is signiﬁcant especially when there exists bottleneck in social networks.The paper is organized as follows. Section 2 deﬁnes the Markov model, thequantity to estimate, and the traditional RDS estimators. Section 3 introduces thePS estimator. Section 4 shows PS estimator is √ n -consistent under the DegreeCorrected Stochastic Blockmodel (DC-SBM). In Section 5, we show by simula-tions that PS estimator has smaller variation than the state-of-the-art estimators,especially when there exists bottleneck in social networks. We summarize with adiscussion in Section 6.

2. Preliminaries.

We model referrals using a Markov process similar to theones previously considered in the RDS literature [7, 1, 8, 9, 5, 6].2.1.

Markov process on a social network.

A social network G consists of anode set V = { , . . . , N } of individuals and an undirected edge set E = {{ i, j } : i and j can refer one another } . We use i ∈ V and i ∈ G interchangeably. We assume that G is connected. Let w ij = w ji > be the weight of edge { i, j } ∈ E , which models recruitment prefer-ence (more details in Section 4). For any { i, j } (cid:54)∈ E , we let w ij = w ji = 0 by con-vention. If the graph is unweighted, then w ij = 1 for all { i, j } ∈ E . For each node OST-STRATIFIED RDS i ∈ V , we denote its neighbor in the network G by N ( i ) = { j ∈ V : { i, j } ∈ E } . We denote the degree of node i as d i = (cid:80) j w ij and the mean degree of graph G as ¯ d = (cid:80) i d i /N .We model the collection of samples in RDS with a Markov process on the socialnetwork G indexed by a tree. It starts with an initial participant as seed, which weindex as vertex 0, and develops into a rooted tree, T (a connected graph with n nodes, no cycles, and a vertex ). We use τ ∈ T to denote that node τ belongsto the samples indexed by T . For each node τ ∈ T , we denote the parent of τ as τ (cid:48) (the node that refers τ to the sample). Formally, an RDS sample is an indexedcollection of random nodes ( X τ ∈ G : τ ∈ T ) , where each referral X τ (cid:48) → X τ has probability P ( X τ = j | X τ (cid:48) = i ) = P ij , ∀ i, j ∈ G, where the transition matrix P ∈ R N × N has elements P ij = w ij d i . Since the graph G is undirected and connected, P is a reversible Markov transitionmatrix with unique stationary distribution π = ( π i ) i ∈ G ∈ R N with π i = d i N ¯ d . While the referrals are random, we think of T itself as deterministic.Following [10], we refer to this Markov process as a ( T , P ) -walk on G . Notethat G and T are two distinct graphs: the node set in G indexes the population,which is a social network, and the node set in T indexes the samples, which isa sampling tree. We say that the ( T , P ) -walk is stationary if the seed is chosenaccording to the stationary distribution.2.2. Quantity to estimate and the Volz-Heckathorn estimator.

For each node i ∈ G , we denote the variable of interest (e.g., the indicator of HIV status) as y ( i ) .We wish to estimate the population mean of the variable of interest µ true = 1 N (cid:88) i ∈ G y ( i ) . For each sample X τ , we observe Y τ = y ( X τ ) , ∀ τ ∈ T . The sample average ˆ µ = 1 n (cid:88) τ ∈ T Y τ ZHANG ET AL. is generally biased, since nodes with larger degrees are more likely to be sampledin the Markov process. Speciﬁcally, under the stationary ( T , P ) -walk on G , it hasexpectation E [ˆ µ ] = µ = E [ Y ] = (cid:88) i ∈ G y ( i ) π i . In general, µ (cid:54) = µ true .To obtain an unbiased estimator of µ true , the sample average must be adjusted.Using π i = d i / ( N ¯ d ) , the inverse probability weighted estimator (IPW), ˆ µ IPW = 1 n (cid:88) τ ∈ T Y τ π X τ N = ¯ dn (cid:88) τ ∈ T Y τ d X τ , is an unbiased estimator of µ true [11]. Additionally estimating ¯ d with the harmonicmean of the observed node degrees, ˆ H = (cid:32) n (cid:88) τ ∈ T d X τ (cid:33) − , leads to the popular Volz-Heckathorn (VH) estimator [9], ˆ µ VH = ˆ Hn (cid:88) τ ∈ T Y τ d X τ . The VH estimator has been extensively used in the study of marginalized pop-ulations [2, 3, 4], but it is highly variable. The variance of the VH estimator ingeneral may decay at a rate slower than O ( n − ) [5], implying that many moresamples are required to reduce the standard error. See Section B.3. We address thisissue by introducing a post-stratiﬁcation approach to RDS in the following section.

3. A new estimator.

A post-stratiﬁcation approach to RDS.Stratiﬁcation.

Stratiﬁcation has been extensively used in traditional random sam-pling to reduce variance. The key idea of stratiﬁed sampling is as follows. Assumethat the overall population can be divided into (ideally homogeneous) sub-groups(which we refer to as blocks) based on some variable, such as gender, race, etc.Then the sample mean and sample variance of the total population can be calcu-lated using block-wise sample means and variances.Speciﬁcally, suppose there are K blocks in a population with N individuals. Foreach block k , we denote the block size as N k , the block-wise population mean as OST-STRATIFIED RDS µ k , the sample size as n k and the block-wise sample average as ˆ µ k . The sampleaverage ˆ µ and sample variance s for the total population can be derived from theblock-wise quantities by(3.1) ˆ µ = K (cid:88) k =1 (cid:18) N k N (cid:19) ˆ µ k , and s = K (cid:88) k =1 (cid:18) N k N (cid:19) N k − n k N k s k n k . Stratiﬁed sampling by proportionate allocation randomly selects individuals pro-portionally to the sizes of the different blocks, with the goal of improving accuracyby reducing sampling error. Post-stratiﬁed sampling, on the other hand, performsstratiﬁcation after sampling and calculates ˆ µ and s as above. Post-stratiﬁcation isuseful when the samples constitute an unbalanced representation of the full popu-lation. Block proportions are unobserved in marginalized populations.

We seek to ap-ply this last approach to RDS in order to deal with seed bias. An important issuearises however. Per (3.1), traditional post-stratiﬁcation requires the knowledge ofthe block proportions N k /N . These are typically unknown in marginalized pop-ulations. Hence, we need to estimate the block proportions from the samples. Inthe next section, we describe how we do this and we formally deﬁne a novel post-stratiﬁed estimator for RDS.3.2. Block-wise quantities.

For a set V (cid:48) , denote its cardinality by | V (cid:48) | . Supposethere are K blocks in the social network G . For each node i ∈ G , denote its blockmembership as z ( i ) , i.e., z ( i ) = k if i belongs to block k ∈ { , . . . , K } . Tosimplify notation, we write i ∈ V k to mean z ( i ) = k . For each block k , we denotethe block size as N k = | V k | and the block-wise mean as µ k = N − k (cid:80) i ∈ V k y ( i ) .For each sample τ ∈ T , we let its block membership be Z τ = z ( X τ ) and wewrite τ ∈ T k to mean Z τ = k . We deﬁne for each block k the sample size as n k ,the block-wise harmonic average degree as(3.2) ˆ H k =  n k (cid:88) τ ∈ T k d X τ  − , and the block-wise sample average weighted by degree, i.e., the VH estimator for µ k , as(3.3) ˆ µ k VH = ˆ H k n k (cid:88) τ ∈ T k Y τ d X τ . ZHANG ET AL.

Suppose that we observe the block membership of each sample, i.e., we observe Z τ = z ( X τ ) for all τ ∈ T . We deﬁne the matrix ˆ Q ∈ R K × K such that, for anytwo blocks u, v ∈ { , . . . , K } , ˆ Q uv = 1 n × number of referrals from block u to block v, and the row-normalized matrix ˆ P B ∈ R K × K whose ( u, v ) -entry is(3.4) ˆ p uv = ˆ Q uv ˆ Q u ∗ . Here, for a matrix A , we let A u ∗ = (cid:80) v A uv and {E} is the indicator of event E .Finally we deﬁne the vector ˆ π B = (ˆ π Bk ) k with entries(3.5) ˆ π Bk = (cid:34)(cid:88) v ˆ p kv ˆ p vk (cid:35) − . The post-stratiﬁed estimator.

We deﬁne our new estimator next.D

EFINITION G with K blocks, the post-stratiﬁed (PS) estimator is(3.6) ˆ µ PS = (cid:88) k ˆ α k ˆ µ k VH , with ˆ α k = ˆ π Bk / ˆ H k (cid:80) (cid:96) ˆ π B(cid:96) / ˆ H (cid:96) , where ˆ H k , ˆ µ k VH , and ˆ π Bk are deﬁned in (3.2), (3.3) and (3.4) respectively.Comparing (3.6) with (3.1), the estimator ˆ µ PS can indeed be seen as a post-stratiﬁed estimator. In Section 3.4, we argue that ˆ α k is an estimator of the blockproportion of block k . Note that we also use the VH estimator ˆ µ k VH on each block k , instead of the block-wise sample average, to adjust for the bias induced by nodedegrees.3.4. Motivation for the PS estimator.

To motivate our new estimator, we ana-lyze its behavior under a standard model of random social network with communitystructure, the degree-corrected stochastic blockmodel (DC-SBM) [12].

OST-STRATIFIED RDS D EFINITION B ∈ R K × K + be a positive, symmetric matrix and let θ ∈ R N + be a positive vector. Under theDC-SBM, a social network G = ( V, E ) with V = { , . . . , N } is drawn randomlyas follows. Assume that we have a partition V , . . . , V K of V into K blocks labeled { , . . . , K } . Let N , . . . , N K be the respective sizes of the blocks. For a node i ∈ V , let Z i be its block. Each possible edge { i, j } is present independently from allother edges with probability(3.7) P [ { i, j } ∈ E ] = θ i θ j B Z i ,Z j . By convention, we assume (cid:80) i ∈ V k θ i = 1 for all block k .R EMARK { i, i } in the DC-SBM, each of which will contribute to degree counts (in-stead of the standard convention of ). Note that, in a dense graph, such self-loopswill play a negligible role.To justify our PS estimator under the DC-SBM, we make three observations:1. Deﬁne the matrices Q = B/m , where m = T B , and P B = ( p uv ) u,v ,where(3.8) p uv = B uv B u ∗ = Q uv Q u ∗ , for any two blocks u, v . Since P B is positive and row-normalized version ofthe symmetric matrix Q , it has a unique stationary distribution π B = ( π Bk ) k ,where π Bk = Q k ∗ = (cid:34)(cid:88) v Q v ∗ Q k ∗ (cid:35) − = (cid:34)(cid:88) v Q kv /Q k ∗ Q vk /Q v ∗ (cid:35) − = (cid:34)(cid:88) v p kv p vk (cid:35) − . Indeed (cid:88) k Q k ∗ p kv = (cid:88) k Q k ∗ Q kv Q k ∗ = (cid:88) k Q kv = Q v ∗ .

2. The expected degree of node i in block k is E [ d i ] = (cid:88) j θ i θ j B Z i ,Z j = θ i (cid:88) (cid:96) (cid:88) j ∈ V (cid:96) θ j B k(cid:96) = θ i B k ∗ . Hence the block-wise mean expected degree over block k is δ Bk = 1 N k (cid:88) i ∈ V k θ i B k ∗ = B k ∗ N k . ZHANG ET AL.

3. Combining the two observations above, we get π Bk δ Bk = N k (cid:80) k B k ∗ . Because the denominator is constant, we have ﬁnally α k = π Bk /δ Bk (cid:80) (cid:96) π B(cid:96) /δ B(cid:96) = N k N .

Therefore, by (3.1), the population mean µ true can be re-written as µ true = (cid:88) k α k µ k . From this it follows that, to estimate µ true , it sufﬁces to estimate the block-wisemean µ k , the block-wise expected mean degree δ Bk , and the stationary distribu-tion π Bk of P B , for each block k . We estimate them with ˆ µ k VH , ˆ H k , and ˆ π Bk ,respectively—leading to the PS estimator in (3.6). In the proof of Theorem 4.1below, we analyze the accuracy of these estimators (see Claims B.6, B.9 and B.7).

4. Main theoretical result.

In this section, we show that the PS estimatordeﬁned in (3.6) has error O ( (cid:112) log n/n ) with high probability when the social net-work is distributed under a dense DC-SBM.T HEOREM

Suppose the social network G = ( V, E ) ofsize N is distributed according to the DC-SBM with K blocks of respective sizes N , . . . , N k and parameters B ∈ R K × K + and θ ∈ R N + . Suppose T is a samplingtree of size n ≤ N . Let y ∈ R N + be the variable of interest. Assume that there areuniversal constants < c − < c + < + ∞ and < c y , c d < + ∞ independent of N and n such that the following assumptions hold:(a) [Linear-sized blocks] c − N ≤ N k ≤ c + N for all k;(b) [Dense graph] c − N ≤ B uv ≤ c + N for all blocks u, v ;(c) [Degree homogeneity] c − N − ≤ θ i ≤ c + N − for all nodes i ∈ G ;(d) [Bounded variables] ≤ y ( i ) ≤ c y for all nodes i ∈ G ;(e) [Limited referrals] The maximum degree of T is less than or equal to c d .Then, for any ε, ε (cid:48) > , there exists a constant c > (not depending on n, N )such that, with probability − ε over the choice of G , the following holds. For any ( T , P ) -walk on G the PS estimator deﬁned in (3.6) satisﬁes | ˆ µ PS − µ true | ≤ c (cid:114) log nn , with probability − ε (cid:48) . OST-STRATIFIED RDS A direct consequence of Theorem 4.1 is that the bias and standard deviationdecay at rate O ( n − / ) up to log factors. This does not hold for the traditional VHestimator [1], since its standard deviation decays at a rate slower than O ( n − / ) [5], which we also show by example in Appendix B.3. For the recent GLS-basedestimators proposed in [6], it is shown that their standard deviation decays at rate O ( n − / ) as n goes to inﬁnity for a ﬁxed network size, but no ﬁnite size guaranteesare provided.Assumptions (b) and (c) require the graph to be dense. In the following sec-tion, we show through simulations that the PS estimator also works well on sparsegraphs.

5. Simulations.

This section compares the PS estimator to the VH and fGLSestimators on simulated networks (in Section 5.1) as well as social networks col-lected by the National Longitudinal Study of Adolescent Health (Add Health Net-works) (in Section 5.2), both with simulated RDS samples. In both cases, the PSestimator has smaller variation than the VH and fGLS estimators.5.1.

Simulated Networks.

We simulated 100 random social networks by DC-SBM with nodes, expected average degree , and K = 2 blocks with thesame sizes. The stochastic matrix B was chosen proportional to (cid:18) .

95 0 . .

05 0 . (cid:19) . We simulated the binary outcomes to be perfectly aligned with one of the blocklabels.On each social network, we generated RDS samples by link tracing without re-placement. We randomly sampled the seed proportionally to the node degree. Then,for each participant τ in the sample, we recruited R τ ∈ N number of friends, where R τ iid ∼ Poi (2) . The recruiting process stopped when there were 1000 participantsin the RDS sample. If it terminated before recruiting 1000 participants, then were-started the recruiting process. We generated 200 different RDS samples on eachnetwork. For each RDS sample, we computed the VH, fGLS, and PS estimators.On each network, we computed the absolute bias, standard deviation, and RMSEof the 200 estimators of each type. In the simulations, we computed the fGLS esti-mator as in [6], which re-weights the outcome Y to adjust for the sampling bias.Figure 3 shows that the PS estimator has smaller variation than the VH and fGLSestimators in terms of absolute bias, standard deviation, and RMSE. ZHANG ET AL. ll l ll ll llllll lll ll lllll

Absolute Bias Standard Deviation RMSEVH fGLS PS VH fGLS PS VH fGLS PS0.20.30.4

Estimators V a r i a t i on v a l ue Variations of the estimators F IG . Comparisons on the Simulated Networks.

The ﬁgures present the variations for the VH, fGLSand PS estimators on each of the 100 simulated networks. Each panel corresponds to a differentvariation, including absolute bias, standard deviation, and RMSE. Each data points represents fora variation value for a type of estimator on a network. In each panel, for each type of estimator, weuse a corresponding box plot to show the distribution of the variation values of the 100 simulatednetworks.

Note that there are some factors that may affect the performance of the estima-tors, such as (1) bottlenecks in the social network (2) the alignment of the blocklabels z ( i ) with the variable of interest y ( i ) , and (3) the network density, etc. Weexplored these factors and how they affected the performance of the estimators.More explorations on other factors including network sizes and sample sizes are inSection A in the appendix. The following simulations in Figure 4, 5 and 6 have thesame setting as in Figure 3, except that the values of the corresponding factor aremade to vary. Bottleneck.

Bottlenecks exist when there are much fewer connections across dif-ferent blocks than within blocks. Recall that, in the DC-SBM, the stochastic blockmatrix B shows the average number of links between any two blocks. We simulatedthe stochastic block matrix such that, B ∝ (cid:18) p qq p (cid:19) , with p + q = 1 for identiﬁcation. We refer to the difference p − q as the bottle-neck strength. With a larger bottleneck strength, there are more connections withinblocks and fewer connections across blocks. When there is no bottleneck (strength OST-STRATIFIED RDS is zero), there is only one block in the network. Figure 4 shows that the PS estima-tor has smaller variation than the fGLS and VH estimators, especially when thereexists a bottleneck. In particular, the PS estimator appears to reduce the seed biasand standard deviation caused by bottlenecks much better than the fGLS and VHestimators. lllllllll lllllllll lllllllll Absolute Bias Standard Deviation RMSE0.01 0.10 1.000.01 0.10 1.000.01 0.10 1.000.00.20.40.6 V a r i a t i on v a l ue type l VHfGLSPS

Variations of the estimators F IG . Comparisons on the simulated networks with different bottleneck strengths

Alignment.

We capture the alignment of the block labels and the variable of in-terest by the difference of the block-wise means of the variable of interest, i.e. | µ − µ | with K = 2 blocks. Figure 5 shows that the fGLS and PS estimators ex-hibit the largest improvement over the VH estimator when the block label perfectlyaligns with the variable of interest (i.e., the alignment is ). The three estimatorsperform equally well when the block-wise means of the variable of interest areequal (i.e., the alignment is ). When the block label partially aligns with the vari-able of interest (i.e., the alignment is strictly between and ), the fGLS and VHestimators exhibit similar variation, but the PS estimator has smaller variation whenthe block-wise difference is over 0.4. ZHANG ET AL. l l l l l l l l l l l l l l l l l l

Absolute Bias Standard Deviation RMSE0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.00.00.10.20.30.4

Difference of the block−wise means V a r i a t i on v a l ue type l VHfGLSPS

Variations of the estimators F IG . Comparisons on the simulated networks with different alignments of the block labels

Network density.

We use the expected average degree of the network to quan-tify the network density. Though Theorem 4.1 requires the networks to be denseenough, Figure 6 shows that the estimators perform similarly on sparse networks. l l l l l l l l l l l l l l l

Absolute Bias Standard Deviation RMSE25 50 100 200 400 25 50 100 200 400 25 50 100 200 4000.20.3

Network density (average degree) V a r i a t i on v a l ue type l VHfGLSPS

Variations of the estimators F IG . Comparisons on the simulated networks with different density

Add Heath Networks.

In this section, we consider RDS simulations ob-tained by tracing contacts in social networks collected in the National LongitudinalStudy of Adolescent Health (Add Health Networks). This study collected a nation-ally represented sample of adolescents from grade 7 to 12 in the United States inthe 1994-1995 school year. The sample covers 84 pairs of middle and high schoolsin which students nominated of up to ﬁve male and ﬁve female friends in theirmiddle or high school network ([13]). In this analysis, we symmetrized all contacts

OST-STRATIFIED RDS to create a social network, and we restricted each network to its largest connectedcomponent. These networks were previously studied in [14], [15], and [6].We restricted our analysis to the 25 Add Heath networks with over 1000 nodes.On each network, we simulated 200 different RDS samples, each with 500 par-ticipants. On each RDS sample, we computed the VH, fGLS, and PS estimators.In the simulation, we randomly sampled seed nodes proportional to node degrees.We computed the absolute bias, RMSE, and standard deviation of the estimatorson each network. In the analysis we used the school label (middle school or highschool) as the outcome and the grade label (7-12) as the block labels.The recruitment process was similar to that in Section 5.1, but without replace-ment. In this case, each person could be recruited no more than once. For eachparticipant τ , if they had fewer number of unrecruited friends than R τ , then werecruited all of their unrecruited friends.Figure 7 shows the variation of the estimators. Overall, the PS estimator hassubstantially smaller variation than the fGLS and VH estimators. Absolute Bias RMSE Standard Deviation0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.30510152025

Variation Values A dd H ea l t h N e t w o r ks Variations of the Estimators on Add Health Networks F IG . Comparisons on the Add Health Networks.

The ﬁgures present the variations for theVH/fGLS and PS estimators on each of the 25 Add Health Networks. Each panel corresponds toa different variation, including absolute bias, RMSE, and standard deviation. In each panel, thehorizontal axis corresponds to the variation value and the vertical axis corresponds to different net-works, ordered by variation value of the baseline (VH/fGLS) estimator. The baseline estimator is theone between VH and fGLS estimators with the smaller variation value, i.e. the better between VH andfGLS estimators. Each line connects the variation value for the baseline estimator to the variationvalue of the PS estimator. If the line is red, then the PS estimator has a smaller variation.

6. Discussion.

RDS has been widely used in studying marginalized popula-tions. But the estimators derived from RDS samples have suffered from high vari- ZHANG ET AL. ance. This is due to two related issues (1) the complicated network dependenceof the RDS samples, and (2) seed bias caused by bottlenecks. In this paper, weintroduced post-stratiﬁcation to RDS and provided a novel estimator. Our easy-to-compute PS estimator reduces seed bias. We derived some theoretical results forthe PS estimator, showing its bias and standard deviation decay at O ( n − / ) (upto log factors) under the degree-corrected stochastic block model. This is the ﬁrstestimator with such guarantees. Though we require the networks to be dense intheory, we showed through simulations that the estimator performs similarly onsparse networks.One future direction is how to select the block labels in practice. In [6], anapproach for selecting block labels using eigenvalues of the block-wise transitionmatrix ˆ Q is proposed. Further discussions on this issue would be helpful to applythe PS (and fGLS) estimators. References. [1] Douglas D Heckathorn. Respondent-driven sampling: a new approach to the study of hiddenpopulations.

Social problems , 44(2):174–199, 1997.[2] Mohsen Malekinejad, Lisa Grazina Johnston, Carl Kendall, Ligia Regina Franco SansigoloKerr, Marina Raven Rifkin, and George W Rutherford. Using respondent-driven samplingmethodology for hiv biological and behavioral surveillance in international settings: a system-atic review.

AIDS and Behavior , 12(1):105–130, 2008.[3] LG Johnston. Introduction to hiv/aids and sexually transmitted infection surveillance: Module4: Introduction to respondent driven sampling.

World Health Organization , 2013.[4] Richard G White, Avi J Hakim, Matthew J Salganik, Michael W Spiller, Lisa G Johnston, LigiaKerr, Carl Kendall, Amy Drake, David Wilson, Kate Orroth, et al. Strengthening the report-ing of observational studies in epidemiology for respondent-driven sampling studies:strobe-rdsstatement.

Journal of clinical epidemiology , 68(12):1463–1471, 2015.[5] Karl Rohe. Network driven sampling; a critical threshold for design effects. arXiv preprintarXiv:1505.05461 , 2015.[6] Sebastien Roch and Karl Rohe. Generalized least squares can overcome the critical thresholdin respondent-driven sampling. arXiv preprint arXiv:1708.04999 , 2017.[7] Sharad Goel and Matthew J Salganik. Respondent-driven sampling as markov chain montecarlo.

Statistics in medicine , 28(17):2202–2229, 2009.[8] Matthew J Salganik and Douglas D Heckathorn. Sampling and estimation in hidden popula-tions using respondent-driven sampling.

Sociological methodology , 34(1):193–240, 2004.[9] Erik Volz and Douglas D Heckathorn. Probability based estimation theory for respondentdriven sampling.

Journal of ofﬁcial statistics , 24(1):79, 2008.[10] Itai Benjamini and Yuval Peres. Markov chains indexed by trees.

The annals of probability ,pages 219–243, 1994.[11] Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without replacementfrom a ﬁnite universe.

Journal of the American statistical Association , 47(260):663–685, 1952.[12] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in net-works.

Physical review E , 83(1):016107, 2011.[13] Kathleen Mullan Harris. The national longitudinal study of adolescent health: Research design. , 2011.[14] Sharad Goel and Matthew J Salganik. Assessing respondent-driven sampling.

Proceedings ofthe National Academy of Sciences , 107(15):6743–6747, 2010.OST-STRATIFIED RDS [15] Aaron J Baraff, Tyler H McCormick, and Adrian E Raftery. Estimating uncertainty inrespondent-driven sampling using a tree bootstrap method. Proceedings of the NationalAcademy of Sciences , page 201617258, 2016.[16] Wassily Hoeffding. Probability inequalities for sums of bounded random variables.

Journal ofthe American statistical association , 58(301):13–30, 1963.[17] Rajeev Motwani and Prabhakar Raghavan.

Randomized algorithms . Cambridge UniversityPress, Cambridge, 1995. ISBN 0-521-47465-5.

APPENDIX A: MORE SIMULATIONSIn this section, we explore how network sizes and sample sizes affect the perfor-mances of RDS estimators. The simulation settings are the same as in Section 5.1.Figure 8 shows the estimators perform similarly with different sizes of networks.Figure 9 shows the RDS estimators have smaller variation with larger sample sizes. l l l l l l l l l l l l l l l

Absolute Bias Standard Deviation RMSE20000400006000080000 20000400006000080000 200004000060000800000.20.3

Network size V a r i a t i on v a l ue type l VHfGLSPS

Variations of the estimators F IG . Comparisons on the simulated networks with different sample sizes.

In the simulations, wecontrol the densities for the networks to be similar. For each network with size N , we set the expectedaverage degree as (cid:98)√ N/ (cid:99) , where (cid:98) c (cid:99) denotes the integer part of c for any constant c ∈ R . ZHANG ET AL. l l l l l l l l l l l l l l l l l l l l l

Absolute Bias Standard Deviation RMSE500 1000 500 1000 500 10000.20.30.4

Sample size V a r i a t i on v a l ue type l VHfGLSPS

Variations of the estimators F IG . Comparisons on the simulated networks with different sample sizes.

APPENDIX B: PROOF OF THE MAIN THEOREM

B.1. Notation.

For each node i ∈ V , we denote its neighborhood in the socialnetwork G as N ( i ) = { j ∈ V : { i, j } ∈ E } and its neighborhood within block k as N ( i ; k ) = { j ∈ V k : { i, j } ∈ E } . We denote by d ( i ; k ) = |N ( i ; k ) | the size ofthe latter. The degree of node i is denoted d i = |N ( i ) | and we have d i = (cid:80) k d ( i ; k ) .While the RDS sampling procedure is a random walk on the social network G ,under a dense DC-SBM our analysis relies on establishing an approximation ofthe process by a “population-level” random walk on blocks. We deﬁne the blocktransition probability at node i ∈ V by(B.1) p uv ( i ) = P [ Z τ = v | X τ (cid:48) = i, z ( i ) = u ] = d ( i ; v ) (cid:80) k d ( i ; k ) { z ( i ) = u } , for any blocks u, v and any sample τ ∈ T . Recall that, for any sample τ ∈ T , wedenote its parent as τ (cid:48) .Under our assumptions, B uv is the expected number of edges between blocks u (cid:54) = v ; indeed (cid:88) i ∈ V u ,j ∈ V v P [ { i, j } ∈ E ] = (cid:88) i ∈ V u ,j ∈ V v θ i θ j B uv = B uv (cid:88) i ∈ V u θ i (cid:88) j ∈ V v θ j = B uv . Recalling the matrix Q = B/m , where B is the matrix in the deﬁnition of theDC-SBM model (3.7) and m = T B , the population block transition probabilityis given by(B.2) p uv = B uv B u ∗ = Q uv Q u ∗ , OST-STRATIFIED RDS for any two blocks u, v . We refer to(B.3) P B = ( p uv ) uv . as the population transition matrix on blocks. Recall that its unique stationary dis-tribution is π B = ( π Bk ) k .For each block k ∈ { , . . . , K } , n k = | T k | . We also deﬁne the number ofreferrals from block k to be(B.4) n k (cid:48) = (cid:88) τ ∈ T { Z τ (cid:48) = k } . For any two blocks u, v ∈ { , . . . , K } , we deﬁne the number of referrals betweenblock u and block v as(B.5) n u (cid:48) v = (cid:88) τ ∈ T { Z τ (cid:48) = u, Z τ = v } . Note that ˆ Q uv = n u (cid:48) v /n and ˆ P Buv = n u (cid:48) v /n u (cid:48) The elements of the estimated blocktransition matrix ˆ P B in (3.4) can be rewritten as ˆ P Buv = n u (cid:48) v /n u (cid:48) . We use ˆ p uv todenote these quantities, i.e., ˆ p uv := n u (cid:48) v n u (cid:48) . To summarize, for any blocks u, v , the quantities p uv ( i ) , ˜ p uv and ˆ p uv representrespectively the block transition probability at node i ∈ G , the population blocktransition probability, and the estimated block transition probability. B.2. Proof.

The proof of Theorem 4.1 follows from a series of claims. Webegin with a sketch of the proof in this section.1. Under the dense DC-SBM, random walk is mixing fast within each block(Claims B.2 and B.5). This plays a key role in estimating block-wise means,for which we use the VH estimator (Claims B.7, B.8, and B.10).2. To estimate block proportions, we use the stationary distribution of block-wise transition matrix, which is the main, non-trivial contribution of thiswork. Indeed, the standard empirical frequency gives an estimate with muchlarger variance (see Section B.3). Instead we estimate the transition matrixbetween blocks, which is a “more local” quantity in the sense that it is notaffected strongly by the seed, and compute its stationary distribution. As a re-sult, block-wise transition probabilities are highly concentrated around theirtrue value under the Markov chain on the blocks; their stationary distribu-tions are also close to each other (Claim B.3, Claim B.6). ZHANG ET AL.

Note that there are two sources of randomness, the social network G and the T -indexed random walk. Claims B.1-B.5 are concerned with the randomness of G ,while Claims B.6-B.10 deal with the random walk. ˆ µ k VH µ k ˆ π Bk π Bk H ( δ k ) δ ( B ) k = N − k B k ∗ ˆ µ mVH µ true Claim B.10Claim B.6Claim B.9Combine aboveThroughout, ε > is as in the statement of the theorem. High-probability properties of the social network.

We ﬁrst use standard concen-tration inequalities to control the degrees of G . Recall that under the DC-SBM theexpectation of d ( i ; v ) is θ i B uv .C LAIM

B.1 (Degrees are concentrated).

Under the DC-SBM, there exists c > (depending on ε but not on N ) such that, with probability − ε/ over the choiceof G , the following event holds: simultaneously for all pairs of blocks u, v and allnodes i ∈ V u , (cid:12)(cid:12)(cid:12)(cid:12) d ( i ; v ) θ i B uv − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log NN .

We let E D be the event in the claim.P ROOF OF C LAIM

B.1. Fix blocks u, v and i ∈ V u . Under the DC-SBM, eachnode j in block v connects with node i independently with probability θ i θ j B uv .Hence we can write d ( i ; v ) as a sum of N v independent indicators, whose overallexpectation is θ i B uv , where we used that (cid:80) j ∈ V v θ j = 1 . By Hoeffding’s inequality[16], for any constant c (cid:48) > , by choosing c > large enough P (cid:104) | d ( i ; v ) − θ i B uv | > c (cid:112) N log N (cid:105) ≤ (cid:32) − (cid:2) c √ N log N (cid:3) N v (cid:33) ≤ N − c (cid:48) , (B.6) OST-STRATIFIED RDS where we used that N v = Θ( N ) in the second inequality. Taking a union boundover u , v and i gives | d ( i ; v ) − θ i B uv | ≤ c (cid:112) N log N , simultaneously for all u, v and all i ∈ V v with probability at least − K · N · N − c (cid:48) .Dividing by θ i B uv and using θ i = Θ( N − ) for any node i and B uw = Θ( N ) forany blocks u, w , gives the result for appropriately chosen c , c (cid:48) > .The following claim will be useful to control the mixing rate within a block. Forany blocks u, w, v and two distinct nodes i ∈ V u , j ∈ V v , we consider the numberof two-edge paths from i to j in G whose middle vertex is in block w , weighted bya quantity related to the expected degree of the middle vertex under the DC-SBM: d (2) θ ( i, j ; w ) = (cid:88) k ∈ V w N θ k { k ∈ N ( i ) ∩ N ( j ) } . C LAIM

B.2 (Two-edge paths).

There exists c > such that, with probability − ε/ over the choice of G , the following holds: simultaneously for all blocks u, w, v , and all i ∈ V u , j ∈ V v with i (cid:54) = j , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d (2) θ ( i, j ; w ) N − θ i θ j B uw B wv − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log NN .

We let E D , be the event in the claim.P ROOF OF C LAIM

B.2. Fix blocks u, w, v , and nodes i (cid:54) = j . By Claim B.1, wecan choose c (cid:48)(cid:48) large enough such that(B.7) P (cid:104) | d ( i ; w ) − θ i B uw | > c (cid:48)(cid:48) (cid:112) N log N (cid:105) ≤ N − c (cid:48)(cid:48)(cid:48) , for some c (cid:48)(cid:48)(cid:48) > .We treat the case where all blocks are distinct. The other cases are similar. Let E i,w be the event that | d ( i ; w ) − θ i B uw | ≤ c (cid:48)(cid:48) √ N log N and note that j (cid:54)∈ N ( i ; w ) .Conditioned on E i,w , each of the d ( i ; w ) edges incident to i and block w has a cor-responding endpoint k ∈ V w which itself connects to j —independently of all othersuch endpoints—with probability θ k θ j B wv . Since d (2) θ ( i, j ; w ) weighs this lastedge by ( N θ k ) − , its expected contribution is N − θ j B wv . Moreover, the d ( i ; w ) possibly non-zero terms in the sum deﬁning d (2) θ ( i, j ; w ) are uniformly bounded by ZHANG ET AL. a constant by the assumption that θ i = Θ( N − ) . Hence, we can apply Hoeffding’sinequality again, and by choosing c (cid:48) > large enough we have P (cid:104)(cid:12)(cid:12)(cid:12) d (2) θ ( i, j ; w ) − N − θ j B wv d ( i ; w ) (cid:12)(cid:12)(cid:12) > c (cid:48) (cid:112) N log N (cid:12)(cid:12)(cid:12) E i,w (cid:105) ≤ (cid:32) − (cid:2) c (cid:48) √ N log N (cid:3) d ( i ; w ) (cid:33) ≤ (cid:32) − (cid:2) c (cid:48) √ N log N (cid:3) θ i B uw + c (cid:48)(cid:48) √ N log N (cid:33) ≤ N − c (cid:48)(cid:48) , (B.8)for some c (cid:48)(cid:48) > , where we used (B.7) in the second inequality and we used that θ i = Θ( N − ) and B uw = Θ( N ) in the last inequality.Combining (B.7) and (B.8), and taking a union bound over u , w , v , i and j gives (cid:12)(cid:12)(cid:12) d (2) ( i, j ; w ) − N − θ i θ j B uw B wv (cid:12)(cid:12)(cid:12) ≤ c (cid:48)(cid:48)(cid:48) (cid:112) N log N , for a constant c (cid:48)(cid:48)(cid:48) > chosen large enough. Dividing by N − θ i θ j B uw B wv andusing again that θ i = Θ( N − ) and B uv = Θ( N ) gives the result, for an appro-priately chosen constant c > . Properties of the walk.

Before proving our main theorem, we will also need someresults about the behavior of simple random walk on the network. We ﬁrst showthat, from any i ∈ V u , the probability of jumping to a vertex in block v is close tothe population-level probability p uv .C LAIM

B.3 (Transitions between blocks).

There exists c > such that, con-ditioned on E D , for any blocks u, v and any i ∈ V u (cid:12)(cid:12)(cid:12)(cid:12) p uv ( i ) p uv − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log NN . P ROOF . Fix u, v and i ∈ V u . Recall p uv ( i ) = d ( i ; v ) (cid:80) k d ( i ; k ) and p uv = B uv B u ∗ . Under E D , p u,v ( i ) ≤ θ i B uv (cid:18) c (cid:113) log NN (cid:19)(cid:80) k θ i B uk (cid:18) − c (cid:113) log NN (cid:19) ≤ p uv (cid:32) c (cid:114) log NN (cid:33) , OST-STRATIFIED RDS for a constant c > large enough. A similar inequality holds in the oppositedirection.The previous claim also implies that any step has a probability bounded awayfrom of landing in any block.C LAIM

B.4 (Landing in a block).

There is p ∗ ∈ (0 , such that, conditionedon E D , for any blocks u, v and any i ∈ V u , we have p uv ( i ) ≥ p ∗ , provided N is larger than a sufﬁciently large constant. P ROOF . Let < p ∗ < min u,v p uv . The result then follows from Claim B.3.We next show that two steps of the walk are enough to mix within a block.C

LAIM

B.5 (Two steps sufﬁce for within-block mixing).

For each sample τ ∈ T , we denote its grandchildren as C (2) ( τ ) . For a ( T , P ) -walk on G , there exists c > such that, on E D and E D , , for all τ and τ ∗∗ ∈ C (2) ( τ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P [ X τ ∗∗ = j | X τ = i, G ] θ j (cid:80) k p uk p kv − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log NN , for all blocks u, v , and nodes i ∈ V u , j ∈ V v with i (cid:54) = j . P ROOF . To simplify notation, the conditioning on G is implicit throughout theproof. Assume E D and E D , hold. Fix blocks u, v as well as nodes i ∈ V u and j ∈ V v with i (cid:54) = j . Let τ ∗∗ ∈ C (2) ( τ ) and let τ ∗ be the ancestor of τ ∗∗ on T , which ZHANG ET AL. is necessarily a child of τ . Then, for some constants c (cid:48) , c (cid:48)(cid:48) > , using E D and E D , P [ X τ (cid:48)∗∗ = j | X τ = i ]= (cid:88) t ∈ V P [ X τ ∗∗ = j | X τ ∗ = t ] P [ X τ ∗ = t | X τ = i ]= (cid:88) t ∈N ( i ) ∩N ( j ) d i × d t ≤ (cid:88) t ∈N ( i ) ∩N ( j ) θ t B Z t , ∗ (cid:18) − c (cid:113) log NN (cid:19) × θ i B u ∗ (cid:18) − c (cid:113) log NN (cid:19) ≤ (cid:32) c (cid:48) (cid:114) log NN (cid:33) (cid:88) k θ i B k ∗ B u ∗ (cid:88) t ∈N ( i ; k ) ∩N ( j ; k ) θ t = (cid:32) c (cid:48) (cid:114) log NN (cid:33) (cid:88) k N d (2) θ ( i, j ; k ) θ i B k ∗ B u ∗ ≤ (cid:32) c (cid:48) (cid:114) log NN (cid:33) (cid:88) k θ i θ j B uk B kv (cid:18) c (cid:113) log NN (cid:19) θ i B k ∗ B u ∗ ≤ (cid:32) c (cid:48)(cid:48) (cid:114) log NN (cid:33) θ j (cid:88) k p uk p kv , where recall that p uv = B uv /B u ∗ . A similar inequality holds in the other direction.That implies the claim. Concentration of key estimates.

The PS estimator deﬁned in (3.1) relies on threekey estimates, whose concentration we establish now.We begin with the concentration of ˆ π Bk by showing that our estimates of blocktransition probabilities are concentrated, which boils down to proving that the ˆ p uv ’sare concentrated. Recall that Claim B.3 implies that the block transition probabili-ties are concentrated at each i , i.e., the p uv ( i ) ’s are concentrated. Proving that theestimate ˆ p uv = n u (cid:48) v /n u (cid:48) itself is concentrated requires an argument. Indeed, asshown in Section B.3 below, both the numerator and denominator of this estimatorin general may have variance asymptotically much greater than /n . Instead, weuse the Markovian structure of the model to control the deviation of ˆ p uv .C LAIM

B.6 (Concentration of block-wise steady-state probability estimates).

Conditioned on G and E D , there exists c > such that, for any block k , with OST-STRATIFIED RDS probability at least − ε (cid:48) / , (cid:12)(cid:12)(cid:12)(cid:12) ˆ π Bk π Bk − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn . Recall that π B was deﬁned in (B.3).P ROOF . Throughout this proof, we implicitly condition on G and assume that E D (from Claim B.1) holds. We let τ , . . . , τ n − be a topological ordering of thevertices of T , i.e., an ordering such that: if τ i is an ancestor of τ j , then i < j . Fora ﬁxed G , we let F , . . . , F n − be the corresponding ﬁltration, i.e., F j = σ (cid:0) X τ , . . . , X τ j (cid:1) . Recall that τ (cid:48) is the parent of τ (cid:54) = τ . The proof relies on three sub-claims:1. Deviation of n u (cid:48) v : For u, v and j = 1 , . . . , n − , let I j = { z ( X τ (cid:48) j ) = u, z ( X τ j ) = v } , where recall that z ( i ) is the block of i . Note that n − (cid:88) j =1 I j = n u (cid:48) v . We consider the process W t = t (cid:88) j =1 { I j − E [ I j | F j − ] } , t = 1 , . . . , n − , with W = 0 . We claim that { W t } t is a martingale with bounded increments.Indeed, by the ordering of the samples, X τ (cid:48) j ∈ F j since τ (cid:48) j = τ s for some s < j . Hence I j − E [ I j | F j − ] ∈ F t for all j ≤ t . So W t ∈ F t . Moreover,following a standard calculation, E [ W t − W t − | F t − ] = E [ I t | F t − ] − E [ E [ I t | F t − ] | F t − ] = 0 . Finally, observe that by deﬁnition | W t − W t − | = | I t − E [ I t | F t − ] | ≤ . ZHANG ET AL.

By the Azuma-Hoeffding inequality (see e.g. [17]), for a constant c > large enough P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n u (cid:48) v − n − (cid:88) j =1 E [ I j | F j − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c (cid:112) n log n  = P (cid:104) | W n − − W | ≥ c (cid:112) n log n (cid:105) ≤ (cid:18) − [ c √ n log n ] n − (cid:19) ≤ ε (cid:48) K . (B.9)2. Deviation of n − (cid:80) j =1 E [ I j | F j − ] : Next, we bound n − (cid:88) j =1 E [ I j | F j − ]= n − (cid:88) j =1 P [ z ( X τ (cid:48) j ) = u, z ( X τ j ) = v | F j − ]= n − (cid:88) j =1 p uv ( X τ (cid:48) j ) , (B.10)where we use the Markov property of the walk indexed by T . By Claim B.3,for all i ∈ V u ,(B.11) (cid:12)(cid:12)(cid:12)(cid:12) p uv ( i ) p uv − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48) (cid:114) log NN .

Combining (B.10) and (B.11), we get (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j =1 E [ I j | F j − ] − p uv n u (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48) (cid:114) log NN n u (cid:48) ≤ c (cid:48) (cid:114) log NN n = c (cid:48) (cid:114) log NN (cid:114) n log n (cid:112) n log n ≤ c (cid:48) (cid:112) n log n, (B.12) OST-STRATIFIED RDS where we used that n ≤ N and that x/ ln x is non-decreasing for x ≥ e .3. Lower bound on n u (cid:48) : Let n in be the number of internal vertices in T . Be-cause each leaf has a parent that is an internal vertex and T has maximumdegree d max ≤ c d for some constant c d > , it follows that n in = Θ( n ) .Moreover, by Claim B.4, the state of each internal vertex of T (except theroot) has probability at least p ∗ of coming from block u , independently ofall other X τ ’s. As a result, n u (cid:48) stochastically dominates a binomial randomvariable with n in − trials and probability of success p ∗ . By Hoeffding’sinequality we therefore have for a constant c > large enough that P (cid:104) n u (cid:48) − p ∗ ( n in − < c (cid:112) n log n (cid:105) ≤ exp (cid:32) − (cid:2) c √ n log n (cid:3) n in − (cid:33) ≤ ε (cid:48) K .

Together with n in = Θ( n ) , that implies that for some constant c (cid:48) > (B.13) P (cid:2) n u (cid:48) ≥ c (cid:48) n (cid:3) ≥ − ε (cid:48) K .

Combining (B.9), (B.12), and (B.13), with probability at least − ε (cid:48) / for anyblock u, v , there exists some constant c (cid:48)(cid:48) > , such that (cid:12)(cid:12)(cid:12)(cid:12) ˆ p uv p uv − (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) n u (cid:48) v n u (cid:48) p uv − (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) n u (cid:48) v − n u (cid:48) p uv n u (cid:48) p uv (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) n u (cid:48) v − (cid:80) n − j =1 E [ I j | F j − ] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:80) n − j =1 E [ I j | F j − ] − n u (cid:48) p uv (cid:12)(cid:12)(cid:12) | n u (cid:48) p uv |≤ | c (cid:48) np uv | − ( c + c (cid:48) ) (cid:112) n log n ≤ c (cid:48)(cid:48) (cid:114) log nn . (B.14)Recall that the stationary distribution of P B is(B.15) π Bk = (cid:34)(cid:88) v p kv p vk (cid:35) − , for any k ∈ { , . . . , K } and that(B.16) ˆ π Bk = (cid:34)(cid:88) v ˆ p kv ˆ p vk (cid:35) − . ZHANG ET AL.

Then, there exists some constant c (cid:48)(cid:48)(cid:48) > , such that (cid:12)(cid:12)(cid:12)(cid:12) ˆ π Bk π Bk − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48)(cid:48)(cid:48) (cid:114) log nn . Indeed, π Bk ˆ π Bk − (cid:80) v ˆ p kv / ˆ p vk (cid:80) v p kv /p vk − (cid:80) v ( p kv /p vk )(ˆ p kv /p kv )( p vk / ˆ p vk ) (cid:80) v p kv /p vk − ≤ c (cid:48)(cid:48) (cid:113) log nn − c (cid:48)(cid:48) (cid:113) log nn − ≤ c (cid:48)(cid:48)(cid:48) (cid:114) log nn , for large enough c (cid:48)(cid:48)(cid:48) , and similarly in the other direction. The second line is from(B.15) and (B.16) while fourth line is from (B.14).We then evaluate the deviation of ˆ µ k VH = ˆ H k n k (cid:88) τ ∈ T k Y τ d X τ . Recall that, for any block k , the population block-wise average is µ k = 1 N k (cid:88) i ∈ V k y i . Before showing that our block-wise estimator ˆ µ k VH is close to µ k , we ﬁrst look ata related quantity, ˆ µ k, w below, which serves as a “bridge.” We deﬁne the weightedblock-wise average as(B.17) ˆ µ k, w = 1 n k (cid:88) τ ∈ T k Y τ N k θ X τ . Using an argument similar to that in Claim B.6, we show in Claim B.7 that ˆ µ k, w is concentrated for each block k . We then show in Claim B.10 that ˆ µ k, w is close to ˆ µ k VH . As a result, we will have established that ˆ µ k VH is close to µ k . OST-STRATIFIED RDS ˆ µ k VH ˆ µ k, w µ k close toClaim B.10 close toClaim B.7C LAIM

B.7 (Concentration of block-wise sample averages weighted by de-grees).

Conditioned on G , E D and E D , , there exists c > such that, with prob-ability at least − ε (cid:48) / , for any block k (cid:12)(cid:12)(cid:12)(cid:12) ˆ µ k, w µ k − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn . P ROOF . Because the structure of the proof is similar to that of Claim B.6, weonly sketch it here. We also make use of Claim B.5, which shows that simplerandom walk on G mixes well within blocks in two steps. Because of the latter,we control separately the odd and even levels of T . Let ν , ν , . . . , ν n (e) be thevertices of T whose graph distance to the root is even, including the root ν = τ ,in a topological ordering. Let C (2) ( ν ) be the grand-children of ν in T . Let G = σ ( X ν ) = σ ( X τ ) and for j ≥ G j = G ∪ σ ( X ν : ν ∈ C (2) ( ν (cid:96) ) , (cid:96) ≤ j ) . For each node X ν ∈ V k , deﬁne y θ ( X ν ) = y ( X ν ) N Z ν θ X ν . Fix block k and let I j = (cid:88) ν ∈C (2) ( ν j ) { z ( X ν ) = k } y θ ( X ν ) , and n (e) k := n (e) (cid:88) i =2 { z ( X ν i ) = k } , where note that the last sum excludes the root. Following the proof of Claim B.6,we note that the partial sums J (cid:88) j =1 { I j − E [ I j | G j − ] } , J = 1 , . . . , n (e) , form a martingale indexed by J with increments satisfying | I j − E [ I j | G j − ] | ≤ ( c d − c y , ZHANG ET AL. where we used that T has maximum degree ≤ c d and ≤ y ( x ) ≤ c y by assump-tion. Hence, arguing as in Step 1 of Claim B.6, we get that with probability at least − ε (cid:48) / K for all k (B.18) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (e) (cid:88) i =2 { z ( X ν i ) = k } y θ ( X ν i ) − n (e) (cid:88) j =1 E [ I j | G j − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48) (cid:112) n log n. Moreover, let ν ∈ C (2) ( ν j ) and notice that by construction X ν j ∈ G j − . Hence, byClaim B.5, E [ { z ( X ν ) = k } y θ ( X ν ) | G j − ] ≤ (cid:88) x ∈ V k (cid:40) θ x (cid:88) u p z ( X νj ) ,u p uk (cid:34) c (cid:114) log NN (cid:35)(cid:41) y θ ( x ) ≤ (cid:88) x ∈ V k (cid:40) N − k (cid:88) u p z ( X νj ) ,u p uk (cid:34) c (cid:114) log NN (cid:35)(cid:41) y ( x )= µ k (cid:88) u p z ( X νj ) ,u p uk (cid:34) c (cid:114) log NN (cid:35) , where recall that we condition on G . Similarly in the opposite direction. So, arguingas in Step 2 of Claim B.6, for some large enough c (cid:48)(cid:48) > , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (e) (cid:88) j =1 E [ I j | G j − ] − µ k (cid:88) v (cid:88) u p vu p uk n (e) (cid:88) j =1 { z ( X ν j ) = v } (cid:12)(cid:12)(cid:12) C (2) ( ν j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log NN nc d c y ≤ c c d c y (cid:114) log NN (cid:114) n log n (cid:112) n log n ≤ c (cid:48)(cid:48) (cid:112) n log n, (B.19)where we used n ≤ N .In addition, we argue as in Step 3 of Claim B.6. Because each node with odddistance to the root has a parent with even distance to the root, and T has maximumdegree ≤ c d , it follows that n ( e ) = Θ( n ) . Moreover, by Claim B.4, the state of eachinternal vertex of T (except the root) has probability at least p ∗ of coming fromblock u , independently of all other X τ ’s. As a result, n (e) k stochastically dominatesa binomial random variable with n ( e ) − trials and probability of success p ∗ . By OST-STRATIFIED RDS Hoeffding’s inequality we therefore have for a constant c > large enough that P (cid:104) n (e) k − p ∗ ( n ( e ) − < c (cid:112) n log n (cid:105) ≤ exp (cid:32) − (cid:2) c √ n log n (cid:3) n ( e ) − (cid:33) ≤ ε (cid:48) K .

Together with n ( e ) = Θ( n ) , that implies that with probability at least − ε (cid:48) / forall block k for some constant c (cid:48) > (B.20) n (e) k ≥ c (cid:48) n. Finally, following the proof of Claim B.6 once again, we also get that with prob-ability at least − ε (cid:48) / for all k, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) u p vu p uk n (e) (cid:88) j =1 { z ( X ν j ) = v } (cid:12)(cid:12)(cid:12) C (2) ( ν j ) (cid:12)(cid:12)(cid:12) − n (e) (cid:88) j =1 (cid:88) ν ∈C (2) ( ν j ) { z ( X ν j ) = v, z ( X ν ) = k } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48)(cid:48) (cid:112) n log n, (B.21)for some constant c (cid:48)(cid:48) > . Combining (B.18), (B.19), and (B.21), with probabilityat least − ε (cid:48) / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (e) (cid:88) i =2 { z ( X ν i ) = k } y θ ( X ν i ) − µ k n (e) (cid:88) i =2 { z ( X ν i ) = k } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (e) (cid:88) i =2 { z ( X ν i ) = k } y θ ( X ν i ) − n (e) (cid:88) j =1 E [ I j | G j − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (e) (cid:88) j =1 E [ I j | G j − ] − µ k (cid:88) v (cid:88) u p vu p uk n (e) (cid:88) j =1 { z ( X ν j ) = v } (cid:12)(cid:12)(cid:12) C (2) ( ν j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ k (cid:88) v (cid:88) u p vu p uk n (e) (cid:88) j =1 { z ( X ν j ) = v } (cid:12)(cid:12)(cid:12) C (2) ( ν j ) (cid:12)(cid:12)(cid:12) − µ k (cid:88) v n (e) (cid:88) j =1 (cid:88) ν ∈C (2) ( ν j ) { z ( X ν j ) = v, z ( X ν ) = k } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48)(cid:48)(cid:48) (cid:112) n log n, for some constant c (cid:48)(cid:48)(cid:48) > . ZHANG ET AL.

The same holds for the odd levels. Together with (B.20) and a similar inequalityfor odd levels (and the fact that the ﬁrst two levels of T have negligible effectasymptotically), we get the claim.By replacing y ( X τ ) by 1 in the proof of Claim B.7, we can also derive thefollowing.C LAIM

B.8.

Conditioned on G , E D and E D , , there exists c > such that,with probability at least − ε (cid:48) / , for any block k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n k (cid:88) τ ∈ T k N k θ X τ − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn . Using Claims B.1 and B.8, we derive the deviation of the block-wise harmonicaverage degrees. Recall, for any block k , the block population mean degree is δ Bk = B k ∗ N k , and the block-wise harmonic average degree as ˆ H k =  n k (cid:88) τ ∈ T k d X τ  − . C LAIM

B.9 (Concentration of block-wise harmonic average of degrees).

Con-ditioned on G , E D and E D , , there exists c > such that, with probability at least − ε (cid:48) / , for any block k , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ H k δ Bk − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn . OST-STRATIFIED RDS P ROOF . Conditioned on G , E D and E D , , under the DC-SBM, ( ˆ H k ) − = n − k (cid:88) τ ∈ T k /d X τ ≤ n − k (cid:88) τ ∈ T k (cid:34) θ X τ B k ∗ (cid:32) − Kc (cid:114) log NN (cid:33)(cid:35) − = (cid:32) − Kc (cid:114) log NN (cid:33) − n − k (cid:88) τ ∈ T k θ X τ B k ∗ = (cid:32) − Kc (cid:114) log NN (cid:33) −  n − k (cid:88) τ ∈ T k N k θ X τ  N k B k ∗ ≤ (cid:32) − Kc (cid:114) log NN (cid:33) − (cid:32) c (cid:114) log nn (cid:33) N k B k ∗ ≤ (cid:32) c (cid:48) (cid:114) log nn (cid:33) N k B k ∗ , for some large enough constant c (cid:48) > . The ﬁrst inequality is from Claim B.1,which holds with probability − ε (cid:48) / . The second inequality is from Claim B.8. Asimilar inequality holds for the opposite direction. Thus, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N − k B k ∗ ˆ H k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:48) (cid:114) log nn . Thus, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ H k N − k B k ∗ − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn , for some large enough constant c > . By deﬁnition of δ Bk we are done.Directly from Claim B.9, we show that ˆ µ k, w is close to ˆ µ k VH for each block k in the following claim.C LAIM

B.10.

Conditioned on G , E D and E D , , there exists c > such that,with probability at least − ε (cid:48) / , for any block k , (cid:12)(cid:12)(cid:12)(cid:12) ˆ µ k VH µ k − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn . ZHANG ET AL. P ROOF . Conditioned on G , E D and E D , , under the DC-SBM, Claims B.9 andB.7 hold simultaneously with probability − ε (cid:48) / . Then ˆ µ k VH = n − k (cid:88) τ ∈ T k Y τ ˆ H k d X τ = ˆ H k δ Bk n − k (cid:88) τ ∈ T k Y τ B k ∗ d X τ N k ≤ (cid:32) c (cid:114) log nn (cid:33) n − k (cid:88) τ ∈ T k Y τ B k ∗ d X τ N k ≤ (cid:32) c (cid:114) log nn (cid:33) n − k (cid:88) τ ∈ T k Y τ B k ∗ θ X τ B k ∗ (cid:18) − c (cid:113) log NN (cid:19) N k = (cid:32) c (cid:114) log nn (cid:33) (cid:32) − c (cid:114) log NN (cid:33) −  n − k (cid:88) τ ∈ T k Y τ θ X τ N k  = (cid:32) c (cid:114) log nn (cid:33) (cid:32) − c (cid:114) log NN (cid:33) − ˆ µ k, w ≤ (cid:32) c (cid:48) (cid:114) log nn (cid:33) ˆ µ k, w , for some large enough constant c (cid:48) > . The ﬁrst inequality is from Claim B.9,while the second inequality is from Claim B.1. A similar bound holds for the op-posite direction. Combining with Claim B.7, (cid:12)(cid:12)(cid:12)(cid:12) ˆ µ k VH µ k − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:114) log nn , for some large enough constant c > . Putting everything together.

Finally, we prove the main result.P

ROOF OF T HEOREM E D and E D , holdwith probability at least − ε . Under those events, by Claims B.6 and B.9 withhold with probability − ε (cid:48) , ˆ π Bk ˆ H k ≤ π Bk (cid:18) c (cid:113) log nn (cid:19) δ Bk (cid:18) − c (cid:113) log nn (cid:19) = π Bk δ Bk (cid:32) c (cid:48) (cid:114) log nn (cid:33) , OST-STRATIFIED RDS for some large enough c (cid:48) > . Similar for the other direction. Then, using ClaimB.10, ˆ µ PS = (cid:80) k [ˆ π Bk / ˆ H k ] ˆ µ k VH (cid:80) k ˆ π Bk / ˆ H k ≤ (cid:80) k (cid:2) π Bk /δ Bk (cid:3) (cid:18) c (cid:48) (cid:113) log nn (cid:19) µ k (cid:18) c (cid:113) log nn (cid:19)(cid:80) k (cid:2) π Bk /δ Bk (cid:3) (cid:18) − c (cid:48) (cid:113) log nn (cid:19) ≤ (cid:80) k (cid:2) π Bk /δ Bk (cid:3) µ k (cid:80) k (cid:2) π Bk /δ Bk (cid:3) (cid:32) c (cid:114) log nn (cid:33) = µ true (cid:32) c (cid:114) log nn (cid:33) , for some constant c > . Similarly for the other direction. Thus, there existsconstant c > such that | ˆ µ PS − µ true | ≤ c (cid:114) log nn . B.3. A simple instance showing that the variance of the VH estimator con-verges slower than O ( n − ) . The following example shows that, in general, theVolz-Heckathorn estimator, i.e., ˆ µ VH = (cid:80) τ ∈ T y ( X τ ) /d X τ (cid:80) τ ∈ T /d X τ , has a variance asymptotically worse than /n on a two-block stochastic blockmodel. Recall that z ( x ) is the block of x .T HEOREM

B.11 (Negative example).

Let K = 2 and denote the blocks by { , } . Let N = N = N/ , B = B = 1 − B = 1 − B = pN where p ∈ (0 , / , y ( x ) = z ( x ) for all x ∈ V . Let x ∈ V be chosen uniformly atrandom. Let T be a complete ( α − -ary tree. Assume that N (cid:29) n γ for some γ > and that (B.22) − p ) > . ZHANG ET AL.

Then, with probability at least / over the network, Var [ˆ µ VH | G ] (cid:29) n − ζ , for some ζ > . P ROOF . By Claim B.1, the event E D occurs with probability at least / . There-fore, by the conditional variance formula,(B.23) Var [ˆ µ VH | G ] ≥

12 Var [ˆ µ VH | G, E D ] . By symmetry, δ B = δ B = N/ . Hence, on E D , we have further that Var [ˆ µ VH | G, E D ]= E (cid:34)(cid:18) (cid:80) τ ∈ T y ( X τ ) /d X τ (cid:80) τ ∈ T /d X τ (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G, E D (cid:35) − (cid:18) E (cid:20) (cid:80) τ ∈ T y ( X τ ) /d X τ (cid:80) τ ∈ T /d X τ (cid:12)(cid:12)(cid:12)(cid:12) G, E D (cid:21)(cid:19) ≥ E (cid:34)(cid:18) (cid:80) τ ∈ T y ( X τ ) n (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G, E D (cid:35) − (cid:18) E (cid:20) (cid:80) τ ∈ T y ( X τ ) n (cid:12)(cid:12)(cid:12)(cid:12) G, E D (cid:21)(cid:19) − O (cid:32)(cid:114) log NN (cid:33) = Var (cid:34) n (cid:88) τ ∈ T y ( X τ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G, E D (cid:35) − o ( n − ) , (B.24)by our assumption on N , where we used that y ( x ) ∈ [0 , for all x . To simplifynotation, in the rest of the proof, we implicitly condition on G and E D .The population-level chain satisﬁes p = p = 1 − p, p = p = p, π B = π B = 12 . Let ( ˜ f τ ) τ ∈ T be a Markov chain on { , } indexed by T with transition probabilities ( p bu ) bu ∈{ , } . By Claim B.3, on E D , we can couple ( y ( X τ )) τ and ( ˜ f τ ) τ except withprobability O ( n (cid:112) log N/N ) = o (1) , an event we denote by ˜ E . This is because,for each of the n − transitions, there can only be a difference in probability of OST-STRATIFIED RDS O ( (cid:112) log N/N ) .Hence, by the conditional variance formula again, Var (cid:34) n (cid:88) τ ∈ T y ( X τ ) (cid:35) ≥ (1 − o (1))Var (cid:34) n (cid:88) τ ∈ T y ( X τ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ E (cid:35) = (1 − o (1))Var (cid:34) n (cid:88) τ ∈ T ˜ f τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ E (cid:35) . (B.25)To simplify notation, in the rest of the proof, we implicitly condition on ˜ E .Deﬁne ˜ g τ := 1 − f τ ∈ {− , +1 } , and notice that, by translation,(B.26) Var (cid:34) n (cid:88) τ ∈ T ˜ f τ (cid:35) = 14 Var (cid:34) n (cid:88) τ ∈ T ˜ g τ (cid:35) and that ˜ g τ is centered under π B . Under ( p bu ) bu ∈{ , } , the function ( − , +1) is aright-eigenvector with eigenvalue θ := 1 − p ∈ (0 , . Hence, for any τ, τ (cid:48) ∈ T at graph distance η , it holds that E [˜ g τ (cid:48) | ˜ g τ ] = θ η ˜ g τ , and Cov[˜ g τ (cid:48) , ˜ g τ ] = E [˜ g τ (cid:48) ˜ g τ ] = E [ E [˜ g τ (cid:48) ˜ g τ | ˜ g τ ]] = E [ θ η ˜ g τ ] = θ η , where we used that ˜ g τ = 1 . Let L be the leaves of T . Because the samples (˜ g τ ) τ ∈ T are positively correlated by the above calculation and |L| = Ω( n ) , we have furtherthat(B.27) Var (cid:34) n (cid:88) τ ∈ T ˜ g τ (cid:35) = Ω (cid:32) Var (cid:34) |L| (cid:88) τ ∈L ˜ g τ (cid:35)(cid:33) . Finally, by symmetry and the conditional variance formula once more, recalling ZHANG ET AL. that τ is the root of T we have Var (cid:34) |L| (cid:88) τ ∈L ˜ g τ (cid:35) ≥ Var (cid:34) E (cid:34) |L| (cid:88) τ ∈L ˜ g τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ g τ (cid:35)(cid:35) = Var (cid:104) θ log n ˜ g τ (cid:105) = θ n = n log θ = 1 n − ζ , (B.28)with ζ = 1+log θ = log (2 θ ) > by (B.22). Combining the latter with (B.23),(B.24), (B.25), (B.26), and (B.27) gives the result. Y ILIN Z HANG , K

ARL R OHE D EPARTMENT OF S TATISTICS U NIVERSITY OF W ISCONSIN M ADISON

NIVERSITY A VE M ADISON , WI 53706USAE-

MAIL : [email protected]@stat.wisc.edu S

EBASTIEN R OCH D EPARTMENT OF M ATHEMATICS U NIVERSITY OF W ISCONSIN -M ADISON

480 L

INCOLN D RIVE M ADISON , WI 53706USAE-