[PDF] Adaptive Randomization in Network Data

Abstract

Network data have appeared frequently in recent research. For example, in comparing the effects of different types of treatment, network models have been proposed to improve the quality of estimation and hypothesis testing. In this paper, we focus on efficiently estimating the average treatment effect using an adaptive randomization procedure in networks. We work on models of causal frameworks, for which the treatment outcome of a subject is affected by its own covariate as well as those of its neighbors. Moreover, we consider the case in which, when we assign treatments to the current subject, only the subnetwork of existing subjects is revealed. New randomized procedures are proposed to minimize the mean squared error of the estimated differences between treatment effects. In network data, it is usually difficult to obtain theoretical properties because the numbers of nodes and connections increase simultaneously. Under mild assumptions, our proposed procedure is closely related to a time-varying inhomogeneous Markov chain. We then use Lyapunov functions to derive the theoretical properties of the proposed procedures. The advantages of the proposed procedures are also demonstrated by extensive simulations and experiments on real network data.

Full PDF

aa r X i v : . [ s t a t . M E ] S e p ADAPTIVE RANDOMIZATION IN NETWORK DATA B Y Z HIXIN Z HOU , P ING L I AND F EIFANG H U Department of Management Sciences, City University of Hong Kong, [email protected] Baidu Research, USA, [email protected] Department of Statistics, George Washington University, [email protected]

Network data have appeared frequently in recent research. For example,in comparing the effects of different types of treatment, network models havebeen proposed to improve the quality of estimation and hypothesis testing.In this paper, we focus on efﬁciently estimating the average treatment effectusing an adaptive randomization procedure in networks. We work on modelsof causal frameworks, for which the treatment outcome of a subject is af-fected by its own covariate as well as those of its neighbors. Moreover, weconsider the case in which, when we assign treatments to the current sub-ject, only the subnetwork of existing subjects is revealed. New randomizedprocedures are proposed to minimize the mean squared error of the estimateddifferences between treatment effects. In network data, it is usually difﬁcult toobtain theoretical properties because the numbers of nodes and connectionsincrease simultaneously. Under mild assumptions, our proposed procedure isclosely related to a time-varying inhomogeneous Markov chain. We then useLyapunov functions to derive the theoretical properties of the proposed pro-cedures. The advantages of the proposed procedures are also demonstratedby extensive simulations and experiments on real network data.

1. Introduction.

Evaluation of the effects of different types of treatment is gaining sig-niﬁcant attention in social media development, online advertising and clinical testing. Theoutcome for each subject may depend not only on the treatment allocation, but also the sub-jects’ covariates and the connections between subjects. Random treatment assignment meth-ods often generate unbalanced prognostic factors. In the situation where the covariates arethe observed categorical or numerical variables in ﬁxed dimensions, sequential treatment as-signment is introduced in [30] to address the issue of unbalancedness. In [36], the authorgeneralizes the idea of sequential design by proposing a marginal urn model. Adaptive ran-domization methods are studied in [13, 14], and show promising performance in categoricalcovariate balance with theoretical guarantees. Pairwise sequential randomization is investi-gated in [31] to reduce the Mahalanobis distance of continuous variables.In the past decade, the presence of networks in social media, clinical tests and biolog-ical experiments has received attention from statisticians [35, 6, 7, 5]. In causal inferencestudies, the behavior of one individual may be correlated with the behaviors of other indi-viduals, namely peer effects or social interaction [21, 2, 9]. In online social media networks,the behavior of a given user may be similar to his or her friends, as they might share corre-lated factors. Hence, in causal inference and clinical studies, we assume that if two subjectsare connected in the network, then their hidden covariates affect each other’s outcomes. Tobe more precise, we consider network-correlated outcomes , where the network informs thecorrelations among potential outcomes because the potential outcomes of subject i depend Keywords and phrases:

Adaptive randomization design, Lamperti process, Time-varying Markov process,Lyapunov function, Network-correlated outcomes.The work of Zhixin Zhou was conducted while he was a Postdoctoral Researcher at Baidu Research - Bellevue.The work of Feifang Hu was conducted while he was a consulting researcher at Baidu Research. on both its own covariates and those of its neighbors in the network [22, 4]. Furthermore,we assume the potential outcome of a certain subject is not affected by the assignment oftreatments to other subjects [8]. That is, there is no interference between subjects [1]. In ad-dition, we consider another realistic assumption, which is similar to that proposed in [36]:we assume subjects appear singly and must be treated immediately. In other words, whenwe decide the treatment assigned to the current subject, only the connections between thissubject and the previous subjects are observed; we observe only the sub-adjacency matrixfor those subjects observed in the current stage. Rerandomization is proposed in [24] andgeneralized to network data by [4]; however, their approach requires the whole network tobe revealed before deciding the treatment of the ﬁrst subject. To resolve this issue, here, wegeneralize adaptive design methods [13, 14] to decide treatment allocation sequentially. It isworth noting that the adaptive design method has not previously been considered in networkmodels. Moreover, the performance analysis of the existing adaptive randomization methodcannot be applied to the model considered in this paper.Assuming the observations are network-correlated and sequentially obtained, this paperfocuses on improving the estimation of treatment effects by reducing the imbalance mea-surement . We still aim to reduce the effect of prognostic factors by the pairwise sequentialrandomization method proposed in [31]. Under the assumption of network-correlated out-comes, and supposing the network is observed sequentially, we ﬁrst derive the formula forvariance of treatment effects under certain statistical assumptions, then we show that our ap-proach reduces the imbalance measurement empirically and theoretically under some reason-able assumptions on the network. Despite the popularity of the model in [22, 4], no previouswork has analytically evaluated the variance of the estimator in this model with mathematicalveriﬁcation. To the best of our knowledge, this paper is the ﬁrst work to provide a theoreticalveriﬁcation for the performance of randomization procedures on models assuming network-correlated outcomes.In the literature, it is assumed that covariates are identically and independently distributed(i.i.d.), and the number of covariates is ﬁxed, hence turning the imbalance measurement of theadaptive randomization procedure into a Markov process. It is shown in [13] that the Markovprocess is recurrent when the covariates are categorical variables. To formulate a theoreticalanalysis of the proposed procedure of this paper, we assume the observed network followsthe Erd˝os-R´enyi random graph model. The analysis does not follow from previous workon adaptive design, in the following sense. As we observe a network with extra nodes, thenumber of possible neighbors of each individual increases simultaneously. Moreover, becausethe Erd˝os-R´enyi random graph is a probabilistic model for undirected graphs, the entries ofthe adjacency matrix are not independent. To overcome these difﬁculties, we analyze thisstochastic process as a Lamperti problem [16] and further derive the upper bound of theexpectation of imbalance measurement by computing certain Lyapunov functions [23]. In ourmodel, as more and more subjects join the experiments, the dimension of states changes overtime progresses. Thus, this process can be approximated as a time-varying Markov process.The generalization from ﬁxed dimension to increasing dimension is a novel extension inMarkov models.This article is organized as follows. We introduce the network-correlated outcome modeland our proposed procedure in Section 2. Theoretical properties under the Erd˝os-R´enyi ran-dom graph model are presented in Section 3. In Section 4, we discuss the theoretical proper-ties that arise when we replace the random graph model with a Gaussian orthogonal ensem-ble. Experiments on simulated and real network data are presented in Section 5. We concludein Section 6, where possible future works are also discussed. Proofs of the main theoremsand auxiliary lemmas appear in Section 7. DAPTIVE RANDOMIZATION IN NETWORK DATA Here, we brieﬂy introduce the notation used in this paper. X n is the set of vectors withentries belonging to X , where X can be any subset of real numbers. Similarly, X m × n is theset of m × n matrices with entries belonging to X . For A ∈ S m × n , A i ∗ ∈ X n is the i -th rowof matrix A . For vector a , k a k represents the ℓ -norm of vector a . a i : j = ( a i , a i +1 , . . . , a j ) for i < j . Similarly, for matrix A , A i : j,k : l is the submatrix formed by rows i, i + 1 , . . . , j andcolumns k, k + 1 , . . . , l . In particular, we write A ( i ) = A i, i as the upper-left submatrix.

2. Model Assumptions.

We focus on two treatment groups (treatment 0 and treatment1) assigned to a ﬁnite population of n subjects. Let T ∈ { , } n be the treatment assignmentvector. T i records the assignment of the i -th subject, that is, T i = 0 for treatment and T i = 1 for treatment . The relationship between nodes is recorded by an undirected network, orequivalently, a symmetric binary adjacency matrix A ∈ { , } n × n . We assume self loopsalways exist, i.e., A ii = 1 for i ∈ [ n ] . We recall that A i ∗ is the i -th row of adjacency matrix A . Given the treatment assignment T i , the observed outcome of the i -th subject follows thedistribution X i = µ (1 − T i ) + µ T i + A i ∗ Z + ε i where Z ∼ N (0 , σ Z I n ) and ε i ∼ N (0 , σ ε ) . (2.1)We assume ε i are i.i.d. for i ∈ [ n ] . The observation is the summation of three parts.1. µ (1 − T i ) + µ T i is the treatment effect, where µ and µ are the effect sizes of thecorresponding treatments. We note that the outcome has the expectation E [ X i ] = µ if T i = 0 , otherwise its expectation is µ .2. The outcome of the i -th observation is also affected by its unknown covariate Z i and thecovariates of its neighbors in the network. To be precise, let N i be the set of neighbors of i , and recall that A ii = 1 , then A i ∗ Z = Z i + P j : j ∈ N i Z j . We assume the covariates Z havezero mean, so the outcome can be positively or negatively inﬂuenced by the covariates.3. ε i is random noise in each observation. We also write ε := ( ε , . . . , ε n ) ⊤ , which followsthe distribution N (0 , σ ε I n ) .Following previous studies, to ensure the treatment groups are unbiased, we restrict T m − + T m = 1 . That is, ( T m − , T m ) is either (0 , or (1 , . For notational conve-nience, we assume the total number of subjects n is even. Hence we have an estimator of µ − µ , deﬁned as W := 2 n n X i =1 (1 − T i ) X i − T i X i = µ − µ + 2 n n X i =1 (1 − T i )( A i ∗ Z + ε i ) − T i ( A i ∗ Z + ε i )= µ − µ + 2 n ( n − T ) ⊤ ( AZ + ε ) . For a ﬁxed adjacency matrix A and an allocation vector T , it is not difﬁcult to check that theestimator is unbiased, as Z and ε have zero means: E [ W ] = µ − µ + 2 n ( n − T ) ⊤ E [ AZ + ε ] = µ − µ + 2 n ( n − T ) ⊤ A ( E [ Z ] + E [ ε ]) = µ − µ . We can also compute the variance of W :var [ W ] = 4 n var [( n − T ) ⊤ ( AZ + ε )] = 4 n k A ( n − T ) k σ Z + 4 n σ ε , where k · k denotes the ℓ -norm throughout this paper. We note that W is an unbiased estima-tion and the term σ ε /n converges to 0 as n → ∞ , so the best strategy in this experiment isto reduce the term k A ( n − T ) k by assigning an appropriate treatment to each pair of sub-jects. As the variance of estimator W decreases, the hypothesis testing on the effectiveness of the treatment becomes more powerful. We assume each pair of subjects joins the experimentsequentially, and we need to decide their treatment assignment soon after they join. In pair-wise sequential randomization [31, 20], we assign different treatments to each pair of subjectssimultaneously. In the m -th stage, we determine the treatment assignments to the (2 m − -thand the m -th subjects, which may depend on two factors. First, after the ﬁrst m subjectsjoin the experiment, we only observe the connection between these subjects, while all otherconnections are concealed. In other words, we observe the (upper-left) sub-adjacency matrix A (2 m ) := ( A ij ) ≤ i,j ≤ m . Second, when we determine the assignment to the (2 m − -th and m -th subjects, we have the record of the assignments to the ﬁrst (2 m − subjects, althoughwe cannot update them. Therefore, given the submatrix A (2 m ) and T , . . . T m − , we need todetermine T m − and T m to reduce the imbalance measurement , deﬁned as I m = k A (2 m ) ( m − T m ) k , (2.2)where m ∈ R m with all entries equal to 1 and T m consists of the ﬁrst m entries of T .To reduce the imbalance measurement, we propose the following procedure:1. The ﬁrst two subjects are randomly assigned to different treatments.2. Suppose m − patients have been assigned to treatments, we deﬁne the imbalance mea-surement when ( T m − , T m ) = (0 , I (0 , m = k A (2 m ) ( m − T ⊤ m − , , ⊤ ) k , and in the same manner, when ( T m − , T m ) = (1 , , we have I (1 , m = k A (2 m ) ( m − T ⊤ m − , , ⊤ ) k .

3. We decide ( T m − , T m ) according to the following probabilities, P (( T m − , T m ) = (0 ,  b, if I (0 , m < I (1 , m ;1 − b, if I (0 , m > I (1 , m ;0 . , otherwise.Here b ∈ (1 / , is a ﬁxed biasing probability.4. We repeat steps 2 and 3 until m ≥ n − . If m = n − , we arbitrarily assign a treatmentto subject n .The general idea of this procedure can be summarized as follows. In each stage, we con-sider two possible assignments to ( T m − , T m ) and compute which assignment minimizesthe imbalance measurement. In pairwise sequential randomization, the assignments are either ( T m − , T m ) = (0 , , or (1 , . We use the assignment that results in the smallest imbalancemeasurement with the biasing probability b ∈ (1 / , . It is clear that letting b = 1 would re-duce the expected imbalance measurement as far as possible, but we allow randomness inthe procedure for several practical reasons. We further discuss this biasing probability in Re-mark 1. Notably, the proposed procedure does not require any information on subjects joiningthe experiment in the future. To be more speciﬁc, the choice of treatment for subjects m − and m depends only on their connection with previous subjects and the current imbalancemeasurement. The procedure can be applied to the case when n is odd, as long as we assigna random treatment to the last subject. If b is a constant greater than / , the adaptive pro-cedure can signiﬁcantly reduce the imbalance measurement under mild assumptions on thenetwork. DAPTIVE RANDOMIZATION IN NETWORK DATA R EMARK

Suppose we let b = 1 in our proposed procedure, theneach pair of assignments in the procedure reduces the imbalance measurement as far as pos-sible, and treatment allocation is completely determined by the network. However, determin-istic treatment assignment is not desirable from the standpoint of (un)predictability and theprinciple of randomness [18], so an appropriate allocation probability ∈ (1 / , should beselected. The idea of biased coin design is introduced in [10] for balancing the total numberof different treatments. For the purpose of balancing prognostic factors between treatmentgroups, the authors of [12] suggest an allocation probability between 0.70 and 0.95 accord-ing to the sample size. In [32], the authors simulate the effects of allocation probability. Inthis paper, we assume that b can be any constant greater than 0.5 and no more than 1. R EMARK

Suppose the whole network is observed, thegoal of reducing the imbalance measurement I = k A ( − T ) k with unbiased treatmentgroups is equivalent to the following optimization problem: min x ∈{− , } , ⊤ x =0 k Ax k = min x ∈{− , } , ⊤ x =0 x ⊤ Hx, where H = A ⊤ A = A . It is not difﬁcult to observe that H ij counts the number of commonneighbors of node i and j in the adjacency matrix A . The constrained ⊤ x = 0 can beconverted to a penalty function: min x ∈{− , } x ⊤ Hx + λ ( ⊤ x ) = min x ∈{− , } x ⊤ ( H + λ ⊤ ) x. This formulation is summarized as an unconstrained binary programming problem (UBQP)in [15]. The authors of that survey also mention that the UBQP is an NP-hard problem,whose proof is provided in [27], except for some special cases with very strong assumptionson H [29, 3, 26]. H in these special cases is restricted to be an adjacency matrix with certainregularization conditions, so their results cannot apply to our case H = A . In the generalcase, heuristic methods such as the continuous approach [28, 25], tabu search algorithms[19, 34], and semi-deﬁnite relaxation [33] have been proposed for ﬁnding inexact but high-quality solutions. However, it is worth noting that the setting we consider here is very differentfrom a UBQP problem. We have to determine x i when only the upper-left i × i submatrix of A is observed.

3. Theoretical Properties of the Proposed Design.

In this section, we study the asymp-totic property of the imbalance measurement quantity of (2.2) under the following stochasticassumption on the symmetric adjacency matrix A . We assume for some p ∈ (0 , , A − I ∼ G ( n, p ) , where G ( n, p ) represents the Erd ˝os-R´enyi random graph model.(3.1)In other words, on the diagonal of A , we have determinant entries A ii = 1 for i ∈ [ n ] , and A ij = A ji ∼ Bernoulli ( p ) independently for ≤ i < j ≤ n. In the graph sense, the Erd˝os-R´enyi random graph model indicates that an edge between dis-tinct nodes exists with probability p [11]. Under this assumption on A , we aim to analyze theasymptotic behavior of the imbalance measurement I m = k A (2 m ) ( m − T m ) k deﬁnedin (2.2). Let us also deﬁne the state after the m -th iteration of the procedure: S m = A m ( m − T m ) so that I m = k S m k . For convenience of notation, we let I m +1 = I m for m ∈ N , (3.2) so the imbalance measurement I i can be deﬁned for all positive integers i . Suppose A werenot symmetric, i.e., a ij and a ji were i.i.d., then { S i } i ∈ N would be a time-varying Markovchain , where the randomness comes from entrywise Bernoulli distribution and random as-signments in step 3 of the procedure. In the symmetric case, we still approximately have thefollowing Markov property: P ( S i = x | S , . . . , S i − ) ≈ P ( S i = x | S i − ) We will show that the imbalance measurement I n is signiﬁcantly reduced compared withrandom design if we apply our proposed procedure.A random design indicates that we assign ( T m − , T m ) = (0 , or (1 , with probability / . In other words, we implement step 3 of our proposed design with b = 1 / . We denote theresulting assignments by the vector T random , then for ﬁxed p ∈ (0 , , we have the followingtheorem about random assignment.T HEOREM Suppose the n × n network follows the Erd˝os-R´enyi random graph modelin (3.1) with Bernoulli parameter p , then using random assignment, the imbalance measure-ment satisﬁes the following limit lim n →∞ E [ k A (1 − T random ) k ] n = p (1 − p ) . (3.3)The next theorem shows that our proposed design can signiﬁcantly reduce the imbalancemeasurement.T HEOREM Suppose the n × n network follows the Erd˝os-R´enyi random graph modelin (3.1) with Bernoulli parameter p , then using our proposed design, the imbalance measure-ment I n satisﬁes the following upper bound: lim sup n →∞ E [ I n ] n ≤ p (1 − p ) −

18 (2 b − − √ b − / p / (1 − p ) / . (3.4)R EMARK Theorem 2 provides the upper bound of the fourth moments of the imbalancemeasurement. Because E [ I n ] ≥ E [ I n ] , we immediately obtain an upper bound of the secondmoment E [ I n ] . For ﬁxed p ∈ (0 , and b ∈ (1 / , , lim sup n →∞ E [ I n ] n < p (1 − p ) . Hence the proposed procedure provides a strictly smaller imbalance measurement than ran-dom design in expectation. Suppose b = 1 / , i.e., b − , then the proposed method isidentical to random design. As a result, the second term of (3.4) vanishes. Meanwhile, sup-pose the network is very sparse, that is, p is very small, then the reduction of imbalancemeasurement by the proposed design is not very great, because p / is much smaller than p .

4. Discussion on the Gaussian Case.

In previous sections, we assumed the networkfollowed the Erd ˝os-R´enyi random graph model. As discussed in Remark 3, Theorem 2 hasnot shown that the reduction of imbalance measurement is asymptotically smaller than theimbalance measurement itself if p → . To discuss whether the reduction is rate optimal, weinvestigate a weighted adjacency matrix with Gaussian entries. Speciﬁcally, in Wigner matrix A , we have determinant entries A ii = 1 for i ∈ [ n ] , and A ij = A ji ∼ N (0 , σ ) independently for ≤ i < j ≤ n. DAPTIVE RANDOMIZATION IN NETWORK DATA In other words, we consider the Gaussian orthogonal ensemble (GOE) instead of the Erd ˝os-R´enyi random graph model. This assumption corresponds to the following scenario. We ob-serve a weighted network in which the weights can be either positive or negative. Underassumption (2.1), the observation X i is still well deﬁned. In this case, the unknown covariate Z j can affect the i -th observation X i positively or negatively, depending on the weight A ij .Under this assumption, the proposed procedure in Section 2 is still valid. If we adopt thedeﬁnition of imbalance measurement I n , we have the following asymptotic upper bound.T HEOREM Suppose the n × n weighted network follows the GOE with variance σ where σ depends on n . Assuming σ = O (1) and nσ → ∞ , then using our proposed design,the imbalance measurement I n satisﬁes the following upper bound: lim sup n →∞ E [ I n ] n σ ≤ −

14 (2 b − p /π (4 − p /π (2 b − / . In the Erd˝os-R´enyi random graph model, the entrywise variance of the adjacency matrixis p (1 − p ) . This quantity is comparable to σ in the GOE. When σ → , the reduction ofthe imbalance measurement is still signiﬁcantly large. This is a stronger result than that inthe Erd ˝os-R´enyi random graph model. An essential technical reason is the lower bound of E [ | x ⊤ Y | ] for ﬁxed subject vector x and centered random vector Y ∈ R m . If we assume only Y is a sub-Gaussian vector, then for general x , we obtain the best lower bound by Khinchin-Kahane inequality, see Lemma 1. If we further assume Y i ∼ N (0 , σ ) independently, then | x ⊤ Y | is a folded normal random variable and E [ | x ⊤ Y | ] = σ p n/π for all subject vectors x . It is still an open problem whether the term p / can be improved to p . An empiricalcomparison of these two cases can be found in the next section.

5. Experiments.

In this section, we empirically study the behavior of imbalance mea-surement in (2.2). The experiments demonstrate that our proposed algorithm improves theestimation of treatment effects for both simulated data and real network data. number of nodes s t anda r d de v i a t i on randomadaptive Fig 1: Left: the standard deviation of W in Section 2 for different n . Right: the histogram of W when µ = µ and σ ε = 1 Experiments on Simulated Network Data.

The plots in Figure 1 show the result ofthe Erd ˝os-R´enyi random graph model in (3.1). We ﬁx p = 0 . and simulate different sizesof random graphs. We consider random assignment and our proposed adaptive design al-gorithm with b = 0 . . On the left plot, the shaded region is the conﬁdence interval for 100 iterations. All other plots with shaded regions in this section have conﬁdence in-tervals with the same conﬁdence coefﬁcient. The plot of random assignment shows that theimbalance measurement concentrates around 0.8. This coincides with the theoretical limit p p (1 − p ) = 0 . suggested by Theorem 1. Applying the proposed algorithm, the imbalancemeasurement decreases to approximately 0.6. The right plot shows the bias of estimation of µ − µ when n = 100 . This experiment is repeated 20000 times. number of nodes s t anda r d de v i a t i on randomadaptive

250 500 750 1000 number of nodes s t anda r d de v i a t i on randomadaptive Fig 2: Left: the result when the graph is sparse. Right: the simulation on the Gaussian orthogonalensemble.

The left plot in Figure 2 considers the sparse Erd ˝os-R´enyi random graph model. For an n × n network, we consider the density regime log nn . In particular, we generate random networks G (cid:0) n, log n n (cid:1) . The shaded region is the interquartile range over 100 iterations. The imbalancemeasurement of random design monotonically decreases because its maximum expectationis p p (1 − p ) , which converges to 0 as the network becomes more sparse. The right plotin Figure 2 considers the GOE instead. The entrywise variance remains the same. In otherwords, for the network with n nodes, p n = log nn , the corresponding variance of the GOE is σ n = p n (1 − p n ) . The results of this simulation show that the imbalance measures have verysimilar asymptotic behavior for both models. number of nodes s t anda r d de v i a t i on randomadaptive number of nodes s t anda r d de v i a t i on randomadaptive number of nodes s t anda r d de v i a t i on randomadaptive Fig 3: Comparison of the Erd˝os-R´enyi random graph model with different density.

The plots in Figure 3 compare the performance of our proposed method on the Erd˝os-R´enyi random graph model. In the plots, we let p = 0 . , . , and . and plot the im-balance measurement on different sizes of random graphs. The result of this experiment isidentical to that shown in the left plot in Figure 1, but in different densities. DAPTIVE RANDOMIZATION IN NETWORK DATA number of nodes s t anda r d de v i a t i on randomadaptive number of nodes s t anda r d de v i a t i on randomadaptive number of nodes s t anda r d de v i a t i on randomadaptive Fig 4: Comparison of Gaussian orthogonal ensemble with different variance.

In Figure 4, we consider the GOE with different levels of variance. For the left, center, andright plots, we let σ = p (1 − p ) for p = 0 . , . , . respectively, so that the entrywisevariance is the same as that in the Erd ˝os-R´enyi random graph model. As we can see in theplots, the empirical performances are very similar for these two models. number of nodes s t anda r d de v i a t i on randomadaptive number of nodes s t anda r d de v i a t i on randomadaptive number of nodes s t anda r d de v i a t i on randomadaptive Fig 5: Comparison of stochastic block models with different density.

Figure 5 considers the stochastic block model. The subjects are randomly divided intotwo groups. In our setting, if two subjects belong to the same group, then the probability ofconnection between them is p , and the between-group probability is p . The plots consider p = 0 . , . , . and p = 0 . , . , . from left to right respectively. The overalldensity is the same as the previous experiment on the Erd ˝os-R´enyi random graph in Figure 3,and the empirical performances of the two experiments are very similar. However, on theleft plot, we can observe that the conﬁdence intervals are wider than those in Figure 3 andFigure 4.5.2. Experiments on Real Network Data.

We implement our proposed algorithm on 11real undirected network datasets from SNAP [17]. For each network, we randomly samplea subnetwork with 10000 nodes, then we apply both adaptive design and random designto the subnetwork. We compare the imbalance measurement in (2.2) in each dataset. Be-cause theoretical analysis shows that the network density plays an important role in imbal-ance measurement, the densities are recorded in the last column. In our model, we alwaysassume the existence of self loops. When we compute the density of the subnetwork, we con-sider only the connections between different nodes. For example, the density of the network com-youtube.ungraph is approximately 0.00076. In other words, the average degree ofthe graph is approximately 8.6, including self loops.The edges in these networks might have different meanings. We now explain how ourmodel and algorithm are applied on the network com-youtube.ungraph . The nodes ofthis network represent users on YouTube. Two nodes are connected in the network if they are T ABLE Comparison of adaptive random design applied to real network data from SNAP

Dataset Adaptive Random Reduction Density

Email-Enron . × − com-youtube.ungraph . × − HR_edges . × − HU_edges . × − RO_edges . × − CA-GrQc . × − CA-HepPh . × − CA-AstroPh . × − CA-CondMat . × − CA-HepTh . × − friends on YouTube. Under the assumption of a network-correlated outcome, friends sharecommon unknown factors that affect observations. To reduce the effect of a factor, we shouldpropose treatment allocation such that friends sharing the corresponding factor are dividedinto two treatment groups. Suppose we apply our proposed adaptive design on this networkwith b = 0 . (see Remark 1 for the choice of b ), then the imbalance measurement is reducedby .We also implement the proposed adaptive design algorithm in the other real networks. Theresults are presented in Table 1. We observe that the percentage of imbalance measurementreduction depends on the density of the network. For instance, the network of CA-HepTh hasa low density of . × − , hence, our method can reduce the imbalance measurementby only . According to Remark 3, there is no evidence that our proposed method canreduce the imbalance measurement signiﬁcantly if the network is very sparse. number of nodes s t anda r d de v i a t i on randomadaptive Fig 6: These plots repeat the experiments in Figure 1 on the real dataset from YouTube.

In Figure 6, we implement both random and adaptive designs on different sizes of thenetwork com-youtube.ungraph . To repeat the experiments, we keep the random graphsgenerated in the previous experiment. In this real dataset, to obtain the conﬁdence interval,we repeatedly sample subgraphs from the network and apply the proposed algorithm on eachsubgraph. The empirical results again show that our proposed method signiﬁcantly reducesthe imbalance measurement and improves the accuracy of estimation of treatment effects.

DAPTIVE RANDOMIZATION IN NETWORK DATA

6. Conclusion.

In this paper, we consider the problem of estimating treatment effectsunder the assumption that the outcomes are network-correlated. We propose an adaptive ran-domization procedure to reduce the variance of the estimation. The algorithm assigns dif-ferent treatments to each pair of subjects sequentially. The biased coin design enforces theassignments, with the result that a smaller imbalance measurement will be chosen with higherprobability. For theoretical analysis, we assume the network is generated by the Erd˝os-R´enyirandom graph model. As the number of subjects increases, the states of the Markov processhave different dimensions as time progresses. We provide a novel mathematical proof thatour adaptive randomization algorithm signiﬁcantly reduces the imbalance measurement. Ourempirical results also show that this proposed algorithm reduces the variance of the unbiasedestimator in both simulated and real data.The new procedure can still be generalized in several ways. To guarantee a balanced treat-ment allocation, we consider pairwise sequential design, which determines treatments to twosubjects simultaneously. Conventional adaptive design [13] is still applicable in network data.The empirical results can be expected to be similar to the proposed method, but the theoreti-cal analysis will be different, and this is an interesting topic for future work. In the methodsdescribed so far, uniform weights are assigned to subjects. If different weights were allowedfor different subjects, it might be possible to further reduce the variance of estimation. If asubject has a high degree in the network, its outcome is affected by many other subjects. As aresult, the outcome of this node has high variance. If we can reduce the weight of such nodes,the performance of the algorithm will be further improved. If we assume there is interferencebetween subjects, i.e., the outcome of a certain subject might be affected by the treatmentof its neighbors, then the analysis in this paper is no longer strictly applicable. However, thetools introduced in this paper could still powerfully reduce the variance under such an inter-ference assumption. As long as we can deﬁne the variance after each step sequentially, thenwe assign the desired assignment to the current subject with a probability greater than 0.5.We believe this procedure at least performs better than random assignment. We leave theseas future research topics.Last but not least, it is possible that the theoretical analysis in Theorem 2 can be furtherimproved. As mentioned in Section 3, the result in Theorem 3 allows σ → . If p → (sparse)in Theorem 2, the current analysis does not show that the proposed design still achievessigniﬁcant improvement. This could be a very interesting problem for further research. Ingeneral, it has proven difﬁcult for researchers to obtain theoretical results on the designs ofnetwork data, due to the complexity of the problem and the lack of technical tools. In thispaper, we introduce the technique of Lyapunov functions. This technique could provide afeasible way of studying the properties of general designs in network data.

7. Appendix: Proofs.

Proof of Theorem 1.

In this proof, we brieﬂy denote T := T random . We have k A (1 − T ) k = n X i =1 ( A i ∗ (1 − T )) and observe that the distributions of A i ∗ (1 − T random ) are identical for all i ∈ [ n ] . Withoutloss of generality, we consider i = 1 . By the deﬁnition of random design, E [1 − T ] = 0 .Furthermore, A and T are independent in the random design, so E [ A ∗ (1 − T )] = 0 . Hencewe have E [( A ∗ (1 − T )) ] = var [ A ∗ (1 − T )] . We recall that ( T m − , T m ) = (0 , or (1 , with equiprobability. By independence, wehave var [ A ∗ (1 − T )] = n/ X m =1 var [ A , m − (1 − T m − ) + A , m (1 − T m )] For m ∈ [[2 , n/ , the distributions of A , m − (1 − T m − ) + A , m (1 − T m ) are iden-tical. When m = 1 , we are in the special case that A = 1 . Hence, it sufﬁces to consider thecases when m = 1 and m = 2 . When m = 1 , we havevar [ A (1 − T ) + A (1 − T )] = E [(1 − A ) ] = 1 − p. When m = 2 , we havevar [ A (1 − T ) + A (1 − T )] = E [( A − A ) ] = 2 p (1 − p ) . Hence E [( A ∗ (1 − T )) ] = np (1 − p ) + (1 − p )(1 − p ) , and E [ k A (1 − T ) k ] = n p (1 − p ) + n (1 − p )(1 − p ) . Taking the limit, we have lim n →∞ E [ k A (1 − T random ) k ] n = p (1 − p ) as desired.7.2. Proof of Theorem 2. P ROOF . For i ∈ [2 m ] and j ∈ m , let us deﬁne Y ij = A i, j − A i, j − If i = 2 j − and i = 2 j , i.e., neither A i, j nor A i, j − is on the diagonal, we have Y ij =  − , with probability p (1 − p );0 , with probability p + (1 − p ) ;1 , with probability p (1 − p ) (7.1)We recall that A (2 m ) is the m × m submatrix of A , ˜ T m = m − T m , and deﬁne Y m = Y m,m +1 ∈ R m . In this section, we use the notations ˜ S m := S m = A (2 m ) ˜ T m , ˜ I m := I m = k A (2 m ) ˜ T m k . DAPTIVE RANDOMIZATION IN NETWORK DATA Now, as we deﬁne I m +1 = I m = ˜ I m in (3.2), it sufﬁces to show lim sup n →∞ E [ ˜ I n ]16 n ≤ p (1 − p ) −

18 (2 b − − √ b − / p / (1 − p ) / , (7.2)which is equivalent to (3.4). As the entries of Y m follow the distribution of (7.1) indepen-dently, we have E [ k Y m k ] = 2 m var [ Y ,m +1 ] = 2 m E [ Y ,m +1 ] = 4 mp (1 − p ) . We have E [ k Y m k ] = 4 mp (1 − p ) . We also deﬁne Z m +1 = m X i =1 A m +1 ,i ˜ T i and Z m +2 = m X i =1 A m +2 ,i ˜ T i . By deﬁnition, we have Z m +2 − Z m +1 = ˜ T ⊤ m Y m . As ( T i − , T i ) = (1 , or (0 , , we canwrite Z m +1 = m X j =1 ( A m +1 , j − A m +1 , j − ) ˜ T i = m X j =1 Y m +1 ,j ˜ T i . By symmetry of Y ij ˜ T i , we have that Z m +1 shares the same distribution as P mj =1 Y m +1 ,j .It is also clear that Z m +2 shares that same distribution. Hence we have E [ Z m +1 ] = E [ Z m +2 ] = 0 and E [ Z m +1 ] = E [ Z m +2 ] = E h(cid:16) m X j =1 Y m +1 ,j (cid:17) i = m X j =1 var [ Y m +1 ,j ] = m E [ Y m +1 , ] = 2 mp (1 − p ) . In the m + 1 step of our proposed procedure, we observe two new columns and new rowsof the adjacency matrix, which will change the imbalance measurement. The square of theimbalance measurement in the m + 1 step will be either U m = k ˜ S m + Y m k + ( Z m +1 − A m +1 , m +2 ) + ( Z m +2 + 1 − A m +1 , m +2 ) or V m = k ˜ S m − Y m k + ( Z m +1 + 1 − A m +1 , m +2 ) + ( Z m +2 − A m +1 , m +2 ) . Step 3 of the procedure indicates that our new design will pick the smaller one of the abovetwo with probability b > / , and choose the larger one otherwise. By the symmetry of thedistributions of Y m , one can observe that this these two terms have the same expectation, andby direct calculation, we obtain E [( Z m +1 − A m +1 , m +2 ) ] = E [( Z m +2 + 1 − A m +1 , m +2 ) ] = 2 mp (1 − p ) + 1 − p.

1. Upper bound of E [ ˜ I n ] . we We have the conditional expectation of ˜ I m +1 , E [ ˜ I m +1 | ˜ S m ] ≤ E [ k ˜ S m + Y m k + ( Z m +1 − A m +1 , m +2 ) + ( Z m +2 + 1 − A m +1 , m +2 ) | ˜ S m ]= E [ k ˜ S m k + 2 ˜ S ⊤ m Y m + k Y m k + ( Z m +2 + 1 − A m +1 , m +2 ) + ( Z m +2 + 1 − A m +1 , m +2 ) | ˜ S m ]= k ˜ S m k + 4 mp (1 − p ) + 2 mp (1 − p ) + 1 − p + 2 mp (1 − p ) + 1 − p = k ˜ S m k + 8 mp (1 − p ) + 2(1 − p ) . Therefore, E [ ˜ I m +1 − ˜ I m | ˜ I m ] = 8 mp (1 − p ) + 2(1 − p ) . Hence E [ ˜ I m +1 − ˜ I m ] = E [ E [ ˜ I m +1 − ˜ I m | ˜ I m ]] ≤ mp (1 − p ) + 2(1 − p ) . In the ﬁrst stage, the imbalance measurement E [ ˜ I ] = (1 − A ) + ( A − = 2(1 − p ) .Thus E [ ˜ I n ] ≤ n − X m =0 mp (1 − p ) + 2(1 − p ) = 4 n ( n − p (1 − p ) + 2 n (1 − p ) ≤ (cid:16) n p p (1 − p ) + r − p p (cid:17) . By Jensen’s inequality, E [ ˜ I n ] ≤ q E [ ˜ I n ] ≤ n p p (1 − p ) + r − p p .

2. Lower bound of E [ ˜ I n ] . With the upper bound of the ﬁrst moment, we can derive the lowerbound of the second moment. E [ ˜ I m +1 | ˜ S m ] = E [ B min( U m , V m ) + (1 − B ) max( U m , V m ) | ˜ S m ]= E [ U m | ˜ S m ] − (2 b − E [ | U m − V m || ˜ S m ]= k ˜ S m k + 8 mp (1 − p ) + 2(1 − p ) − (2 b − E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ] Now we aim to ﬁnd an upper bound of E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ] . By Jensen’s inequality, ( E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ]) ≤ E [( ˜ S ⊤ m Y m + (1 − A m +1 , m +2 )( Z m +1 − Z m +2 )) | ˜ S m ]= 4 E [( ˜ S ⊤ m Y m ) | ˜ S m ] + 8 E [(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) ˜ S ⊤ m Y m | ˜ S m ]+ 4 E [(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) | ˜ S m ] Now we will ﬁnd the condition expectation of these three terms. As the entries of Y m arei.i.d. with distribution (7.1) we have E [( ˜ S ⊤ m Y m ) | ˜ S m ] = k ˜ S m k E [ Y ] = 2 k ˜ S m k p (1 − p ) . For the second term, by the deﬁnition of Y m , Z m +1 and Z m +2 , we have Z m +2 − Z m +1 = Y ⊤ m ˜ T m . Hence E [(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) ˜ S ⊤ m Y m | ˜ S m ] = (1 − p ) E [ − Y ⊤ m ˜ T m ˜ S ⊤ m Y m | ˜ S m ] . As the distributions of the (2 i − -th and i -th rows are identical, we have P ( ˜ T i = − | ˜ S m ) = P ( ˜ T i = 1 | ˜ S m ) = 0 . . Hence for all i and m , T i and ˜ S m are independent. T i and ˜ S m dependonly on the submatrix A (2 m ) , so they are independent of Y m . Thus, E [ Y ⊤ m ˜ T m ˜ S ⊤ m Y m | ˜ S m ] = 0 , which implies the second term vanishes. Using Z m +2 − Z m +1 = ˜ T ⊤ m Y m again, we have E [(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) | ˜ S m ] = (1 − p ) E [( ˜ T ⊤ m Y m ) ]= (1 − p ) k ˜ T m k E [ Y ]= 4 mp (1 − p ) . DAPTIVE RANDOMIZATION IN NETWORK DATA Therefore, using the fact that √ x + y ≤ √ x + √ y for x, y ≥ , we have E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ] ≤ q k ˜ S m k p (1 − p ) + 2 mp (1 − p ) ≤ k ˜ S m k p p (1 − p ) + 4 √ mp (1 − p ) . As k ˜ S m k = ˜ I m , we have E [ ˜ I m +1 − ˜ I m ] = E [ E [ ˜ I m +1 − ˜ I m | ˜ S m ]] ≥ mp (1 − p ) + 2(1 − p ) − b − E [ ˜ I m ] p p (1 − p ) + 2 p mp (1 − p ))= 8 mp (1 − p ) + 2(1 − p ) − √ b − mp (1 − p ) − √ b − p (1 − p ) − √ mp (2 b − − p )= (8 − √ b − mp (1 − p ) − (4 √ mp (2 b −

1) + 2 − √ b − p )(1 − p ) . Recalling that E [ ˜ I ] = 2(1 − p ) , and using P n − m =1 √ m ≥ ( n − / , we have E [ ˜ I n ] = 2(1 − p ) + n − X m =1 (8 − √ b − mp (1 − p ) − (4 √ mp (2 b −

1) + 2 − √ b − p )(1 − p ) ≥ (4 − √ b − n ( n − p (1 − p ) −

83 (2 b − √ p (1 − p )( n − / − (2 − √ b − p )( n − − p ) .

3. Lower bound of E [ ˜ I n ] . Combining with Jensen’s inequality, we have E [ ˜ I n ] ≥ E [ ˜ I n ] / ≥ (4 − √ b − n p (1 − p ) + O ( n / ) √ p (1 − p )) / . Since ( x + y ) / ≥ x / + y / for x, y ≥ , we have E [ ˜ I n ] ≥ (4 − √ b − / n p / (1 − p ) / + O ( n / p / (1 − p ) / ) .

4. Upper bound of E [ ˜ I n ] . Now we are ready to establish the upper bound of the fourthmoment of ˜ I n . We have E [ ˜ I m +1 ] = E [ B max( U m , V m ) + (1 − B ) min( U m , V m )]= E [ U m ] − (2 b − E [ | U m − V m | ] . (7.3)where the ﬁrst term E [ U m ] = E [( k ˜ S m + Y m k + ( Z m +1 − A m +1 , m +2 ) + ( Z m +2 + 1 − A m +1 , m +2 ) ) ] . As E [ ˜ S m ] = 0 and ˜ S m is independent of Y m , Z m +1 and Z m +2 , all of the cross termscontaining S ⊤ m Y m have expectation 0. Using E [ ˜ S ⊤ m Y m ] = 0 , it is easy to check E [( ˜ S ⊤ m Y m ) ] = var ( ˜ S ⊤ m Y m ) = 2 p (1 − p ) E [ k ˜ S m k ] . By independence again, we have E [2 k ˜ S m k k Y m k ] = 2 E [ k ˜ S m k ] E [ k Y m k ] = 8 mp (1 − p ) E [ k ˜ S m k ] , and recalling that E [( Z m +1 − A m +1 , m +2 ) ] = E [( Z m +2 + 1 − A m +1 , m +2 ) ] =2 mp (1 − p ) + 1 − p , we have E [2 k ˜ S m k ( Z m +1 − A m +1 , m +2 ) ] = E [ k ˜ S m k ( Z m +2 + A m +2 , m +1 − ]= 2(2 mp + 1)(1 − p ) E [ k ˜ S m k ] . The other terms do not contain ˜ S m . We ﬁrst compute the fourth moments: E [ k Y m k ] = E h(cid:16) m X i =1 Y i,m +1 (cid:17) i = m X i =1 E [ Y i,m +1 ] + X ≤ i

The outline of the proof is very similar to that of Theorem 2. Weadopt the deﬁnitions of ˜ I m , ˜ S m , Y ij , Y m , Z m +1 and Z m +2 , U m and V m . As we replace theErd˝os-R´enyi random graph model by the GOE, we have A ij ∼ (0 , σ ) for ≤ i ≤ j ≤ n and A ij = A ji if ≤ i < j ≤ n . Then Y ij ∼ N (0 , σ ) , Z m +1 , Z m +2 ∼ N (0 , mσ ) . Hence k Y ij k ∼ σ χ m and Z m +1 , Z m +2 ∼ mσ χ . We need to replace the moments of these variables in the proof of Theorem 2.

1. Upper bound of E [ ˜ I n ] . The conditional expectation of ˜ I m +1 is bounded by E [ ˜ I m +1 | ˜ S m ] ≤ E [ k ˜ S m + Y m k + ( Z m +1 − A m +1 , m +2 ) + ( Z m +2 + 1 − A m +1 , m +2 ) | ˜ S m ]= E [ k ˜ S m k + 2 ˜ S ⊤ m Y m + k Y m k + ( Z m +2 + 1 − A m +1 , m +2 ) + ( Z m +2 + 1 − A m +1 , m +2 ) | ˜ S m ]= k ˜ S m k + 4 mσ + (2 m + 1) σ + (2 m + 1) σ + 2= k ˜ S m k + (8 m + 2) σ + 2 . Therefore, E [ ˜ I m +1 − ˜ I m | ˜ I m ] ≤ (8 m + 2) σ + 2 . Hence E [ ˜ I m +1 − ˜ I m ] = E [ E [ ˜ I m +1 − ˜ I m | ˜ I m ]] ≤ (8 m + 2) σ + 2 . In the ﬁrst stage, E [ ˜ I ] ≤ σ . Thus, E [ ˜ I n ] ≤ σ + n − X m =1 (8 m + 2) σ + 2 ≤ n σ + 2 n ≤ (cid:16) nσ + 12 σ (cid:17) . By Jensen’s inequality, E [ ˜ I n ] ≤ q E [ ˜ I n ] ≤ nσ + 12 σ . Lower bound of E [ ˜ I n ] . As we did in the proof of Theorem 2, to ﬁnd the lower bound of E [ ˜ I m +1 − ˜ I m ] , we need to ﬁnd the upper bound for ( E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ]) ≤ E [4( ˜ S ⊤ m Y m ) + 8(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) ˜ S ⊤ m Y m + 4(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) | ˜ S m ] . The second term has expectation 0 for the same reason as in the proof of Theorem 2. Since Y m ∼ N (0 , σ ˜I m ) , we have E [( ˜ S ⊤ m Y m ) | ˜ S m ] = 2 σ k ˜ S m k . We recall that Z m +1 , Z m +2 ∼N (0 , mσ ) , so E [(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) | ˜ S m ] = 4 mσ . Applying Z m +1 − Z m +2 = ⊤ m Y m , we have E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ]= 2 E [( ˜ S m + (1 − A m +1 , m +2 ) m ) ⊤ Y m | ˜ S m ]= 2 p /π E [ k ˜ S m + (1 − A m +1 , m +2 ) m k| ˜ S m ] ≥ p /π ( k ˜ S m k + p m (1 + σ )) σ. where in the last equality, we use if X ∼ N (0 , σ ) , then E [ | X | ] = σ p /π . Using the samedeﬁnition of U m and V m from the proof of Theorem 2, we have E [ ˜ I m +1 | ˜ S m ] = E [ B min( U m , V m ) + (1 − B ) max( U m , V m ) | ˜ S m ]= E [ U m | ˜ S m ] − (2 b − E [ | U m − V m || ˜ S m ]= k ˜ S m k + 8 mσ + 2 − (2 b − E [ | S ⊤ m Y m + 2(1 − A m +1 , m +2 )( Z m +1 − Z m +2 ) || ˜ S m ] ≥ k ˜ S m k + 8 mσ + 2 − b − p /π ( k ˜ S m k + p m (1 + σ )) σ ≥ k ˜ S m k + 8 mσ + 2 − b − p /π (2 mσ + p m (1 + σ ) σ + 0 . k ˜ S m k + (8 − b − p /π ) mσ + O ( √ mσ ) . Using P n − m =1 √ m ≥ ( n − / , we have E [ ˜ I n ] = E [ ˜ I ] + n − X i =1 E [ E [ ˜ I m +1 − ˜ I m | ˜ S m ]] ≥ σ + n X m =1 (8 − b − p /π ) mσ + O ( √ mσ ) ≥ (4 − p /π (2 b − n σ + O ( n / σ ) . DAPTIVE RANDOMIZATION IN NETWORK DATA Lower bound of E [ ˜ I n ] . By Jensen’s inequality and the fact that ( x + y ) / ≥ x / + y / ,we have E [ ˜ I n ] ≥ E [ ˜ I n ] / ≥ (4 − p /π (2 b − / n σ + O ( n / σ / ) . Upper bound of E [ ˜ I n ] . We have E [ ˜ I m +1 ] = E [ B max( U m , V m ) + (1 − B ) min( U m , V m )]= E [ U m ] − (2 b − E [ | U m − V m | ] . from (7.3). Using the same arguments in the proof of Theorem 2, we have E [ U m ] = E [ k ˜ S m k ] + 16 mσ E [ k ˜ S m k ] + O ( m σ ) , and E [ | U m − V m | ] = 4(2 b − E [ k ˜ S m k | ˜ S ⊤ m Y m | ] + O ( m σ ) . Using the expectation of a folded normal random variable again, we have E [ k ˜ S m k | ˜ S ⊤ m Y m | ] = E [ k ˜ S m k | ˜ S ⊤ m Y m | ]= E [ E [ k ˜ S m k | ˜ S ⊤ m Y m | ] | ˜ S m ]= p /πσ E [ ˜ I m ] . We apply the upper bound of E [ ˜ I m ] and lower bound of E [ ˜ I m ] , and have E [ ˜ I m +1 ] ≤ E [ ˜ I m ] + 64 m σ − b − p /π (4 − p /π (2 b − / m σ + O ( m / σ / + m σ ) . Therefore, we have E [ ˜ I n ] = E [ ˜ I ] + n − X m =1 E [ ˜ I m +1 − ˜ I m ] ≤ (16 − b − p /π (4 − p /π (2 b − / ) m σ + O ( m / σ / + m σ ) Assuming nσ → ∞ , we have O (cid:16) m / σ / + m σ m σ (cid:17) = O (( nσ ) − / + ( nσ ) − ) → . Hence lim sup n →∞ E [ ˜ I n ] m σ ≤ (16 − b − p /π (4 − p /π (2 b − / ) , as desired.7.4. Auxiliary Lemmas. L EMMA

For i ∈ [ n ] , let Y i = − , , with probability p i , − p i , p i identically and independently distributed for p i ∈ (0 , / . Then we have min i ∈ [ n ] p i ≤ inf x ∈ S n − E [ | x ⊤ Y | ] ≤ sup x ∈ S n − E [ | x ⊤ Y | ] ≤ max i ∈ [ n ] p p i , where S n − = { x ∈ R n : k x k = 1 } . P ROOF . Upper bound.

For k x k = 1 , we have E [ | x ⊤ Y | ] ≤ E [ | x ⊤ Y | ] = n X i =1 x i E [ | Y i | ] = k x k max i ∈ [ n ] E [ Y i ] = max i ∈ [ n ] E [ Y i ] . Hence E [ | x ⊤ Y | ] ≤ max i ∈ [ n ] p E [ Y i ] = max i ∈ [ n ] √ p i . Lower Bound.

Let us deﬁne S n − = { x ∈ S n − : ∀ i ∈ [ n ] , x i ≥ } . By symmetry of Y i ,if sufﬁces to consider the inﬁmum for x over S n − to avoid loss of generality. We claimthat if j = arg min i ∈ [ n ] p i , then the minimum is achieved at x = e j . We will prove this byinduction. The claim is clearly correct when n = 1 . Now let us consider the case n + 1 , giventhe statement is true for n . In other words, it sufﬁces to show that inf x ∈ S n E [ | x ⊤ Y | ] = inf x ∈ S n − inf θ ∈ [0 ,π/ E [ | x ⊤ Y cos θ + Y n +1 sin θ | ] ≥ min i ∈ [ n +1] p i . given inf x ∈ S n − E [ | x ⊤ Y | ] ≥ min i ∈ [ n ] p i . We note that for x ∈ S n − , k ( x cos θ, sin θ ) k = k x k cos θ + sin θ = 1 . We have E [ | x ⊤ Y cos θ + Y n +1 sin θ | ] = X y = − E [ | x ⊤ Y cos θ + Y n +1 sin θ || Y n +1 = y ] P ( Y n +1 = y )=(1 − p n +1 ) cos θ E [ | x ⊤ Y | ] + p n +1 E [ | x ⊤ Y cos θ + sin θ | ]+ p n +1 E [ | x ⊤ Y cos θ − sin θ | ] . By symmetry of x ⊤ Y , we have E [ | x ⊤ Y cos θ − sin θ | ] = E [ | − x ⊤ Y cos θ − sin θ | ] = E [ | x ⊤ Y cos θ + sin θ | ] . By Lemma 2, we have E [ | x ⊤ Y cos θ + sin θ | ] = E [max {| x ⊤ Y | cos θ, sin θ } ] ≥ max { E [ | x ⊤ Y | ] cos θ, sin θ } . Therefore, E [ | x ⊤ Y cos θ + Y n +1 sin θ | ] = (1 − p n +1 ) E [ | x ⊤ Y | ] cos θ + 2 p n +1 E [ | x ⊤ Y cos θ + sin θ | ] ≥ (1 − p n +1 ) E [ | x ⊤ Y | ] cos θ + 2 p n +1 max { E [ | x ⊤ Y ] cos θ, sin θ } = max { E [ | x ⊤ Y | ] cos θ, (1 − p n +1 ) E [ | x ⊤ Y | ] cos θ + 2 p n +1 sin θ } . The last line is a concave function corresponding to the variable θ ∈ [0 , π/ . For every x ∈ S n − , it achieves the minimum when either θ = 0 or θ = 1 . Thus for every x ∈ S n and θ ∈ [0 , π/ , E [ | x ⊤ Y cos θ + Y n +1 sin θ | ] ≥ min { max { E [ | x ⊤ Y | ] , (1 − p n +1 ) E [ | x ⊤ Y | ] } , max { , p n +1 }} = min { E [ | x ⊤ Y | ] , p n +1 } . By the inductive assumption, E [ | x ⊤ Y | ] ≥ min i ∈ [ n ] p i , so inf x ∈ S n E [ | x ⊤ Y | ] = inf x ∈ S n − inf θ ∈ [0 ,π/ E [ | x ⊤ Y cos θ + Y n +1 sin θ | ] ≥ min i ∈ [ n +1] p i , which ﬁnishes the proof. DAPTIVE RANDOMIZATION IN NETWORK DATA L EMMA Suppose Y is a symmetric random variable and x ≥ is ﬁxed, then E [ | Y + x | ] ≥ max { E [ | Y | ] , x } . P ROOF . We assume Y is discrete. Other cases simply follow from the arguments below. E [ | Y + x | ] = X y | y + x | P ( Y = y ) = x P ( Y = 0) + X y> ( y + x + | − y + x | ) P ( Y = y ) ≥ x P ( Y = 0) + X y> ( y + x + y − x ) P ( Y = y ) ≥ X y> y P ( Y = y ) = E [ | Y | ] . Additionally, E [ | Y + x | ] = x P ( Y = 0) + X y> ( y + x + | − y + x | ) P ( Y = y ) ≥ x P ( Y = 0) + X y> ( y + x − x + y ) P ( Y = y ) = x. The proof is complete. REFERENCES [1] A

RAL , S. (2016).

Networked experiments . Oxford, UK: Oxford University Press.[2] A

RAL , S. and W

ALKER , D. (2011). Creating social contagion through viral product design: A randomizedtrial of peer inﬂuence in networks.

Management science ARAHONA , F. (1986). A solvable case of quadratic 0–1 programming.

Discrete Applied Mathematics ASSE , G. W. and A

IROLDI , E. M. (2018). Model-assisted design of experiments in the presence ofnetwork-correlated outcomes.

Biometrika

ICKEL , P. J. and C

HEN , A. (2009). A nonparametric view of network models and Newman–Girvan andother modularities.

Proceedings of the National Academy of Sciences

ORGATTI , S. P., M

EHRA , A., B

RASS , D. J. and L

ABIANCA , G. (2009). Network analysis in the socialsciences. science

ARRINGTON , P. J., S

COTT , J. and W

ASSERMAN , S. (2005).

Models and methods in social network anal-ysis . Cambridge university press.[8] C OX , D. R. and C OX , D. R. (1958). Planning of experiments . Wiley New York.[9] E CKLES , D., K

ARRER , B. and U

GANDER , J. (2017). Design and analysis of experiments in networks:Reducing bias from interference.

Journal of Causal Inference .[10] E FRON , B. (1971). Forcing a sequential experiment to be balanced.

Biometrika RD ˝ OS , P. and R ´ ENYI , A. (1960). On the evolution of random graphs.

Publ. Math. Inst. Hung. Acad. Sci AGINO , A., H

AMADA , C., Y

OSHIMURA , I., O

HASHI , Y., S

AKAMOTO , J. and N

AKAZATO , H. (2004).Statistical comparison of random allocation methods in cancer clinical trials.

Controlled Clinical Trials U , Y. and H U , F. (2012). Asymptotic properties of covariate-adaptive randomization. The Annals of Statis-tics U , Y. and H U , F. (2012). Balancing treatment allocation over continuous covariates: a new imbalancemeasure for minimization. Journal of Probability and Statistics .[15] K

OCHENBERGER , G., H AO , J.-K., G LOVER , F., L

EWIS , M., L ¨ U , Z., W ANG , H. and W

ANG , Y. (2014).The unconstrained binary quadratic programming problem: a survey.

Journal of Combinatorial Optimiza-tion AMPERTI , J. (1960). Criteria for the recurrence or transience of stochastic process. I.

Journal of Mathe-matical Analysis and applications ESKOVEC , J. and K

REVL , A. (2014). SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data .[18] L

EWIS , J. A. (1999). Statistical principles for clinical trials (ICH E9): an introductory note on an interna-tional guideline.

Statistics in medicine [19] L ¨ U , Z., H AO , J.-K. and G LOVER , F. (2011). Neighborhood analysis: a case study on curriculum-basedcourse timetabling.

Journal of Heuristics A , W., Q IN , Y., L I , Y. and H U , F. (2019). Statistical Inference for Covariate-Adaptive RandomizationProcedures. Journal of the American Statistical Association

ANSKI , C. F. (2000). Economic analysis of social interactions.

Journal of economic perspectives ANSKI , C. F. (2013). Identiﬁcation of treatment response with social interactions.

The EconometricsJournal S1–S23.[23] M

ENSHIKOV , M., P

OPOV , S. and W

ADE , A. (2016).

Non-homogeneous Random Walks: Lyapunov FunctionMethods for Near-Critical Stochastic Systems . Cambridge University Press.[24] M

ORGAN , K. L., R

UBIN , D. B. et al. (2012). Rerandomization to improve covariate balance in experi-ments.

The Annals of Statistics AN , S., T AN , T. and J IANG , Y. (2008). A global continuation algorithm for solving binary quadraticprogramming problems.

Computational Optimization and Applications ARDALOS , P. M. and J HA , S. (1991). Graph separation techniques for quadratic zero-one programming. Computers & Mathematics with Applications ARDALOS , P. M. and J HA , S. (1992). Complexity of uniqueness and local search in quadratic 0–1 pro-gramming. Operations research letters ARDALOS , P. M., P

ROKOPYEV , O. A. and B

USYGIN , S. (2006). Continuous approaches for solvingdiscrete optimization problems. In

Handbook on modelling for discrete optimization

ICARD , J.-C. (1976). Maximal closure of a graph and applications to combinatorial problems.

Manage-ment science OCOCK , S. J. and S

IMON , R. (1975). Sequential treatment assignment with balancing for prognosticfactors in the controlled clinical trial.

Biometrics IN , Y., L I , Y., M A , W. and H U , F. (2016). Pairwise sequential randomization and its properties. arXivpreprint arXiv:1611.02802 .[32] T OORAWA , R., A

DENA , M., D

ONOVAN , M., J

ONES , S. and C

ONLON , J. (2009). Use of simulation tocompare the performance of minimization with stratiﬁed blocked randomization.

Pharmaceutical Statistics:The Journal of Applied Statistics in the Pharmaceutical Industry ANG , P., S

HEN , C. and V AN D EN H ENGEL , A. (2013). A fast semideﬁnite approach to solving binaryquadratic problems. In

Proceedings of the IEEE conference on computer vision and pattern recognition

ANG , Y., L ¨ U , Z., G LOVER , F. and H AO , J.-K. (2013). Probabilistic GRASP-tabu search algorithms forthe UBQP problem. Computers & Operations Research ASSERMAN , S., F

AUST , K. et al. (1994).

Social network analysis: Methods and applications . Cam-bridge university press.[36] W EI , L. (1978). An application of an urn model to the design of sequential controlled clinical trials. Journalof the American Statistical Association73