Design and Analysis of Switchback Experiments
DDesign and Analysis of Switchback Experiments
Iavor Bojinov
Technology and Operations Management Unit, Harvard Business School, Boston, MA 02163 [email protected]
David Simchi-Levi
Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, and Operations ResearchCenter, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected]
Jinglong Zhao
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected]
In switchback experiments, a firm sequentially exposes an experimental unit to a random treatment, mea-sures its response, and repeats the procedure for several periods to determine which treatment leads to thebest outcome. Although practitioners have widely adopted this experimental design technique, the develop-ment of its theoretical properties and the derivation of optimal design procedures have been, to the bestof our knowledge, elusive. In this paper, we address these limitations by establishing the necessary resultsto ensure that practitioners can apply this powerful class of experiments with minimal assumptions. Ourmain result is the derivation of the optimal design of switchback experiments under a range of differentassumptions on the order of carryover effect that is, the length of time a treatment persists in impactingthe outcome. We cast the experimental design problem as a minimax discrete robust optimization problem,identify the worst-case adversarial strategy, establish structural results for the optimal design, and finallysolve the problem via a continuous relaxation. For the optimal design, we derive two approaches for per-forming inference after running the experiment. The first provides exact randomization based p -values andthe second uses a finite population central limit theorem to conduct conservative hypothesis tests and buildconfidence intervals. We further provide theoretical results for our inferential procedures when the order ofcarryover effect is misspecified. For firms that possess the capability to run multiple switchback experiments,we also provide a data-driven strategy to identify the likely order of carryover effect. To study the empir-ical properties of our results, we conduct extensive simulations. We conclude the paper by providing somepractical suggestions.
1. Introduction
Academic scholars have appreciated the benefits that experimentation brings to firms for manydecades (March 1991, Sitkin 1992, Sarasvathy 2001, Thomke 2001, Kohavi and Thomke 2017).However, widespread adoption of the practice has only taken off in the last decade, partly fueledby the rapid cost reductions achieved by firms in the technology sector (Kohavi et al. 2007, 2009,Azevedo et al. 2019, Kohavi et al. 2020). Most large firms now possess internal tools for experi- a r X i v : . [ s t a t . M E ] A ug mentation, and a growing number of smaller and more conventional companies are purchasing thecapabilities from third-party sellers that offer full-stack integration (Thomke 2020). These toolstypically allow simple “A/B” tests that compare the standard offering “A” to a new or improvedversion “B”. The comparisons are made across a range of different business outcomes, and thetests are usually conducted for at least a week. This simple practice has provided tremendous valueto firms (Koning et al. 2019). Some firms and authors, however, have recognized the limitationsof these simple A/B tests (Bojinov et al. 2020b). Principle amongst these is adequately handlinginterference (the scenario where the assignment of one subject impacts another) or estimatingheterogeneous (or personalized) effects .In this paper, we simultaneously tackle both of these challenges by developing a theoreticalframework for the optimal design and analysis of switchback (or time series) experiments. Inswitchback experiments, we sequentially expose a unit to a random treatment, measure its response,and repeat the procedure for a fixed period of time (Robins 1986, Bojinov and Shephard 2019). Byadministering alternate treatments to the same unit, we can directly estimate an individual levelcausal effect and alleviate the challenges posed by interference.There are two classes of applications were switchback experiments are widely used in practice.The first arises when units interfere with each other either through a network or some morecomplicated unknown structure. For example, consider a ride-hailing platform that wants to testa new fare pricing algorithm’s effectiveness in a large city (Farronato et al. 2018). Administeringthe test version to a subset of drivers can impact their behavior, which, in turn, could change thebehavior of drivers that are receiving the old version. Directly comparing the revenue generatedby the drivers across the two groups will likely provide a biased estimate of what would happenif everyone were assigned to the new version compared to the old. Instead, practitioners considerthe city a single aggregated unit and use a switchback experiment to estimate the intervention’seffectiveness, thereby alleviating the problem caused by interference. A similar issue often arises inmarketing when, for example, a retailer wants to test the effectiveness of a new promotion planningalgorithm (Ferreira et al. 2016). Administering the new version to a subset of stock keeping units(SKU’s) cannibalizes the sales from the other SKU’s. Again comparing the generated revenueacross the two groups is unlikely to provide an accurate measure of the promotion’s effectiveness.Instead, practitioners can treat all the SKU’s as a single aggregated unit and use a switchbackexperiment to obtain accurate estimates of the promotion’s effectiveness. The second application Many online platforms and retail marketplaces have observed different levels of interference when the assignmentof one subject impacts another. See Kastelman and Ramesh (2018), Farronato et al. (2018), Glynn et al. (2020) foronline platforms ( e.g. , DoorDash, Lyft, Uber), and Caro and Gallien (2012), Ferreira et al. (2016), Ma et al. (2020)for retail markets ( e.g.,
AB InBev, Rue la la, Zara). See Nie et al. (2018), Deshpande et al. (2018), Hadad et al. (2019) for estimating heterogeneous effects. arises when we have a limited number of experimental units, and we believe the effects are likelyto be heterogeneous. For example, Bojinov and Shephard (2019) used switchback experimentsto make causal claims about the relative effectiveness of algorithms compared with humans atexecuting large financial trades across a range of financial markets. More generally, psychologistsand biostatisticians regularly use switchback experiments whenever studying the effectiveness ofan intervention on a single unit ( e.g. , Lillie et al. (2011) and Boruvka et al. (2018)).There are three significant challenges to using switchback experiments. The first is that causalestimators from switchback experiments have large variances as the precision is a function of thetotal number of assignments. The second is that past interventions are likely to impact futureoutcomes; this is often referred to as a carryover effect. Typically, many authors assume that thereare no carryover effects, e.g. , Chamberlain (1982), Athey and Imbens (2018), and Imai and Kim(2019), although some recent work has relaxed this assumption (Sobel 2012, Bojinov et al. 2020a).The third is that standard super population inference—where we either assume a model for theoutcome . or that the units are sampled from an infinitely large population—requires unrealisticassumptions that fail to capture the problem’s personalized nature (Bojinov and Shephard 2019).This paper’s main contributions are to address these three challenges and present a frameworkthat allows firms and researchers to run reliable switchback experiments. First, we derive optimaldesigns for switchback experiments, ensuring that we can select a design that leads to the lowestvariance among the most popular class assignment mechanisms. Second, we assume the presenceof a carryover effect and show that our estimation and inference are valid both when the orderof carryover effect is correctly specified and misspecified, the later leading to a minor increasein the variance. For practitioners and managers, we also propose a method to identify the orderof carryover effect by running a series of carefully designed switchback experiments. Finally, wetake a purely design-based perspective on uncertainty; that is, we treat the outcomes as unknownbut fixed (or equivalently, we condition on the set of potential outcomes) and assume that theassignment mechanism is the only source of randomness (Abadie et al. 2020). The main benefit ofa design-based perspective is that the inference, and in turn the causal conclusions, do not dependon our ability to correctly specify a model describing the phenomena we are studying, ensuringthat our findings are wholly non-parametric and robust to model misspecification (Imbens andRubin 2015, Chapter 5).The paper is structured as follows. In Section 2 we define the notations, the assumptions, and theassignment mechanism that we focus on, which is referred to as the regular switchback experiments . For example, Wager and Xu (2019), Johari et al. (2020) assume that the market they are experimenting on is in anequilibrium state, Glynn et al. (2020) assumes an underlying Markovian model for the outcomes, and Athey et al.(2018), Eckles et al. (2016), Sussman and Airoldi (2017), Puelz et al. (2019) make assumptions on the structure ofthe interference.
In Section 3, we discuss how to design an effective regular switchback experiment under the minimaxrule. We cast the design problem as a minimax robust optimization problem. We identify theworst-case adversarial strategy, establish structural results, and then explicitly find the optimaldesign. In Section 4, we discuss how to perform inference and conduct statistical testing basedon the results obtained from an optimally designed switchback experiments. We propose an exacttest for sharp null hypotheses, and an asymptotic test for testing the average treatment effect.We provide p -values and construct confidence intervals based on such two hypotheses tests. InSection 5, we discuss cases when carryover effects are misspecified. We show that our estimationand inference still remain valid, with only a little more variance. In Section 6, we run simulations totest the correctness and effectiveness of our proposed experiments under various simulation setups.In Section 7, we discuss how to conduct hypothesis testing to identify the true order of carryovereffects. We give empirical illustrations on how to conduct a switchback experiment in practice. Allthe proofs can be found in the Appendix.
2. Notations, Assumptions, and Regular Switchback Experiments
We focus our discussion on a single experimental unit. For example, this unit could be a ride-hailingplatform testing the effectiveness of a new fare pricing algorithm in a large city, or a retailer testingthe effectiveness of a new promotion planning algorithm over all its SKU’s. At each time point t ∈ [ T ] = { , , ..., T } , we assign the unit to receive an intervention W t ∈ { , } . For example, oneexperimental period could be several minutes to one hour for a ride-hailing platform or one to twodays for a retailer; the intervention could be a new pricing or promotion planning algorithm. Insome applications, external factors determine the time horizon, T ; however, when T is not fixed,Section 7.2, provides details for how the manager can determine an appropriate T .Following convention, we say that the unit is assigned to treatment if W t = 1 and control when W t = 0; in A/B testing terminology, “A” is control and “B” is treatment. The assignment path isthen the collection of assignments and is denoted using a vector notation whose dimensions arespecified in the subscript, W T = ( W , W , ..., W T ) ∈ { , } T . We adopt the convention that W t stands for a random assignment path, while w t stands for one realization of the random assignmentpath. Though we focus on binary assignments, our results easily extend to more complex settings.After administering the assigned intervention, we observed a corresponding outcome. For exam-ple, this could be total traffic or total revenue generated during each experimental period. Let { t : t (cid:48) } = { t, t + 1 , ..., t (cid:48) } . Following the extended potential outcomes framework, at time t ∈ [ T ], we positthat for each possible assignment path w T there exists a corresponding potential outcome denotedby Y t ( w T ). The set of all potential outcomes will then be written as Y = { Y i ( w T ) } t ∈ [ T ] , w T ∈{ , } T . Figure 1 Illustrator of assignment paths and potential outcomes when T = 4 . The green path stands for oneassignment path w = (1 , , , . The two red dots stand for two potential outcomes that are equal in Example 3. Example 1.
When T = 4, there are 16 assignment paths as shown in Figure 1. Associated witheach assignment path w are potential outcomes Y ( w ) , Y ( w ) , Y ( w ) , Y ( w ). (cid:3) Throughout this paper, we do not directly model the potential outcomes or impose a parametricrelationship with the assignment path; instead, we treat them as unknown but fixed quantities,or, equivalently, we will implicitly condition on Y (Imbens and Rubin 2015, Chapter 5). Thebenefit of this approach is that we can be completely agnostic to the outcome process, allowingus to make nonparametric causal claims. To make inference possible, we rely on the variationintroduced by the random assignment path; this is commonly referred to as finite-sample or design-based perspective. Unlike traditional sampling-based inference, our approach does not require ahypothetical population from which we sampled our experimental units (Abadie et al. 2020).Since the potential outcomes are fixed but unknown, we can assume that their absolute values arebounded from above (Robins et al. 1999, Bai 2019, Li et al. 2020). Assumption 1 is almost alwayssatisfied, since it only assumes that the potential outcomes are bounded by the same constant B , e.g. , the total traffic or revenue generated from each experimental period is some finite amount. Assumption 1 (Bounded Potential Outcomes) . Assume that the potential outcomes arebounded by some constant, i.e., ∃ B > , s.t. ∀ t ∈ [ T ] , ∀ w ∈ { , } T , | Y t ( w ) | ≤ B and denote Y ∈ [ − B, B ] T := Y . In particular, knowledge about the magnitude of B is not required. We further make the following two assumptions that limit the dependence of the potentialoutcomes on assignment paths.
Assumption 2 (Non-anticipating Potential Outcomes) . Assume for any t ∈ [ T ] , w t ∈{ , } T , and for any w (cid:48) t +1: T , w (cid:48)(cid:48) t +1: T ∈ { , } T − t , Y t ( w t , w (cid:48) t +1: T ) = Y t ( w t , w (cid:48)(cid:48) t +1: T ) . Assumption 2 states that the potential outcomes at time t do not depend on future treatments(Bojinov and Shephard 2019, Basse et al. 2019, Rambachan and Shephard 2019). Since we controlthe assignment mechanism, the design ensures that this assumption is satisfied. Example 2 (Example 1 Continued).
Under Assumption 2, Y (1 , , ,
1) = Y (1 , , , Y (1 , ,
1) to stand for both Y (1 , , ,
1) and Y (1 , , , (cid:3) Assumption 3 (No m -Carryover Effects) . Assume there exists a fixed and given m , suchthat for any t ∈ { m + 1 , m + 2 , ..., T } , w t − m : T ∈ { , } T − t + m +1 , and for any w (cid:48) t − m − , w (cid:48)(cid:48) t − m − ∈{ , } t − m − , Y t ( w (cid:48) t − m − , w t − m : T ) = Y t ( w (cid:48)(cid:48) t − m − , w t − m : T ) . Assumption 3 restricts the order of carryover effect (Laird et al. 1992, Senn and Lambrou 1998,Bojinov and Shephard 2019, Basse et al. 2019). In many applications, Assumption 3 is satisfied;however, practitioners must rely on their domain knowledge to choose an appropriate m . Forexample, surge pricing on a ride-hailing platform is typically known not to carry over for morethan 1 ∼ m will not invalidatethe subsequent inference, but will lead to a little increase in variance. See Section 5 and 6.5 fordiscussions. Moreover, we can always correctly identify m with a little more experimental budget.See Section 7.1 for discussions.Under both Assumptions 2 and 3, we simplify notations as follows. For any t ∈ { m + 1 , ..., T } , w t − m : t ∈ { , } m +1 , and for any w (cid:48) t − m − , w (cid:48)(cid:48) t − m − ∈ { , } t − m − , any w (cid:48) t +1: T , w (cid:48)(cid:48) t +1: T ∈ { , } T − t , Y t ( w (cid:48) t − m − , w t − m : t , w (cid:48) t +1: T ) = Y t ( w (cid:48)(cid:48) t − m − , w t − m : t , w (cid:48)(cid:48) t +1: T ) . In the remainder of this paper, we will write Y t ( w t − m − , w t − m : t , w t +1: T ) = Y t ( w t − m : t ). Example 3 (Example 2 Continued).
Suppose m = 1. Under Assumptions 2 and 3, thepotential outcomes at the 2 red dots in Figure 1 are equal, i.e., Y (1 , ,
1) = Y (0 , , (cid:3) In the potential outcomes approach to causal inference, any comparison of potential outcomeshas a causal interpretation. For any p ∈ N , let p +1 = (1 , , ...,
1) be a vector of p + 1 ones; let p +1 = (0 , , ...,
0) be a vector of p + 1 zeros. In this paper, we focus on the average lag- p causaleffect of consecutive treatments on the outcome, defined for any p ∈ [ T − τ p ( Y ) = 1 T − p T (cid:88) t = p +1 [ Y t ( p +1 ) − Y t ( p +1 )] . (1)This estimand captures the effects of permanently deploying a new policy , and has been widelystudied in the longitudinal experiments since the early work of Robins (1986).Although we focus on an average causal effect, all of our results and analysis trivially extend tothe total causal effect, which does not divide the sum by the number of estimands, i.e., ( T − p ) τ p ( Y ).The optimal design as we will show in Section 3 will not be changed. We mainly focus on the p = m case in this section. We refer to Section 5 when m is misspecified, and Section 7.1 to identify m .The challenge of causal inference on switchback experiments is that we only observe one assign-ment path. So in each period t , we observe at most either Y t ( p +1 ) or Y t ( p +1 ) (and sometimesneither). To link the observed and potential outcomes, we assume there is only one version ofthe treatment, and there is no non-compliance. Let w obs T be the realized assignment path. Let Y obs t = Y t ( w obs T ) be the observed outcome at time t , under the realized assignment path w obs T . It is the manager’s decision to decide the design of switchback experiments. In this paper, wenarrow our scope to the family of regular switchback experiments. This family of experiments areparameterized by T = { t = 1 < t < t < ... < t K } ⊆ [ T ] , where K ∈ N belongs to the set of all positive integers, and T contains a total of K + 1 integers.For the ease of notations also let t K +1 = T + 1. Definition 1 (Regular Switchback Experiments).
For any K ∈ N and any T = { t = 1 Figure 2 Two designs. The blue lines stand for the possible treatment assignments that a design couldadminister. Left: regular switchback experiment (Example 4); Right: irregular switchback experiment (Example 5). In words, the manager decides on a collection of randomization points, which consists of flippinga fair coin at each period t ∈ { t , ..., t K } . If the resulting flip at period t k is heads , then the managerassigns the unit to treatment during periods ( t k , t k + 1 , ..., t k +1 − tails , then themanager assign the unit to control during periods ( t k , t k + 1 , ..., t k +1 − t ∈ T , the treatmentprobability that leads to the smallest variance is 1 / Example 4. When T = 4, T = { t = 1 , t = 3 } corresponds to the following design: with proba-bility 1 / W = (1 , , , / W = (1 , , , / W =(0 , , , / W = (0 , , , (cid:3) Example 5. Not all switchback experiments are regular. For example, when T = 4: with prob-ability 1 / W = (1 , , , / W = (1 , , , / W =(0 , , , / W = (0 , , , (cid:3) Any design of switchback experiment induces a probabilistic distribution over assignment paths w T ∈ { , } T . Define a design of switchback experiment to be any η : { , } N → [0 , 1] such that (cid:88) w T ∈{ , } T η ( w T ) = 1 , η ( w T ) ≥ , ∀ w T ∈ { , } T . Explicitly, η ( · ) is the underlying discrete distribution of the random assignment path W T . Forany regular switchback experiment T , we refer to the probability distribution using η T ( · ).For any T , there are a total of 2 K +1 many assignment paths, which are uniquely determinedby the values of W t , W t , ..., W t K . The assignment path is random, and follows the probabilitydistribution η T ( · ): η T ( w T ) = (cid:40) / K +1 , if ∀ k ∈ { , , ..., K } , w t k = w t k +1 = ... = w t k +1 − , , otherwise. (3)In the remainder of this paper, unless explicitly noted, all probabilities and expectations are takenwith respect to this probability distribution η T ( · ). Now that η T ( · ) is determined, following any realization of the assignment path W T = w T , weuse the Horvitz-Thompson estimator to estimate the lag- p effect:ˆ τ p ( η T , w T , Y ) = 1 T − p T (cid:88) t = p +1 (cid:26) Y obs t { w t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y obs t { w t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:27) (4)Since the assignment path W T is random, this Horvitz-Thompson estimator is random, as well. Weemphasize that estimator depends on (i) the probability distribution behind the random assignmentpath, (ii) the realization of the assignment path, and (iii) the potential outcomes. Example 6. Suppose T = 4 , p = m = 1. See Figure 1. Suppose the assignments are probabilisticand Pr( W t = 1) = Pr( W t = 0) = 1 / , ∀ t ∈ [4] . With probability 1 / 16 the green assignment path isadministered, W = (1 , , , τ = { Y (1 , 1) + 0 − Y (0 , } . (cid:3) It is well-known that the Horvitz-Thompson estimator is unbiased under the probabilistic treat-ment assignment assumption, which is satisfied by regular switchback experiments. Assumption 4 (Probabilistic Treatment Assignment) . Assume for any t ∈ { p + 1 : T } ,both < Pr( W t − p : t = p +1 ) , Pr( W t − p : t = p +1 ) < . It is easy to verify that regular switchback experiments satisfy Assumption 4, since the assign-ment on the firs period is random, i.e. Pr( W = 1) = 1 / Theorem 1 (Unbiasedness of the Horvitz-Thompson Estimator) . In a regular switch-back experiment, under Assumptions 2 and 3, the Horvitz-Thompson estimator is unbiased for theaverage lag- p causal effect of consecutive treatments on outcome, i.e., E W T ∼ η T [ˆ τ p ( η T , W T , Y )] = τ p ( Y ) . The proof to Theorem 1 is standard, by checking the expectations. We defer its proof to Sec-tion EC.3 in the Appendix. When m is misspecified, the above estimator is still meaningful withcausal interpretations. See Section 5 for a discussion. To evaluate the quality of a design of experiment, we adopt the decision-theoretic framework(Berger 2013, Bickel and Doksum 2015). When the random design is η T ( · ), the assignment path W T is random. For any realization of the assignment path w T and any set of potential outcomes Y , we define the loss function L ( η T , w T , Y ) = (ˆ τ p ( η T , w T , Y ) − τ p ( Y )) and the risk function r ( η T , Y ) = E W T ∼ η T ( · ) [ L ( η T , W T , Y )]= (cid:88) w T ∈{ , } T η T ( w T ) · (ˆ τ p ( η T , w T , Y ) − τ p ( Y )) (5)Such a risk function quantifies the expected loss incurred by one design of experiment. Sincethe estimator is unbiased, the risk function also has a second interpretation: the variance of theestimator. A design with lower risk is also a design whose estimator has a lower variance. Example 7 (Examples 4 and 6 Revisited). Suppose T = 4 and p = m = 1. As in Example 4,the experiment is T = { , } . With probability 1 / W = (1 , , , τ ( T ) = { Y (1 , − Y (0 , } . So L ( η ˜ T , w T , Y ) = ( Y (1 , 1) + Y (0 , − Y (1 , 1) + Y (0 , − Y (1 , − Y (0 , . As inExample 6, ˜ T = { , , , } . With probability 1 / W = (1 , , , τ (˜ T ) = { Y (1 , − Y (0 , } . So L ( η T , w T , Y ) = (3 Y (1 , 1) + Y (0 , − Y (1 , 1) + Y (0 , − Y (1 , − Y (0 , . (cid:3) Example 7 suggests that, even if the two realizations of the assignment path are the same andthe potential outcomes are the same, since the probability distributions T , ˜ T are different, thecorresponding loss functions could be different. This motivates us to find a probability distributionwith a smaller expected loss against some Y , which we will detail in Section 3. 3. Design of Regular Switchback Experiments under Minimax Rule The minimax decision rule (Berger 2013, Wu 1981) finds an optimal design of experiment, suchthat the worst-case risk against an adversarial selection of potential outcomes is minimized,min T ∈ [ T ] max Y ∈Y r ( η T , Y ) = min T ∈ [ T ] max Y ∈Y (cid:88) w T ∈{ , } T η T ( w T ) · (ˆ τ p ( w T , Y ) − τ p ( Y )) . (6)The goal of this section is to find the optimal T ∗ ⊆ [ T ]. Throughout this section we assumeperfect knowledge of m and assume p = m .In practice, there is a trade-off between having too few randomization points ( corresponding tosmall K ) and too many (corresponding to large K ). Intuitively, too many decreases the probabilityof observing an assignment path m +1 or m +1 , which, in turn, decreases the amount of useful data.On the other hand, too few decreases the number of independent observations and reduces ourability to produce reliable results. Both of these scenarios reduce our ability to draw valid causalclaims. To make switchback experiments useful in practice, we need to find the optimal numberof randomization points that allows us to draw valid inference while minimizing the variance. Weformalize this goal through the minimax framework, where we try to derive the best possible designfor the worse possible set of potential outcomes. To solve the minimax problem, we start by focusing on the inner maximization part of (6). Wecharacterize the worst-case potential outcomes by identifying two dominating strategies for theadversarial selection of potential outcomes. Denote Y + = { Y t ( m +1 ) = Y t ( m +1 ) = B } t ∈{ m +1: T } and Y − = { Y t ( m +1 ) = Y t ( m +1 ) = − B } t ∈{ m +1: T } Lemma 1. Under Assumptions 1–3, Y + and Y − are the only two dominating strategies for theadversarial selection of potential outcomes. That is, for any T ⊆ [ T ] and for any Y ∈ Y , r ( η T , Y + ) ≥ r ( η T , Y ); r ( η T , Y − ) ≥ r ( η T , Y ) . Moreover, for any Y ∈ Y such that Y (cid:54) = Y + or Y − , the above two inequalities are strict. The proof of Lemma 1 and an implication of Lemma 1 can be found in Sections EC.4.2.2and EC.4.2.3, respectively. Example 8 (Example 4 Continued). Suppose T = 4, p = m = 1, and T = { , } . The riskfunction can be calculated by r ( η T , Y ) = (cid:80) t =2 [( Y t (1 , 1) + Y t (0 , ] + 2 Y (1 , + 2 Y (0 , +2 (cid:80) t =2 [( Y t (1 , 1) + Y t (0 , Y t +1 (1 , 1) + Y t +1 (0 , Y (only) at Y t (1 , 1) = Y t (0 , 0) = ± B, ∀ t ∈ { } . This is what Lemma 1 says. (cid:3) Lemma 1 simplifies the minimax problem in (6). Instead of directly solving the minimax problem,we can now replace Y by either Y + or Y − , and solve only a minimization problem.Using Lemma 1, we now establish two structural results that limit the class of optimal designsof regular switchback experiments. Lemma 2 states the optimal starting and ending structure;Lemma 3 states the optimal middle-case structure. The proofs to Lemma 2 and Lemma 3 aredeferred to Sections EC.4.3.1 and EC.4.3.2, respectively. Lemma 2. When Y = Y + or Y = Y − , under Assumptions 1–3, any optimal design of experiment T must satisfy t ≥ m + 2 , and t K ≤ T − m. Lemma 2 suggests that the first coin flip on period 1 should be followed by at least m periodsthat do not flip a coin, and that the last coin flip should be followed by at least m periods that donot flip a coin. It guarantees that the assignment path during { m + 1 } and during { T − m : T } are both useful, i.e., m +1 or m +1 . Lemma 3. When Y = Y + or Y = Y − , under Assumptions 1–3, any optimal design of experiment T must satisfy t k +1 − t k − ≥ m, ∀ k ∈ [ K ] . Lemma 3 suggests that in every consecutive m + 1 periods, there could be at most 3 random-ization points. This is because too many randomization points in every consecutive m + 1 periodsdecreases the chance of observing an useful assignment path of m +1 or m +1 . Lemma 3 formalizessuch intuition, and suggests that when m grows large, the optimal design randomizes less often.Lemmas 2 and 3 restrict the space of possible optimal regular switchback experiment to a smallerclass of switchback experiments, which we define below. Definition 2 (Persistent Switchback Experiments). We say a regular switchback exper-iment T is persistent, if it satisfies the following three conditions, t ≥ m + 2; t K ≤ T − m ; t k +1 − t k − ≥ m, ∀ k ∈ [ K ] . For persistent switchback experiments, we can explicitly calculate the risk function r ( η T , Y ). Theorem 2 (Risk Function) . When Y = Y + or Y = Y − , under Assumptions 1–3, the riskfunction for any persistent switchback experiment is given by r ( η T , Y ) = 1( T − m ) (cid:40) K +1 (cid:88) k =1 ( t k − t k − ) + 8 m ( t K − t ) + 4 m K − m + 4 K (cid:88) k =2 [( m − t k + t k − ) + ] (cid:41) B (7)Theorem 2 explicitly describes the risk function of any optimal design of regular switchbackexperiments, which lies in the class of persistent switchback experiments. The proof of Theorem 2is deferred to Section EC.4.4 in the appendix.To understand the risk function in (7), note that the first summation of the squares (cid:80) K +1 k =1 ( t k − t k − ) suggests that the gap between two consecutive randomization points should not be too big;while the last summation of the squares (cid:80) Kk =2 [( m − t k + t k − ) + ] suggests that the gap should notbe too small. Such a contrast formalized the trade-off that we have described earlier in this section.Based on the risk function in (7), we are able to describe the optimal design, as we state in thenext Theorem. Theorem 3 (Optimal Design) . Under Assumptions 1–3, the optimal solution to the designof regular switchback experiment as we have introduced in (6) is equivalent to the optimal solutionto the following subset selection problem. min T ⊂ [ T ] (cid:40) K (cid:88) k =0 ( t k +1 − t k ) + 8 m ( t K − t ) + 4 m K − m + 4 K − (cid:88) k =1 [( m − t k +1 + t k ) + ] (cid:41) (8) In particular, when m = 0 then T ∗ = { , , , ..., T } ; when m > , and if there exists n ≥ ∈ N , s.t. T = nm , then T ∗ = { , m + 1 , m + 1 , ..., ( n − m + 1 } . The optimal design under two remarkable special cases are, when m = 0, T ∗ = { , , , ..., T } ;and when m = 1, T ∗ = { , , , ..., T − } . When managers believe there to be very little carryovereffect, the optimal designs are almost the same. Moreover, Theorem 3 presents the optimal designin a class of perfect cases when the time horizon split into several epochs. In practice, selecting T is part of the design of the experiment; our recommendation is to pick a T that satisfies theconditions in Theorem 3. See Section 7.2.We can also find the optimal design for other imperfect cases by solving (8); however, sincethere are integrality issues in the subset selection problem, the discussion of optimal design in suchimperfect cases are rather technical. We defer to Section EC.4.5 in the appendix to discuss suchdetails. The proof of Theorem 3 is deferred to Section EC.4.5.1 in the appendix. Example 9 (An Optimal Design). When T = 12, p = m = 2, the optimal design of regularswitchback experiment is T ∗ = { , , , } . See Table 1. (cid:3) Table 1 An example of the optimal design T ∗ versus an arbitrary design ˜ T when T = 12 and p = m = 2 . T ∗ (cid:88) − − − (cid:88) − (cid:88) − (cid:88) − − − ˜ T (cid:88) − − (cid:88) − − (cid:88) − − (cid:88) − − Each checkmark beneath a time period t indicates that t is a randomization point. It is worth noting that both the causal estimand and the Horvitz-Thompson estimator involveconsecutive treatments or controls for m + 1 periods. By contrast, Theorem 3 suggests that theoptimal design have epochs of equal length m (ignoring the first and last epoch).At a first sight this is counter-intuitive. Intuitively, each epoch should contain at least m + 1periods so there exist periods that always have consecutive treatments m +1 or m +1 and alwaysgenerate useful data; e.g. , periods ˜ t = 4 , , 10 in the third row of Table 1. However, even if eachepoch had m + 1 periods, there are still many periods that do not always generate useful data ( e.g. ,periods ˜ t = 5 , , , 4. Inference and Statistical Testing After designing and running the experiment, we obtain two time series. The first is the observedassignment path w obs T , and the second is the corresponding observed outcomes Y obs p +1: T . See Figure 3.To draw inference from this data we propose two methods, the exact inference and the asymptoticinference, as we detail below. Figure 3 Illustrator of the observed assignment path w obs T (blue and red dots) and the observed outcomes Y obs p +1: T (black curve). The dashed lines are the potential outcomes under consecutive treatments / controls. Throughout this section we assume perfect knowledge of m , i.e., p = m , and we will write τ m and ˆ τ m to stand for τ p and ˆ τ p , respectively. When m (cid:54) = p , our inference methods are still valid. SeeSection 5 for a discussion, and see Section 6.5 for a numerical example. We propose an exact non-parametric test for the sharp null of no causal effect at every time point H : Y t ( m +1 ) − Y t ( m +1 ) = 0 for all t = 1 , , ..., T. (9)This will be tested against a portmanteau alternative. The sharp null hypothesis implies that Y t ( w obs t − m : t ) = Y t ( w (cid:48) t − m : t ) for all w (cid:48) t − m : t ∈ { , } t ; that is, regardless of the assignment path w (cid:48) t − m : t we would have observed the same outcomes.We can conduct exact tests by using the known assignment mechanism to simulate new assign-ment paths. Algorithm 1 provides the details of how to implement it. In particular, under the sharpnull hypothesis of no treatment effect (9), any assignment path w [ i ]1: T leads to the same observed out-comes. So in Step 3, we always plug in the same observed outcomes Y obs p +1: T . To obtain a confidenceinterval, we propose inverting a sequence of exact hypothesis tests to identify the region outside ofwhich (9) is violated at the prespecified nominal level (Imbens and Rubin 2015, Chapter 5). Algorithm 1 Algorithm for performing a sharp-null hypothesis test Require: Fix I , total number of samples drawn. for i in 1 : I do Sample a new assignment path w [ i ]1: T according to the assignment mechanism. Hold Y obs p +1: T unchanged. Compute ˆ τ [ i ] according to (4),ˆ τ [ i ] = 1 T − m T (cid:88) t = m +1 (cid:40) Y obs t { w [ i ] t − m : t = m +1 } Pr( W t − m : t = m +1 ) − Y obs t { w [ i ] t − m : t = m +1 } Pr( W t − m : t = m +1 ) (cid:41) . end for Compute ˆ p F = I − (cid:80) Ii =1 (cid:8)(cid:12)(cid:12) ˆ τ [ i ] (cid:12)(cid:12) > | ˆ τ | (cid:9) return ˆ p F , the estimated p -value. For large I , this is exact. Now we introduce the asymptotic inference method, which tests the following null hypothesis H : τ m = 1 T − m T (cid:88) t = m +1 [ Y t ( m +1 ) − Y t ( m +1 )] = 0 . (10)The testing of such null hypothesis is based on the Horvitz-Thompson estimator in (4) beingasymptotically Gaussian. Below we introduce how to conduct asymptotic inference.Assume n = T /, ≥ W , W m +1 , ..., W ( n − m +1 . To makethe dependence on randomization clear, denote the following notations. Let N be the set ofall non-negative integers. For any k ∈ N , let ¯ Y k ( m +1 ) = (cid:80) ( k +2) mt =( k +1) m +1 Y t ( m +1 ) and ¯ Y k ( m +1 ) = (cid:80) ( k +2) mt =( k +1) m +1 Y t ( m +1 ). Moreover, let ¯ Y obs k = (cid:80) ( k +2) mt =( k +1) m +1 Y obs t be the summation of observed out-comes. Theorem 4 (Variance of the Horvitz-Thompson Estimator) . Under Assumptions 1–3,if n = T /m ≥ is an integer, then under the optimal design as shown in Theorem 3, the varianceof the Horvitz-Thompson estimator, Var (ˆ τ m ) , is Var (ˆ τ m ) = 1( T − m ) (cid:40) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) + 2 ¯ Y ( m +1 ) ¯ Y ( m +1 )+ n − (cid:88) k =1 (cid:2) Y k ( m +1 ) + 3 ¯ Y k ( m +1 ) + 2 ¯ Y k ( m +1 ) ¯ Y k ( m +1 ) (cid:3) + ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) + 2 ¯ Y n − ( m +1 ) ¯ Y n − ( m +1 )+ n − (cid:88) k =0 (cid:2) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:3) · (cid:2) ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 ) (cid:3)(cid:41) (11) Theorem 4 provides the variance of the estimator. Since we never observe all the potentialoutcomes, many of the cross-product terms from the variance can never be estimated. As analternative, we provide the following two upper bounds, and detail two unbiased estimators of suchtwo upper bounds, respectively. Corollary 1. Under the conditions in Theorem 4, there exist two upper bounds for the varianceof the Horvitz-Thompson estimator, Var (ˆ τ m ) ≤ Var U1 (ˆ τ m ) ≤ Var U2 (ˆ τ m ) . These two upper bounds Var U1 (ˆ τ m ) and Var U2 (ˆ τ m ) can be estimated by ˆ σ U1 and ˆ σ U2 , respectively, where ˆ σ U1 = 1( T − m ) (cid:40) 6( ¯ Y obs ) + n − (cid:88) k =1 24( ¯ Y obs k ) { W km +1 = W ( k +1) m +1 } + 6( ¯ Y obs n − ) + n − (cid:88) k =0 16 ¯ Y obs k ¯ Y obs k +1 { W km +1 = W ( k +1) m +1 = W ( k +2) m +1 } (cid:41) , and ˆ σ U2 = 1( T − m ) (cid:40) 8( ¯ Y obs ) + n − (cid:88) k =1 32( ¯ Y obs k ) { W km +1 = W ( k +1) m +1 } + 8( ¯ Y obs n − ) (cid:41) . Moreover, ˆ σ U1 and ˆ σ U2 are unbiased, i.e., E [ˆ σ U1 ] = Var U1 (ˆ τ m ) , and E [ˆ σ U2 ] = Var U2 (ˆ τ m ) . Corollary 1 provides the foundation to make conservative inference. We make the followingtechnical assumption for the asymptotic normal distribution to hold. Assumption 5 (Non-negligible Variance) . Assume that the randomization distribution hasa non-negligible variance, i.e., Var (ˆ τ m ) ≥ Ω( n − ) (12) In particular, one sufficient condition for (12) is to assume that all the potential outcomes arepositive, i.e., there exists some constant b > , such that ∀ t ∈ [ T ] , ∀ w T ∈ { , } T , Y t ( w T ) ≥ b . Intuitively, the key to Central Limit Theorem is that all the variables roughly have the sameorder of variance, i.e., there cannot be a small number of variables such that their variances makethe majority of the sum. Assumption 5 suggests that the variance is large enough, such that itcannot come from only a few of the time periods. Theorem 5 (Asymptotic Normality) . Let m be fixed. For any n ≥ ∈ N , define an n -replicaexperiment such that there are T = nm time periods. We take the optimal design as in Theorem 3whose randomization points are at T ∗ = { , m + 1 , m + 1 , ..., ( n − m + 1 } . Under Assumptions 2–3, and under Assumption 5, the limiting distribution of the Horvitz-Thompson estimator in the n -replica experiment has an asymptotic normal distribution. That is, let Var (ˆ τ m ) be defined inTheorem 4. As n → + ∞ , ˆ τ m − τ m (cid:112) Var (ˆ τ m ) D −→ N (0 , . In particular, Theorem 5 does not require Var (ˆ τ m ) to converge as n → + ∞ .To conduct inference, we replace Var (ˆ τ m ) by ˆ σ , one of the two bounds provided in Corollary 1.Define the test statistic to be z = | ˆ τ m | / √ ˆ σ . When the alternative hypothesis is two-sided, theestimated p -value is given by ˆ p N = 2 − z ), where Φ is the CDF of a standard normal distribution.The proofs of Theorem 4, Corollary 1, and Theorem 5 are deferred to Sections EC.5.2, EC.5.3,and EC.5.4 in the Appendix, respectively. 5. A Discussion about Misspecified m So far we have discussed cases when m is correctly specified. When m is misspecified, the esti-mation and inference are still valid and meaningful. We will detail below cases when m is eitheroverestimated m < p , or underestimated m > p . When m is overestimated, m < p . Due to Assumption 3, Y t ( p +1 ) = Y t ( m +1 ) , ∀ t ∈ { p + 1 : T } , sothe lag- p causal effect is essentially the lag- m causal effect.When m is underestimated, m > p . The lag- p effect in (1) is not well defined. Instead, we definethe m -misspecified lag- p causal effect that pads the p + 1 assignments with the earlier observedtreatments. τ ( m ) p ( Y ) = 1 T − p (cid:40) m (cid:88) t = p +1 (cid:2) Y t ( w obs t − p − , p +1 ) − Y t ( w obs t − p − , p +1 ) (cid:3) + T (cid:88) t = m +1 (cid:2) Y t ( w obs t − m : t − p − , p +1 ) − Y t ( w obs t − m : t − p − , p +1 ) (cid:3)(cid:41) . (13)This is a special case of the weighted lag- p causal effect introduced in Bojinov and Shephard (2019).Similarly to the average lag- p causal effect, τ ( m ) p ( Y ) captures how administering p + 1 consecutivetreatments as opposed to p + 1 consecutive controls impact the outcomes at time t , conditional onthe observed assignment path up to time t − p − 1. See Section 6.5 for numerical results. When m is overestimated ( m < p ) Theorem 1 still holds, i.e., E [ˆ τ p ] = τ p ( Y ) = τ m ( Y ). When m is underestimated ( m > p ), sometimes we have to slightly augment the results and study theconditional expectation. Define f T : [ T ] → T to be the “determining randomization point of period t ,” f T ( t ) = max { j | j ∈ T , j ≤ t } such that, it is the realization at time f T ( t ) that uniquely determines the assignment at time t , i.e. W t = W f T ( t ) , ∀ t ∈ [ T ]. See Example 10 for an illustration of f T ( · ). When T is clear from thecontext we drop the subscript and use f ( · ) = f T ( · ). Depending on if f ( t − p ) ≤ t − m , we establishan analogy of Theorem 1 for the m -underestimated case. Theorem 6 (Unbiasedness of the Estimator when m is Misspecified) . Under Assump-tions 2 and 3, for m > p , at each time t ≥ m + 1 , the Horvitz-Thompson estimator is either unbiasedfor the lag- m causal effect when f ( t − p ) ≤ t − m , or conditionally unbiased for the m -misspecifiedlag- p causal effect when f ( t − p ) > t − m . When p + 1 ≤ t ≤ m , the Horvitz-Thompson estimatoris either unbiased for the lag- t causal effect when f ( t − p ) = 1 , or conditionally unbiased for the m -misspecified lag- t causal effect when f ( t − p ) > . To remove the conditional expectation, we can further take an outer loop of expectation averagedover the past assignment paths. Although this is somewhat different from the average lag- p effectintroduced earlier in (1), it does capture the impact of a sequence of treatment relative to a sequenceof controls.All the mathematical expressions of Theorem 6, as well its proof, are all stated in Section EC.6.1in the Appendix. See Example 10 below for a specific illustration of Theorem 6. For a numericalillustration of the estimand and estimator in more general setups, see Section 6.5. Example 10 (Misspecified m ). Suppose T = 4 , m = 2 , p = 1 , T = { , } . Then the determiningrandomization points are f T (1) = 1 , f T (2) = 1 , f T (3) = 3 , f T (4) = 3, and E (cid:20) Y obs { W = (1 , } Pr( W = (1 , − Y obs { W = (0 , } Pr( W = (0 , (cid:21) = Y (1 , − Y (0 , E (cid:20) Y obs { W = (1 , } Pr( W = (1 , − Y obs { W = (0 , } Pr( W = (0 , (cid:21) = Y (1 , , − Y (0 , , E (cid:20) Y obs { W = (1 , } Pr( W = (1 , − Y obs { W = (0 , } Pr( W = (0 , (cid:21) = 12 [ Y (1 , , 1) + Y (0 , , − Y (0 , , − Y (1 , , (cid:3) The exact inference procedure as in Section 4.1 remains valid when m is misspecified. For theasymptotic inference procedure as in Section 4.2, Theorem 5 still holds when m is misspecified, aswe state in Corollary 2. The proof is deferred to Section EC.6.2 in the Appendix. Corollary 2 (Asymptotic Normality when m is Misspecified) . For any n ≥ ∈ N ,define an n -replica experiment such that there are T = np time periods. Take the optimal design asin Theorem 3 whose randomization points are at T ∗ = { , p + 1 , p + 1 , ..., ( n − p + 1 } . We havethe following two observations.i When m < p , under Assumptions 2–3, the variance of the Horvitz-Thompson estimator, Var (ˆ τ p ) , is explicitly given by (11) .ii Furthermore, no matter if m < p or m > p , under Assumptions 1–3 and assume Var (ˆ τ p ) ≥ Ω( n − ) , the limiting distribution of the Horvitz-Thompson estimator in the n -replica experi-ment has an asymptotic normal distribution. That is, as n → + ∞ , ˆ τ p − τ p (cid:112) Var (ˆ τ p ) D −→ N (0 , . Corollary 2, together with Theorem 5, is the key to identify m , the order of the carryover effect.In Section 7.1, we will provide methods to identify m . So far we have discussed estimation and inference when the order of carryover effects are misspec-ified. We conclude with a short discussion on the robustness of our method.First, the optimal design between m = 0 and m = 1 are almost the same. This suggests that whenthere is very little carryover effect, our proposed optimal design is robust. Second, as we will see inSection 6, when the order of carryover effect is slightly overestimated the variance is only a littlelarger. This adds an extra layer of robustness that a slight misspecification is often acceptable. 6. Simulation Study and Empirical Illustration There are 5 goals for this simulation study. First, to illustrate how to conduct a switchback exper-iment for various outcome models. Second, to show that our proposed optimal design has thesmallest risk, compared with two benchmarks. There are two dimensions for our comparison: theworst-case risk and the risk under a specific outcome model. Third, to verify the asymptotic normaldistribution under a non-asymptotic setup, and to study the quality of the upper bound proposedin Corollary 1. Fourth, to understand the rejection rate and its dependence on the length of timehorizon. Fifth, under randomly generated cases, to study the performance of the optimal designunder a misspecified m , and to compare the difference of the two inference methods proposed inSection 4. The potential outcome framework is flexible. As we will see below, it is easy to use the potentialoutcome framework to describe many complex relationships between assignments and outcomes. We start with a simple model which originates from Oman and Seiden (1988): Y t ( w t ) = µ + α t + δw t + γw t − + (cid:15) t (14)where µ is a fixed effect; α t is a fixed effect associated to period t ; δw t is the contemporaneouseffect, and γw t − is the carryover effect from period t − (cid:15) t is the random noise in period t . Sucha model as well as a few very similar ones are widely used in the literature (Hedayat et al. 1978,Jones and Kenward 2014).A more general variant from the above model is to consider carryover effects of any arbitraryorder: Y t ( w t ) = µ + α t + δ (1) w t + δ (2) w t − + ... + δ ( t ) w + (cid:15) t (15)where δ (1) , δ (2) , ..., δ ( t ) are non-stochastic coefficients. The dotted terms are carryover effects ofhigher orders. And all the other parameters are as defined in (14). We will run simulations basedon this more general model, which enables us to test the performance of our proposed optimaldesign under a misspecified m .The autoregressive model (Arellano 2003) is even more general: Y ( w ) = δ , w + (cid:15) and ∀ t > Y t ( w t ) = φ t,t − Y t − ( w t − ) + φ t,t − Y t − ( w t − ) + ... + φ t, Y ( w )+ δ t,t w t + δ t,t − w t − + ... + δ t, w + (cid:15) t (16)where φ t, ˜ t and δ t, ˜ t are non-stochastic coefficients; the dotted terms are carryover effects of higherorders; (cid:15) t is the random noise in period t . We can iteratively replace Y t ( w t ) using a linear combi-nation of w t , w t − , ..., w . So the autoregressive model in (16) can be written in a similar form of(15). The only difference is that the coefficients are different and dependent on t .For all these models, we first decide what is the order of carryover effects, namely m . Then weuse Theorem 3 to find the optimal design of experiment. Finally, we use the exact randomizationtest in Section 4 to conduct hypothesis test. We consider two setups. The first setup is for the worst-case risk.We consider T = 120 , p = m = 2 where m is correctly identified, and Y t ( ) = Y t ( ) = 10. Wecompare three different designs of switchback experiments. The first one is our proposed optimaldesign as in Theorem 3, such that T ∗ = { , , , ..., } . The second one is the most common andnaive switchback experiment, which independently assign treatment/control in every period withhalf-half probability. It is parameterized by T H1 = { , , , ..., } . The third one is the “intuitive” experiment discussed in Example 9, which divides the time horizon into several epochs each withlength m + 1 = 3. It is parameterized by T H2 = { , , , ..., } .Second, we run simulations based on the outcome model as in (15). We consider T = 120 , p = m = 2 where m is correctly identified. For the outcome model, we consider µ = 0, α t = log ( t ), and (cid:15) t ∼ N (0 , 1) are i.i.d. standard normal distributions. For any t > 3, let δ ( t ) = 0. We will vary thevalues of δ (1) , δ (2) , δ (3) ∈ { , } and conduct experiments under 2 = 8 different scenarios. Again wecompare the same three different designs of switchback experiments. T ∗ = { , , , ..., } , T H1 = { , , , ..., } , T H2 = { , , , ..., } .We simulate one assignment path at a time, and conduct experiment following this assignmentpath. Since the outcome model is prescribed, we can calculate both the causal estimand and andthe observed outcomes (along the simulated assignment path). Then we calculate the Horvitz-Thompson estimator based on the simulated assignment path and the simulated observed outcomes.With both the estimand and estimator, we can calculate the loss function. By repeating the aboveprocedure enough (in this simulation, 100000) times we approximately have the risk function. We calculate the worst-case risk functions via simulation. Noticethat even though we could calculate the worst-case risk function explicitly via Theorem 2, we stillrun the simulation to confirm this result. See Table 2 for results.The causal effect is τ = 0 because Y t ( ) = Y t ( ) = 10. The simulated estimator is E [ˆ τ ∗ ] = − . E [ˆ τ H1 ] = 0 . E [ˆ τ H2 ] = − . T − p then the errors are very clode to zero. The risk function is r ( η T ∗ ) = 26 . r ( η T H1 ) = 33 . 67 and r ( η T H1 ) = 27 . 85 for the two benchmarks,respectively. Such simulation results suggest that our proposed optimal design have the smallestrisk, under the worst case outcome model. Table 2 Simulation results for the worst-case risk function. τ E [ˆ τ ∗ ] E [ˆ τ H1 ] E [ˆ τ H2 ] r ( η T ∗ ) r ( η T H1 ) r ( η T H2 )0 0 . . . . 78 33 . 67 27 . The optimal design T ∗ as suggested in Theorem 3 yields the smallest risk. We also calculate the risk functions based on the outcome model in (15). See Table 3. As we varythe values of δ (1) , δ (2) and δ (3) , the total lag-2 causal effect is being changed. All three estimatorsare able to reflect the change as the estimand changes. The risk function can be simulated and wesee that the risk function associated with the first benchmark T H1 is 28% ∼ 32% larger than theoptimal design; and the second benchmark T H2 is 1% ∼ 2% larger. Such simulation results suggest Table 3 Simulation results for the risk function based on the outcome model in (15) . δ (1) δ (2) δ (3) τ E [ˆ τ ∗ ] E [ˆ τ H1 ] E [ˆ τ H2 ] r ( η T ∗ ) r ( η T H1 ) r ( η T H2 )1 1 1 3 3.016 3.012 3.002 7.96 10.22 8.111 1 2 4 4.018 4.013 4.002 9.57 12.39 9.741 2 1 4 4.018 4.013 4.002 9.57 12.39 9.742 1 1 4 4.018 4.013 4.002 9.57 12.39 9.741 2 2 5 5.020 5.015 5.003 11.34 14.81 11.522 1 2 5 5.020 5.015 5.003 11.34 14.81 11.522 2 1 5 5.020 5.015 5.003 11.34 14.81 11.522 2 2 6 6.022 6.016 6.003 13.28 17.48 13.47 For each row, the random seed that generates the simulation setup is fixed. The optimal design T ∗ as suggested in Theorem 3,though solved from a minimax program, still yields the smallest risk for the outcome model in (15). again that our proposed optimal design have the smallest risk. Moreover, based on the fact that r ( η T H2 ) is rather close to r ( η T ∗ ) and much smaller than r ( η T H1 ), we suggest that a slight overestimateof m is more desirable than an underestimate.As the magnitude of treatment effects increase, the associated risk functions also increase. Therelative difference between risk functions of r ( η T H1 ) and r ( η T ∗ ) increases, while the relative differencebetween r ( η T H1 ) and r ( η T ∗ ) decreases. This coincides with the intuitions discussed in Section 3. We run simulations based on the outcome model as in (15). Weconsider T = 120 , m = 2. We will consider three cases: (i) m is correctly specified so p = 2; (ii) m is overestimated to be 3 so p = 3, and we estimate lag-3 causal estimand as in (1); (iii) m isunderestimated to be 1 so p = 1, and we pretend as if we estimated the lag-1 causal estimand.However, the lag-1 causal estimand is not well defined – and we instead estimate the 2-misspecifiedlag-1 causal estimand as in (13).For the outcome model, we consider µ = 0, α t = log ( t ), and (cid:15) t ∼ N (0 , 1) are i.i.d. standard normaldistributions. For any t > 3, let δ ( t ) = 0. For simplicity, let δ (1) = δ (2) = δ (3) = δ . We vary δ ∈ { , , } and conduct experiments under 3 different scenarios.We simulate one assignment path at a time, and conduct experiment following this assignmentpath. Since the outcome model is prescribed, we calculate the observed outcomes based on the sim-ulated assignment path. Then we calculate the Horvitz-Thompson estimator, and two conservativeestimators of the randomization variance (Corollary 1), both based on the simulated assignmentpath and the simulated observed outcomes. On the other hand, the lag- p causal estimand is easyto calculate once the outcome model is prescribed. Yet the m -misspecified lag- p causal estimandhas to be calculated in conjunction with the simulated assignment path. By repeating the aboveprocedure enough (in this simulation, 100000) times we obtain a distribution of the estimator, andwe calculate the mean value of the estimator (and the m -misspecified lag- p causal estimand). Figure 4 shows approximate normality of the randomization dis-tribution, under all 9 cases. There are three specifications of m : correctly specified when p = 2;overestimated when p = 3; underestimated when p = 1. There are three specifications of δ = 1 , , m thatleads to three different designs. Table 4 Simulation results for the randomization distribution. τ p τ [ m ] p E [ˆ τ p ] Var (ˆ τ p ) E [ˆ σ U1 ] E [ˆ σ U2 ]correct m δ = 1 3 − δ = 2 6 − δ = 3 9 − m δ = 1 3 − δ = 2 6 − δ = 3 9 − m δ = 1 − δ = 2 − δ = 3 − The randomization distributions in all 9 cases are unbiased. The conservative estimation of the variance upper boundsfrom Corollary 1 are close to the true variance. From Table 4, we make the following two observations. (i) Unbiasedness of the Horvitz-Thompson estimator . When m is correctly specified, R [ˆ τ p ] is very close to τ p , verifying theunbiasedness of the estimator. When m is overestimated, the estimand remains unchanged, and theestimator remains unbiased. But the variance of the estimator is larger. When m is underestimated,the estimand is the m -misspecified estimand, and the estimator is unbiased for this m -misspecifiedestimand. (ii) Quality of Corollary 1 . As we increase δ , the variance of the randomization distributionalso increases. The two conservative estimators of the randomization variance are very close tothe true variance, which suggests that Corollary 1 approximates the true variance quite well. Eventhough the second upper bound Var U2 (ˆ τ p ) is larger than the first one Var U1 (ˆ τ p ), its estimator ˆ σ U2 turns out to be smaller than ˆ σ U1 in most cases. We run simulations based on the outcome model as in (15). Wevery T ∈ { , , , ..., } . We consider p = m = 2 where m is correctly specified. Similar to δ = 1. (b) m correctly specified, δ = 2. (c) m correctly specified, δ = 3.(d) m overestimated, δ = 1. (e) m overestimated, δ = 2. (f) m overestimated, δ = 3.(g) m underestimated, δ = 1. (h) m underestimated, δ = 2. (i) m underestimated, δ = 3. Figure 4 Approximate normality of the randomization distributions in all 9 cases. The red vertical lines are theexpected values of the randomization distributions. Section 6.3, we consider the same parameterization and conduct experiments under 3 differentscenarios δ ∈ { , , } .We simulate one assignment path at a time, and conduct experiment following this assignmentpath. We first calculate the observed outcomes and the Horvitz-Thompson estimator. Then weconduct the two inference methods as proposed in Section 4 (for the asymptotic inference method, we plug in the second upper bound ˆ σ U2 ) and obtain two estimated p -values. We reject the corre-sponding null hypothesis when the p -value is smaller than 0 . 1. By repeating the above procedureenough (in this simulation, 1000) times we obtain the frequency of a null hypothesis being rejected,which we refer to as the rejection rate. We calculate the rejection rates via simulation, and plot Figure 5.In all the simulations, δ (cid:54) = 0 , τ p (cid:54) = 0, so ideally we would wish to reject the null hypothesis (whetherif it is (9) or (10)). Figure 5 Rejection rates and their dependence on T /m . The blue dots are rejection rates under exactinference; the red dots are under asymptotic inference. Left: δ = 1 ; Middle: δ = 2 ; Right: δ = 3 From Figure 5 we make the following three observations. (i) Dependence on T /m . The rejec-tion rates increase as the length of the horizon increases – more specifically, as T /m the totalnumber of epochs increases. In practice, when firms have choose the length of T and decide howmuch experimental budgets to allocate, they can refer to Figure 5 to choose T properly. Also seediscussion in Section 7.2. (ii) Between two inference methods . In all three cases, the rejection rate from testing asharp null hypothesis (9) is slightly higher than that from testing the Neyman’s null (10). Thiscoincides with our intuition that a sharp null is more likely to be rejected. We discuss this inSection 6.5.2 together with the associated p -values. (iii) Dependence on the signal-to-noise ratio . The rejection rates all increase as δ increasesfrom 1 to 3 (while holding the noise from the model fixed). This suggests that when the treatmenteffect is relatively larger, we do not require a long experimental horizon to achieve a desired rejectionrate. m We run simulations based on the outcome model as in (15). Weconsider T = 120 , m = 2. We consider three cases: (i) m correctly specified p = 2; (ii) m is overesti-mated p = 3, and we estimate the lag-3 causal estimand as in (1); (iii) m is underestimated p = 1, and we pretend as if we estimated the lag-1 causal estimand. However, the lag-1 causal estimandis not well defined. Instead, we estimate the 2-misspecified lag-1 causal estimand as in (13).For the outcome model, we consider the same parameterization as in Section 6.3, and conductexperiments under 3 different scenarios δ ∈ { , , } .We only simulate one assignment path. Since the outcome model is prescribed, we calculate theobserved outcomes. There is only one time series of such observed outcomes. We calculate theHorvitz-Thompson estimator based on the simulated assignment path and the simulated observedoutcomes. We calculate the lag- p causal estimand directly, and also the m -misspecified lag- p causalestimand in conjunction with the simulated assignment path. Finally, we perform the two inferencemethods from Section 4, and report their associated estimated p -values. The conservative samplingvariance we take is ˆ σ U2 . We choose I = 100000 to be the number of samples drawn in the exactinference method as shown in Algorithm 1. Notice this is only one experiment under one simulated experi-mental setup from one simulated assignment path. So the estimators ˆ τ p we derive are different from τ p . But they are still roughly following the true causal effects which they estimate. See Table 5. Table 5 Simulation results for correctly specified m , overestimated m , and underestimated m . τ p τ [ m ] p ˆ τ p ˆ σ U2 ˆ p F ˆ p N correct m δ = 1 3 − δ = 2 6 − δ = 3 9 − m δ = 1 3 − δ = 2 6 − δ = 3 9 − m δ = 1 − δ = 2 − δ = 3 − The simulation setup for the three δ = 1 cases is the same; so are the δ = 2 cases and δ = 3 cases. The estimated p -valuesderived from the exact inference is slightly smaller than the p -values derived from the asymptotic inference. From Table 5 we see that both our estimator and the estimated variance are well defined in allthe cases when m is correctly specified, is overestimated, and is underestimated. In each case, as delta increases from 1 to 3, the associated p -values exhibit decreasing trends, suggesting a strongerrejection rate against the null hypothesis. Moreover, the p -values suggested by the exact inferenceis always slightly smaller than the p -values suggested by the asymptotic inference. This coincideswith our intuition that: (i) the exact inference method possesses a stronger null hypothesis (9)which implies the null hypothesis of (10); (ii) in the asymptotic inference we replaced the truerandomization variance by its conservative upper bound, which further leads to a larger p -value. 7. Practical Implications We recap what would a manager do to practically run a switchback experiment. First, the gran-ularity and the length of horizon are sometimes given to a manager; when the manager has moreflexibility to choose the granularity and the length of horizon, see discussions in Section 7.2. Sec-ond, the manager either consults domain knowledge to decide what is the order of carryover effect,or when such knowledge is not perfect, run a first phase experiment to identify such an order.See discussions in Section 7.1. Third, the manager decides on a collection of randomization points,and draws one sample from the randomization distribution to be the assignment path. We recom-mend the optimal design as we discussed in Section 3. Finally, the manager collects data from theexperiment, and draws causal conclusions using the methods in Section 4. We borrow Theorem 5 and Corollary 2 to define a sub-routine, which, combined with searchingmethod, identifies the order of carryover effect.Suppose we have access to two i.i.d. experimental units. Such two experimental units could betwo identical units. They can also be two time spans on one single experimental unit, such thatthe two spans are well separated and the carryover effect from one does not affect the outcomes ofthe other.On the first experimental unit, we design an optimal experiment under p = p ; while on thesecond unit, p = p . Without loss of generality let p < p . We observe the outcomes, collect thedata, and find the following statistics from the two experiments. In the first one, calculate ˆ τ p , thesampling average, and ˆ σ p , the conservative sampling variance as suggested by Corollary 1. In thesecond one, calculate ˆ τ p and ˆ σ p .Define a sub-routine that tests the following null hypothesis: H : m ≤ p (17)Under the null hypothesis (17), τ p = τ p . Furthermore, given that the two experimental unitsare independent, the difference between the two sample means should be a normal distributioncentered around zero, i.e., (ˆ τ p − ˆ τ p ) / (cid:112) Var ( τ p ) + Var ( τ p ) D −→ N (0 , z = | ˆ τ p − ˆ τ p | / (cid:112) ˆ σ p + ˆ σ p . The estimated p -value is given byˆ p = 2 − z ), where Φ is the CDF of a standard normal distribution.Such a sub-routine enables us to test the null hypothesis (17). We can combine such a sub-routinewith any searching method to identify m .To take an example, suppose we are running an experiment whose setup is in Section 6.5, andthat δ = 3. Suppose we have narrowed down the range of the order of carryover effect to be m ≤ In the first round, we consult the sub-routine to test a null hypothesis m ≤ 2. Then we wouldobserve row 3 and 6 from Table 5, with ˆ τ = 7 . , ˆ σ = 23 . 88; ˆ τ = 8 . , ˆ σ = 39 . 00. So the estimated p -value for the null hypothesis m ≤ p = 0 . m ≤ τ = 1 . , ˆ σ = 9 . 47; ˆ τ = 7 . , ˆ σ = 23 . p -value for the null hypothesis m ≤ p = 0 . . T /p ≈ The selection of T depends on the following two components: first, the granularity of one singletime period; and second, the number of total epochs T /p included in the time horizon.The granularity of each time period refers to how long in the physical world a single time periodcorresponds to. As long as each time period is granular enough such that its length is smaller thanthe length of the carryover effect, and that the length of the carryover effect is some multiple timesof the length of one time unit, then the selection of granularity makes no difference to the optimaldesign and analysis of switchback experiments. See Example 11. Example 11 (Two Granularity Levels). In the ride-sharing application, suppose thefirms have two options to either treat one single time period as 15 minutes, or as 30 minutes. SeeFigure 6. Each smallest time period stands for 15 minutes, and the carryover effect lasts for 1 hour.In the time granularity as shown in blue, each time period lasts for 15 minutes, and the carryovereffect lasts for m = 4 time periods. In the time granularity as shown in red, each time period lastsfor 30 minutes, and the carryover effect lasts for m = 2 time periods. Figure 6 Illustrator of two granularity levels. Blue: each period is 15 minutes; Red: each period is 30 minutes. From Theorem 3, the optimal design exhibits an optimal structure that randomizes once every m time periods (except for the first and last epoch, which lasts for 2 m time periods each). In bothcases, the optimal design would randomize once every 1 hour. Furthermore, from Theorem 1 weknow that under both cases the mean value of the Horvitz-Thompson estimator remains unchanged.From Theorem 4, the variance consists of terms like ¯ Y k ( m +1 ) = (cid:80) ( k +2) mt =( k +1) m +1 Y t ( m +1 ), which arethe sum across all the outcomes within 1 hour. So under both cases the variance of the estimatorremains unchanged. (cid:3) On the other hand, when the length of each time period is longer than the length of the carryovereffect, this will cause unnecessary overestimation of the length of the carryover effect. In an extremecase, the length of carryover effect is 1 minute while each period is selected to be 1 hour. Thenthe minimum order of carryover effect would be 1, which overestimates the length of the carryovereffect to be 1 hour. So we suggest that the length of each time period should be smaller than thelength of the carryover effect.Second, after we have decided about the granularity of each time period, we can consult theprocedures from Section 7.1 to identify m . The next to choose is how long the experiment shouldlast, n = T /m . We choose n by referring to the rejection rate curve as shown in Section 6.4. Wefirst prescribe which inference method to use (exact inference or asymptotic inference). Then weborrow our domain knowledge to get a sense of the signal-to-noise ratio. Finally, we choose anydesired rejection rate and find out the length of horizon required. 8. Concluding Remarks, Limitations and Future Work We studied the design and analysis of switchback experiments. We formulated and solved a min-imax problem for the design of optimal switchback experiments. We then analyzed our proposedoptimal design and proposed two inferential methods. In particular, we showed asymptotic nor-mality of the randomization distribution. We discussed cases when the order of carryover effect, m ,is misspecified, and detailed a method to identify the order of carryover effect. We gave empiricalsuggestions that a slight overestimate of m is acceptable and better than an underestimate.We point out two limitations of our paper. First, when m the order of carryover effect is as largeas comparable to T the length of horizon, our method, though still unbiased in theory, incurs ahuge variance that typically prohibits the experimental designer to make any inference. This is dueto the fact that our method is general and requires the minimum amount of modeling assumptions.When the outcome model has some structures, for example the equilibrium effects as in Wager andXu (2019), utilizing such structures will lead to difference designs. Second, our method does nottake advantage of covariate information to further reduce the randomization variance. This couldlead to future work.Finally, we do encourage empirical researchers who apply our method to use domain knowledgeto narrow down m first, before using the procedure in Section 7.1 to identify m . This is because,empirically, each sub-routine to identify (17) needs to consume experimental resources at the scaleof T /p ≈ 100 to well distinguish two candidate values, which could be luxurious when the resourceis scarce. References Abadie A, Athey S, Imbens GW, Wooldridge JM (2020) Sampling-based versus design-based uncertainty inregression analysis. Econometrica Journal of Marketing Research Panel data econometrics (Oxford university press).Athey S, Eckles D, Imbens GW (2018) Exact p-values for network interference. Journal of the AmericanStatistical Association Available atSSRN 3171224 .Bai Y (2019) Optimality of matched-pair designs in randomized control trials. Available at SSRN 3483834 .Basse G, Ding Y, Toulis P (2019) Minimax crossover designs. arXiv preprint arXiv:1908.03531 .Bell DR, Chiang J, Padmanabhan V (1999) The decomposition of promotional response: An empiricalgeneralization. Marketing Science Statistical decision theory and Bayesian analysis (Springer Science & Business Media).Bickel PJ, Doksum KA (2015) Mathematical statistics: basic ideas and selected topics, volume I , volume 117(CRC Press).Bojinov I, Rambachan A, Shephard N (2020a) Panel experiments and dynamic causal effects: A finite pop-ulation perspective. arXiv preprint arXiv:2003.09915 .Bojinov I, Saint-Jacques G, Tingley M (2020b) Avoid the pitfalls of a/b testing make sure your experimentsrecognize customers’ varying needs. Harvard Business Review Journal of the American Statistical Association Journal of the American Statistical Association Operations Research Journal of econometrics International Conference on Machine Learning , 1194–1203 (PMLR).Eckles D, Karrer B, Ugander J (2016) Design and analysis of experiments in networks: Reducing bias frominterference. Journal of Causal Inference HarvardBusiness School Case Manufacturing & Service Operations Management arXiv preprint arXiv:1905.07544 .Glynn P, Johari R, Rasouli M (2020) Adaptive experimental design with temporal interference: A maximumlikelihood approach. arXiv preprint arXiv:2006.05591 .Hadad V, Hirshberg DA, Zhan R, Wager S, Athey S (2019) Confidence intervals for policy evaluation inadaptive experiments. arXiv preprint arXiv:1911.02768 .Hedayat A, Afsarinejad K, et al. (1978) Repeated measurements designs, ii. The Annals of Statistics Duke Mathe-matical Journal American Journal of Political Science http://dx.doi.org/10.1111/ajps.12417 .Imbens GW, Rubin DB (2015) Causal Inference for Statistics, Social, and Biomedical Sciences: An Intro-duction (Cambridge University Press), URL http://dx.doi.org/10.1017/CBO9781139025751 .Johari R, Li H, Weintraub G (2020) Experimental design in two-sided platforms: An analysis of bias. arXivpreprint arXiv:2002.05670 .Jones B, Kenward MG (2014) Design and analysis of cross-over trials (CRC press).Kastelman D, Ramesh R (2018) Switchback tests and randomized experimentation under network effects atdoordash. URL: https://medium.com/@DoorDash/switchback-tests-and-randomized-experimentation-under-network-effects-at-doordash-f1d938ab7c2a .Kohavi R, Crook T, Longbotham R, Frasca B, Henne R, Ferres JL, Melamed T (2009) Online experimentationat microsoft. Data Mining Case Studies Proceedings of the 13th ACM SIGKDD international conference onKnowledge discovery and data mining , 959–967.Kohavi R, Tang D, Xu Y (2020) Trustworthy Online Controlled Experiments: A Practical Guide to A/BTesting (Cambridge University Press), URL http://dx.doi.org/10.1017/9781108653985 .Kohavi R, Thomke S (2017) The surprising power of online experiments. Harvard Business Review Statistics in Medicine k factorial experiments. The Annals of Statistics Personalized medicine Forthcoming at ManagementScience .March JG (1991) Exploration and exploitation in organizational learning. Organization science International Conference on Artificial Intelligence and Statistics , 1261–1269.Oman SD, Seiden E (1988) Switch-back designs. Biometrika arXiv preprint arXiv:1910.10862 .Rambachan A, Shephard N (2019) Econometric analysis of potential outcomes time series: instruments,shocks, linearity and the causal response function. arXiv preprint arXiv:1903.01637 .Robins J (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling Journal of the American Statistical Association Statistics & probability letters Academy of management Review Statistics in Medicine Research in organizational behavior Journal of the American Statistical Association http://dx.doi.org/10.1080/01621459.2011.646917 .Sussman DL, Airoldi EM (2017) Elements of estimation theory for causal effects in the presence of networkinterference. arXiv preprint arXiv:1702.03578 .Thomke S (2001) Enlightened experimentation. the new imperative for innovation. Harvard Business Review Experimentation Works: The Surprising Power of Business Experiments (Harvard Busi-ness Press).Wager S, Xu K (2019) Experimenting in equilibrium. arXiv preprint arXiv:1903.02124 .Wu CF (1981) On the robustness and efficiency of some randomized designs. The Annals of Statistics -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec1 Online Appendix EC.1. Recap of Notations Within this paper, let N , N be the set of positive integers and non-negative integers, respectively.For any T ∈ N , let [ T ] = { , ..., T } be the set of positive integers no larger than T . For any t < t (cid:48) ∈ N ,let { t : t (cid:48) } = { t, t + 1 , ..., t (cid:48) } be the set of integers between (including) t and t (cid:48) . For any m ∈ N ,let m = (1 , , ..., 1) be a vector of m ones; let m = (0 , , ..., 0) be a vector of m zeros. We useparentheses for probabilities, i.e., Pr( · ); brackets for expectations, i.e., E [ · ]; and curly brackets forindicators, i.e., {·} . For any a ∈ R , let ( a ) + = max { a, } . EC.2. Theorems Used We summarize here the preliminaries that we have directly used in our proof. Definition EC.1 ( φ -Dependent Random Variables, Hoeffding and Robbins (1948)). For any sequence { X , X , ... } , if there exists φ such that for any s − r > φ , the two sets( X , X , ..., X r ) , ( X s , X s +1 , ..., X n )are independent, then the sequence is said to be φ -dependent. Lemma EC.1 (Romano and Wolf (2000), Theorem 2.1) . Let { X n,i } be a triangular arrayof zero-mean random variables. Let φ ∈ N be a fixed constant. For each n = 1 , , ... , let d = d n , andsuppose that X n, , X n, , ..., X n,d is an φ -dependent sequence of random variables. Define B n,k,a = Var (cid:32) a + k − (cid:88) i = a X n,i (cid:33) , B n = B n,d, = Var (cid:32) d (cid:88) i =1 X n,i (cid:33) For some δ > and − ≤ γ ≤ , if the following conditions hold: E | X n,i | δ ≤ ∆ n , for all i ; B n,k,a /k γ ≤ K n , for all a and k ≥ φ ; B n / ( dφ γ ) ≥ L n ; K n /L n = O (1) ; 5. ∆ /L (2+ δ ) / n = O (1) ,then (cid:80) di =1 X n,i B n D −→ N (0 , . We explain Lemma EC.1. The D −→ notation stands for convergence in distribution. The definitionof a sequence of φ -dependent random variables is given in Definition EC.1. To check if the conditionsin Lemma EC.1 hold, we will first calculate B n,k,a for any k and a , and then construct some proper∆ n , K n , and L n . c2 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments EC.3. Proof of Unbiasedness of the Horvitz-Thompson Estimator Proof of Theorem 1. First observe that for regular switchback experiments, both 0 < Pr( W t − p : t = p +1 ) , Pr( W t − p : t = p +1 ) < 1. So Assumption 4 is naturally satisfied.So for any t ∈ { p + 1 : T } , with probability Pr( W t − p : t = p +1 ) (cid:54) = 0, { W t − p : t = p +1 } = 1, and Y obs t = Y t ( p +1 ). So E (cid:104) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:105) = Y t ( m +1 ). Similarly E (cid:104) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:105) = Y t ( p +1 ).Sum them up for any t ∈ { p + 1 : T } we finish the proof. (cid:3) EC.4. Proofs and Discussions from Section 3 In Section 3 we focus on the case when p = m . Throughout this section in the appendix, we useonly m instead of p . EC.4.1. Extra Notations Used in the Proofs from Section 3 For any t ∈ { m + 1 : T } , denote t ( T , Y ) = Y t ( m +1 ) (cid:34) { W t − m : t = ( m +1 ) } · · m (cid:89) l =1 { t − l +1 ∈ T } − (cid:35) − Y t ( m +1 ) (cid:34) { W t − m : t = ( m +1 ) } · · m (cid:89) l =1 { t − l +1 ∈ T } − (cid:35) (EC.1)where we use 2 (cid:81) ml =1 { t − l +1 ∈ T } to calculate the inverse propensity score. When T and Y are clearfrom the context we omit them and use t for t ( T , Y ).Using the above notation, we could re-writeˆ τ m − τ m = 1 T − m T (cid:88) t = m +1 t ( T , Y )Note that ∀ t ∈ { m + 1 , m + 2 , ..., T } , E [ t ( T , Y )] = 0 . (EC.2)Recall that any regular switchback experiment can be represented by T = { t , t , ..., t K } ⊆ [ T ].Define f T : [ T ] → T to be the “determining randomization point of period t ”, i.e. f T ( t ) = max { j | j ∈ T , j ≤ t } such that W f T ( t ) uniquely determines the distribution of W t , i.e. W t = W f T ( t ) . When T is clear fromthe context we also omit the subscript and use f ( t ) for f T ( t ).Similarly, we define f m T ( t ) : [ T ] → { , } T , which maps a time period to a subset of the set T , tobe the “determining randomization points of periods { t − m, t − m + 1 , ..., t } ”, i.e. f m T ( t ) = { j |∃ i ∈ { t − m, ..., t } , s.t.j = f T ( i ) } -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec3 such that f m T ( t ) ⊆ T ⊆ [ T ]. And f m T ( t ) contains all the time periods whose coin flips determine thedistributions of W t − m , W t − m +1 , ..., W t . Denote | f m T ( t ) | = J . We keep in mind that J depends on m, t and T , yet they are all omitted for brevity. Since W t − m , W t − m +1 , ..., W t are determined by at leastone randomization point f ( t − m ), we know that f m T ( t ) (cid:54) = ∅ is non-empty, i.e., | f m T ( t ) | = J ≥ . (EC.3)Finally, define “overlapping randomization points of periods { t − m, t − m + 1 , ..., t } and { t (cid:48) − m, t (cid:48) − m + 1 , ..., t (cid:48) } ” to be O T ( t, t (cid:48) ) = f m T ( t ) ∩ f m T ( t (cid:48) )Denote | O T ( t, t (cid:48) ) | = J o . We keep in mind that J o depends on m, t, t (cid:48) and T , yet they are all omittedfor brevity. EC.4.2. Lemma 1: Adversarial Selection of Potential OutcomesEC.4.2.1. Preliminaries. We first introduce two Lemmas for the proof of Lemma 1. Lemma EC.2. Under Assumptions 2–3, for any t ∈ [ T ] , let | f m T ( t ) | = J . E [ t ] =(2 J − Y t ( m +1 ) + 2 Y t ( m +1 ) Y t ( m +1 ) + (2 J − Y t ( m +1 ) . (EC.4) Proof of Lemma EC.2. Denote | f m T ( t ) | = J . Let the elements be f m T ( t ) = { u , u , ..., u J } . Let u < u < ... < u J .Using the notations defined earlier in Section EC.4, E [ t ] = Pr ( W t − m : t = m +1 ) · (cid:8) Y t ( m +1 )(2 J · − − Y t ( m +1 )(2 J · − (cid:9) + Pr ( W t − m : t = m +1 ) · (cid:8) Y t ( m +1 )(2 J · − − Y t ( m +1 )(2 J · − (cid:9) + Pr ( W t − m : t (cid:54) = J or J ) · (cid:8) Y t ( m +1 )(2 J · − − Y t ( m +1 )(2 J · − (cid:9) = Pr (( W u , ..., W u J ) = J ) · (cid:8) (2 J − Y t ( m +1 ) + Y t ( m +1 ) (cid:9) + Pr (( W u , ..., W u J ) = J ) · (cid:8) − Y t ( m +1 ) − (2 J − Y t ( m +1 ) (cid:9) + Pr (( W u , ..., W u J ) (cid:54) = J or J ) · {− Y t ( m +1 ) + Y t ( m +1 ) } =(2 J − Y t ( m +1 ) + 2 Y t ( m +1 ) Y t ( m +1 ) + (2 J − Y t ( m +1 ) which finishes the proof. (cid:3) Lemma EC.3. Under Assumptions 2–3, for any t < t (cid:48) ∈ [ T ] , when | O T ( t, t (cid:48) ) | = J o = 0 , E [ t t (cid:48) ] =0 . (EC.5) When | O T ( t, t (cid:48) ) | = J o ≥ , E [ t t (cid:48) ] =(2 J o − Y t ( m +1 ) Y t (cid:48) ( m +1 ) + Y t ( m +1 ) Y t (cid:48) ( m +1 )+ Y t ( m +1 ) Y t (cid:48) ( m +1 ) + (2 J o − Y t ( m +1 ) Y t (cid:48) ( m +1 ) . (EC.6) c4 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments Proof of Lemma EC.3. Denote | f m T ( t ) | = J , | f m T ( t (cid:48) ) | = J (cid:48) , and | O T ( t, t (cid:48) ) | = J o . Let the elementsbe f m T ( t ) = { u , u , ..., u J } , f m T ( t (cid:48) ) = { u (cid:48) , u (cid:48) , ..., u (cid:48) J (cid:48) } , and O T ( t, t (cid:48) ) = { u o , u o , ..., u o J o } . Let u < u <... < u J , u (cid:48) < u (cid:48) < ... < u (cid:48) J (cid:48) , and u o < u o < ... < u o J o .One time period could have different numberings in f m T ( t ), f m T ( t (cid:48) ), and O T ( t, t (cid:48) ). For example, u J − J o +1 = u (cid:48) = u o , and u J = u (cid:48) J o = u o J o . See Table EC.1 for an illustrator of the determining ran-domization points and the overlapping randomization points. Table EC.1 Illustrator of the determining randomization points and the overlapping randomization points. u u ... u J − J o +1 ... u J u o ... u o J o u (cid:48) ... u (cid:48) J o u (cid:48) J o +1 ... u (cid:48) J (cid:48) Each columns stands for one time period. The first row stands for the determining randomization points of f m T ( t ); the second row for the overlapping randomization points of O T ( t, t (cid:48) ); and the third row for the determiningrandomization points of f m T ( t (cid:48) ). First, when J o = 0, this implies that t and t (cid:48) are independent. Then E [ t t (cid:48) ] = E [ t ] E [ t (cid:48) ] = 0,where the second equality is due to (EC.2).When J o ≥ 1, this implies that t and t (cid:48) are correlated. Using the notations defined above, E [ t t (cid:48) ] = E W u o ,...,W u o J o (cid:104) E (cid:104) t t (cid:48) (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105)(cid:105) (EC.7)= Pr (cid:16) ( W u o , ..., W u o J o ) = J o (cid:17) E (cid:104) t t (cid:48) (cid:12)(cid:12)(cid:12) ( W u o , ..., W u o J o ) = J o (cid:105) + Pr (cid:16) ( W u o , ..., W u o J o ) = J o (cid:17) E (cid:104) t t (cid:48) (cid:12)(cid:12)(cid:12) ( W u o , ..., W u o J o ) = J o (cid:105) + Pr (cid:16) ( W u o , ..., W u o J o ) (cid:54) = J o or J o (cid:17) E (cid:104) t t (cid:48) (cid:12)(cid:12)(cid:12) ( W u o , ..., W u o J o ) (cid:54) = J o or J o (cid:105) Next we go over the three cases of ( W u o , ..., W u o J o ) as decomposed above. Note that conditionalon ( W u o , ..., W u o J o ), t and t (cid:48) are independent, i.e., E (cid:104) t t (cid:48) (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = E (cid:104) t (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) E (cid:104) t (cid:48) (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) (1) With probability 1 / J o , ( W u o , ..., W u o J o ) = J o , in which case E (cid:104) t (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = Pr ( W t − m : t = m +1 ) · (cid:8) Y t ( m +1 )(2 J · − 1) + Y t ( m +1 ) (cid:9) + Pr ( W t − m : t (cid:54) = m +1 ) · (cid:8) Y t ( m +1 )(2 J · − 1) + Y t ( m +1 ) (cid:9) = Pr (cid:0) ( W u , W u , ..., W u J − J o ) = J − J o (cid:1) · (cid:8) Y t ( m +1 )(2 J − 1) + Y t ( m +1 ) (cid:9) + Pr (cid:0) ( W u , W u , ..., W u J − J o ) (cid:54) = J − J o (cid:1) · {− Y t ( m +1 ) + Y t ( m +1 ) } = 12 J − J o · (cid:8) Y t ( m +1 )(2 J − 1) + Y t ( m +1 ) (cid:9) + (cid:18) − J − J o (cid:19) · {− Y t ( m +1 ) + Y t ( m +1 ) } =(2 J o − Y t ( m +1 ) + Y t ( m +1 ) -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec5 where the third equality is due to (2). Similarly, E (cid:104) t (cid:48) (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = Pr ( W t (cid:48) − m : t (cid:48) = m +1 ) · (cid:110) Y t (cid:48) ( m +1 )(2 J (cid:48) · − 1) + Y t (cid:48) ( m +1 ) (cid:111) + Pr ( W t (cid:48) − m : t (cid:48) (cid:54) = m +1 ) · (cid:110) Y t (cid:48) ( m +1 )(2 J (cid:48) · − 1) + Y t (cid:48) ( m +1 ) (cid:111) = Pr (cid:16) ( W u (cid:48) J o +1 , W u (cid:48) J o +2 , ..., W u (cid:48) J (cid:48) ) = J (cid:48) − J o (cid:17) · (cid:110) Y t (cid:48) ( m +1 )(2 J (cid:48) − 1) + Y t (cid:48) ( m +1 ) (cid:111) + Pr (cid:16) ( W u (cid:48) J o +1 , W u (cid:48) J o +2 , ..., W u (cid:48) J (cid:48) ) (cid:54) = J (cid:48) − J o (cid:17) · {− Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 ) } = 12 J (cid:48) − J o · (cid:110) Y t (cid:48) ( m +1 )(2 J (cid:48) − 1) + Y t (cid:48) ( m +1 ) (cid:111) + (cid:18) − J (cid:48) − J o (cid:19) · {− Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 ) } =(2 J o − Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 ) (2) With probability 1 / J o , ( W u o , ..., W u o J o ) = J o , in which case E (cid:104) t (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = Pr ( W t − m : t = m +1 ) · (cid:8) − Y t ( m +1 ) − Y t ( m +1 )(2 J · − (cid:9) + Pr ( W t − m : t (cid:54) = m +1 ) · (cid:8) − Y t ( m +1 ) − Y t ( m +1 )(2 J · − (cid:9) = 12 J − J o · (cid:8) − Y t ( m +1 ) − Y t ( m +1 )(2 J − (cid:9) + (cid:18) − J − J o (cid:19) · {− Y t ( m +1 ) + Y t ( m +1 ) } = − Y t ( m +1 ) − (2 J o − Y t ( m +1 ) E (cid:104) t (cid:48) (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = Pr ( W t (cid:48) − m : t (cid:48) = m +1 ) · (cid:110) − Y t (cid:48) ( m +1 ) − Y t (cid:48) ( m +1 )(2 J (cid:48) · − (cid:111) + Pr ( W t (cid:48) − m : t (cid:48) (cid:54) = m +1 ) · (cid:110) − Y t (cid:48) ( m +1 ) − Y t (cid:48) ( m +1 )(2 J (cid:48) · − (cid:111) = 12 J (cid:48) − J o · (cid:110) − Y t (cid:48) ( m +1 ) − Y t (cid:48) ( m +1 )(2 J (cid:48) − (cid:111) + (cid:18) − J (cid:48) − J o (cid:19) · {− Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 ) } = − Y t (cid:48) ( m +1 ) − (2 J o − Y t (cid:48) ( m +1 ) (3) With probability 1 − · (1 / J o ), ( W u o , ..., W u o J o ) (cid:54) = J o or J o , in which case E (cid:104) t (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = − Y t ( m +1 ) + Y t ( m +1 ) E (cid:104) t (cid:48) (cid:12)(cid:12)(cid:12) W u o , ..., W u o J o (cid:105) = − Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 )Finally, putting all above together into (EC.7), we have E [ t t (cid:48) ] = 12 J o · (cid:110) (2 J o − Y t ( m +1 ) + Y t ( m +1 ) (cid:111) · (cid:110) (2 J o − Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 ) (cid:111) + 12 J o · (cid:110) − Y t ( m +1 ) − (2 J o − Y t ( m +1 ) (cid:111) · (cid:110) − Y t (cid:48) ( m +1 ) − (2 J o − Y t (cid:48) ( m +1 ) (cid:111) + (cid:26) − J o (cid:27) · {− Y t ( m +1 ) + Y t ( m +1 ) } · {− Y t (cid:48) ( m +1 ) + Y t (cid:48) ( m +1 ) } =(2 J o − Y t ( m +1 ) Y t (cid:48) ( m +1 ) + Y t ( m +1 ) Y t (cid:48) ( m +1 ) + Y t ( m +1 ) Y t (cid:48) ( m +1 ) + (2 J o − Y t ( m +1 ) Y t (cid:48) ( m +1 )which finishes the proof. (cid:3) c6 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments EC.4.2.2. Proof of Lemma 1. The proof of Lemma 1 is through careful expansion of therisk function, the expected square loss. Proof of Lemma 1. From Lemma EC.2 and Lemma EC.3, all the terms are quadratic, andall the coefficients are non-negative. That is, after multiplying the constant ( T − m ) , for any T ∈ [ T ] , Y ∈ Y we can express in canonical form the following:( T − m ) · E W T ∼ η ( · ) (cid:104) (ˆ τ m ( W T , Y ) − τ m ( Y )) (cid:105) = T (cid:88) t = m +1 (cid:8) a t (11) Y t ( m +1 ) + a t (10) Y t ( m +1 ) Y t ( m +1 ) + a t (00) Y t ( m +1 ) (cid:9) + (cid:88) m +1 ≤ t 0, so the inequality is strict. Similarly, if ∃ t ∈ { m +1 , ..., T } such that − B < Y t ( m +1 ) < B , then combine a t (00) > 0, so the inequality is strict. (cid:3) EC.4.2.3. Implications of Lemma 1. Lemma 1 simplifies the minimax problem in (6).Instead of thinking it as a minimax problem, we can now replace Y by either Y + or Y − , and solveonly a minimization problem.Here we state Lemma EC.4 that is a direct implication of Lemma 1. It will be frequently usedlater on. Lemma EC.4. When Y = Y + or Y = Y − , under Assumptions 1–3, for any t ∈ [ T ] , E [ t ] =2 J +1 B . For any t < t (cid:48) ∈ [ T ] , when | O T ( t, t (cid:48) ) | = J o = 0 , E [ t t (cid:48) ] =0 When | O T ( t, t (cid:48) ) | = J o ≥ , E [ t t (cid:48) ] =2 J o +1 B Proof of Lemma EC.4. Replace Y t ( m +1 ) = Y t ( m +1 ) by B or − B , into the expressions in Lem-mas EC.2 and EC.3. (cid:3) -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec7 EC.4.3. Structural Results of the Optimal Design First note that when we focus on the optimal design, we treat T and m both as constants. So theconstant of 1 / ( T − m ) in the expression of the risk function does not affect the optimal design. EC.4.3.1. Proof of Lemma 2. Proof of Lemma 2. We prove the two parts separately, both by contradiction. (1) Suppose there exists an optimal design T = { t = 1 , t , t , ..., t K } such that t ≤ m + 1. Thenwe try to construct another design ˜ T , such that (cid:12)(cid:12)(cid:12) ˜ T (cid:12)(cid:12)(cid:12) = K = | T | − 1. And the K elements are ˜ T = { ˜ t = 1 , ˜ t = t , ˜ t = t , ..., ˜ t K − = t K } . Table EC.2 An example of two regular switchback experiments T and ˜ T when m = 4 and t = 3 . T (cid:88) − (cid:88) − − (cid:88) ...˜ T (cid:88) − − − − (cid:88) ... Each checkmark beneath a number indicates that this number is within that set; and each dashbeneath a number indicates that this number is not within that set. For example, the checkmark (cid:88) beneath number 3 indicates that 3 ∈ T ; and the dash − beneath number 3 indicates that 3 (cid:54) = ˜ T . Next we argue that when Y = Y + or Y = Y − , r ( T , Y ) > r (˜ T , Y ) , which suggests that T is not the optimal design.First, focus on the squared terms. For any m + 1 ≤ t ≤ t + m − t ∈ f m T ( t ) , t (cid:54) = f m ˜ T ( t ). Moreover, t − m ≤ t − 1, so that t ∈ f m ˜ T ( t ). So f m T ( t ) − { t } = f m ˜ T ( t ), and (cid:12)(cid:12) f m ˜ T ( t ) (cid:12)(cid:12) ≥ 1. As a result, E [ t ( T ) ] − E [ t (˜ T ) ] ≥ (2 − ) B = 4 B . For any t ≥ t + m , either (i) f T ( t − m ) = t , in which case f ˜ T ( t − m ) = t . This is the only differencebetween f m T ( t ) and f m ˜ T ( t ), i.e., f m T ( t ) − { t } = f m ˜ T ( t ) − { t } . So | f m T ( t ) | = (cid:12)(cid:12) f m ˜ T ( t ) (cid:12)(cid:12) . The second caseis (ii) f T ( t − m ) ≥ t , in which case f m T ( t ) = f m ˜ T ( t ). Both cases suggest that E [ t ( T ) ] − E [ t (˜ T ) ] = 0 . So we have T (cid:88) t = m +1 E (cid:2) t ( T ) (cid:3) − T (cid:88) t = m +1 E (cid:104) t (˜ T ) (cid:105) = t + m − (cid:88) t = m +1 (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) + T (cid:88) t = t + m (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) ≥ t + m − (cid:88) t = m +1 (4 B ) + 0=4( t − B > c8 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments Second, focus on the cross product terms. For any t and t (cid:48) such that m + 1 ≤ t < t (cid:48) ≤ t + m − t ∈ O T ( t, t (cid:48) ) , t (cid:54) = O ˜ T ( t, t (cid:48) ). Moreover, t − m ≤ t − 1, so that t ∈ O T ( t, t (cid:48) ). So O T ( t, t (cid:48) ) − { t } = O ˜ T ( t, t (cid:48) ), and | O ˜ T ( t, t (cid:48) ) | ≥ 1. As a result, E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] ≥ (2 − ) B = 4 B > . For any m + 1 ≤ t < t (cid:48) ≤ T such that t (cid:48) ≥ t + m , either (i) f T ( t (cid:48) − m ) = t , in which case f ˜ T ( t (cid:48) − m ) = t . So O T ( t, t (cid:48) ) − { t } = O ˜ T ( t, t (cid:48) ) − { t } . So | O T ( t, t (cid:48) ) | = | O ˜ T ( t, t (cid:48) ) | . The second case is (ii) f T ( t (cid:48) − m ) ≥ t , in which case O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ). Both cases suggest that E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] = 0 . So we have (cid:88) m +1 ≤ t 1. And the K elements are˜ T = { ˜ t = 1 , ˜ t = t , ˜ t = t , ..., ˜ t K − = t K − } . Table EC.3 An example of two regular switchback experiments T and ˜ T when m = 4 and t K = T − . ... T − T − T − T − T − T T ... (cid:88) − (cid:88) (cid:88) − − ˜ T ... (cid:88) − (cid:88) − − − Each checkmark beneath a number indicates that this number is within that set; and each dashbeneath a number indicates that this number is not within that set. For example, the checkmark (cid:88) beneath number T − T − ∈ T ; and the dash − beneath number T − T − (cid:54) = ˜ T . Next we argue that when Y = Y + or Y = Y − , r ( T , Y ) > r (˜ T , Y ) , which suggests that T is not the optimal design. -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec9 First focus on the squared terms. For any m + 1 ≤ t ≤ t K − f m T ( t ) = f m ˜ T ( t ) is totally unchanged. E [ t ( T ) ] − E [ t (˜ T ) ] = 0 . For any t K ≤ t ≤ T , t K / ∈ f m ˜ T ( t ) , t K ∈ f m T ( t ). And all the other determining randomization pointsare unchanged. So f m ˜ T ( t ) ⊂ f m T ( t ) and f m T ( t ) − { t K } = f m ˜ T ( t ) and (cid:12)(cid:12) f m ˜ T ( t ) (cid:12)(cid:12) ≥ E [ t ( T ) ] − E [ t (˜ T ) ] ≥ (2 − ) B = 4 B . So we have T (cid:88) t = m +1 E (cid:2) t ( T ) (cid:3) − T (cid:88) t = m +1 E (cid:104) t (˜ T ) (cid:105) = t K − (cid:88) t = m +1 (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) + T (cid:88) t = t K (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) ≥ T (cid:88) t = t K (4 B ) + 0=4( T − t K + 1) B > m + 1 ≤ t < t (cid:48) ≤ T such that t ≤ t K − O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ) is totally unchanged. E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] = 0 . For any t K ≤ t < t (cid:48) ≤ T , since t (cid:48) − m ≤ T − m ≤ t K − 1, so f ˜ T ( t (cid:48) − m ) < t K and | O ˜ T ( t, t (cid:48) ) | ≥ O ˜ T ( t, t (cid:48) ) ⊂ O T ( t, t (cid:48) ). So E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] ≥ (2 − ) B ≥ B > . So we have (cid:88) m +1 ≤ t Switchback Experiments EC.4.3.2. Proof of Lemma 3. Proof of Lemma 3. Recall that we denote t = 1 and t K +1 = T + 1. First, from Lemma 2, t ≥ m + 2 , t K ≤ T − m . So k = 1 and k = K cases both hold. Next, when 2 ≤ k ≤ K − 1, we prove bycontradiction.Suppose there exists some optimal design T , such that ∃ ≤ k ≤ K − , s.t. t k +1 − t k − ≤ m − . Denote K = { k ∈ { K − } | t k +1 − t k − ≤ m − } . Since K (cid:54) = ∅ , pick j = max K to be the largest element in K . Apparently j ≤ K − j ∈ { K − } . We also know that t j +2 ≥ t j + m, because otherwise j + 1 ∈ K , which contradicts themaximality of j .We now construct another design ˜ T such that (cid:12)(cid:12)(cid:12) ˜ T (cid:12)(cid:12)(cid:12) = K = | T | − 1, and the K elements are ˜ T = { ˜ t = 1 , ˜ t = t , ..., ˜ t j − = t j − , ˜ t j = t j +1 , ..., ˜ t K − = t K } . Table EC.4 An example of two regular switchback experiments T and ˜ T when m = 4 and t j = t j +1 − t j − + 2 . ... t j − t j − + 1 t j t j +1 t j +1 + 1 t j +1 + 2 t j +2 ... T ... (cid:88) − (cid:88) (cid:88) − − (cid:88) ...˜ T ... (cid:88) − − (cid:88) − − (cid:88) ... Each checkmark beneath a number indicates that this number is within that set; and each dash beneath a number indicatesthat this number is not within that set. For example, the checkmark (cid:88) beneath number t j indicates that t j ∈ T ; and thedash − beneath number t j indicates that t j (cid:54) = ˜ T . Next we argue that when Y = Y + or Y = Y − , r ( T , Y ) > r (˜ T , Y ) , which suggests that T is not the optimal design.First focus on the squared terms. When t ≤ t j − f m T ( t ) = f m ˜ T ( t ) is totally unchanged. E [ t ( T ) ] − E [ t (˜ T ) ] = 0 . When t j ≤ t ≤ t j + m − 1, this suggests that t − m ≤ t J − f ˜ T ≤ t j − 1. So t j / ∈ f m ˜ T ( t ) , t j ∈ f m T ( t ). And all the other determining randomization points are unchanged. So f m ˜ T ( t ) ⊂ f m T ( t ) and f m T ( t ) − { t j } = f m ˜ T ( t ) and (cid:12)(cid:12) f m ˜ T ( t ) (cid:12)(cid:12) ≥ E [ t ( T ) ] − E [ t (˜ T ) ] ≥ (2 − ) B = 4 B . When t j + m ≤ t ≤ T , either (i) f T ( t − m ) = t j , in which case f ˜ T ( t − m ) = t j − . This is the onlydifference between f m T ( t ) and f m ˜ T ( t ), i.e., f m T ( t ) − { t j } = f m ˜ T ( t ) − { t j − } . So | f m T ( t ) | = (cid:12)(cid:12) f m ˜ T ( t ) (cid:12)(cid:12) . Thesecond case is (ii) f T ( t − m ) ≥ t j +1 , in which case f m T ( t ) = f m ˜ T ( t ). Both cases suggest that E [ t ( T ) ] − E [ t (˜ T ) ] = 0 . -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec11 So we have T (cid:88) t = m +1 E (cid:2) t ( T ) (cid:3) − T (cid:88) t = m +1 E (cid:104) t (˜ T ) (cid:105) = t j − (cid:88) t = m +1 (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) + t j + m − (cid:88) t = t j (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) + T (cid:88) t = t j + m (cid:16) E (cid:2) t ( T ) (cid:3) − E (cid:104) t (˜ T ) (cid:105)(cid:17) ≥ t j + m − (cid:88) t = t j (4 B ) + 0=4( m − B > m + 1 ≤ t < t (cid:48) ≤ T . There are many cases whichwe summarize in Table EC.5 Table EC.5 Summary of the differences between cross-product terms under two regularswitchback experiments T and ˜ T . T ˜ T m + 1 ≤ t ≤ t j − , t < t (cid:48) ≤ T unchanged t j − ≤ t ≤ t j − , t < t (cid:48) ≤ t j + m − t j − ≤ t ≤ t j − , t j + m ≤ t (cid:48) ≤ t j +1 + m − B t j − ≤ t ≤ t j − , t j +1 + m ≤ t (cid:48) ≤ T unchanged t j ≤ t < t (cid:48) ≤ t j + m − | O T ( t,t (cid:48) ) | +1 B | O ˜ T ( t,t (cid:48) ) | +1 B t j ≤ t ≤ t j + m − , t j + m ≤ t (cid:48) ≤ T unchanged t j + m ≤ t < t (cid:48) ≤ T unchanged We explain Table EC.5.When m + 1 ≤ t ≤ t j − , t < t (cid:48) ≤ T , all the overlapping randomization points are earlier than t j − − 1, i.e., ∀ a ∈ O T ( t, t (cid:48) ) , a ≤ t j − − ∀ a ∈ O ˜ T ( t, t (cid:48) ) , a ≤ t j − − 1. So t j / ∈ O T ( t, t (cid:48) ), and the overlappingrandomization points are unchanged, i.e., O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ).When t j − ≤ t ≤ t j − , t < t (cid:48) ≤ t j + m − 1, all the overlapping randomization points are earlierthan t j − , i.e., ∀ a ∈ O T ( t, t (cid:48) ) , a ≤ t j − ; ∀ a ∈ O ˜ T ( t, t (cid:48) ) , a ≤ t j − . So t j / ∈ O T ( t, t (cid:48) ), and the overlappingrandomization points are unchanged, i.e., O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ).When t j − ≤ t ≤ t j − , t j + m ≤ t (cid:48) ≤ t j +1 + m − 1, changing from T to ˜ T increases the expectedvalues. This is because t (cid:48) − m ≥ t j > t . So first, O T ( t, t (cid:48) ) = ∅ . But f ˜ T ( t (cid:48) − m ) = t j − and t j − ∈ f m ˜ T ( t ),which suggests that t j − ∈ O ˜ T ( t, t (cid:48) ) . Also, ∀ a ∈ f m ˜ T ( t (cid:48) ) , a ≥ t j − ; ∀ a ∈ f m T ( t ) , a ≤ t j − , which suggeststhat t j − is the only overlapping element. So, O ˜ T ( t, t (cid:48) ) = { t j − } . In this case, E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] = (0 − ) B = − B . When t j − ≤ t ≤ t j − , t j +1 + m ≤ t (cid:48) ≤ T , since t (cid:48) − m ≥ t j +1 > t j > t , O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ) = ∅ . c12 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments When t j ≤ t < t (cid:48) ≤ t j + m − t j ∈ O T ( t, t (cid:48) ) and t j / ∈ O ˜ T ( t, t (cid:48) ). And all the other overlappingrandomization points are unchanged, so O T ( t, t (cid:48) ) − { t j } = O ˜ T ( t, t (cid:48) ) and | O ˜ T ( t, t (cid:48) ) | ≥ . In this case, E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] ≥ (2 − ) B = 4 B . When t j ≤ t ≤ t j + m − , t j + m ≤ t (cid:48) ≤ T , either (i) f m T ( t (cid:48) − m ) = t j , in which case f ˜ T ( t (cid:48) − m ) = t j − .This is the only difference between O T ( t, t (cid:48) ) and O ˜ T ( t, t (cid:48) ), i.e., O T ( t, t (cid:48) ) − { t j } = O ˜ T ( t, t (cid:48) ) − { t j − } . | O T ( t, t (cid:48) ) | = | O ˜ T ( t, t (cid:48) ) | . The second case is (ii) f T ( t (cid:48) − m ) ≥ t j +1 , in which case O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ) isunchanged. Both cases suggest that E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] = 0.When t j + m ≤ t < t (cid:48) ≤ T , either (i) f m T ( t (cid:48) − m ) = t j , in which case f ˜ T ( t (cid:48) − m ) = t j − . This is theonly difference between O T ( t, t (cid:48) ) and O ˜ T ( t, t (cid:48) ), i.e., O T ( t, t (cid:48) ) − { t j } = O ˜ T ( t, t (cid:48) ) − { t j − } . | O T ( t, t (cid:48) ) | = | O ˜ T ( t, t (cid:48) ) | . The second case is (ii) f T ( t (cid:48) − m ) ≥ t j +1 , in which case O T ( t, t (cid:48) ) = O ˜ T ( t, t (cid:48) ) is unchanged.Both cases suggest that E [ t ( T ) t (cid:48) ( T )] − E [ t (˜ T ) t (cid:48) (˜ T )] = 0.So we have (cid:88) m +1 ≤ t 1, so ( t j − t j − )( t j +1 − t j ) ≤ ( m − ≤ m ( m − .Combine both square terms and cross-product terms we know that r ( T , Y ) > r (˜ T , Y ) . (cid:3) EC.4.4. Proof of Theorem 2 Proof of Theorem 2. Think of E [ t ] as E [ t t ], so that r ( η T , Y ) = (cid:80) Tt = m +1 (cid:80) Tt (cid:48) = m +1 E [ t t (cid:48) ].Then we can decompose the risk function to be( T − m ) · r ( η T , Y ) = (cid:88) m +1 ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t − E [ t t (cid:48) ] + K − (cid:88) k =1 (cid:88) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] + (cid:88) t K ≤ t,t (cid:48) ≤ T E [ t t (cid:48) ](EC.8) -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec13 The core of this proof is to carefully count how many values can each E [ t t (cid:48) ] , ∀ t, t (cid:48) ∈ { m + 1 : T } take. See Table EC.6 for an illustration. Table EC.6 Illustrator of the different values of E [ t t ] , when T = 17 , m = 4 , T = { , , , } . (1 2 3 4) 5 6 7 8 9 10 11 12 13 14 15 16 17( (cid:88) − − − ) − (cid:88) − (cid:88) − − − − (cid:88) − − − −− (cid:88) − (cid:88) − − − − (cid:88) − − − − In the second line, each checkmark beneath number t indicates that period t ∈ T , i.e. there is a random-ization point at period t . This table illustrates different values of E [ t t (cid:48) ] when t, t (cid:48) ∈ { m + 1 , T } , wherethe zero values are omitted. The B magnitudes are also omitted. First we calculate the first block from equation (EC.8). Because t ≥ m + 2, for any t, t (cid:48) suchthat m + 1 ≤ min { t, t (cid:48) } ≤ t − m + 1 ≤ max { t, t (cid:48) } ≤ t + m − 1, we know that the only overlappingrandomization point is t . So E [ t t (cid:48) ] = 4 B . For any t, t (cid:48) such that m + 1 ≤ min { t, t (cid:48) } ≤ t − t + m ≤ max { t, t (cid:48) } ≤ T , there is no overlapping randomization point so E [ t t (cid:48) ] = 0 . (cid:88) m +1 ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t − E [ t t (cid:48) ] = B (cid:0) · (( t − − m ) (cid:1) Then we calculate the second block from equation (EC.8). For any k ∈ [ K − t k − t k − and t k +1 − t k , which jointly determine the values of E [ t t (cid:48) ] for any t, t (cid:48) , such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k ≤ max { t, t (cid:48) } ≤ T . We will go over each of the four cases below. (1) When t k − t k − ≥ m, t k +1 − t k ≥ m . Due to Lemma EC.4, for all t, t (cid:48) ∈ { t k : t k + m − } , E [ t t (cid:48) ] = 8 B , because both t k − ≤ t − m ≤ t k − t k − ≤ t (cid:48) − m ≤ t k − 1, and both t k − and t k are overlapping randomization points. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k + m ≤ max { t, t (cid:48) } ≤ t k +1 + m − E [ t t (cid:48) ] = 4 B , because t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k ≤ max { t, t (cid:48) } − m ≤ t k +1 − t k is the overlapping randomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k +1 + m ≤ max { t, t (cid:48) } ≤ T , E [ t t (cid:48) ] = 0. c14 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments In this case, (cid:88) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] = B (cid:0) · m + 4 · (( m + t k +1 − t k ) − m ) (cid:1) (2) When t k − t k − ≥ m, t k +1 − t k < m . Due to Lemma EC.4, for all t, t’ such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k ≤ max { t, t (cid:48) } ≤ t k + m − E [ t t (cid:48) ] = 8 B , because both t, t (cid:48) ≤ t k + m − 1, so t k − ≤ t − m ≤ t k − t k − ≤ t (cid:48) − m ≤ t k − 1, and both t k − and t k are overlapping randomization points. Forall t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k + m ≤ max { t, t (cid:48) } ≤ t k +1 + m − E [ t t (cid:48) ] = 4 B ,because t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k ≤ max { t, t (cid:48) } − m ≤ t k +1 − t k is the overlappingrandomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − t k +1 + m ≤ max { t, t (cid:48) } ≤ T , E [ t t (cid:48) ] = 0.In this case, (cid:88) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] = B (cid:0) · ( m − ( m − t k +1 + t k ) ) + 4 · (( m + t k +1 − t k ) − m + ( m − t k +1 − t k ) ) (cid:1) (3) When t k − t k − < m, t k +1 − t k ≥ m . Due to Lemma EC.4, for all t, t (cid:48) ∈ { t k : t k − + m − } , E [ t t (cid:48) ] = 16 B , because t − m ≤ t k − − ≤ t k ≤ t and t (cid:48) − m ≤ t k − − ≤ t k ≤ t (cid:48) so t k − , t k − , t k arethree determining randomization points. Also t k − t k − ≥ m so t k − ≤ min { t, t (cid:48) } − m and t k − is nota determining randomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k + m − , t k − + m ≤ max { t, t (cid:48) } ≤ t k + m − E [ t t (cid:48) ] = 8 B , because min { t, t (cid:48) } − m ≤ t k − t k − ≤ max { t, t (cid:48) } − m ≤ t k − t k − and t k are two determining randomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − , t k + m ≤ max { t, t (cid:48) } ≤ t k +1 + m − E [ t t (cid:48) ] = 4 B , because t k ≤ max { t, t (cid:48) } − m so t k is theonly determining randomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − , t k +1 + m ≤ max { t, t (cid:48) } ≤ T , E [ t t (cid:48) ] = 0 . In this case, (cid:88) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] = B (cid:0) · ( m − t k + t k − ) + 8 · ( m − ( m − t k + t k − ) )+4 · (( m + t k +1 − t k ) − m ) (cid:1) (4) When t k − t k − < m, t k +1 − t k < m . Due to Lemma EC.4, for all t, t (cid:48) ∈ { t k : t k − + m − } , E [ t t (cid:48) ] = 16 B , because t − m ≤ t k − − ≤ t k ≤ t and t (cid:48) − m ≤ t k − − ≤ t k ≤ t (cid:48) so t k − , t k − , t k are three determining randomization points. Also t k − t k − ≥ m so t k − ≤ min { t, t (cid:48) } − m and t k − isnot a determining randomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − , t k − + m ≤ max { t, t (cid:48) } ≤ t k + m − E [ t t (cid:48) ] = 8 B , because min { t, t (cid:48) } − m < t k − t k − ≤ max { t, t (cid:48) } − m ≤ -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec15 t k − t k − and t k are two determining randomization points. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − , t k + m ≤ max { t, t (cid:48) } ≤ t k +1 + m − E [ t t (cid:48) ] = 4 B , because t k ≤ max { t, t (cid:48) } − m so t k is theonly determining randomization point. For all t, t (cid:48) such that t k ≤ min { t, t (cid:48) } ≤ t k +1 − , t k +1 + m ≤ max { t, t (cid:48) } ≤ T , E [ t t (cid:48) ] = 0 . In this case, (cid:88) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] = B (cid:0) · ( m − t k + t k − ) + 8 · ( m − ( m − t k + t k − ) − ( m − t k +1 + t k ) )+4 · (( m + t k +1 − t k ) − m + ( m − t k +1 + t k ) ) (cid:1) Finally we calculate the third block from equation (EC.8). Observe that T − t K ≥ m . (1) When t K − t K − ≥ m . Due to Lemma EC.4, for all t, t (cid:48) ∈ { t K : t K + m − } , E [ t t (cid:48) ] = 8 B , becauseboth t K − ≤ t − m ≤ t K − t K − ≤ t (cid:48) − m ≤ t K − 1, and both t K − and t K are overlappingrandomization points. For all t, t (cid:48) such that t K ≤ min { t, t (cid:48) } ≤ T, t K + m ≤ max { t, t (cid:48) } ≤ T , E [ t t (cid:48) ] =4 B , because t K ≤ max { t, t (cid:48) } − m so t K is the only determining randomization point.In this case, (cid:88) t K ≤ t,t (cid:48) ≤ T E [ t t (cid:48) ] = B (cid:0) · m + 4 · (( T + 1 − t K ) − m ) (cid:1) (2) When t K − t K − < m . Due to Lemma EC.4, for all t, t (cid:48) ∈ { t K : t K − + m − } , E [ t t (cid:48) ] = 16 B ,because t − m ≤ t K − − ≤ t K ≤ t and t (cid:48) − m ≤ t K − − ≤ t K ≤ t (cid:48) so t K − , t K − , t K are threedetermining randomization points. Also t K − t K − ≥ m so t K − ≤ min { t, t (cid:48) } − m and t K − is not adetermining randomization point. For all t, t (cid:48) such that t K ≤ min { t, t (cid:48) } ≤ t K + m − , t K − + m ≤ max { t, t (cid:48) } ≤ t K + m − E [ t t (cid:48) ] = 8 B , because min { t, t (cid:48) } − m ≤ t K − t K − ≤ max { t, t (cid:48) } − m ≤ t K − t K − and t K are two determining randomization points. For all t, t (cid:48) such that t K ≤ min { t, t (cid:48) } ≤ T, t K + m ≤ max { t, t (cid:48) } ≤ T , E [ t t (cid:48) ] = 4 B , because t K ≤ max { t, t (cid:48) } − m so t K is theonly determining randomization point.In this case, (cid:88) t K ≤ t,t (cid:48) ≤ T E [ t t (cid:48) ] = B (cid:0) · ( m − t K + t K − ) + 8 · ( m − ( m − t K + t K − ) ) + 4 · (( T + 1 − t K ) − m ) (cid:1) Now we combine all above together.Note that whenever there exists k ∈ { K } such that ( t k − t k − ) < m , this suggests that in (cid:80) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] there is a 16( m − t k + t k − ) ; but in (cid:80) t k − ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k − E [ t t (cid:48) ] there is a c16 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments − ( m − t k + t k − ) ). So when we sum them up, we break 16( m − t k + t k − ) into two 8( m − t k + t k − ) , which cancels in two sumations. By telescoping,( T − m ) · r ( η T , Y ) = (cid:88) m +1 ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t − E [ t t (cid:48) ] + K − (cid:88) k =1 (cid:88) t k ≤ t,t (cid:48) ≤ T min { t,t (cid:48) }≤ t k +1 − E [ t t (cid:48) ] + (cid:88) t K ≤ t,t (cid:48) ≤ T E [ t t (cid:48) ]= 4 B · (cid:0) ( t − − m (cid:1) + K − (cid:88) k =1 B · (cid:16) m + 4 (cid:16) ( m + t k +1 − t k ) − m + (cid:0) ( m − t k +1 + t k ) + (cid:1) (cid:17)(cid:17) + B · (cid:0) m + 4 (cid:0) ( T + 1 − t K ) − m (cid:1)(cid:1) = B · (cid:40) K (cid:88) k =0 ( t k +1 − t k ) + 8 m ( t K − t ) + 4 m K − m + 4 K − (cid:88) k =1 [( m − t k +1 + t k ) + ] (cid:41) which finishes the proof. (cid:3) EC.4.5. Optimal Solutions to the Subset Selection Problem in Theorem 3EC.4.5.1. Proof of Theorem 3. Proof of Theorem 3. Consider the problem as we have introduced in (6). Due to Lemma 1, Y + = { Y t ( m +1 ) = Y t ( m +1 ) = B } t ∈{ m +1: T } and Y − = { Y t ( m +1 ) = Y t ( m +1 ) = − B } t ∈{ m +1: T } are theonly two dominating strategies for the adversarial selection of potential outcomes.Then due to Lemma 2 and Lemma 3, the optimal design of switchback experiment must satisfythe following three conditions. t ≥ m + 2 , t K ≤ T − m t k +1 − t k − ≥ m, ∀ k ∈ [ K ] . Due to Theorem 2, the risk function of the optimal design of experiment is given by r ( η T , Y ) = 1( T − m ) (cid:40) K +1 (cid:88) k =1 ( t k − t k − ) + 8 m ( t K − t ) + 4 m K − m + 4 K (cid:88) k =2 [( m − t k + t k − ) + ] (cid:41) B So if we further take minimum over T ⊂ [ T ] in the above risk function, we find the optimalsolution to the original problem introduced in (6). Note that B is a constant and irrelevant to ourdecisions, and that T and m are inputs. So we solvemin T ⊂ [ T ] (cid:40) K (cid:88) k =0 ( t k +1 − t k ) + 8 m ( t K − t ) + 4 m K − m + 4 K − (cid:88) k =1 [( m − t k +1 + t k ) + ] (cid:41) as stated in (8).In particular, if there exists some constant n ∈ N , n ≥ 4, such that T = nm , we can explicitly findthe optimal design of experiment. Take the continuous relaxation of this problem, such that forany K , { < t < t < ... < t K < T + 1 } ∈ [1 , T + 1] K .min K ∈ N , { Switchback Experiments ec17 The relaxed problem provides a lower bound to the original subset selection problem as stated in(8). We will argue later that it is a lucky coincidence that the optimal solution to this relaxedproblem is also an integer solution.First we argue that t − t = t K +1 − t K . This is because otherwise if t − t (cid:54) = t K +1 − t K thendenote a = t − t + t K +1 − t K . We could always pick for any k ∈ { K } , ˜ t k = t k + a − t + 1, such that t k +1 − t k is unchanged for any k ∈ { K − } . The only change in the objective value comes from (cid:0) a (cid:1) − (cid:0) ( t − t ) + ( t K +1 − t K ) (cid:1) < , which suggests that t − t (cid:54) = t K +1 − t K is not optimal.Second, similarly, we argue that for any k (cid:48) < k (cid:48)(cid:48) ∈ [ K − t k (cid:48) +1 − t k (cid:48) = t k (cid:48)(cid:48) +1 − t k (cid:48)(cid:48) This is becauseotherwise if t k (cid:48) +1 − t k (cid:48) (cid:54) = t k (cid:48)(cid:48) +1 − t k (cid:48)(cid:48) then denote b = t k (cid:48) +1 − t k (cid:48) + t k (cid:48)(cid:48) +1 − t k (cid:48)(cid:48) . We could always pick for any k ∈ { k (cid:48) + 1 : k (cid:48)(cid:48) } , ˜ t k = t k + b − ( t k (cid:48) +1 − t k (cid:48) ), such that t k +1 − t k is unchanged for any k ∈ { k (cid:48) + 1 : k (cid:48)(cid:48) − } .The only change in the objective value comes from (cid:0) b + 2(( m − b ) + ) (cid:1) − (cid:0) ( t k (cid:48) +1 − t k (cid:48) ) + ( t k (cid:48)(cid:48) +1 − t k (cid:48)(cid:48) ) + (( m − t k (cid:48) +1 + t k (cid:48) ) + ) + (( m − t k (cid:48)(cid:48) +1 + t k (cid:48)(cid:48) ) + ) (cid:1) < , where x + (( m − x ) + ) is convex and the inequality holds due to Jensen’s Inequality. This inequalitysuggests that t k (cid:48) +1 − t k (cid:48) (cid:54) = t k (cid:48)(cid:48) +1 − t k (cid:48)(cid:48) is not optimal.With the above two structural results, we can assume that there exists a, b > 0, such that t − t = t K +1 − t K = a , and t k +1 − t k = b, ∀ k ∈ [ K − 1] Also, it must be satisfied that 2 a + ( K − b = T . Nextwe replace K − T − ab into the relaxed problem, to havemin a,b> (cid:8) a + ( K − b ) + 8 m ( K − b + 4 m ( K − 1) + 4( K − m − b ) + ) (cid:9) = min a,b> (cid:26) a + 4( T − a ) b + 8 m ( T − a ) + 4 m T − ab + 4 T − ab (( m − b ) + ) (cid:27) Either when b ≥ m , the above is to minimizemin a,b> (cid:26) a + 4( T − a ) b + 8 m ( T − a ) + 4 m T − ab (cid:27) Note that8 a + 4( T − a ) b + 8 m ( T − a ) + 4 m T − ab =8 a + 8 m ( T − a ) + 4( T − a ) (cid:18) b + m b (cid:19) ≥ a + 16 m ( T − a )=8( a − m ) + 16 mT − m ≥ mT − m where the first inequality takes equality if and only if b = m b , which suggests b = m ; the secondinequality takes equality if and only if a = 2 m . c18 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments Or when b ≤ m , the above is to minimizemin a,b> (cid:26) a + 4( T − a ) b + 8 m ( T − a ) + 4 m T − ab + 4 T − ab ( m − b ) (cid:27) Note that8 a + 4( T − a ) b + 8 m ( T − a ) + 4 m T − ab + 4 T − ab ( m − b ) =8 a + 8( T − a ) (cid:18) b + m b (cid:19) ≥ a + 16 m ( T − a )=8( a − m ) + 16 mT − m ≥ mT − m where the first inequality takes equality if and only if b = m b , which suggests b = m ; the secondinequality takes equality if and only if a = 2 m .Combining both cases, the optimal solution is when a = 2 m and b = m , which happens to bean integer solution, thus optimal for the subset selection problem. Translating into t , ..., t K thissuggests that t = 2 m + 1 , t = 3 m + 1 , ..., t K = ( n − m + 1. (cid:3) EC.4.5.2. Solutions in the Imperfect Cases. It is always worth noting that we are takinga design of experiments perspective. So when practically we have control of T , we can pick T tobe some multiples of m , which fits our Theorem 3 perfectly. If we do not have control of T , we canalways pick a smaller T (cid:48) such that T (cid:48) = (cid:98) T /m (cid:99) · m is some multiples of m .Nonetheless, from an optimization perspective, we establish the following optimal structures forthe subset selection problem as in (8). Recall that t K +1 = T + 1. Lemma EC.5. Under Assumptions 1–3, the optimal design of regular switchback experimentmust satisfy the following two conditions, | ( t − t ) − ( t K +1 − t K ) | ≤ , | ( t j +1 − t j ) − ( t j (cid:48) +1 − t j (cid:48) ) | ≤ , ∀ ≤ j, j (cid:48) ≤ K − . Proof of Lemma EC.5. Prove by contradiction. Case 1. Suppose there exists some optimal design T , such that ( t − t ) − ( t K +1 − t K ) ≥ 2. Wenow construct another design ˜ T , such that (cid:12)(cid:12)(cid:12) ˜ T (cid:12)(cid:12)(cid:12) = K = | T | , and the K elements are ˜ T = { ˜ t = 1 , ˜ t = t − , ˜ t = t − , ..., ˜ t K = t K − } . Now check the expression as in (8). Note that ˜ t k +1 − ˜ t k = t k +1 − t k is unchanged for any k ∈ [ K − t K − ˜ t = t K − t is unchanged; and m − ˜ t k +1 − ˜ t k = m − t k +1 − t k in unchanged for any k ∈ [ K − t − ˜ t ) + (˜ t K +1 − ˜ t K ) = ( t − t − + ( t K +1 − t K + 1) ≤ ( t − t ) + ( t K +1 − t K ) , because ( t − t ) − ( t K +1 − t K ) ≥ T , such that ( t K +1 − t K ) − ( t − t ) ≥ 2, thenconstruct another design ˜ T = { ˜ t = 1 , ˜ t = t + 1 , ˜ t = t + 1 , ..., ˜ t K = t K + 1 } . -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec19 Case 2. Suppose there exists some optimal design T , and there exists 1 ≤ j < j (cid:48) ≤ K − t j +1 − t j ) − ( t j (cid:48) +1 − t j (cid:48) ) ≥ 2. We now construct another design ˜ T , such that (cid:12)(cid:12)(cid:12) ˜ T (cid:12)(cid:12)(cid:12) = K = | T | , and the K elements are ˜ T = { ˜ t = 1 , ˜ t = t , ..., ˜ t j = t j , ˜ t j +1 = t j +1 − , ..., ˜ t j (cid:48) = t j (cid:48) − , ˜ t j (cid:48) +1 = t j (cid:48) +1 , ..., ˜ t K = t K } .Now check the expression as in (8). Note that ˜ t k +1 − ˜ t k = t k +1 − t k is unchanged for any k ∈ { K } except j and j (cid:48) ; ˜ t K − ˜ t = t K − t is unchanged; and m − ˜ t k +1 − ˜ t k = m − t k +1 − t k in unchanged forany k ∈ [ K − 1] except j and j (cid:48) . Now focus on j and j (cid:48) .(˜ t j +1 − ˜ t j ) + (˜ t j (cid:48) +1 − ˜ t j (cid:48) ) + [( m − ˜ t j +1 + ˜ t j ) + ] + [( m − ˜ t j (cid:48) +1 + ˜ t j (cid:48) ) + ] =( t j +1 − t j − + ( t j (cid:48) +1 − t j (cid:48) + 1) + [( m − t j +1 + t j + 1) + ] + [( m − t j (cid:48) +1 + t j (cid:48) − + ] ≤ ( t j +1 − t j ) + ( t j (cid:48) +1 − t j (cid:48) ) + [( m − t j +1 + t j ) + ] + [( m − t j (cid:48) +1 + t j (cid:48) ) + ] To see why this inequality holds, define g ( x ) = x + [( m − x ) + ] and note that g ( x ) is a univariateconvex function. The inequality holds due to ( t j +1 − t j ) − ( t j (cid:48) +1 − t j (cid:48) ) ≥ T , and there exists 1 ≤ j < j (cid:48) ≤ K − t j (cid:48) +1 − t j (cid:48) ) − ( t j +1 − t j ) ≥ 2. Then construct another design ˜ T = { ˜ t = 1 , ˜ t = t , ..., ˜ t j = t j , ˜ t j +1 = t j +1 + 1 , ..., ˜ t j (cid:48) = t j (cid:48) + 1 , ˜ t j (cid:48) +1 = t j (cid:48) +1 , ..., ˜ t K = t K } .Combine both cases we finish the proof. (cid:3) EC.5. Proofs and Discussions from Section 4 In Section 4 we focus on the case when p = m . Throughout this section in the appendix, we useonly m instead of p . EC.5.1. Extra Notations Used in the Proofs from Section 4 For any t ∈ { m + 1 : T } , we use the notations of t as defined in (EC.1). Denote¯ = m (cid:88) t = m +1 t ¯ k = ( k +2) m (cid:88) t =( k +1) m +1 t , ∀ k ∈ [ K ]¯ K +1 = ( K +3) m (cid:88) t =( K +2) m +1 t It is worth noting that under the optimal design as suggested by Theorem 3, when T /m = n ∈ N is an integer, we have K = n − 3. So ( K + 3) m = T . See Example EC.1 below. Example EC.1 (An Optimal Design and Its ¯ k Notations). When T = 12, p = m = 2,the optimal design of regular switchback experiment is T ∗ = { , , , } , and K = 3. The ¯ k notationsare defined below. Each ¯ k spans m = 2 periods. See Table EC.7. (cid:3) c20 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments Table EC.7 An example of the optimal design T ∗ and its ¯ k notations when T = 12 and p = m = 2 . T ∗ (cid:88) − − − (cid:88) − (cid:88) − (cid:88) − − −−{ ¯ k } K +1 k =0 ¯ ¯ ¯ ¯ ¯ Using the above notation, we could writeˆ τ m − τ m = 1 T − m K +1 (cid:88) k =0 ¯ k , and so Var (ˆ τ m ) = 1( T − m ) Var (cid:32) K +1 (cid:88) k =0 ¯ k (cid:33) . EC.5.2. Proof of Theorem 4 The proof of Theorem 4 resembles the proof of Lemmas EC.2 and EC.3. The trick here is to observethat for any k ∈ [ K ], the values of all the variables t , where ( k + 1) m + 1 ≤ t ≤ ( k + 2) m , are alldetermined by the randomization at time km + 1 and ( k + 1) m + 1. Since they are all correlated,we can use ¯ k to stand for (cid:80) ( k +2) mt =( k +1) m +1 t for short. Proof of Theorem 4. First observe that ¯ k has zero mean for each k ∈ { K + 1 } . So we candecompose the variance into squared terms and cross-product terms,( T − m ) Var (ˆ τ m ) = Var (cid:32) K +1 (cid:88) k =0 ¯ k (cid:33) = K +1 (cid:88) k =0 E (cid:2) ¯ k (cid:3) + (cid:88) ≤ k 2, ¯ k = ¯ Y ( m +1 ) + ¯ Y ( m +1 );with probability 1 / 2, ¯ k = − ¯ Y ( m +1 ) − ¯ Y ( m +1 ). When k ∈ [ K ], with probability 1 / 4, ¯ k =3 ¯ Y ( m +1 ) + ¯ Y ( m +1 ); with probability 1 / 2, ¯ k = − ¯ Y ( m +1 ) + ¯ Y ( m +1 ); with probability 1 / k = − ¯ Y ( m +1 ) − Y ( m +1 ).Then for the cross-product terms, if k (cid:48) − k ≥ 2, then ¯ k and ¯ k (cid:48) are independent, i.e., E [¯ k ¯ k (cid:48) ] = 0.If k (cid:48) − k = 1, then E [¯ k ¯ k +1 ] = ( ¯ Y k ( m +1 ) + ¯ Y k ( m +1 )) · ( ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 ))This is because the values of ¯ k and ¯ k +1 are determined by the realization at 3 ran-domization points, W km +1 , W ( k +1) m +1 , W ( k +2) m +1 . With probability 1 / 8, ¯ k ¯ k +1 = (3 ¯ Y k ( m +1 ) + -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec21 ¯ Y k ( m +1 )) · (3 ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = (3 ¯ Y k ( m +1 ) + ¯ Y k ( m +1 )) · ( − ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = ( − ¯ Y k ( m +1 ) + ¯ Y k ( m +1 )) · (3 ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = ( − ¯ Y k ( m +1 ) + ¯ Y k ( m +1 )) · ( − ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = ( − ¯ Y k ( m +1 ) + ¯ Y k ( m +1 )) · ( − ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = ( − ¯ Y k ( m +1 ) + ¯ Y k ( m +1 )) · ( − ¯ Y k +1 ( m +1 ) − Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = ( − ¯ Y k ( m +1 ) − Y k ( m +1 )) · ( − ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 )); with probability 1 / 8, ¯ k ¯ k +1 = ( − ¯ Y k ( m +1 ) − Y k ( m +1 )) · ( − ¯ Y k +1 ( m +1 ) − Y k +1 ( m +1 )).Combining the squared terms and the cross-product terms we finish the proof. (cid:3) EC.5.3. Discssions and proof of Corollary 1 We first provide the details of the two variance upper bounds here. Var U1 (ˆ τ m ) = 1( T − m ) (cid:40) (cid:2) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) (cid:3) + n − (cid:88) k =1 (cid:2) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:3) + 4 (cid:2) ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) (cid:3) + n − (cid:88) k =0 (cid:2) ¯ Y k ( m +1 ) · ¯ Y k +1 ( m +1 ) + ¯ Y k ( m +1 ) · ¯ Y k +1 ( m +1 ) (cid:3)(cid:41) , and Var U2 (ˆ τ m ) = 1( T − m ) (cid:40) (cid:2) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) (cid:3) + n − (cid:88) k =1 (cid:2) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:3) + 4 (cid:2) ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) (cid:3) (cid:41) . We prove Corollary 1 using the basic inequality that 2 xy ≤ x + y . Such an inequality is com-monly used to find a conservative upper bound of the variance. Proof of Corollary 1. From Theorem 4, the variance of the estimator is given by( T − m ) Var (ˆ τ m ) ≤ (cid:8) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) (cid:9) + n − (cid:88) k =1 (cid:8) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:9) + 2 (cid:8) ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) (cid:9) + n − (cid:88) k =0 (cid:2) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:3) · (cid:2) ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 ) (cid:3) ≤ (cid:8) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) (cid:9) + n − (cid:88) k =1 (cid:8) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:9) + 2 (cid:8) ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) (cid:9) + n − (cid:88) k =0 (cid:8) Y k ( m +1 ) ¯ Y k +1 ( m +1 ) + 2 ¯ Y k ( m +1 ) ¯ Y k +1 ( m +1 ) + ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) + ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 ) (cid:9) ≤ (cid:8) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) (cid:9) + n − (cid:88) k =1 (cid:8) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:9) + 3 (cid:8) ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) (cid:9) c22 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments + n − (cid:88) k =0 (cid:8) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) + ¯ Y k +1 ( m +1 ) + ¯ Y k +1 ( m +1 ) (cid:9) =4 (cid:8) ¯ Y ( m +1 ) + ¯ Y ( m +1 ) (cid:9) + n − (cid:88) k =1 (cid:8) ¯ Y k ( m +1 ) + ¯ Y k ( m +1 ) (cid:9) + 4 (cid:8) ¯ Y n − ( m +1 ) + ¯ Y n − ( m +1 ) (cid:9) where the first inequality suggests Var (ˆ τ m ) ≤ Var U1 (ˆ τ m ), and the last inequality suggests Var U1 (ˆ τ m ) ≤ Var U2 (ˆ τ m ).The unbiasedness part is due to the estimator of the variances being Horvitz-Thompson typeestimators, and that regular switchback experiments naturally satisfy Assumption 4. (cid:3) EC.5.4. Proof of Theorem 5 We prove Theorem 5 by using Lemma EC.1. In particular, we derive B n,k,a , and then constructsome proper ∆ n , K n , and L n . Proof of Theorem 5. In the n -replica experiment, ˆ τ m − τ m = n − m (cid:80) n − k =0 ¯ k , and Var (ˆ τ m ) = n − m Var (cid:16)(cid:80) n − k =0 ¯ k (cid:17) . To use the language from Lemma EC.1, denote d = n − 1. Denote forany i ∈ [ n − X n,i = n − m ¯ i − so we know that φ = 1, i.e., { X n, , X n, , ... } is a sequence of1-dependent random variables.First note that B n = Var (ˆ τ m ), and we calculate B n,k,a as follows. B n,k,a = 1( n − m Var (cid:32) a + k − (cid:88) i = a ¯ i − (cid:33) ≤ n − m (cid:40) a + k − (cid:88) i = a (cid:2) Y i − ( m +1 ) + 3 ¯ Y i − ( m +1 ) + 2 ¯ Y i − ( m +1 ) ¯ Y i − ( m +1 ) (cid:3) + a + k − (cid:88) i = a 2[ ¯ Y i − ( m +1 ) + ¯ Y i − ( m +1 )] · [ ¯ Y i ( m +1 ) + ¯ Y i ( m +1 )] (cid:41) ≤ km B + 8( k − m B ( n − m ≤ kB ( n − Pick γ = 0 , δ = 1, then ∆ n = B / ( n − , K n = 16 B / ( n − , and L n = Var (ˆ τ m ) / ( n − E | X n,i | ≤ ∆ n = B / ( n − , because all the potential outcomes are bounded by B , so that X n,i ≤ B/ ( n − B n,k,a /k ≤ K n = 16 B / ( n − .3. B n / ( n − ≥ L n = Var (ˆ τ m ) / ( n − K n /L n = 16 B / ( n − Var (ˆ τ m ) = O (1), where the last equality is due to Assumption 5.5. ∆ n /L / n = B / ( n − / Var (ˆ τ m ) / = O (1), where the last equality is due to Assumption 5. -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec23 Due to Lemma EC.1, ˆ τ m − τ m (cid:112) Var (ˆ τ m ) D −→ N (0 , . (cid:3) EC.6. Proofs and Discussions from Section 5 In Section 5 we discuss the cases when m is misspecified. Throughout this section in the appendix,we use both p and m . Recall that m is the order of carryover effect, and p is the experimenter’sknowledge of m . EC.6.1. Unbiasedness of the Horvitz-Thompson Estimator when m is Misspecified We state here the omitted mathematics in Theorem 6.Under Assumptions 2 and 3, for m > p , at each time t ≥ m + 1, the Horvitz-Thompson estimatoris either unbiased for the lag- m causal effect when f ( t − p ) ≤ t − m , i.e., E W T ∼ η T (cid:20) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:21) = Y t ( m +1 ) − Y t ( m +1 ) , or conditionally unbiased for the m -misspecified lag- p causal effect when f ( t − p ) > t − m , i.e., E W T ∼ η T (cid:20)(cid:26) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:27) − (cid:8) Y t ( w obs t − m : f ( t − p ) − , t − f ( t − p )+1 ) − Y t ( w obs t − m : f ( t − p ) − , t − f ( t − p )+1 ) (cid:9) (cid:12)(cid:12)(cid:12)(cid:12) W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − (cid:21) = 0 . When p + 1 ≤ t ≤ m , the Horvitz-Thompson estimator is either unbiased for the lag- t causal effectwhen f ( t − p ) = 1, i.e., E W T ∼ η T (cid:20) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:21) = Y t ( t ) − Y t ( t ) , or conditionally unbiased for the m -misspecified lag- t causal effect when f ( t − p ) > 1, i.e., E W T ∼ η T (cid:20)(cid:26) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:27) − (cid:8) Y t ( w obs f ( t − p ) − , t − f ( t − p )+1 ) − Y t ( w obs f ( t − p ) − , t − f ( t − p )+1 ) (cid:9) (cid:12)(cid:12)(cid:12)(cid:12) W f ( t − p ) − = w obs f ( t − p ) − (cid:21) = 0 . To remove the conditional expectation, we can further take an outer loop of expectation averagedover the past assignment paths. So the estimator is estimating a weighted average of lag- p effects.When t ≥ m + 1, (cid:88) w t − m : f ( t − p ) − Pr( W t − m : f ( t − p ) − = w t − m : f ( t − p ) − )( Y t ( w t − m : f ( t − p ) − , t − f ( t − p )+1 ) − Y t ( w t − m : f ( t − p ) − , t − f ( t − p )+1 )) , c24 e-companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments and when p + 1 ≤ t ≤ m , (cid:88) w f ( t − p ) − Pr( W f ( t − p ) − = w f ( t − p ) − )( Y t ( w f ( t − p ) − , t − f ( t − p )+1 ) − Y t ( w f ( t − p ) − , t − f ( t − p )+1 )) . We prove Theorem 6 as follows. Proof of Theorem 6. Focus on any specific t ∈ { m + 1 : T } .When f ( t − p ) ≤ t − m , both 0 < Pr( W t − p : t = p +1 ) , Pr( W t − p : t = p +1 ) < 1. With probabilityPr( W t − p : t = p +1 ) (cid:54) = 0, { W t − p : t = p +1 } = 1, and Y obs t = Y t ( m +1 ). So E (cid:104) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:105) = Y t ( m +1 ). Similarly E (cid:104) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:105) = Y t ( m +1 ). So E W T ∼ η T (cid:20)(cid:26) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) (cid:27)(cid:21) = Y t ( m +1 ) − Y t ( m +1 ) . When f ( t − p ) > t − m , both 0 < Pr (cid:16) W t − p : t = p +1 (cid:12)(cid:12)(cid:12) W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − (cid:17) < < Pr (cid:16) W t − p : t = p +1 (cid:12)(cid:12)(cid:12) W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − (cid:17) < 1. Conditional on W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − , we know that with probability Pr (cid:16) W t − p : t = p +1 (cid:12)(cid:12)(cid:12) W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − (cid:17) (cid:54) =0, { W t − p : t = p +1 } = 1, and Y obs t = Y t ( w obs t − m : f ( t − p ) − , t − f ( t − p )+1 ). So E W T ∼ η T (cid:20) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y t ( w obs t − m : f ( t − p ) − , t − f ( t − p )+1 ) (cid:12)(cid:12) W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − (cid:21) = 0 . Similarly, we have E W T ∼ η T (cid:20) Y obs t { W t − p : t = p +1 } Pr( W t − p : t = p +1 ) − Y t ( w obs t − m : f ( t − p ) − , t − f ( t − p )+1 ) (cid:12)(cid:12) W t − m : f ( t − p ) − = w obs t − m : f ( t − p ) − (cid:21) = 0 , which finishes the proof. (cid:3) EC.6.2. Asymptotic Normality when m is Misspecified The proof of Corollary 2 consists of two parts: m < p and m > p . When m < p we consult Theorems 4and 5. When m > p we prove Corollary 2 by using Lemma EC.1. In particular, we derive B n,k,a ,and then construct some proper ∆ n , K n , and L n . Proof of Corollary 2. The proof consists of two parts: m < p and m > p . First, when m < p , weknow that ˆ τ p = ˆ τ m , τ p = τ m , Var (ˆ τ p ) = Var (ˆ τ m ). Due to Theorems 4 we prove part (i) the expressionin (11). Due to Theorem 5 we know thatˆ τ p − τ p (cid:112) Var (ˆ τ p ) = ˆ τ m − τ m (cid:112) Var (ˆ τ m ) D −→ N (0 , . Second, when m > p , then we follow the same trick as in Theorem 5. In the n -replica exper-iment, ˆ τ p − E [ τ [ m ] p ] = n − p (cid:80) n − k =0 ¯ k , and Var (ˆ τ p ) = n − p Var (cid:16)(cid:80) n − k =0 ¯ k (cid:17) . To use the languagefrom Lemma EC.1, denote d = n − 1. Denote for any i ∈ [ n − X n,i = n − p ¯ i − . We know that -companion to Bojinov, Simchi-Levi, and Zhao: Switchback Experiments ec25 Table EC.8 An illustration of φ when m = 5 , p = 3 . −−−−−−−−−−−−−−−−−−−−−−−−−−−→ carryover effect . . . 13 14 15 16 17 18 19 20 21 22 23 24 . . . T ∗ (cid:88) − − (cid:88) − − (cid:88) − − (cid:88) − −{ ¯ k } K +1 k =0 ¯ ¯ ¯ ¯ In this example φ = (cid:100) mp (cid:101) = 2. The arrow above numbers 17 through 22 means that the assignment on period 17 affects the outcomeon period 22. So that ¯ and ¯ are correlated, but ¯ and ¯ are independent. φ = (cid:100) mp (cid:101) , so that { X n, , X n, , ... } is a sequence of φ -dependent random variables. See Table EC.8for an illustration of φ .First note that B n = Var (ˆ τ p ), and we calculate B n,k,a as follows. Note that k ≥ φ + 1. B n,k,a = 1( n − p Var (cid:32) a + k − (cid:88) i = a ¯ i − (cid:33) ≤ n − p (cid:32) a + k − (cid:88) i = a E [¯ i − ] + a + k − (cid:88) i = a E [¯ i − ¯ i ] + ... + a + k − φ (cid:88) i = a E [¯ i − ¯ i − φ ] (cid:33) ≤ Cp B ( n − p · ( k + ( k − 1) + ... + ( k − φ )) ≤ ( φ + 1) CkB ( n − where C is some constant bounding the number of terms in each cross-product expectation2 E [¯ i − ¯ i ] , ..., E [¯ i − ¯ i − φ ]; and φ + 1 is a constant as well.Pick γ = 0 , δ = 1, then ∆ n = B / ( n − , K n = ( φ + 1) CB / ( n − , and L n = Var (ˆ τ m ) / ( n − E | X n,i | ≤ ∆ n = B / ( n − , because all the potential outcomes are bounded by B , so that X n,i ≤ B/ ( n − B n,k,a /k ≤ K n = ( φ + 1) CB / ( n − .3. B n / ( n − ≥ L n = Var (ˆ τ m ) / ( n − K n /L n = ( φ + 1) CB / ( n − Var (ˆ τ m ) = O (1), where the last equality is due to Assumption 5.5. ∆ n /L / n = B / ( n − / Var (ˆ τ m ) / = O (1), where the last equality is due to Assumption 5.Due to Lemma EC.1, ˆ τ p − τ p (cid:112) Var (ˆ τ p ) D −→ N (0 , ..