[PDF] The Micro-Randomized Trial for Developing Digital Interventions: Data Analysis Methods

Abstract

Although there is much excitement surrounding the use of mobile and wearable technology for the purposes of delivering interventions as people go through their day-to-day lives, data analysis methods for constructing and optimizing digital interventions lag behind. Here, we elucidate data analysis methods for primary and secondary analyses of micro-randomized trials (MRTs), an experimental design to optimize digital just-in-time adaptive interventions. We provide a definition of causal "excursion" effects suitable for use in digital intervention development. We introduce the weighted and centered least-squares (WCLS) estimator which provides consistent causal excursion effect estimators for digital interventions from MRT data. We describe how the WCLS estimator along with associated test statistics can be obtained using standard statistical software such as SAS or R. Throughout we use HeartSteps, an MRT designed to increase physical activity among sedentary individuals, to illustrate potential primary and secondary analyses.

Full PDF

MMRT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS The Micro-Randomized Trial for Developing Digital Interventions: Data Analysis Methods

Tianchen Qian Harvard University Michael A. Russell Pennsylvania State University Linda M. Collins Pennsylvania State University Predrag Klasnja University of Michigan Stephanie T. Lanza Pennsylvania State University Hyesun Yoo University of Michigan Susan A. Murphy Harvard University RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Author Note

The HeartSteps micro-randomized trial presented here have been presented in papers with primary analysis results (Klasnja et al., 2015, 2018). Tianchen Qian, Department of Statistics, Harvard University. Dr. Qian was supported by National Institutes of Health grants P50DA039838, R01AA023187, U01CA229437, and U54EB020404. Michael A. Russell, The Methodology Center and Department of Biobehavioral Health, The Pennsylvania State University. Dr. Russell was supported by National Institutes of Health grant P50DA039838. Linda M. Collins, The Methodology Center and Department of Human Development & Family Studies, The Pennsylvania State University. Dr. Collins was supported by National Institutes of Health grants P50DA039838, R01AA022931, P01CA180945, and R01DA040480. Predrag Klasnja, School of Information, University of Michigan, & Kaiser Permanente Washington Health Research Institute. Dr. Klasnja was supported by National Institutes of Health grants R01HL125440, U01CA229445, and R01LM013107. Stephanie T. Lanza, Edna Bennett Pierce Prevention Research Center and Department of Biobehavioral Health, The Pennsylvania State University. Dr. Lanza was supported by National Institutes of Health grant P50DA039838. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Hyesun Yoo, Department of Statistics, University of Michigan. Dr. Yoo was funded by National Institutes of Health grant R01AA023187. Susan A. Murphy, Departments of Statistics and Computer Science, Harvard University. Dr. Murphy was supported by National Institutes of Health grants P50DA039838, R01AA023187, U01CA229437, UG3DE028723, and U54EB020404. The corresponding author is Susan A. Murphy, Science Center 400 Suite, Harvard University, One Oxford Street, Cambridge, MA 02138-2901 The authors thank Amanda Applegate for helpful comments. Draft version 4/19/2020. This paper has not been peer reviewed. Please do not copy or cite without author's permission. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Abstract

Although there is much excitement surrounding the use of mobile and wearable technology for the purposes of delivering interventions as people go through their day-to-day lives, data analysis methods for constructing and optimizing digital interventions lag behind. Here, we elucidate data analysis methods for primary and secondary analyses of micro-randomized trials (MRTs), an experimental design to optimize digital just-in-time adaptive interventions. We provide a definition of causal “excursion” effects suitable for use in digital intervention development. We introduce the weighted and centered least-squares (WCLS) estimator which provides consistent causal excursion effect estimators for digital interventions from MRT data. We describe how the WCLS estimator along with associated test statistics can be obtained using standard statistical software such as SAS (SAS Institute Inc., 2019) or R (R Core Team, 2019). Throughout we use HeartSteps, an MRT designed to increase physical activity among sedentary individuals, to illustrate potential primary and secondary analyses.

Keywords : Micro-randomized trial (MRT); digital interventions; just-in-time adaptive intervention (JITAI); intensive longitudinal data; causal inference RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS The Micro-Randomized Trial for Developing Digital Interventions: Data Analysis Methods

Mobile technologies—including tablets, smartphones, and wearable sensors—have become ubiquitous in daily life. Because of their ability to engage users “in the moment,” they provide unprecedented opportunity to deliver interventions at the times and in the contexts when individuals are most likely to benefit. Just-in-time adaptive interventions (JITAIs; Nahum-Shani et al., 2018) are intervention designs that take advantage of these opportunities in order to intensively adapt and deliver interventions to each individual in the flow of their life. JITAI design involves the construction of decision rules that pinpoint the best intervention option for the individual’s current context. However, research methods useful in optimizing JITAI decision rules lag behind technology’s current capabilities to reach individuals, with the result that scientific knowledge is lacking about what intervention content should be delivered when, how often, and in which context, without overburdening the individual. The micro-randomized trial (MRT) is an experimental trial for obtaining this knowledge. In an MRT, intervention component options are randomized to each participant many times during the trial (see the companion article, Walton et al. (under review), for more details and examples of MRTs). As discussed in Walton et al. (under review), the MRT was developed for use in constructing and optimizing JITAI decision rules in digital interventions (Klasnja et al., 2015; Nahum-Shani et al., 2018). A natural primary analysis of data from an MRT would focus on whether there is a marginal causal effect of an intervention component. Possible secondary analyses might include moderation analyses aimed at informing JITAI development. As will be shown below, the meaning of marginal and moderation causal effects requires careful thought. The purpose of this RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS article is to describe these causal effects and discuss ways to analyze the data from an MRT in order to optimize a JITAI. We begin by providing a brief review of the MRT, along with an overview of the HeartSteps MRT (Klasnja et al., 2018) for use in clarifying ideas. We then offer precise definitions of marginal and moderation effects via the concept of causal “excursion” effects, using the potential outcomes framework (Robins, 1986; Rubin, 1978). Next we review and explain the weighted and centered least-squares (WCLS) estimator (Boruvka et al., 2018), which provides estimators and test statistics for conducting primary and secondary analyses using data from MRTs. Data from the HeartSteps MRT is used to illustrate these analyses. Brief Review of MRTs

MRTs are conducted with the goal of providing intensive longitudinal data that can be used to develop one or more JITAI components. Each component is associated with different intervention options that might be provided at any of multiple decision points during an individual’s day-to-day life. For example, one of the components examined in the HeartSteps MRT was the activity suggestions component, for which the intervention options were a contextually tailored activity suggestion or no suggestion. The activity suggestions component was randomly assigned at 5 decision points per day. In an MRT, each participant is repeatedly randomized to different options of an intervention component, with known probability at each decision point. This repeated, intensive randomization means that over the course of an MRT, a participant may be randomized hundreds or even thousands of times. As discussed in the companion article (Walton et al., under review), The contextually tailored activity suggestion at each time is delivered in one of two forms (with equal probability): either a suggestion with a walking activity that took 2-5 minutes to complete or an anti-sedentary suggestion that instructs brief movements such as to stand up and roll one’s arms; see Walton et al. (under review). For expositional simplicity we group all activity suggestions together for most parts of this article.

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS intervention components are often designed to have greatest impact on a near-term proximal outcome . The proximal outcome is observed after each randomization. Prior to each decision point, observations of context such as sensor data (e.g., heart rate, step count, weather, current location), as well as self-report data (perceptions of loneliness, perceptions of social isolation, enjoyment of physical activity), may be observed. As discussed in the companion article (Walton et al., under review), observations of context may be used to inform the content of a message in an intervention option and to define availability conditions (defined in the next paragraph). As will be shown below, some observations might be collected to serve as controls in data analyses or to inform moderation analyses. As an individual goes through everyday life, there may be times when only the intervention option of “no treatment” is appropriate. This is formalized in the notion of availability . For example, often the delivery of the intervention involves an audible and visual cue. If sensors on the phone and/or wearables indicate that the individual might be operating a moving vehicle at a decision point, then to avoid potentially dangerous distractions the individual might be deemed unavailable for an intervention. Individuals also may be deemed unavailable if an intervention was delivered within the prior x minutes (to reduce burden). At times of unavailability, the only appropriate intervention is the “no treatment” option. Availability is one of the time-varying variables observed prior to each decision point. HeartSteps

The HeartSteps study involves a 42-day MRT designed to inform the optimization of the HeartSteps digital intervention to increase physical activity (Klasnja et al., 2018; Walton et al., under review). HeartSteps combines a wristband activity tracker that monitors participants’ steps throughout the day in concert with a mobile phone application. Two intervention components RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS were investigated in the HeartSteps MRT: a planning support component and the contextually tailored activity suggestions component described above. The planning support component consisted of support for planning in the evening for activity on the next day. The intervention options were planning and no planning . For simplicity, much of the exposition below focuses on the activity suggestions component. Activity suggestions were randomized at 5 decision points each day: morning commute, lunch time, mid-afternoon, evening commute, and after dinner. The exact time of day of the five decision points was specified by each participant at the beginning of the study and could differ between participants. At each decision point, if the participant was available, the probability of delivering a contextually tailored activity suggestion (as opposed to no suggestion) was .6. The proximal outcome for the activity suggestion component was the step count of the participant in the 30-minute window following a decision point. In the case of the activity suggestions component, as mentioned above, a participant was deemed unavailable at a decision point either if their smart device indicated that the participant might be driving, or if the participant had turned off intervention delivery (participants could turn off delivery over the subsequent 1, 2, 4 or 8 hours). Furthermore, because the content involved suggestions for new activities, a participant was deemed unavailable if they were currently walking or running or they had just finished an activity bout in the previous 90 seconds prior to the decision point. A natural primary research question motivating the HeartSteps MRT is whether there is a causal effect of delivering an activity suggestion versus not delivering any suggestion on the An individual is sent a planning prompt with probability .5 every day. If a planning prompt is sent, it is delivered in one of two forms (with equal probability): structured planning (where the individual is prompted to select a plan from a list of their own past activity plans) or unstructured planning (where the individual is prompted to type their plan into a text box). For expositional simplicity we group both forms of planning together in this paper.

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS proximal outcome—i.e., the subsequent 30-minute step count. This question can be addressed by estimating a causal effect that is conceptually similar to the main effect of a factor (Collins et al., 2009) because, as is explained below, it is marginal over the other intervention components. Additional important research questions in the development of the HeartSteps activity suggestion component concern the potential effects of habituation (Rankin et al., 2009) and/or treatment burden (Clawson et al., 2015; Eysenbach, 2005; Ho & Intille, 2005; Klasnja et al., 2008; Shaw et al., 2013; Yardley et al., 2016). If individuals habituate to the activity suggestions or find the intervention burdensome, the causal effect would be expected to deteriorate over time. Thus, a natural secondary or exploratory analysis is to assess whether day in study moderates the effect of delivering an activity suggestion. Two additional examples of secondary research questions are whether the effect of delivering an activity suggestion depends on the individual’s current location and whether the effect of the activity suggestion depends on whether the individual was prompted to plan an activity for that day. All of these research questions as we have stated them are imprecise about what is meant by a causal effect. In the following section we offer a more precise definition of a causal effect as estimated in an MRT. The Causal Excursion Effect

In this section, we define causal effects using the potential outcomes framework (Robins, 1986, 1987; Rubin, 1978). For expositional clarity, throughout the paper we consider the setting in which there are only two intervention options, denoted by treatment 1 and treatment 0. For the activity suggestions component in the HeartSteps MRT, this would be delivering activity suggestion (treatment 1) and not delivering activity suggestion (treatment 0). First, we briefly review the definition of a causal effect using a hypothetical setting with a single time point treatment. Then we define the causal excursion effect of a time-varying intervention component RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS on a time-varying outcome. Throughout, upper case letters denote random variables and lower case letters denote particular values of the random variables. In the classical potential outcomes framework, where there is only a single time point for possible treatment (see review by Rubin (2005)), the ideal but usually unattainable goal is to determine the individual-level causal effect, or the difference between the outcome under treatment 1 [denoted by 𝑌(1) ] and the outcome under treatment 0 [denoted by

𝑌(0) ] at the same time for each individual. Consider the first decision point in the HeartSteps MRT. At this decision point individuals are randomly assigned to receive an activity suggestion or no suggestion. The step count in the 30-minute window following this decision point is the outcome. For each individual, the treatment effect at this decision point is the difference between (a) the 30-minute step count had treatment been assigned to the individual (

𝑌(1) ) and (b) the 30-minute step count had the treatment not been assigned to the individual (

𝑌(0) ). 𝑌(1) and

𝑌(0) are called potential outcomes , because in reality only one of the potential outcomes can be observed on each individual, as both treatment and no treatment cannot be assigned to an individual at the same time—this is the “fundamental problem of causal inference” (Holland, 1986). Let 𝐴 denote the treatment assignment ( 𝐴 = 1 if treatment 1;

𝐴 = 0 if treatment 0). Only

𝑌 = 𝐴𝑌(1) + (1 − 𝐴)𝑌(0) is observed. A widely adopted solution to this problem is to estimate a marginal causal effect, or to estimate an effect closer to the individual-level causal effect, the causal effect conditional on a pre-treatment variable 𝑆 ; that is, the difference between the This equality holds under the consistency assumption often made in causal inference literature, which essentially requires that there are no two “versions” of the same treatment. In the example of activity suggestions, in order to properly define “delivering an activity suggestion” as treatment 1 and “not delivering an activity suggestion” as treatment 0, one would consider various framings and various contents of the suggestions as a “compound treatment.” However, if one wishes to distinguish between the effect of different versions of the suggestions in the analysis, then instead it would be necessary to define the treatment to have multiple levels.

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS expected outcome had everyone with 𝑆 = 𝑠 received the treatment (

𝐸[𝑌(1)|𝑆 = 𝑠] ) and the expected outcome had everyone with

𝑆 = 𝑠 not received the treatment (

𝐸[𝑌(0)|𝑆 = 𝑠] ). In the example for the first decision point in the HeartSteps MRT, 𝑆 might be the individual’s current location (home, work or other), current weather, gender, and baseline activity level. An interesting scientific question would be whether the value of 𝑆 modifies the treatment effect. Inference for the difference, 𝐸[𝑌(1)|𝑆 = 𝑠] − 𝐸[𝑌(0)|𝑆 = 𝑠] , answers this question. If 𝐴 is randomized, then the above difference in terms of potential outcomes can be written in terms of expectations with respect to the distribution of the observations ( 𝑆, 𝐴, 𝑌 ). In particular, if treatment is randomly assigned, the causal effect,

𝐸[𝑌(1)|𝑆 = 𝑠] − 𝐸[𝑌(0)|𝑆 = 𝑠] , is equal to

𝐸[𝑌|𝐴 = 1, 𝑆 = 𝑠] − 𝐸[𝑌|𝐴 = 0, 𝑆 = 𝑠] (see Rubin (2005)). To define the causal excursion effect of a time-varying intervention component on a time-varying outcome, notation is needed to accommodate time. Consider the HeartSteps MRT, in which one goal is to estimate the effect of the activity suggestion on the subsequent 30-minute step count. Recall that the HeartSteps MRT is a 42-day study and there were 5 decision points per day for the activity suggestion component; thus, there are

𝑇 = 210 decision points overall. Let 𝑋 represent the data available from decision point 𝑡 − 1 up to and including decision point 𝑡 . 𝑋 contains the availability indicator, 𝐼 , with 𝐼 = 1 meaning that the individual is available at decision point t and 𝐼 = 0 otherwise . In the HeartSteps example, 𝑋 also includes current weather, location, time of day, 30-minute step count prior to decision point 𝑡 , and whether planning support was provided on the prior evening. Let 𝐴 represent the treatment indicator at decision point 𝑡 , where 𝐴 = 1 means treatment is delivered and 𝐴 = 0 means treatment is not For simplicity, we omit the subscript i for the i th individual in 𝑋 and in all other variables unless necessary. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS delivered. W e use an overbar 𝐴̅ to denote all prior and present treatments 𝐴̅ = (𝐴 , … , 𝐴 ). Let 𝑌 represent the proximal outcome—here, the number of steps in the 30 minutes after decision point 𝑡 . Denote by 𝐻 the individual’s history of data observed up to decision point 𝑡 : 𝐻 =(𝑋 , 𝐴 , 𝑌 > , … , 𝑋 , 𝐴 , 𝑌 , 𝑋 ) . 𝐻 includes potential moderators and control variables, some of which may be composites made up of other variables in H t . We denote potential moderators by 𝑆 . Recall that potential moderators of the effect of the activity suggestions in HeartSteps, 𝑆 , include number of days in treatment, the individual’s current location (home, work or other), current weather, whether planning was prompted the prior evening, number of activity suggestions delivered on the prior day, gender, and baseline activity level. As in the single time point setting, the inclusion of potential moderators, 𝑆 , means that the desired causal excursion effect is conditional on these variables. Further discussion regarding this point is included towards the end of this section after the formal definition of the causal excursion effect. To define the causal excursion effect, we use an extension of the potential outcomes framework to the setting of intensive longitudinal data (Robins, 1986, 1987). Recall that lower case letters such as 𝑎 represent particular values of a random variable, here a possible value of the treatment 𝐴 . Recall we use the overbar to represent present and past values, that is, 𝑎A ={𝑎 , 𝑎 > , … , 𝑎 } . The potential outcomes for 𝑌 , 𝑋 , 𝐻 , 𝑆 are 𝑌 (𝑎A ), 𝑋 (𝑎A ), 𝐻 (𝑎A ), 𝑆 (𝑎A ) , respectively. For example, 𝑌 (𝑎A ) is the 30-minute step count outcome after decision point 𝑡 that would have been observed if the individual had been assigned treatment sequence 𝐴̅ = 𝑎A . (For binary treatments, there could be different potential outcomes, 𝑌 (𝑎A ) .) This notation encodes the reality that an individual’s 30-minute Note that the overbar in 𝐴̅ is not an abbreviation for the average; rather, it stands for the entire vector of treatment assignment (𝐴 , … , 𝐴 ) and similarly for 𝑎A . This notation is common in causal inference literature (e.g., Robins, 1986, 1987). RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS step count outcome after decision point 𝑡 may be impacted by all prior treatments, as well as the current treatment. Note that unlike the potential proximal outcome 𝑌 (𝑎A ) , potential outcomes for 𝑋 , 𝑆 , as well as availability, 𝐼 , are indexed only by treatments, 𝑎A , prior to decision point 𝑡 —namely, 𝑋 (𝑎A ) , 𝑆 (𝑎A ) , and 𝐼 (𝑎A ) . This is because 𝑋 , 𝑆 , and 𝐼 occur prior to treatment at 𝑡. The causal excursion effect of activity suggestions at decision point 𝑡 on subsequent 30-minute step count for available individuals with 𝑆 = 𝑠 is defined as (Boruvka et al., 2018; Liao et al., 2016) 𝛽(𝑡, 𝑠) = 𝐸[𝑌 (𝐴̅ , 1) − 𝑌 (𝐴̅ , 0) | 𝐼 (𝐴̅ ) = 1, 𝑆 (𝐴̅ ) = 𝑠]. (1) This formula contains the following information. 1.

The effect, 𝛽(𝑡, 𝑠) , is causal because it is the expected value of the contrast in step counts in the 30 minutes following a decision point 𝑡 if the treatment were delivered at 𝑡 [potential outcome 𝑌 (𝐴̅ , 1) ] versus if treatment were not delivered at 𝑡 [potential outcome 𝑌 (𝐴̅ , 0) ]; that is, 𝑌 (𝐴̅ , 1) − 𝑌 (𝐴̅ , 0) . 2. The effect, 𝛽(𝑡, 𝑠) , is conditional . This effect is only among individuals who are available ( 𝐼 (𝐴̅ ) = 1 ) and for whom the potential moderators take on the value of (𝑆 (𝐴̅ ) = 𝑠) . 3. The effect, 𝛽(𝑡, 𝑠) , is marginal . In HeartSteps, the effect of the activity suggestions component at decision point 𝑡 is marginal over potential moderators not contained in 𝑠 , the effects of interventions from prior decision points, and the planning support component. In fact, 𝛽(𝑡, 𝑠) is replaced by 𝛽(𝑡) if, for example, estimation of the marginal effect of delivering an activity suggestion compared to no activity suggestion is desired; in this case 𝑆 (𝐴̅ ) is omitted from (1). RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS The effect, 𝛽(𝑡, 𝑠) , is an excursion from the “treatment schedule” prior to 𝑡 . In an MRT the treatment schedule prior to 𝑡 is a set of probabilistic decision rules for treatment assignment at all decision points from the beginning of the intervention up to the previous decision point; that is, for assignment of 𝐴 , … , 𝐴 . In the case of an MRT, the treatment schedule will always involve some randomization, but may include non-random assignment as well. For example, in the HeartSteps MRT the treatment schedule included, at five decision points per day, the following: if available deliver an activity suggestion with probability .6 and no suggestion with probability .4; if not available do not deliver an activity suggestion. Suppose the HeartSteps intervention included a component that was not examined in the MRT—for example, a brief motivational video sent to all participants every Monday morning at 8 am. Although there is no experimentation on this component, it would be included in the treatment schedule. The causal excursion effect concerns what would happen if an individual followed the current treatment schedule up to time 𝑡 − 1 and then deviated from the schedule to assign treatment 1 at decision point 𝑡 , versus deviated from the schedule to assign treatment 0 at decision point 𝑡 . In other words, the definition of 𝛽(𝑡, 𝑠) implicitly depends on the schedule for assigning 𝐴 , … , 𝐴 . Technically this excursion can be seen from (1), in that the expectation, 𝐸 , is marginal over all prior treatments not contained in 𝑆 . For example, if 𝑆 contains only current weather, then the excursion effect is marginal over all the variables other than current weather, including the schedule for assigning all of the prior treatments, 𝐴 , … , 𝐴 , as well as all prior treatments for other components such as the planning component. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS To understand the excursion effect better, consider two very different treatment schedules. In the first schedule, the treatment is provided on average once every other day; in the second schedule, the treatment is provided on average 4 times per day. Excursions from these two rather different schedules could result in very different effects, 𝛽(𝑡, 𝑠) . Indeed, in the latter schedule individuals may experience a great deal of burden and disengage with the result that 𝛽(𝑡, 𝑠) would be close to 0, whereas in the former schedule individuals may still be very engaged, resulting in a larger 𝛽(𝑡, 𝑠) . In other words, the definition of the causal excursion effect, 𝛽(𝑡, 𝑠) , is dependent on the schedule for treatment assignment; this is different from the types of effects typically discussed in the causal inference literature (e.g., Robins (1994); Robins, Hernán, & Brumback (2000)). A primary hypothesis test might focus on inference about the marginal excursion effect, that is, (1) with 𝑆 (𝐴̅ ) equal to an empty set. A secondary analysis might consider treatment effect moderation by including in 𝑆 (𝐴̅ ) potential moderators, such as location, or number of days in the intervention. Note that 𝑆 (𝐴̅ ) does not need to include all true moderators for (1) to be a scientifically meaningful causal effect; instead, it is appropriate to choose any (or no) 𝑆 (𝐴̅ ) , provided that (1) is interpreted as the causal excursion effect marginal over all variables in 𝐻 (𝐴̅ ) that are not included in 𝑆 (𝐴̅ ) . Under the assumptions that (i) the treatment is sequentially randomized (which is guaranteed by MRT design) and (ii) the treatment delivered to one individual does not affect another individual’s outcome , the causal excursion effect 𝛽(𝑡, 𝑠) in (1) can be written in terms of expectations over the distribution of the data as This assumption is called “non-interference” in causal inference. If there are social network components in the digital intervention, this assumption may be violated and an extension of the potential outcomes framework to incorporate interference is needed (Hong & Raudenbush, 2006; Hudgens & Halloran, 2008).

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS 𝛽(𝑡, 𝑠)= 𝐸[𝐸( 𝑌 ∣∣ 𝐴 = 1, 𝐻 , 𝐼 = 1 ) − 𝐸( 𝑌 ∣∣ 𝐴 = 0, 𝐻 , 𝐼 = 1 )|𝐼 = 1, 𝑆 = 𝑠]. (2) Thus the excursion effect 𝛽(𝑡, 𝑠) can be estimated using the observed MRT data.

A Primary Research Question for HeartSteps

As discussed above, a natural primary research question that can be addressed in the HeartSteps MRT is whether there is a marginal causal excursion effect of delivering an activity suggestion on the subsequent 30-minute step count of the user, compared to not delivering any message. To operationalize this marginal causal excursion effect, let 𝑆 be an empty set in (2) so that 𝛽(𝑡)= 𝐸[𝐸( 𝑌 ∣∣ 𝐴 = 1, 𝐻 , 𝐼 = 1 ) − 𝐸( 𝑌 ∣∣ 𝐴 = 0, 𝐻 , 𝐼 = 1 )|𝐼 = 1]. (3) The outer expectation on the right-hand side in (3) represents an average across all possible values of 𝐻 across individuals. For example, 𝛽(𝑡) is averaged over weather on that day and on previous days, over previous treatment assignment of the activity suggestions (i.e., 𝐴 G for 𝑠 < 𝑡 ), and also over previous assignment of the planning support prompts. The marginal excursion effect is the average of 𝛽(𝑡) over 𝑡 with weights equal to 𝐸[𝐼 ] to obtain 𝛽 I = ∑ 𝐸[𝐼 ]𝛽(𝑡) K4L9 ∑ 𝐸[𝐼 ] K4L9 . (4) Here

𝐸[𝐼 ] denotes the probability of an individual being available at decision point 𝑡 . Thus 𝛽 I is a weighted average of the marginal effects, 𝛽(𝑡) , in which the weights are the availability probabilities. In the section “Analysis Using Data from HeartSteps MRT,” we will conduct inference about a variety of excursion effects including this marginal excursion effect, 𝛽 I (Klasnja et al., 2018). A Selection of Secondary Research Questions for HeartSteps

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Consider secondary research questions that concern moderation of the causal excursion effect by a non-empty 𝑆 . For example, one question might be whether the causal excursion effect deteriorates with day under treatment. In this case 𝑆 would include day , the number of days in treatment prior to the decision point 𝑡 . Suppose the causal excursion effect can be represented by the linear model: 𝛽(𝑡, day ) = 𝐸[𝐸( 𝑌 ∣∣ 𝐴 = 1, 𝐻 ) − 𝐸( 𝑌 ∣∣ 𝐴 = 0, 𝐻 )|𝐼 = 1, 𝑑𝑎𝑦 ] = 𝛽 I + 𝛽 day . Note that day = 0 for all decision points 𝑡 on the first day of treatment; thus 𝛽 I represents the causal excursion effect on the first day; 𝛽 represents the change in the causal excursion effect with each additional day. Other secondary research questions concern whether there is effect moderation by other time-varying observations, such as the current location of the user, or by another intervention component being examined in the MRT. Consider the planning support component in the HeartSteps MRT. Let 𝑆 denote the indicator of whether a planning support prompt was delivered on the evening prior to decision point 𝑡 ( 𝑆 = 1 if delivered, 𝑆 = 0 if not). A model that includes effect moderation by a planning support prompt on the previous evening can be expressed as 𝛽(𝑡, 𝐴 ) = 𝐸[𝐸( 𝑌 ∣∣ 𝐴 = 1, 𝐻 ) − 𝐸( 𝑌 ∣∣ 𝐴 = 0, 𝐻 )|𝐼 = 1, 𝑆 ] = 𝛽 I + 𝛽 𝑆 . Here 𝛽 I represents the causal excursion effect when the individual did not receive planning support on the prior evening, and 𝛽 I + 𝛽 represents the causal excursion effect when the individual received planning support on the prior evening. See the section “Analysis Using Data from HeartSteps MRT” for results of the secondary analyses concerning these questions. Methods for Estimating Causal Excursion Effects from MRT Data

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS We present a WCLS estimator, developed by Boruvka et al. (2018), that provides a consistent estimator for the causal excursion effect, 𝛽(𝑡, 𝑠) . Here for clarity we provide an overview of the estimation method that can be used when the randomization probabilities are constant, as is the case in HeartSteps. Recall that in HeartSteps a primary analysis might be an assessment of the marginal causal excursion effect of the activity suggestions on the subsequent 30-minute step count. We use the superscript 𝑇 to denote the transpose of a vector or a matrix. Suppose the causal excursion effect is linear: 𝛽(𝑡, 𝑠) = 𝑠 K 𝛽 with 𝛽(𝑡, 𝑠) defined in (2), and the goal is to make inference about 𝛽 . Note that the model for 𝛽(𝑡, 𝑠) characterizes only the treatment effect (as a contrast between the two treatments with the proximal outcomes as dependent variables). The WCLS requires a working model for the conditional mean of 𝑌 given no treatment at decision point 𝑡 and history 𝐻 [i.e., 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) ]. One working model might be 𝑍 𝛼 , where 𝑍 is a vector of summaries of the observations made prior to decision point 𝑡 (i.e., summaries constructed from 𝐻 ). These summaries are often called control variables. For example, in HeartSteps a natural control variable is the step count in the 30 minutes prior to the decision point. This prior 30-minute step count variable is likely strongly correlated with the proximal outcome, the 30-minute step count following the decision point. As will be seen shortly, consistency of the WCLS estimator for 𝛽 does not require 𝑍 𝛼 to be a correct model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) . The role of 𝑍 𝛼 is to reduce noise in the analysis: Inclusion of prognostic control variables (those variables in 𝐻 that are correlated with 𝑌 ) in 𝑍 will usually reduce the variance of the estimator of 𝛽 . A simulation study that illustrates this point is presented in Appendix C. In summary, the inclusion of the step count in the 30 minutes prior to the decision point serves to reduce variance and increase the power to detect a nonzero excursion effect, 𝛽 . RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS The WCLS estimator for 𝛽 is calculated as follows. Suppose (𝛼T, 𝛽U ) is the (𝛼, 𝛽) value that solves the following estimating equation (Diggle et al., 2002) V V 𝐼 W𝑌 − 𝑍 𝛼 − (𝐴 − 𝑝)𝑆 𝛽Y Z 𝑍 (𝐴 − 𝑝)𝑆 [ K4L9\7L9 = 0, (5) where is the randomization probability and 𝑖 is the index for the 𝑖 -th participant. Note that the indicator 𝐴 is centered by subtracting p. Recall that in HeartSteps, at each decision point there is .6 probability to deliver an activity suggestion (if the participant is available), so 𝑝 = .6 . The 𝛽U that solves (5) is the WCLS estimator for 𝛽 . Remarks . 1.

WCLS does not assume a model for the proximal outcome such as 𝑌 ~𝑍 𝛼 +( 𝐴 𝑖𝑡 − 𝑝 )𝑆 𝛽 . The primary assumption (Boruvka et al., 2018) that ensures that 𝛽U is consistent is 𝐸[𝐸( 𝑌 ∣∣ 𝐴 = 1, 𝐻 ) − 𝐸( 𝑌 ∣∣ 𝐴 = 0, 𝐻 )|𝐼 = 1, 𝑆 ] = 𝑆 𝛽; (6) that is, 𝛽(𝑡, 𝑠) = 𝑠 K 𝛽 is a correct model for the causal excursion effect conditional on 𝐼 = 1, 𝑆 = 𝑠 . 2. The centering of the treatment indicator, ( 𝐴 𝑖𝑡 − 𝑝 ) , in (5) creates orthogonality between the columns of the design matrix for the causal excursion effect (i.e., ( 𝐴 𝑖𝑡 − 𝑝 )𝑆 ) and the columns of the design matrix involved in the working model, 𝑍 𝛼 , for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) (i.e., 𝑍 ). This centering provides robustness in the estimation of 𝛽 ; in particular, robustness against a mis-specified working model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) . In other words, the data analyst can use a possibly incorrect working model 𝑍 𝛼 , and 𝛽U will still be a consistent estimator of 𝛽 . In digital interventions this robustness property is of practical importance because vast amounts of RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS data (i.e., high-dimensional 𝐻 ) on the participant have usually been collected prior to decision point 𝑡 . As a result, it is virtually impossible to correctly model 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) . For example, in HeartSteps there are 210 decision points (210 = 42 days × 𝐻 can include the outcome, treatment, and covariates from all the past 𝑡 − 1 decision points, which means hundreds of variables at a later decision point 𝑡 . In addition, 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) may depend on variables in 𝐻 in a nonlinear way, which adds to the difficulty of correctly modeling 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) and thus makes the robustness property desirable. 3. While the choice of 𝑍 doesn’t affect the consistency of 𝛽U , a better working model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) has the potential to decrease the variance of 𝛽U . Because 𝐻 is usually high-dimensional, choosing 𝑍 can be done by hand-picking a subset of 𝐻 (e.g., those covariates and outcomes at recent decision points). As discussed above, in HeartSteps a natural control variable is the step count in the 30 minutes prior to the decision point, as this variable is likely highly correlated with the proximal outcome of step count in the 30 minutes after the decision point. In Appendix C we illustrate through a simulation study how inclusion of control variables that are correlated with the proximal outcome in 𝑍 can reduce the variance of 𝛽U . 4. Although software based on the estimating equation (5) also outputs an estimator 𝛼T and its standard error, we recommend not interpreting them, unless it is safe to assume that 𝑍 𝛼 is a correct model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) . For simplicity in presentation of the estimator, so far we have considered the setting where the randomization probability, 𝑝 , is constant. There are also practical settings where the randomization probability may change over time; for example, in the stratified micro-RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS randomized trial, different micro-randomization probabilities are used depending on a time-varying variable such as prediction of risk. For example, a higher randomization probability may be used when the individual is categorized as high-risk, and a lower randomization probability may be used when the individual is categorized as low-risk. The rationale for such risk stratification is to ensure that there are adequate numbers of treatments delivered both at risk times and at non-risk times. See Dempsey, Liao, Kumar, & Murphy (2019) for details. The WCLS estimator presented above can be generalized to this setting (Boruvka et al., 2018); see Appendix B for a general WCLS estimator that allows randomization probability to depend on the individual’s history, 𝐻 . Estimating the WCLS 𝜷c Using Standard Statistical Software

When the randomization probabilities are constant, standard statistical software that implements GEE (Liang & Zeger, 1986), such as SAS (SAS Institute Inc., 2019), Stata (StataCorp, 2019), and SPSS (IBM Corp., 2019), can be “tricked” into providing the WCLS estimator 𝛽U and its standard error. Consider SAS PROC GENMOD (SAS Institute Inc., 2019) and suppose the assumed causal excursion effect model is (6) and the working model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) is 𝑍 𝛼 . Then the WCLS estimator 𝛽U and its standard error can be obtained by the following steps: (i) incorporate 𝐼 as the “prior weights,” (ii) choose a working independence correlation structure, and (iii) fit GEE with dependent variable 𝑌 and independent variables 𝑍 and ( 𝐴 𝑡 − 𝑝 )𝑆 . Then the estimated coefficient for ( 𝐴 𝑡 − 𝑝 )𝑆 is the WCLS estimate 𝛽U . Note that in choosing the control variables, 𝑍 needs to contain at least 𝑆 . This approach does not actually fit a GEE model for the conditional mean of 𝑌 . Instead, this estimation method merely uses the GEE software as a means to output the WCLS estimator. Technically, this can be done because the estimating equation of a GEE with the RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS above specification is algebraically equivalent to (5), the estimating equation of WCLS. To obtain appropriate standard errors for the estimator 𝛽U through the above GEE fit, one needs to use the robust standard error [SAS calls this the “empirical standard error” (SAS Institute Inc., 2019)]. The robust standard error accounts for the correlations among the proximal outcomes, 𝑌 > , 𝑌 d , … , 𝑌 K<9 . When the sample size is small (e.g., 𝑛 < 50 ), we recommend use of further small sample corrections for both the standard error and the degrees of freedom in the critical value for constructing confidence intervals (Boruvka et al., 2018). R code (R Core Team, 2019) for the implementation with the small sample correction is available at https://github.com/StatisticalReinforcementLearningLab/HeartstepsV1Code/blob/master/xgeepack.R.

Analysis Using Data from HeartSteps MRT

Recall that HeartSteps is a 6-week MRT for optimizing JITAI components of a digital intervention to promote physical activity with 37 sedentary participants (Klasnja et al., 2018). In the illustrative analysis below, we focus on the activity suggestion component, which was randomized at 5 decision points each day. At each decision point, if the participant was available, an activity suggestion was delivered with randomization probability .6. We first address the primary research question by estimating the marginal excursion effect of an activity suggestion versus no suggestion using data from the HeartSteps MRT. The primary analysis for HeartSteps is published in Klasnja et al., (2018); for completeness we include this analysis as well as results of additional secondary analyses.

As discussed before, secondary research questions might include how the excursion effect changes over time, whether this effect is moderated by current location, and whether this effect is moderated by delivery of a planning support prompt on the RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS evening prior to the decision point. All analyses are conducted using the R programming language (R Core Team, 2019). We use the following variables in the analysis: • 𝑌 : log-transformed 30-minute step count following decision point 𝑡 . This is the proximal outcome of interest. • 𝐴 : indicator of whether an activity suggestion is delivered at decision point 𝑡 . The randomization probability is .6 at available decision points. • 𝑋 : log-transformed 30-minute step count preceding decision point 𝑡 . Because this variable is expected to be correlated with 𝑌 , we will include 𝑋 as a control variable in the analysis to reduce noise. • 𝑋 : day in the study, coded as 0, 1, 2, …, 41. • 𝑋 : participant’s location at decision point t ; coded as 1 if at home or at work, and 0 if at any other location. • 𝑋 : indicator of a planning support prompt delivered on evening prior to decision point 𝑡 ; coded as 1 if delivered and 0 if not. • 𝐼 : availability status at decision point 𝑡 . Recall that randomization can occur only if the participant is available. Step count data are highly skewed; the log-transformation is used to make its distribution more symmetric (and we added .5 to the step count before taking log to avoid log(0)). Although the consistency of the WCLS estimator does not require 𝑌 to be symmetrically distributed, symmetry improves the accuracy of the approximation to the distribution of the test statistic in small samples. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Question 1: On average, is there an effect of delivering an activity suggestion on subsequent 30-minute step count, compared to no suggestion?

As discussed above a natural primary research question concerns the excursion effect marginal over all decision points and all covariates. We address this question using the WCLS estimator with 𝑆 equal to the empty set, 𝛽(𝑡, 𝑠) = 𝛽 I , working model 𝛼 I + 𝛼 𝑋 , and weight 𝐼 . Table 1 lists the results. The causal effect of delivering an activity suggestion versus no suggestion on the log-transformed subsequent 30-min step count, averaged over all decision points and all covariates, is 𝛽U I = 0.131 ( p = 0.060, 95% CI = -0.006 to 0.268). This corresponds roughly to a 14% ( = 𝑒 I.9d9 − 1 ) increase in the average 30-minute step count (on its original scale), comparing decision points when an activity suggestion was sent with decision points when an activity suggestion was not sent.

Question 2: Does the effect of activity suggestions change with each additional day in the study?

This question is motivated by the hypothesis that the longer a person participates in the study, the more they may habituate to the suggestions or become overburdened, leading them to become less responsive. We address this question via the WCLS estimator with 𝑆 = 𝑋 (day in study), 𝛽(𝑡, 𝑠) = 𝛽 I + 𝛽 𝑋 , working model 𝛼 I + 𝛼 𝑋 + 𝛼 > 𝑋 , and weight 𝐼 . 𝑋 is included in 𝑆 to assess the effect moderation by 𝑋 , day in the study. Because 𝑋 is coded to start from 0, 𝛽 I represents the causal excursion effect on the first day. Table 2 lists the results. There is a significant interaction between the activity suggestion and day in the study: the causal effect of the activity suggestion changes by 𝛽U = −0.018 with each additional day in the study ( p = 0.005, 95% CI = -0.031 to -0.006). Combining this with 𝛽U I = 0.507 , the analysis indicates that sending an activity suggestion results in about 66% ( = 𝑒 I.lIm − 1 ) increase in the 30-minute RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS step count on the first day of the study, about 16% ( = 𝑒 I.lIm?I.I9n×>I − 1 ) increase on the 21st day of the study, and about 21% ( = 1 − 𝑒

I.lIm?I.I9n×g9 ) decrease on the 42nd day of the study. A sensitivity analysis to the linearity assumption (that the causal excursion effect changes linearly by day in the study) is provided in Appendix D.

Question 3: Does the effect of delivering each type of activity suggestion versus no suggestion depend on the individual’s current location (home/work, or other)?

The activity suggestion involves suggestions for new physical activities; therefore, it is of interest to examine whether its effect depends on the individual’s location, which might be a proxy for interruptibility. If an activity suggestion is sent, then ½ of the time the suggestion is a walking suggestion (instructing a walking activity that took 2-5 minutes to complete) and the remaining ½ of the time the activity suggestion is an anti-sedentary suggestion (instructing brief movements, such as stretching one’s arms). The investigators conjectured that effect moderation by location may differ between walking suggestions and anti-sedentary suggestions. Therefore, here we assess whether the effect of delivering each type of activity suggestion versus no suggestion is modified by the individual’s current location (home/work or other). We address this question by using the WCLS estimator with 𝑆 = 𝑋 (indicator of being at home or work), working model 𝛼 I + 𝛼 𝑋 + 𝛼 > 𝑋 , and weight 𝐼 . 𝑋 is included in 𝑆 to assess the effect moderation by 𝑋 , location of the individual. Because here we have two treatment indicators (indicator of whether a walking suggestion is delivered, and indicator of whether an anti-sedentary suggestion is delivered), the causal excursion effect for the walking suggestion is modeled as 𝛽 I + 𝛽 𝑋 , and the causal excursion effect for the anti-sedentary suggestion is modeled as 𝛽 > + 𝛽 d 𝑋 . Table 3 lists the result. The causal excursion effect moderation by location (home/work or other) is statistically significant for walking suggestions ( 𝛽U = 0.377 , p RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS = 0.049, 95% CI = 0.001 to 0.753). The effect moderation is not statistically significant for anti-sedentary suggestions ( 𝛽U d = −0.142 , p = 0.472, 95% CI = -0.540 to 0.256). Question 4: Does the effect of activity suggestions depend on whether planning support was delivered on the previous evening?

Whether planning support was delivered on the previous evening may impact the effectiveness of the activity suggestion. To assess this moderation effect, we use the WCLS estimator with 𝑆 = 𝑋 , 𝛽(𝑡, 𝑠) = 𝛽 I + 𝛽 𝑋 , working model 𝛼 I + 𝛼 𝑋 + 𝛼 > 𝑋 , and weight 𝐼 . 𝑋 is included in 𝑆 to assess the effect moderation by 𝑋 , whether the individual received planning support on the previous evening. Table 4 lists the results. There is no evidence of effect moderation by the planning support prompt on the previous day ( 𝛽U = 0.046 , p = 0.734, 95% CI = -0.228 to 0.320). Discussion

In this article we define the causal excursion effect of a digital intervention component in an MRT using the potential outcomes framework. We illustrate how primary and secondary analyses concerning causal excursion effects can be formulated for an MRT, using the HeartSteps MRT as an example. We introduce WCLS as a data analysis method for MRTs, which results in consistent estimators for the causal excursion effect, and describe how to obtain the WCLS estimator through standard statistical software. We illustrate WCLS by using it to analyze the marginal and moderated causal excursion effects using data from the HeartSteps MRT.

Using Moderation Effect Analysis to Inform JITAI Development

Conducting moderation analyses, as well as exit interviews with participants, can be useful both in formulating decision rules and in generating hypotheses to be tested in subsequent

FIGURE 2

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS optimization trials. For example, exit interviews might reveal that participants found that the activity suggestions begin to appear similar as the trial progressed. This combined with the evidence of moderation by day in study might motivate the development of different types of activity suggestions that could be introduced after, say, intervention week 3. The moderating effect of location is an early indication that the decision rules might specify no delivery of activity suggestions when an individual is at the “other” location. In the case of HeartSteps, findings from analyses such as those above, along with other moderation analyses and exit interviews, were used to inform a second MRT currently underway in which a personalization algorithm is being used to reduce the probability of receiving an activity suggestion when there is evidence of a decreasing effect. The conjecture is that intervention effects will stop decaying if the probability of delivering an activity suggestion to an individual is decreased whenever this individual is showing evidence of a decreasing effect. This algorithm is also using location as a moderator. Internal and External Validity in MRTs

Internal validity concerns the ability of the MRT to provide evidence for attributing the estimated effects to the manipulation of the intervention component and not some systematic error (Jüni et al., 2001). It is well known that in a two-arm randomized controlled trial, internal validity is harmed if the randomization, by chance, did not achieve balance in baseline covariates between the two arms. One way to check for deviations that indicate a lack of internal validity is to check whether the distribution of the baseline variables is dissimilar across the two arms. For the MRT, because the randomization occurs sequentially over time, to check internal validity one can check for balance in any covariates occurring prior to each decision point. In the HeartSteps example, one can check whether, for available participants at decision point 𝑡 , the fraction of RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS participants who are at home ( 𝑆 = 1 ) is roughly the same among those randomized to an activity suggestion ( 𝐴 = 1 ), compared to those randomized to no activity suggestion ( 𝐴 = 0 ). Other time-varying variables observed prior to decision point 𝑡 besides location might be considered as well. Because the causal excursion effect is defined only for individuals who are currently available—that is, one aims to estimate causal effects only among those who are available at decision point 𝑡 —these checks concern only available individuals . External validity concerns the extent to which the estimated causal excursion effect in the MRT provides a basis for generalization to a target population (Jüni et al., 2001). As is well known, in randomized controlled trials external validity is enhanced by striving to enroll participants who are representative of the target population. The same considerations hold in an MRT. One way to assess the extent of external validity (to a defined target population) is to check whether the distribution of the baseline variables is similar or dissimilar to that in the target population. If some baseline variables are likely prognostic for the outcome or predictive for the causal excursion effect, then a distributional imbalance in these variables between the target population and the MRT sample raises concerns that the causal excursion effect estimated from the MRT might not generalize to the target population. If such imbalances are not evident, then greater confidence in the generalizability of the estimated causal excursion effect is justified. In addition to the proximal outcome, 𝑌 , an MRT can involve other outcomes, such as availability, 𝐼 , and the potential moderators, 𝑆 . Therefore, any baseline variable that might be related to any of the outcomes should be considered in checking for the aforementioned imbalance. Note that this only applies to MRTs with constant randomization probability (such as the HeartSteps MRT). For MRTs where the randomization probability may change depending on the individual’s history information, the aforementioned covariate imbalance may no longer be indicative of lack of internal validity.

RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS The “excursion” aspect of the causal excursion effect is also important when considering generalizability of the findings. The excursion aspect explicitly acknowledges that, prior to decision point 𝑡 , the individual was provided a particular treatment schedule as used in the MRT (rather than some other fixed treatment assignment); the interpretation of the causal excursion effect is the causal effect of excursions from the existing treatment schedule. In the case of the HeartSteps MRT, the existing treatment schedule is “deliver activity suggestion with probability 0.6, if user is available at the decision point” and the excursion effect is a contrast between sending activity message now and not sending activity message now, assuming the user had experienced the existing treatment schedule up to now. The excursion aspect makes it overt that the comparison of two excursions at time 𝑡 might depend on how treatments were assigned prior to that time, which, in turn, depends on the treatment schedule of the particular MRT. Therefore, the causal excursion effect estimated from an MRT with a particular treatment schedule may differ from the causal excursion effect estimated from an MRT with a different treatment schedule. Recall that the main goal of an MRT is to inform intervention development by identifying ways to improve the existing treatment schedule (see the subsection “Using moderation effect analysis to inform intervention development”), and focusing on the causal excursion effect allows the investigator to do exactly that. Connection to the Multiphase Optimization Strategy (MOST)

As discussed in the companion paper (Walton, et al., under review), the MRT fits naturally within the multiphase optimization strategy (MOST; e.g. Collins, 2018; Collins & Kugler, 2018). MOST is an engineering-inspired approach for optimization of behavioral, biobehavioral, and biomedical interventions. In the optimization phase of MOST, the investigator conducts one or more randomized experiments, called optimization trials, aimed at RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS gathering scientific information about the causal effects of individual intervention components, which are needed to construct a new optimized intervention or to optimize an existing intervention. Typically, this includes inference for causal effects of individual intervention components, as well as moderation analyses. The MRT is one design that can be used for an optimization trial, along with factorial and fractional factorial designs (Collins et al., 2009), the sequential, multiple assignment, randomized trial (SMART; Collins, Nahum-Shani, & Almirall, 2014), and other experimental designs. This paper defines inferential methods for the causal effects in an MRT in the optimization phase of MOST. Inference concerning causal excursion effects fits naturally within the overall conceptual framework of MOST. In this framework optimization is an ongoing process of intervention improvement, in which each optimization trial provides information useful in generating hypotheses about how to improve the intervention further and, therefore, informs the design of the next optimization trial. For example, the following question can be characterized by the causal excursion effect: If the treatment schedule for the activity suggestions based on knowledge of the current location were altered, would this improve subsequent 30-minute step count? In digital interventions this inferential goal makes sense even in implementation as the team must continually monitor and update the digital application software. Similarly, continually monitoring performance and assessing how to best improve the current schedule for assigning treatments is natural. The causal excursion effect is useful for this purpose. Sample Size Considerations

Liao et al. (2016) provides theory for determining the sample size for the setting in which the moderator 𝑆 is exogenous (for example, time since the individual started the intervention). A web applet can be found at https://methodologycenter.shinyapps.io/mrt_ss/. The applet takes as RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS input duration of the study, number of decision points per day, expected availability pattern, randomization probability (which can be time-varying), target proximal treatment effect to be detected, and the desired power, and it outputs the required sample size (or vice versa; input sample size and output power). Alternately, R code is freely available at https://cran.r-project.org/web/packages/MRTSampleSize/index.html. Additional Types of Causal Effects

This paper focuses on the immediate causal excursion effect (“immediate” in the sense that there is no other treatment between the decision point 𝑡 and the proximal outcome 𝑌 ) of a time-varying digital intervention. One may also be interested in inference about a delayed causal excursion effect. For example, when assessing the effect of the planning support component, it may be of interest to assess the effect of a planning support prompt on the total step count over the next x days, as it would be desirable for the delivery of a planning prompt to have a longer-term effect, such as forming a habit. The generalization of WCLS to assess such delayed effects is given in Boruvka et al. (2018). Here we note only that the interpretation of such delayed causal excursion effects averages over, in addition to the history information observed up to that decision point, future treatments and future covariates. Suppose researchers are interested in the effect of a planning support prompt on the total step count over the next x days. This effect would be marginal with respect to the schedule for delivering planning support during the subsequent x days. Other more familiar causal effects might be estimated, but additional assumptions are necessary. For example, suppose it can be safely assumed that the treatments prior to the current decision point will not impact subsequent outcomes (i.e., these prior treatments do not have delayed positive or negative effects). Then the potential outcomes such as RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS p𝑌 (𝑎A ), 𝐼 (𝑎A ), 𝑆 (𝑎A )q are actually p𝑌 (𝑎 ), 𝐼 (𝑎 ), 𝑆 (𝑎 )q, making it reasonable to focus on inference for the effect, 𝐸[𝑌 (1) − 𝑌 (0)| 𝐼 (𝐴 ) = 1, 𝑆 (𝐴 ) = 𝑠] . In terms of the primary analysis of data from an MRT, we opt to make inference about causal excursion effects due to both its interpretation in the above continual learning paradigm and the minimal causal inference assumptions it requires. Of course, in secondary and hypothesis-generating analyses, a variety of statistical assumptions would be made to draw inferences about other causal effects. Other Types of Analyses

Generalized estimating equations (GEE; Liang & Zeger, 1986) and multi-level models (MLM; Laird & Ware, 1982; Raudenbush & Bryk, 2002) have been used with great success to analyze data from intensive longitudinal studies; at first glance they appear to be a natural choice for conducting primary and secondary data analysis for MRTs. However, these methods can result in biased causal effect estimates for MRTs when there are endogenous time-varying covariates—covariates that can depend on previous outcomes or previous treatments. For example, in HeartSteps, the prior 30-minute step count is likely impacted by prior treatment and is thus endogenous. We illustrate this bias in Appendix A.

Limitations and Future Directions

Limitations of the WCLS method presented in this paper include the following. (i) The method presented here is applicable only to the case where the proximal outcome is continuous. Qian, Yoo, Klasnja, Almirall, & Murphy (2019) provide an alternative method for analyzing MRT data with a binary outcome that yields causal effect estimates on the relative risk scale. (ii) The method presented here provides estimates of marginal effects; thus it does not provide person-specific predictions of treatment effect except as explained by observed covariates. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Future directions include the development of (i) related methods for other types of proximal outcomes (e.g., zero-inflated, categorical, ordinal, and longitudinal proximal outcomes), and (ii) multi-level models to model person-specific causal excursion effects that are appropriate for MRTs. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Appendix A GEE and MLM Can Be Biased When Estimating Causal Excursion Effects in MRTs

MRTs produce intensive longitudinal data (Schafer, 2006), as individuals are randomized among intervention options repeatedly during the MRT, and outcomes and covariates are assessed in tandem with randomization. Repeated measurement of the same individuals over time means that the repeated observations are likely dependent.

Generalized estimating equations (GEE; Liang & Zeger, 1986) and multi-level models (MLM; Laird & Ware, 1982; Raudenbush & Bryk, 2002), the latter also known as mixed models or random effects models, have been used widely in analyzing longitudinal data. However, as we illustrate below, inappropriate application of them to MRT data may result in biased estimates of the causal excursion effects when endogenous time-varying covariates are included in the model. A time-varying covariate is endogenous if it can depend on previous outcomes or previous treatments, which commonly occurs in MRTs. For example, in analyzing the effect of activity suggestion in the subsequent 30-minute step count in HeartSteps, one may want to control for the 30-minute step count prior to each decision point to reduce noise. Because the 30-minute step count prior to a decision point can be correlated with past step counts (i.e., past outcomes), it is endogenous. When a time-varying covariate is not endogenous, it is called exogenous . Examples of exogenous time-varying covariates include time, weather, and anything that cannot be impacted by previous treatments or previous outcomes.

Inappropriate Use of GEE and MLM Can Result in Biased Causal Excursion Effect Estimates in the Presence of Endogenous Time-Varying Covariates

Pepe & Anderson (1994) demonstrated that, in the presence of endogenous time-varying covariates, parameter estimates from GEE may be biased unless certain conditions, described below, are met. Such bias is also shown in subsequent research through simulation studies and RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS analytic calculations (Diggle et al., 2002; Pan et al., 2000; Schildcrout & Heagerty, 2005; Tchetgen et al., 2012; Vansteelandt, 2007). For completeness we provide a brief explanation of the bias here. Consider a simplified version of the HeartSteps MRT, where there are two decision points for each individual and individuals are always available. Suppose the observed data for individual i is (𝑋 , 𝐴 , 𝑌 , 𝑋 , 𝐴 , 𝑌 ) , where 𝑋 denotes the 30-minute step count prior to decision point 𝑡 (an endogenous time-varying covariate), 𝐴 is the indicator of whether an activity suggestion is delivered at decision point 𝑡 (so 𝐴 has .6 probability to be 1), and 𝑌 is the 30-minute step count following decision point t . The researcher chooses 𝑆 = 𝑋 in equation (2): they want to assess whether the effect of the activity suggestion is moderated by the prior 30-minute step count. The researcher may then choose to impose the following linear model on the mean of the proximal outcome given the treatment and the covariate at decision point 𝑡 : 𝐸p𝑌 r𝐴 , 𝑋 q = 𝛼 I + 𝛼 𝑋 + 𝐴 (𝛽 I + 𝛽 𝑋 ), (7) and use GEE to estimate the coefficients 𝛼 I , 𝛼 , 𝛽 I , 𝛽 . Often a non-independent working correlation structure is used in GEE, aiming for efficiency gain (i.e., smaller standard error of the estimated coefficients compared to GEE with working independence correlation structure). It is well known that GEE produces consistent estimates regardless of the choice of the working correlation structure, as long as equation ( ) holds; however, this is only true when all covariates are exogenous. In this above example with two decision points, Pepe & Anderson In this particular example in which the randomization probability is constant, the 𝛽 I , 𝛽 in the proximal treatment effect term in ( ) equals the 𝛽 I ′ , 𝛽 ′ in 𝐸[𝐸( 𝑌 ∣∣ 𝐴 = 1, 𝐻 ) − 𝐸( 𝑌 ∣∣ 𝐴 = 0, 𝐻 )|𝑆 ] = 𝛽 I ′ + 𝛽 ′ 𝑆 . Therefore, if one can obtain consistent estimates for 𝛽 I , 𝛽 , one obtains consistent estimate for the proximal treatment effect defined in ( ). In general, however, when the randomization probability can depend on 𝐻 , the 𝛽 I , 𝛽 in ( ) no longer equals the 𝛽 in ( ) due to the marginalization over 𝑆 . This is another reason, in addition to the reason that will be presented in the next paragraph of the paper, why inappropriate use of GEE results in biased proximal treatment effect estimates. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS (1994) demonstrated that to guarantee the consistency of the GEE estimates, one of the following conditions needs to hold: (i) 𝐸p𝑌 r𝐴 , 𝑋 q = 𝐸p𝑌 r𝐴 , 𝑋 , 𝐴 , 𝑋 q for 𝑡 = 1,2 ; or (ii) a working independence correlation structure is used. Condition (i) is usually violated when 𝑋 is endogenous: In this particular example, 𝑋 can be correlated with 𝑌 , so that 𝐸(𝑌 |𝐴 , 𝑋 ) ≠ 𝐸(𝑌 |𝐴 , 𝑋 , 𝐴 , 𝑋 ) . This means that unless the independent working correlation structure is used, GEE can produce biased estimates even if equation (7) holds. The same bias can occur when MLM is used instead of GEE. In general, for each MLM there is a corresponding GEE with a non-independent correlation structure that produces the same estimated coefficients. For example, an MLM resembling equation (7) is 𝑌 = 𝛼 I +𝛼 𝑋 + 𝐴 (𝛽 I + 𝛽 𝑋 ) + 𝑢 + 𝜖 , where 𝑢 ∼ Normal(0, 𝜎 }> ) is a random intercept and 𝜖 ∼Normal(0, 𝜎 ~> ) is the error term. This corresponds to a GEE with compound symmetric (also called exchangeable) working correlation structure. Given this equivalency, MLM can produce biased estimates if the covariate 𝑋 is endogenous. A Few Scenarios Where GEE or MLM Provides Consistent Causal Excursion Effect Estimates From MRT Data

GEE builds upon a marginal mean model (i.e., the relationship between the mean of the proximal outcome, the covariates, and the treatment assignments, such as (7)). If no endogenous time-varying covariates are included in the model, the individuals are always available, and the randomization probability is constant, GEE with any working correlation structure gives consistent estimates as long as the marginal mean model is correct. If there are endogenous time-varying covariates in the model, the individuals are always available, and the randomization RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS probability is constant, GEE with independent working correlation structure still gives consistent estimates as long as the marginal mean model is correct, but GEE with other working correlation structure does not. Because an MLM always corresponds to a GEE with some non-independent working correlation structure, MLM provides consistent causal excursion effect estimates if no endogenous time-varying covariates are included in the model, the individuals are always available, and the randomization probability is constant. However, although the estimated coefficients from an MLM will generally be biased for the causal excursion effect when there are endogenous time-varying covariates, those estimated coefficients can have a different, individual-specific interpretation under a rather strong assumption. As shown in Qian, Klasnja, & Murphy (2019), if the endogenous time-varying covariates can be safely assumed to only depend on the random effect through the observed previous outcomes and previous covariates, then the fitted results from standard linear mixed models can be interpreted as a causal effect that is conditional on the random effect (i.e., individual-specific rather than population-average) and conditional on the entire history 𝐻 (rather than conditional only on 𝑆 ). An example where this strong assumption holds is when the endogenous time-varying covariates are previous proximal outcomes (e.g., the endogenous time-varying covariate at decision point 𝑡 is the proximal outcome at decision point 𝑡 − 1 ). A Mathematical Demonstration of the Bias From Inappropriate Application of GEE When There are Endogenous Time-Varying Covariates

For clarity we consider the case where each participant is in the MRT for two decision points. The data for the i -th participant is (𝑋 , 𝐴 , 𝑌 , 𝑋 , 𝐴 , 𝑌 ) , where 𝑋 is the covariate, 𝐴 is the treatment assignment, and 𝑌 is the continuous outcome. The covariate 𝑋 is RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS endogenous time-varying, in the sense that it can depend on previous treatment and previous outcome. The model on the marginal mean of 𝑌 is 𝐸(𝑌 |𝐴 , 𝑋 ) = 𝛼 I + 𝛼 𝑋 +𝐴 (𝛽 I + 𝛽 𝑋 ) . The corresponding GEE solves the following estimating equation: V (cid:127) 1 1𝑋 𝑋 𝐴 𝐴 𝐴 𝑋 𝐴 𝑋 (cid:128) 𝑉 ?9 Z𝑌 − 𝛼 I − 𝛼 𝑋 − 𝐴 (𝛽 I + 𝛽 𝑋 )𝑌 − 𝛼 I − 𝛼 𝑋 − 𝐴 (𝛽 I + 𝛽 𝑋 )[ ni=1 = 0. (8) Here, 𝑛 denotes the number of participants, and 𝑉 is a working covariance matrix. Examples of 𝑉 include the following: • Working independence:

𝑉 = (cid:132)𝜎 >

00 𝜎 > (cid:133) • Compound symmetry:

𝑉 = Z 𝜎 > 𝜌𝜎 > 𝜌𝜎 > 𝜎 > [ • Autoregressive (in the special case of two decision points, autoregressive is the same as compound symmetry):

𝑉 = Z 𝜎 > 𝜌𝜎 > 𝜌𝜎 > 𝜎 > [. In this setting, the result in Pepe & Anderson (1994) implies that GEE is guaranteed to produce consistent 𝛼 I , 𝛼 , 𝛽 I , 𝛽 if either (i) 𝐸(𝑌 |𝐴 , 𝑋 ) = 𝐸(𝑌 |𝐴 , 𝑋 , 𝐴 > , 𝑋 > ) for 𝑡 = 1,2 , or (ii) a working independence correlation structure is used, and they provided simulation results to show that GEE can produce biased estimates when neither condition holds. In the following, we rephrase the intuitive argument given in Pepe and Anderson (1994) in this particular setting to show why GEE can be biased if neither condition holds. We write 𝑉 ?9 = (cid:132)𝑤 𝑤 𝑤 >9 𝑤 >> (cid:133) and write the residual 𝑟 = 𝑌 − 𝛼 I − 𝛼 𝑋 −𝐴 (𝛽 I + 𝛽 𝑋 ) . A summand (for fixed 𝑖 ) in equation (8) becomes RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS (cid:127) 1 1𝑋 𝑋 𝐴 𝐴 𝐴 𝑋 𝐴 𝑋 (cid:128) (cid:132)𝑤 𝑤 𝑤 >9 𝑤 >> (cid:133) (cid:132)𝑟 𝑟 (cid:133)= ⎣⎢⎢⎡ (𝑤 + 𝑤 >9 )𝑟 + (𝑤 + 𝑤 >> )𝑟 (𝑤 𝑋 + 𝑤 >9 𝑋 )𝑟 + (𝑤 𝑋 + 𝑤 >> 𝑋 )𝑟 (𝑤 𝐴 + 𝑤 >9 𝐴 )𝑟 + (𝑤 𝐴 + 𝑤 >> 𝐴 )𝑟 (𝑤 𝐴 𝑋 + 𝑤 >9 𝐴 𝑋 )𝑟 + (𝑤 𝐴 𝑋 + 𝑤 >> 𝐴 𝑋 )𝑟 ⎦⎥⎥⎤. (9) Because

𝐸(𝑌 |𝐴 , 𝑋 ) = 𝛼 I + 𝛼 𝑋 + 𝐴 (𝛽 I + 𝛽 𝑋 ) , we have 𝐸[𝑟 ] = 𝐸[𝑋 𝑟 ] = 𝐸[𝐴 𝑟 ] = 𝐸[𝐴 𝑋 𝑟 ] = 0. Therefore, all the terms with 𝑤 𝑟 and 𝑤 >> 𝑟 (such as 𝑤 𝑟 𝑋 ; i.e., terms that are multiplied with the diagonal elements of 𝑉 ?9 ) in (9) have expectation zero, and what is left are the terms with 𝑤 >9 𝑟 and 𝑤 𝑟 (i.e., terms that are multiplied with the off-diagonal elements of 𝑉 ?9 ). In other words, the expectation of ( ) equals (cid:127) 0𝑤 >9 𝑋 𝑟 + 𝑤 𝑋 𝑟 𝑤 >9 𝐴 𝑟 + 𝑤 𝐴 𝑟 𝑤 𝐴 𝑋 𝑟 + 𝑤 𝐴 𝑋 𝑟 (cid:128). (10) Mathematical theory for GEE tells us that GEE outputs consistent 𝛼 I , 𝛼 , 𝛽 I , 𝛽 when (9) has expectation zero; i.e., when (10) equals zero. If condition (i) holds, we have 𝐸[𝑋 𝑟 ] = 𝐸[𝐴 𝑟 ] = 𝐸[𝐴 𝑋 𝑟 ] = 0 , and similarly 𝐸[𝑋 𝑟 ] = 𝐸[𝐴 𝑟 ] = 𝐸[𝐴 𝑋 𝑟 ] = 0 . Therefore, (10) equals 0 with any choice of 𝑉 ?9 , and GEE estimators are consistent. If condition (ii) holds, we have 𝑤 >9 = 𝑤 = 0 . Hence (10) equals 0 and GEE estimators are consistent. When neither condition holds, it’s likely that (10) does not equal zero. For example, suppose 𝑋 = 𝑌 . Then the term 𝑋 𝑟 equals 𝑌 {𝑌 − 𝛼 I + 𝛼 𝑋 + 𝐴 (𝛽 I + 𝛽 𝑋 )}, (11) RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS which is the residual multiplied with the outcome itself. Because the residual and the outcome at the same time point are correlated, (11) likely does not equal zero. Therefore, (10) likely does not equal zero. This means GEE can be biased when neither conditions hold, i.e., when endogenous time-varying covariates are included and non-independent working correlation structure is used. Appendix B A General Form of the WCLS Estimator for the Causal Excursion Effect That Allows the Randomization Probability to Vary Over Time

We assume a linear model for the causal excursion effect: 𝛽(𝑡, 𝑠) = 𝑠 K 𝛽 . Suppose 𝑍 𝛼 is a working model for the conditional mean of 𝑌 given no treatment at decision point t and history 𝐻 , 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) . Note that the consistency of the estimator for 𝛽 does not require 𝑍 𝛼 to be a correct model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) . We use 𝑝 (𝐻 ) to denote the randomization probability at decision point 𝑡 , which may possibly depend on 𝐻 . The WCLS estimator for 𝛽 is calculated as follows. Suppose (𝛼T, 𝛽U ) is the (𝛼, 𝛽) value that solves the following estimating equation: V V 𝐼 𝑊 W𝑌 − 𝑍 𝛼 − {𝐴 − 𝑝(cid:144) (𝑆 )}𝑆 𝛽Y Z 𝑍 {𝐴 − 𝑝(cid:144) (𝑆 )}𝑆 [ K4L9\7L9 = 0; (12) then 𝛽U is the WCLS estimator for 𝛽 . 𝑝(cid:144) (𝑆 ) is an arbitrary probability as long as it depends on 𝐻 through at most 𝑆 and it is bounded away from 0 and 1; 𝑖 is the index for the i th individual, and 𝑊 is defined as 𝑊 = (cid:145) 𝑝(cid:144) (𝑆 )𝑝 (𝐻 )(cid:146) (cid:147) (cid:148)(cid:149) (cid:145)1 − 𝑝(cid:144) (𝑆 )1 − 𝑝 (𝐻 )(cid:146) (cid:148)(cid:149) . (13) RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS 𝑊 , the ratio of two probabilities, serves as a change of probability: It makes it as if the treatment 𝐴 is randomized with probability 𝑝(cid:144) (𝑆 ) . It is used to marginalize the causal excursion effect over variables in 𝐻 but not in 𝑆 . As long as 𝑝(cid:144) (𝑆 ) depends on 𝐻 through at most 𝑆 and it is bounded away from 0 and 1, the particular choice of 𝑝(cid:144) (𝑆 ) doesn’t affect the consistency of 𝛽U . For instance, one can set it to be 0.5 (or any constant between 0 and 1) for all individuals and all decision points, or set it to be the predicted value from a logistic regression fit of 𝐴 ~𝑆 . If the true randomization probability 𝑝 (𝐻 ) depends at most on 𝑆 , then one can also set 𝑝(cid:144) (𝑆 ) to be equal to the true randomization probability, in which case (12) is mathematically equivalent to (5). 𝑝(cid:144) (𝑆 ) can impact the standard error of 𝛽U . In addition, when the causal excursion effect model 𝛽(𝑡, 𝑠) = 𝑠 K 𝛽 is misspecified, 𝑝(cid:144) (𝑆 ) impacts the limit of 𝛽U . See the Appendix of Boruvka et al. (2018) for more technical details on how the limit of 𝛽U is impacted by 𝑝(cid:144) (𝑆 ) in this case. Now we present a way to obtain the general WCLS estimator for time-varying randomization probability through standard statistical software that implements GEE. Suppose the assumed causal excursion effect model is (6) and the working model for 𝐸(𝑌 |𝐴 = 0, 𝐼 = 1, 𝐻 ) is 𝑍 𝛼 ; then the WCLS estimator 𝛽U and its standard error can be obtained by (i) incorporating 𝐼 𝑊 as the “prior weights”, (ii) chooseing a working independence correlation structure, and (iii) fitting GEE with dependent variable 𝑌 and independent variables 𝑍 and p𝐴 − 𝑝(cid:144) (𝑆 )q𝑆 . Then the estimated coefficient for p𝐴 − 𝑝(cid:144) (𝑆 )q𝑆 is the WCLS estimate 𝛽U . Standard Error Formula for WCLS.

Below we provide the formula for the standard error of the WCLS estimator 𝛽U . For (𝛼T, 𝛽U ) that solves estimating equation (12), variance can be estimated by RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Var (cid:150) (cid:151)Z𝛼T𝛽U [(cid:152) = 𝑀 \?9 𝛴 \ (𝑀 \?9 ) K , where 𝑀 \ = −ℙ \ ∑ 𝑊 Z 𝑍 𝑍 {𝐴 − 𝑝(cid:144) (𝑆 )}𝑍 𝑆 {𝐴 − 𝑝(cid:144) (𝑆 )}𝑆 𝑍 {𝐴 − 𝑝(cid:144) (𝑆 )} > 𝑆 𝑆 [ and Σ \ = ℙ \ V(cid:157)𝑌 − 𝑍 𝛼 − p𝐴 − 𝑝(cid:144) (𝑆 )q𝑆 𝛽(cid:158) > 𝑊 Z 𝑍 𝑍 {𝐴 − 𝑝(cid:144) (𝑆 )}𝑍 𝑆 {𝐴 − 𝑝(cid:144) (𝑆 )}𝑆 𝑍 {𝐴 − 𝑝(cid:144) (𝑆 )} > 𝑆 𝑆 [. Here, ℙ \ denotes sample average over n individuals. The standard error formula can be modified for the setting in which the randomization probability is constant over time (i.e., the setting in the main paper) by letting 𝑝(cid:144) (𝑆 ) = 𝑝 and 𝑊 = 1 . Appendix C

We conduct a simulation study to illustrate the claim that including variables that are correlated with 𝑌 in 𝑍 may reduce the variance of the WCLS estimator. The generative model mimics features of the HeartSteps data and is set up as follows. For simplicity we assume users are always available. At decision point 𝑡 , the covariate 𝑋 is drawn from the empirical distribution of the log-transformed 30-minute step count preceding a decision point in the HeartSteps data. For simplicity 𝑋 is generated independently of previous outcomes and treatments. The treatment 𝐴 is generated from a Bernoulli distribution with .6 success probability; this mimics the .6 randomization probability of activity suggestions in HeartSteps. The proximal outcome 𝑌 is generated from a Gaussian distribution with mean + 0.0655 × 𝑌 + 0.1229 × (𝐴 − 0.6) RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS and standard deviation 2.716. The coefficients in the above display are the estimated coefficients from a WCLS fit on the HeartSteps data with the same control variables (1, 𝑋 , 𝑌 ) and constant treatment effect model. The standard deviation is the empirical standard deviation of the residual in 𝑌 from the above WCLS fit. As in the HeartSteps data set, for each simulated trial we generate 37 individuals, each with 210 decision points. For each data set generated from the above generative model, we consider four WCLS fits for the true treatment effect 0.1229 and compare their performance. All four WCLS assume the constant treatment effect model, and they differ in the choice of the working model. The first WCLS fit (WCLS-1) includes control variables (1, 𝑋 , 𝑌 ) ; the second WCLS fit (WCLS-2) includes control variables (1, 𝑋 ) ; the third WCLS fit (WCLS-3) includes control variables (1, 𝑌 ) ; and the fourth WCLS fit (WCLS-4) includes only the intercept. The bias, standard deviation (SD), and coverage probability (CP) of 95% confidence interval are listed in Supplementary Table 1. All four WCLS estimators are consistent with nominal confidence interval coverage because their assumed constant treatment effect model holds under this generative model. (This again illustrates that the consistency of the WCLS estimator does not require the control part of the model to be correct.) On the other hand, the choice of working model affects the efficiency of the estimator. In particular, WCLS-1 and WCLS-2 have smaller standard errors than WCLS-3 and WCLS-4 because the former two include 𝑋 , a covariate that is highly correlated with the proximal outcome 𝑌 . Appendix D

To assess sensitivity of the result to potential non-linearity in Question 2 of Section “Analysis Using Data from HeartSteps MRT,” we fit a local 2-degree polynomial regression RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS with smoothing span 2/3 and tricubic weighting to estimate the causal excursion effect over time (the default setting for many local regression software, such as the lowess function in R (R Core Team, 2019)). The estimated effect from local regression is presented in Supplementary Figure 1 (black curve). Comparing this estimated effect with the estimated effect based on the linear model (blue curve in Figure 1, with blue shaded area being the pointwise 95% confidence interval), we see that the two estimates are relatively close to each other, indicating that the linear model fits well. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS References

Boruvka, A., Almirall, D., Witkiewitz, K., & Murphy, S. A. (2018). Assessing time-varying causal effect moderation in mobile health.

Journal of the American Statistical Association , (523), 1112–1121. Clawson, J., Pater, J. A., Miller, A. D., Mynatt, E. D., & Mamykina, L. (2015). No longer wearing: Investigating the abandonment of personal health-tracking technologies on craigslist. UbiComp 2015 - Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing , 647–658. https://doi.org/10.1145/2750858.2807554 Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs.

Psychological Methods , (3), 202–224. https://doi.org/10.1037/a0015826 Dempsey, W., Liao, P., Kumar, S., & Murphy, S. A. (2019). The stratified micro-randomized trial design: Sample size considerations for testing nested causal effects of time-varying treatments. ArXiv: 1711.03587 . Diggle, P., Heagerty, P., Liang, K.-Y., & Zeger, S. (2002).

Analysis of longitudinal data . Oxford University Press. Eysenbach, G. (2005). The law of attrition.

Journal of Medical Internet Research , (1), 1–11. https://doi.org/10.2196/jmir.7.1.e11 Ho, J., & Intille, S. S. (2005). Using context-aware computing to reduce the perceived burden of interruptions from mobile devices. CHI 2005: Technology, Safety, Community: Conference Proceedings - Conference on Human Factors in Computing Systems , 909–918. Holland, P. W. (1986). Statistics and causal inference.

Journal of the American Statistical Association , (396), 945–960. Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association , (475), 901–910. https://doi.org/10.1198/016214506000000447 Hudgens, M. G., & Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association , (482), 832–842. https://doi.org/10.1198/016214508000000292 IBM Corp. (2019). IBM SPSS Statistics for Windows . Jüni, P., Altman, D., & Egger, M. (2001). Assessing the quality of controlled clinical trials.

BMJ , (7303), 42–46. Klasnja, P., Harrison, B. L., Legrand, L., Lamarca, A., Froehlich, J., & Hudson, S. E. (2008). Using wearable sensors and real time inference to understand human recall of routine activities. UbiComp 2008 - Proceedings of the 10th International Conference on Ubiquitous Computing , 154–163. https://doi.org/10.1145/1409635.1409656 Klasnja, P., Hekler, E. B., Shiffman, S., Boruvka, A., Almirall, D., Tewari, A., & Murphy, S. A. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS (2015). Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology , (S), 1220. Klasnja, P., Smith, S., Seewald, N. J., Lee, A., Hall, K., Luers, B., Hekler, E. B., & Murphy, S. A. (2018). Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of HeartSteps. Annals of Behavioral Medicine : A Publication of the Society of Behavioral Medicine , (6), 573–582. https://doi.org/10.1093/abm/kay067 Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics , (4), 963–974. Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika , (1), 13–22. Liao, P., Klasnja, P., Tewari, A., & Murphy, S. A. (2016). Sample size calculations for micro-randomized trials in mHealth. Statistics in Medicine , (12), 1944–1971. Mancl, L. A., & DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics , (1), 126–134. https://doi.org/10.1111/j.0006-341X.2001.00126.x Nahum-Shani, I., Smith, S. N., Spring, B. J., Collins, L. M., Witkiewitz, K., Tewari, A., & Murphy, S. A. (2018). Just-in-time adaptive interventions (JITAIs) in mobile health: Key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine , (6), 446–462. Pan, W., Louis, T. A., & Connett, J. E. (2000). A note on marginal linear regression with correlated response data. American Statistician , (3), 191–195. https://doi.org/10.1080/00031305.2000.10474544 Pepe, M. S., & Anderson, G. L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics-Simulation and Computation , (4), 939–951. Qian, T., Klasnja, P., & Murphy, S. A. (2019). Linear mixed models under endogeneity: Modeling sequential treatment effects with application to a mobile health study. Statistical Science . Qian, T., Yoo, H., Klasnja, P., Almirall, D., & Murphy, S. A. (2019). Estimating time-varying causal excursion effect in mobile health with binary outcomes.

ArXiv: 1906.00528 . R Core Team. (2019).

R: A language and environment for statistical computing

Neurobiology of Learning and Memory , (2), 135–138. https://doi.org/10.1016/j.nlm.2008.09.012 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Sage. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Robins, J. M. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect.

Mathematical Modelling , (9–12), 1393–1512. Robins, J. M. (1987). Addendum to “a new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect.” Computers & Mathematics with Applications , (9–12), 923–945. Robins, J. M. (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics - Theory and Methods , (8), 2417–2421. https://doi.org/10.1080/03610929408831394 Robins, J. M., Hernán, M. Á., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology , (5), 550–560. https://doi.org/10.1097/00001648-200009000-00011 Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics , 34–58. Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions.

Journal of the American Statistical Association , (469), 322–331. SAS Institute Inc. (2019). SAS/STAT Software, Version 9.4

Biostatistics , (4), 633–652. https://doi.org/10.1093/biostatistics/kxi033 Shaw, R. J., Bosworth, H. B., Hess, J. C., Silva, S. G., Lipkus, I. M., Davis, L. L., & Johnson, C. M. (2013). Development of a theoretically driven mHealth text messaging application for sustaining recent weight loss. Journal of Medical Internet Research , (5). https://doi.org/10.2196/mhealth.2343 StataCorp. (2019). Stata Statistical Software: Release 16 . Tchetgen, E. J. T., Glymour, M. M., Weuve, J., & Robins, J. (2012). Specifying the correlation structure in inverse-probability-weighting estimation for repeated measures.

Epidemiology , (4), 644–646. https://doi.org/10.1097/EDE.0b013e31825727b5 Vansteelandt, S. (2007). On confounding, prediction and efficiency in the analysis of longitudinal and cross-sectional clustered data. Scandinavian Journal of Statistics , (3), 478–498. https://doi.org/10.1111/j.1467-9469.2006.00555.x Walls, T. A., & Schafer, J. L. (2006). Models for intensive longitudinal data . Oxford University Press. Walton, A., Collins, L. M., Corsby, L., Klasnja, P., Nahum-Shani, I., Rabbi, M., Walton, M., & Murphy, S. A. The micro-randomized trial for developing mobile health interventions.

Under Review . Yardley, L., Spring, B. J., Riper, H., Morrison, L. G., Crane, D. H., Curtis, K., Merchant, G. C., Naughton, F., & Blandford, A. (2016). Understanding and promoting effective engagement RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS with digital behavior change interventions. American Journal of Preventive Medicine , (5), 833–842. https://doi.org/10.1016/j.amepre.2016.06.015 RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Table 1.

Estimated main effect of activity suggestions on proximal outcome

Variable Estimate 95% LCL 95% UCL SE Hotelling t p Intercept 𝛼 I 𝛼 𝛽 I Note.

LCL (UCL) represents lower (upper) confidence limit. SE represents standard error. LCL, UCL, SE, and p are corrected for small sample size using method in (Liao et al., 2016; Mancl & DeRouen, 2001). The degrees of freedom for the Hotelling t test is (1, 34). RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Table 2.

Estimated effect of activity suggestion on proximal outcome as a linear function of time in study

Variable Estimate 95% LCL 95% UCL SE Hotelling t p

Intercept 𝛼 I 𝛼 𝛼 > -0.011 -0.020 -0.001 0.005 5.09 0.031 Activity suggestion 𝛽 I 𝛽 -0.018 -0.031 -0.006 0.006 9.19 0.005 Note.

LCL (UCL) represents lower (upper) confidence limit. SE represents standard error. LCL, UCL, SE, and p are corrected for small sample size using method in (Liao et al., 2016; Mancl & DeRouen, 2001). The degrees of freedom for the Hotelling t test is (1, 32). RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Table 3.

Estimated effect of walking suggestion / anti-sedentary suggestion on proximal outcome, moderated by location (home/work or other) during decision point

Variable Estimate 95% LCL 95% UCL SE Hotelling t p Intercept 𝛼 I 𝛼 𝛼 > 𝛽 I 𝛽 𝛽 > 𝛽 d -0.142 -0.540 0.256 0.195 0.53 0.472 Note.

LCL (UCL) represents lower (upper) confidence limit. SE represents standard error. LCL, UCL, SE, and p are corrected for small sample size using method in (Liao et al., 2016; Mancl & DeRouen, 2001). The degrees-of-freedom for the Hotelling t test is (1, 30). RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Table 4.

Estimated effect of activity suggestion on proximal outcome, moderated by whether activity planning support was received on previous night

Variable Estimate 95% LCL 95% UCL SE Hotelling t p Intercept 𝛼 I 𝛼 𝛼 > 𝛽 I 𝛽 Note.

LCL (UCL) represents lower (upper) confidence limit. SE represents standard error. LCL, UCL, SE, and p are corrected for small sample size using method in (Liao et al., 2016; Mancl & DeRouen, 2001). The degrees-of-freedom for the Hotelling t test is (1, 32). RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Supplementary Figure 1.

Estimated effect of activity suggestion on proximal outcome as a linear function of days in study, and corresponding 95% pointwise confidence intervals

Note.

Figure for the sensitivity analysis in Appendix D regarding “Question 2: Does the effect of the activity suggestions change with each additional day in the study?” in section “Analysis Using Data from HeartSteps MRT.” The black curve is the estimated effect using local 2-degree polynomial regression with smoothing span 2/3 and tricubic weighting. The blue line represents the estimated causal excursion effect across the 42 study days, assuming a linear time trend, and the shaded blue area is the pointwise 95% confidence interval. RT FOR DEVELOPING DIGITAL INTERVENTIONS: DATA ANALYSIS Supplementary Table 1.

Simulation results for Appendix C: Efficiency gain from including prognostic variable in the working model bias standard deviation 95% coverage probability WCLS-1 -0.001 0.067 96.7% WCLS-2 -0.001 0.067 96.9% WCLS-3 -0.001 0.074 95.8% WCLS-4 -0.001 0.074 95.7%

Note.

All four WCLS assumes the constant treatment effect model, and they differ in the choice of the working model. WCLS-1 includes control variables (1, 𝑋 , 𝑌 ) ; WCLS-2 includes control variables (1, 𝑋 ) ; WCLS-3 includes control variables (1, 𝑌 ))