[PDF] Assessing Vaccine Durability in Randomized Trials Following Placebo Crossover

Abstract

Randomized vaccine trials are used to assess vaccine efficacy and to characterize the durability of vaccine induced protection. If efficacy is demonstrated, the treatment of placebo volunteers becomes an issue. For COVID-19 vaccine trials, there is broad consensus that placebo volunteers should be offered a vaccine once efficacy has been established. This will likely lead to most placebo volunteers crossing over to the vaccine arm, thus complicating the assessment of long term durability. We show how to analyze durability following placebo crossover and demonstrate that the vaccine efficacy profile that would be observed in a placebo controlled trial is recoverable in a trial with placebo crossover. This result holds no matter when the crossover occurs and with no assumptions about the form of the efficacy profile. We only require that the vaccine efficacy profile applies to the newly vaccinated irrespective of the timing of vaccination. We develop different methods to estimate efficacy within the context of a proportional hazards regression model and explore via simulation the implications of placebo crossover for estimation of vaccine efficacy under different efficacy dynamics and study designs. We apply our methods to simulated COVID-19 vaccine trials with durable and waning vaccine efficacy and a total follow-up of two years.

Full PDF

AAssessing Vaccine Durability in Randomized TrialsFollowing Placebo Crossover

Jonathan Fintzi and Dean Follmann,

Biostatistics Research Branch, National Institute of Allergy and Infectious DiseasesRockville, Maryland, U.S.A.email addresses: [email protected] and [email protected]

Abstract

Randomized vaccine trials are used to assess vaccine eﬃcacy and to characterize thedurability of vaccine induced protection. If eﬃcacy is demonstrated, the treatment ofplacebo volunteers becomes an issue. For COVID-19 vaccine trials, there is broad consen-sus that placebo volunteers should be oﬀered a vaccine once eﬃcacy has been established.This will likely lead to most placebo volunteers crossing over to the vaccine arm, thuscomplicating the assessment of long term durability. We show how to analyze durabilityfollowing placebo crossover and demonstrate that the vaccine eﬃcacy proﬁle that wouldbe observed in a placebo controlled trial is recoverable in a trial with placebo crossover.This result holds no matter when the crossover occurs and with no assumptions aboutthe form of the eﬃcacy proﬁle. We only require that the vaccine eﬃcacy proﬁle appliesto the newly vaccinated irrespective of the timing of vaccination. We develop diﬀerentmethods to estimate eﬃcacy within the context of a proportional hazards regression modeland explore via simulation the implications of placebo crossover for estimation of vaccineeﬃcacy under diﬀerent eﬃcacy dynamics and study designs. We apply our methods tosimulated COVID-19 vaccine trials with durable and waning vaccine eﬃcacy and a totalfollow-up of 2 years.

Keywords:

COVID-19; Proportional hazards regression; Vaccine eﬃcacy; Vaccine trial design.1 a r X i v : . [ s t a t . A P ] F e b Introduction

Randomized phase III clinical trials are used to deﬁnitively demonstrate the eﬃcacy of can-didate vaccines. Volunteers are randomized to receive vaccine or a placebo and followed fora period of time to assess whether the vaccine reduces the rate of disease acquisition. Animportant question in vaccine development is whether vaccine induced protection is durable.For COVID-19 vaccines, questions surrounding vaccine durability are important as acquiredimmunity against seasonal and other coronaviruses ranges from 6 months to 2 years (Polandet al., 2020, Choe et al., 2020). Clinical trials for vaccines against COVID-19 plan to followparticipants for up to two years (Moderna, 2020).To assess long term safety and durability, long term blinded follow-up of the original placeboand vaccine arms is ideal (World Health Organization, 2020). From an ethical perspective,placebo volunteers should be oﬀered a vaccine once eﬃcacy is established (Wendler et al.,2020). However, vaccination of placebo volunteers may occur before it is known whether vaccineinduced protection is durable. Besides waning of eﬃcacy, there is concern that the vaccinemight eventually cause harm, i.e. negative vaccine eﬃcacy, in subgroups. Such harm is knownas vaccine associated enhanced disease (VAED) and has been observed in other contexts, suchas the Dengvaxia vaccine in seronegative individuals (Sridhar et al., 2018).It might seem that the ability to assess vaccine durability following placebo crossover iscompletely lost once there is no longer an unvaccinated control group (World Health Organi-zation, 2020). However, at the point of crossover the study remains a randomized trial, albeitof immediate vs deferred vaccination. This contrast allows the vaccine eﬃcacy (VE) proﬁle fora standard non-crossover trial to be recovered with placebo crossover (Follmann et al., 2020).The only additional assumption that is required is that the same VE proﬁle applies to the newlyvaccinated irrespective of the timing of vaccination e.g. June or December.Crossover trials for absorbing endpoints, such as infection or death, have been discussed inthe literature (Nason and Follmann, 2010, Makubate and Senn, 2010). However, these methodsapply to estimation of a ﬁxed intervention eﬀect which stops once the intervention is removed.Vaccination is quite diﬀerent as the beneﬁt lingers and our goal is to see how the interventioneﬀect varies with time. Crossover has been discussed for vaccine trials, but only for the placeboarm and only to measure immune response (Follmann, 2006). Delayed vaccination has beenused in an Ebola vaccine trial, but to serve as control group prior to deferred vaccination(Henao-Restrepo et al., 2017).In this work, we establish that vaccine durability can be accurately assessed followingplacebo crossover under fairly mild assumptions. We demonstrate how to estimate VE as afunction of time since vaccination under placebo crossover using proportional hazards (PH)regression (Cox, 1972). We specify VE as 1 minus the hazard ratio and allow this hazard ratioto depend on time through use of time-varying covariates (Therneau and Grambsch, 2013). Wespecify log-linear and P-spline functions to allow for a variety of shapes and provide a justi-ﬁcation for using calendar time as the natural timescale in vaccine trials where risk can varysubstantially with calendar time. We discuss and evaluate diﬀerent approaches for crossing overand consider how the timing and pace of crossover and unspeciﬁed heterogeneity in risk aﬀectestimation. We evaluate our methods by simulation and analyze two simulated COVID-19vaccine trials that vaccinate placebo volunteers after eﬃcacy is established.2

Vaccine Eﬃcacy Under Placebo Crossover

Consider a vaccine trial where volunteers are randomized to receive vaccine or placebo. Fornow, assume that everyone is enrolled at the same time, so calendar time and time sincerandomization are aligned. All participants are followed over the period [0 , τ ], and a blindedcrossover occurs at time τ . At this point, the volunteers randomized to vaccine receive placebo,and the volunteers randomized to placebo receive vaccine. Following crossover both arms arevaccinated and thus comparative eﬃcacy might seem lost as there is no control group. However,a randomized trial remains, though now as a trial of immediate vs deferred vaccination; theseassignments correspond to the original vaccine and placebo arms. This ’rebranded’ randomizedtrial can still provide information about vaccine durability, even after the point of crossover.To illustrate, suppose we have case counts for the two randomization arms over the twoperiods, (0 , τ ] and ( τ, τ ]. Suppose that the vaccine:placebo case split is 20:100 in periodone, and in period two we observe a case split of 20:12 in the original vaccine arm:deferredvaccination arm. Using a person-time analysis, and imagining the denominators are so largethat they cancel out, we obtain a simple estimate of the period one VE as (cid:100) V E = 1 − /

100 =0 .

80. Assume now that this VE applies to the newly vaccinated participants in the secondperiod with 12 cases. With this assumption we can estimate N plac , the number of cases for acounterfactual placebo group in period 2, as we are assuming 0 .

80 = 1 − / ˆ N plac , which yields N plac = 12 /. . We then contrast the counterfactual placebo case count of 60 to the 20observed cases in the original vaccine arm to obtain an estimate of placebo controlled VE inperiod two, (cid:100)

V E = 1 − /

60 = 0 .

67. Based on these crude estimates, we conclude that VEhas waned as eﬃcacy has dropped from 80% in period one to 66.7% in period two.The crux of this example is that the VE for the newly vaccinated is portable across periods.That is, the original placebo arm receives the same immediate beneﬁt from vaccination that theoriginal vaccine group received, regardless of changes in the population attack rate. Additionalconsiderations for a period focused approach are discussed in (Follmann et al., 2020). Whilea period focused approach is simple and clear, a more natural development is to allow vaccineeﬃcacy to vary smoothly with time. The Cox proportional hazards (PH) model allows thisto be easily accomplished. Under the PH model, the baseline placebo hazard function for thetime to disease is arbitrary and the eﬀect of vaccination induces a hazard proportional to thebaseline hazard. We can formulate the above example for a standard trial with no crossover as h ( t ) = h ( t ) exp { Z I θ + Z I θ } , (1)where t is time since randomization, h ( t ) is the placebo hazard function, and Z is the vaccineassignment indicator. Vaccine eﬃcacy is deﬁned as the relative change in the instantaneousrisk of acquiring disease, and is given by 1 − exp( θ ) for period 1 and 1 − exp( θ ) for period 2.Suppose placebo volunteers are vaccinated at the end of period 1. Assuming the vaccineeﬀect for the newly vaccinated applies in period 2, we can write h ( t ) = h ( t ) exp { Z I θ + Z I ( θ − θ ) + I θ } = λ ( t ) exp { Z I θ + Z I θ } , where the baseline hazard λ ( t ) = h ( t ) exp { I θ } applies to the original placebo arm for bothperiods. The ﬁrst line parameterizes the placebo controlled VE for the newly vaccinated in3eriod 2 as 1 − exp( θ ) and the VE for the original vaccinees in period 2 as 1 − exp( θ ). Analogousto the simple example where we recovered the placebo case count N plac after time τ , here we canrecover the counterfactual placebo hazard function h ( t ) for t ≥ τ as h ( t ) = λ ( t ) exp( − θ ).Equation (1) is deceptively simple. Generalizations of this idea allow us to recover an arbi-trary placebo controlled vaccine eﬃcacy curve long after the placebo group has been completelyvaccinated and with no assumptions about the baseline hazard function for an actual or coun-terfactual placebo arm. To illustrate this generality, suppose that the placebo controlled hazardis given by a log-linear function of time. h ( t ) = h ( t ) exp { Z ( θ + θ t ) } (2)If crossover to vaccine occurs at time τ in the placebo arm, the resultant hazard is h ( t ) = h ( t ) exp { Z ( θ + θ t ) + (1 − Z ) I ( θ + θ ( t − τ )) } . = λ ( t ) exp { Zθ t + (1 − Z ) I θ ( t − τ ) } . (3)where λ ( t ) = h ( t ) exp( I θ ). A visualization of this hazard function is given in Figure 1.To better understand what happens in terms of estimation, suppose that an event occurs inthe original placebo arm at time t = s < τ , which is post randomization, but before crossover.This scenario is illustrated in Figure 1. The partial likelihood contribution for this event reducesto 1 (cid:80) i ∈ R ( s ) exp { Z i ( θ + θ s ) } where R ( s ) is the set of indices for volunteers who remain event free at time s . Thus events priorto τ allow estimation of θ , and crucially θ . Next suppose an event occurs at time t = τ + s post randomization to an individual in the original placebo arm who was vaccinated at time τ and who now has been vaccinated for s days (Figure 1). The partial likelihood contribution forthis event is exp( θ s ) (cid:80) i ∈ R ( τ + s ) exp { Z i ( τ + s ) θ + (1 − Z i ) sθ } . We see that θ is gone as the baseline hazard, λ ( τ + s ) = h ( τ + s ) exp( θ ), cancels out ofthe numerator and denominator. Thus the pre-crossover period completely determines thereliability of the estimate of θ . As a result, longer pre-crossover periods with more eventsare desirable to better estimate θ Additionally, as τ approaches zero, the covariate value forthe original vaccine arm ( τ + s ) is very close to the covariate value for the original placeboarm s . Little variation in covariate values makes estimation of the associated regression slopemore diﬃcult. Thus the beneﬁt of a longer pre-crossover period persists in estimation of θ .in addition to helping to estimate θ . We note that there is nothing really special about thiskind of cancellation. Consider a Cox regression with two covariates; a treatment indicator anda gender indicator. Even if all the women eventually dropout or have events, we continue toaccrue information about the eﬀect of treatment, provided we still have men at risk. We now develop this approach for the more realistic setting of a staggered entry trial andconsider more general models for vaccine eﬃcacy over time. Let t ≥ i, i ∈ , . . . , N , the data,4 og h a z a r d TimePlacebocrossover τ s τ + s Participant i (Placebo arm)Participant j (Vaccine arm) V E ( s ) V E ( s ) Figure 1: Log hazard for two study participants: i , who is initially given placebo (orangeline), and j who is initially given vaccine (green line). Vaccine eﬃcacy wanes (i.e. log hazardincreases) as a function of time since vaccination. At time τ , participant i is given vaccine andfollows the same eﬃcacy proﬁle as participant j . The ’baseline’ hazard function for originalplacebo participant i is λ ( t ) = h ( t ) prior to crossover and λ ( t ) = h ( t ) exp { θ + θ ( t − τ ) } after crossover. In this ﬁgure, h ( t ) is constant, but this is not required.5 τ ( e ) i , τ ( v ) i , T i , C i , Z i , X i (cid:17) , consist of the times of study entry and vaccination, τ ( e ) i and τ ( v ) i , with τ ( v ) i ≥ τ ( e ) i > , , the time to symptomatic COVID-19 or end-of-followup, T i = min( Y i , C i )with Y i the true but possibly unobserved event time, treatment assignment Z i , and baselinecovariates X i . By convention, we take τ ( v ) i to be greater than the study duration if diseaseoccurs prior to vaccination. We also deﬁne a time-dependent vaccination indicator, Z i ( t ) = I .The hazard for participant i is h i ( t ) = (cid:40) , t ≤ τ ( e ) i ,h ( t ) exp (cid:104) Z i ( t ) f (cid:16) t − τ ( v ) i ; θ (cid:17) + X (cid:48) β (cid:105) , t > τ ( e ) i , (4)where h ( t ) is the arbitrary ‘reference’ hazard for a placebo group, θ a vector of parametersgoverning vaccine eﬃcacy over time, X a vector of baseline covariates and β a vector of pa-rameters. We calculate vaccine eﬃcacy at time s post–vaccination as one minus the ratio ofvaccine to placebo hazards, i.e., V E ( s ) = 1 − exp[ f ( s ; θ )] . The model encompasses standard trials with parallel arms in which case τ ( v ) i = ∞ for placebovolunteers and placebo-crossover trials in which case τ ( v ) i > τ ( e ) i for participants on the placeboarm. Following crossover of all placebo subjects, Z i ( t ) = 1 for all study participants, hence thehazard ratio for any pair of subjects with the same X is only a function of the diﬀerence intheir times since vaccination: h i ( t ) h j ( t ) = exp { f ( t − τ ( v ) i ; θ ) } exp { f ( t − τ ( v ) j ; θ ) } . This ratio is 1 for a constant VE model and so following crossover, there is no additionalinformation about a constant VE, just as there is no additional information about θ in the log-linear decay model. This diﬀers from the standard parallel arms trial where such informationaccrues throughout follow-up.While it is standard to have the time index for the Cox model be time since randomization,for trials with infectious diseases where risk can seasonally wax and wane, or explode duringan outbreak, calendar time is a more natural index for the Cox model as speciﬁed in (4). It ismore natural because aligning the data on study entry distorts the risk set in the Cox partiallikelihood at each event time. This is diagrammed in Figure 2, where participant k , who is stillat risk at t i , falls out of the risk set after we align the data on study entry. By the same token,participant j is erroneously introduced into the risk set.Suppose participant i acquires disease at calendar time t i after being on study for a period s i = t i − τ ( e ) i (Panels A and B) and we use model (4) with calendar time index (Panel C).Setting aside baseline covariates, the partial likelihood contribution at calendar time t i is h ( t i ) exp { Z i ( t i ) f ( t i − τ ( v ) i ; θ ) } (cid:80) j ∈ R ( t i ) h ( t i ) exp { Z j ( t i ) f ( t i − τ ( v ) j ; θ ) } and the baseline hazards cancel out as they should.Now, suppose we align participants on their times of study entry. The event of person i atcalendar time t i is at study time s i and the associated study time risk set is (cid:101) R ( s i ). The calendar6 A) Participant histories, calendar timeCalendar time ijk t i (B) Participant histories, time on studyTime on study ijk s i (C) Baseline hazard, calendar time B a s e li n e h a z a r d Calendar time (D) Baseline hazard, time on study B a s e li n e h a z a r d Time on study

Figure 2: Participant histories and baseline hazards when the data are indexed in calendar timeor aligned on times of study entry. The true data generating mechanism is indexed in calendartime. (A) vs. (B): Aligning the data on study entry changes the risk set as k falls out of therisk set at i ’s event time and j is incorrectly introduced into the risk set. (C) vs. (D): Baselinehazards are no longer proportional after the data are aligned on study entry.7ime for participant i is t i = s i + τ ( e ) i , whereas for a generic participant j it is a diﬀerent calendartime t j = s i + τ ( e ) j . Since the hazard truly depends on calendar time, the partial likelihoodcontribution at study time s i is h ( τ ei + s i ) exp { Z i ( τ ei + s i ) f ( s ∗ i ( s i ); θ ) } (cid:80) j ∈ ˜ R ( s i ) h ( τ ej + s i ) exp { Z j ( τ ej + s j ) f ( s ∗ j ( s i ); θ ) } where s ∗ j ( s i ) is the time since vaccination at study time s i for person j . With alignment on timesince study entry, the baseline hazards do not cancel out and a partial likelihood contributionwhich assumes they do will be mis-speciﬁed. The log-linear and piecewise-constant forms of VE(s) discussed above are simple and usefulto understand behavior of the model and estimation. However, the form of waning vaccineeﬃcacy can be hard to anticipate for new vaccines and high constant eﬃcacy followed by aquick or smooth decay is possible as are other shapes. It is thus appealing to model f ( · )semi-parametrically e.g. using penalized cubic P-splines (Eilers and Marx, 1996, Wood, 2017,Perperoglou et al., 2019). Let P L ( t ; k , δ ) denote a P-spline basis of degree δ = 3 with L basisterms and vector of knot locations k , and let γ be a vector of coeﬃcients, with γ reserved forthe log-hazard ratio immediately following vaccination. The hazard for participant i is h i ( t ) = (cid:40) , t ≤ τ ( e ) i ,h ( t ) exp (cid:16) Z i ( t ) { γ + (cid:80) L(cid:96) =1 γ (cid:96) P (cid:96) ( t − τ ( v ) i ; k , δ ) } (cid:17) , t > τ ( e ) i . (5)In practice, we center the decay component estimated by the P-spline at zero to ensure identiﬁ-ability of γ . Note that we need to evaluate the hazard for each participant at every event time,not merely at the time when a person experiences their own event (Therneau et al., 2017).Splines can be implemented in the SAS procedure

PROC PHREG and in R using the survival package (Therneau, 2020). The latter provides users with a convenient summary method forthe linear and non–linear spline eﬀects, which is useful for testing for non-linearity in the decayproﬁle. Our development up to now has implicitly started counting cases immediately after the ﬁrstdose of vaccine. In practice vaccine trials often use a per-protocol primary analysis that forgoescounting disease cases until after the immunization schedule is complete, e.g. seven days postlast dose. Such an analysis better evaluates the full beneﬁt of immunization. For such analyses,we need to symmetrically avoid counting cases in both arms during the second immunizationperiod even if it is counterfactual, that is if volunteers randomized to vaccine are unblinded andnot immunized. To achieve this symmetry, a ‘blackout’ period of length ∆ can be deﬁned bythe hazard function h ( t ) { − I ( t ∈ [ τ ( x ) , τ ( x ) + ∆] } , where τ ( x ) is the start of the crossover orunblinding for an individual, and ∆ the time from ﬁrst dose to when cases are counted. Theconsequence is to deﬁne discontinuous pre and post crossover risk intervals for volunteers whocomplete crossover without having an event. Volunteers who have an event before or duringcrossover have a single risk interval which ends in an event or censoring, respectively.8lacebo crossover might happen in a blinded or unblinded (open label) manner. Blindedcrossover is preferred to avoid potentially diﬀerential risk behavior as the recently unblindedvolunteers originally randomized to vaccine, who now know they are protected, might forgorisk avoidance behavior Follmann et al. (2020). With open label crossover such diﬀerentialbehavior could cause a spurious waning eﬃcacy in the period immediately following unblinding.One complex approach to address potential bias from unblinded crossover would be to usecovariate adjustment and stratiﬁcation. Let W denote a vector of covariates measured at orpost unblinding that predict or describe risk behavior. While clinical trials typically avoiduse of post-baseline variables for adjustments, in the open label setting such adjustment mayameliorate bias. Once an individual is unblinded, a new hazard function applies. We illustrateusing log-linear decay: h ∗ ( t ) = λ ∗ ( t ) exp (cid:8) θ ( t − τ ( v ) ) + W (cid:48) β (cid:9) , (6)where λ ∗ ( t ) is the new baseline hazard for the original placebo arm in this new open label milieu.Because crossover of all subjects cannot occur at the same time, there will be a crossoverinterlude during which the placebo volunteers become vaccinated. Thus at calendar time t during the crossover interlude, the expanding unblinded cohort would use hazard h ∗ ( t ) givenby (6) while the dwindling blinded cohort would use hazard h ( t ) given by (2). Following thecrossover interlude (6) would apply to all and at some point, the term exp( W β ) might notbe needed if the volunteers in the two arms behave similarly. This construction is a form oftime-dependent stratiﬁcation.A simpler way to address open label crossover bias is to deﬁne a blackout period of length∆ ∗ such that behavior is presumed to be similar after the end of the period. Similarity shouldhappen at some point as all trial volunteers will know they are vaccinated and protected. Asabove, time dependent stratiﬁcation would make sense with diﬀerent baseline hazards beforeunblinding and after the crossover blackout period. To be speciﬁc, for the log-linear decaymodel we would have h ( t ) = λ ( t ) exp { Z ( t )[ θ + θ ( t − τ ( v ) )] } prior to unblinding and h ∗ ( t ) = λ ∗ ( t ) exp { θ ( t − τ ( v ) ) } at time ∆ ∗ post unblinding.If the vaccine eﬃcacy dropped substantially during a black-out period, later estimates ofvaccine eﬃcacy might be compromised. As an extreme example, suppose all volunteers enroll atthe same time, and all are blacked out during τ, τ + ∆ ∗ which is exactly when vaccine eﬃcacydrops. Then the estimated vaccine eﬃcacy curve pre and post crossover would incorrectlyappear constant. In practice this scenario can be avoided with a staggered entry trial byexploiting the induced variation in time since vaccination at any calendar time. To illustratewhat not to do, suppose that enrollment took 2 months, crossover took 2 months and thecrossover order was in exact sync with the enrollment order. Then all would be crossed over atsome time τ since randomization and all blacked out for the period τ, τ + ∆ ∗ . To minimize theproblem, crossover could occur in reverse order with the ﬁrst enrollees being crossed over last.Logistical considerations and placebo volunteers’ sense of fairness could also come into play. To recover a vaccine eﬃcacy curve under a standard trial with no crossover requires that thevolunteers in each arm remain similar over time and that the external environment remainsimilar over time. • Volunteers in each arm remain similar over time.

This can be violated if there is dif-ferential dropout in the two arms and dropout is related to underlying risk of disease.9elatedly, unobserved heterogeneity in risk can result in diﬀerential culling by infectionof the vaccine and placebo groups. Thus after a while, the remaining placebo arm vol-unteers tend to be a less risky group than the remaining vaccine arm volunteers and thevaccine eﬃcacy can appear to decrease, see Lipsitch (2019), Durham et al. (1998), Aalenet al. (2015). COVID-19 trials with 30,000 or more enrolled and perhaps 200-1000 casesover follow-up, such bias may be small. Of course one can can explicitly model the het-erogeneity (Kanaan and Farrington, 2002). Such methods are beyond the scope of thispaper. • Study environment similar.

The proportional hazards model allows for the attack rateto change with time. But if the pathogen mutates to a form that is resistant to vaccineeﬀects, eﬃcacy may appear to wane. Another possibility is if human behavior changes insuch a way that the vaccine is less eﬀective. For example, if there is less mask wearing inthe community over the study, the viral inoculum at infection may increase over the studyand overwhelm the immune response for later cases. Vaccines may work less well againstlarger inoculums and thus vaccine eﬃcacy might appear to wane. For viral mutation,analyses could be run separately for diﬀerent major strains provided they occur bothprior and post crossover. More elaborate methods could also be developed to addressviral mutation, but are beyond the scope of this paper.The only additional assumption that is required to recover the vaccine eﬃcacy proﬁle underplacebo crossover is that the eﬀect of vaccination be the same no matter when the vaccine isgiven. Interestingly this is a common assumption; for vaccine trials with staggered entry itis implicitly assumed that the vaccine eﬃcacy for early enrollees is the same as for the lateenrollees.

In this section, we explore how placebo crossover, the dynamics of durability, and the baselinehazard aﬀect our estimates of vaccine eﬃcacy and durability. Since several COVID-19 vaccinetrials are powered to accrue 150 cases and follow all volunteers for 2 years, we evaluate threediﬀerent designs: i) crossover at 150 cases, ii)crossover at 1 year and iii) a standard parallelarm trial (Moderna, 2020). We consider two settings for vaccine dynamics: constant VE of75%, and VE waning linearly on the log-hazard scale from 85% to 35% over 1.5 years. Inthe crossover scenarios, placebo arm volunteers cross over during a four week interlude. Foreach of the 6 settings we simulated 10,000 trials. Each trial enrolled 3,000 participants in a1:1 randomization with linear accrual of participants over an initial three month period andfollowed participants for two years post enrollment. While COVID-19 trials are larger, weevaluated 3,000 participants to lessen our computational burden. The baseline hazard waspiecewise-constant and calibrated to yield an average of 50, 75, 50, and 25 cases per threemonth period in the placebo arm in the ﬁrst year, and either the same or half the year one caserates in the second study year. The data were analyzed in each simulation using the log-lineardecay model, (2), and the P-spline model, (5).The simulations demonstrate that we can accurately estimate VE(s) and the change inVE in all simulation settings using both the log-linear and P-spline model (Tables 1 and 4).Coverage probabilities of 95% conﬁdence intervals were near their nominal levels or somewhatconservative. The P-spline model performs similarly to the log-linear model except for the10stimates at year 2 where the variance becomes notably larger. Initiating crossover at one yearresulted in an average accrual of 44% more cases prior to crossover compared with trials thatinitiated crossover at 150 cases. We found that this improved the precision of our estimates forall quantities of interest. One way to quantify the relative performance of placebo crossover andparallel arm trials is by the ratio of empirical variances. We focus on the empirical variancesof estimates obtained with the log-linear model in the constant VE(s) setting in Table 1. Thecross at 150:cross at 1 year variance ratios for (cid:99)

VE( s ) are 0.051/0.029=1.8, 2.4, 2.5, and 2.4 at0 . , . , . , and 2.0 years. This underscores the potential beneﬁt of additional case accrualduring the pre-crossover period leading into the second year when the baseline hazard washalved. We next compare the crossover at one year design to a standard parallel trial usingthe log-linear model. This comparison is more of a benchmark as a standard trial is may notbe ethically possible following vaccine approval. The analogous empirical variance ratios forcross at 1 year compared to a standard trial are 0.029/0.022=1.3, 2.5, 2.3 and 2.0 respectively.Results are broadly similar for the P-spline model and for waning vaccine eﬃcacy.In Tables 2 and 5 we provide estimates of the intercept and linear trend of the vaccine eﬃcacyproﬁle for the scenarios where the baseline hazards was the same or halved in year 2, respectively.All estimates have negligible bias and good coverage. For the constant VE scenario and log-linear model, the variance ratios for cross at 150 vs cross at 1 year are 0.051/0.040=1.3 and0.192/0.084=2.3 for the intercept and slope, respectively. Thus there is a big advantage in slopeestimation with delayed crossover. When we compare crossover at 1 year to a standard trial,the variance ratios are .040/.052=0.8 and .087/.065=1.3 respectively. Interestingly, crossoverimproves the intercept estimate as during the crossover interlude, the newly vaccinated placebovolunteers contribute additional information about the intercept. The P-spline and log-linearmodel have similar empirical variances for the slope but the log-linear model has about halfthe empirical variance of the P-spline model for the intercept. Finally, for the constant vaccineeﬃcacy scenario we compared the intercept estimates to a constant VE model (top half of Table2). Under crossover, the empirical variance was modestly improved under this model comparedto the log-linear model. Conclusions are broadly similar for the waning vaccine eﬃcacy scenario. A design question is how estimation eﬃciency varies with the length of the crossover interlude.To explore this design question, we did additional simulations where we evaluated a standardparallel trial of 2 years, a trial where all placebo participants are crossed over at 1 year, and atrial where the times of vaccination for all volunteers were uniformly distributed over 2 years.The baseline hazard was constant over the 2 year period. Under the constant VE(s) scenario,the empirical variances for the intercept term were 0.051, 0.035, 0.031, respectively while thevariances for the slope were 0.039, 0.039, and 0.034 respectively (Table 6). This suggests alonger crossover interlude is somewhat better for estimation of the intercept and the slope.

Unobserved heterogeneity in the risk of disease can lead to bias in estimates of VE(s) andcomplicate the task of separating time-varying eﬃcacy from increased removal of the riskierindividuals from the placebo arm (Balan and Putter, 2020). We simulated placebo crossoverand parallel arm trials with 30,000 participants and gamma distributed frailties with mean one,and variance equal to either one or four. Crossover trials initiated vaccination of the placebo11 og(

V E ( s )) log( V E ( s )) − log( V E (0))

Design Model Time Bias Emp. Var. Covg. Bias Emp. Var. Covg.True vaccine eﬃcacy constant at 75%

Cross at 150 cases log-linear 0.5 -0.013 0.051 0.948 0.002 0.048 0.952 τ x = 0 . ± .

05 1.0 -0.011 0.146 0.951 0.004 0.192 0.9521.5 -0.009 0.338 0.953 0.006 0.433 0.9522.0 -0.007 0.626 0.953 0.008 0.769 0.952P-spline 0.5 -0.016 0.067 0.967 0.006 0.142 0.9711.0 -0.017 0.194 0.965 0.004 0.288 0.9591.5 -0.014 0.369 0.958 0.008 0.460 0.9562.0 -0.015 0.951 0.974 0.006 1.043 0.972Cross at 1 year log-linear 0.5 -0.009 0.029 0.949 0.001 0.022 0.953 N x = 216 ±

13 1.0 -0.009 0.061 0.953 0.001 0.087 0.9531.5 -0.008 0.137 0.952 0.002 0.195 0.9532.0 -0.007 0.256 0.953 0.002 0.347 0.953P-spline 0.5 -0.013 0.041 0.982 0.005 0.146 0.9781.0 -0.014 0.084 0.976 0.004 0.178 0.9691.5 -0.010 0.160 0.969 0.008 0.240 0.9722.0 -0.024 0.671 0.987 -0.006 0.786 0.983Parallel trial log-linear 0.5 -0.010 0.022 0.948 -0.001 0.016 0.9521.0 -0.010 0.024 0.951 -0.001 0.065 0.9521.5 -0.011 0.059 0.949 -0.002 0.145 0.9522.0 -0.011 0.125 0.949 -0.002 0.258 0.952P-spline 0.5 -0.012 0.039 0.981 0.014 0.173 0.9791.0 -0.013 0.056 0.981 0.012 0.208 0.9671.5 -0.024 0.073 0.980 0.002 0.194 0.9742.0 -0.071 0.447 0.981 -0.045 0.560 0.981

Vaccine eﬃcacy wanes from 85% to 35% over 1.5 years

Cross at 150 cases log-linear 0.5 -0.014 0.051 0.951 0.003 0.031 0.953 τ x = 0 . ± .

05 1.0 -0.010 0.108 0.951 0.007 0.123 0.9531.5 -0.007 0.225 0.951 0.010 0.276 0.9532.0 -0.004 0.405 0.950 0.014 0.491 0.953P-spline 0.5 -0.015 0.067 0.964 0.011 0.143 0.9771.0 -0.008 0.162 0.965 0.018 0.277 0.9641.5 0.000 0.247 0.960 0.027 0.359 0.9602.0 0.005 0.475 0.962 0.031 0.587 0.962Cross at 1 year log-linear 0.5 -0.010 0.031 0.950 0.004 0.017 0.949 N x = 211 ±

13 1.0 -0.006 0.053 0.952 0.008 0.066 0.9491.5 -0.001 0.107 0.950 0.013 0.149 0.9492.0 0.003 0.195 0.951 0.017 0.265 0.949P-spline 0.5 -0.013 0.039 0.976 0.014 0.142 0.9831.0 -0.005 0.072 0.973 0.022 0.182 0.9701.5 0.008 0.119 0.966 0.035 0.208 0.9732.0 0.016 0.352 0.977 0.043 0.469 0.976Parallel trial log-linear 0.5 -0.010 0.024 0.948 0.004 0.012 0.9511.0 -0.006 0.016 0.946 0.008 0.048 0.9511.5 -0.002 0.031 0.951 0.012 0.107 0.9512.0 0.002 0.071 0.950 0.015 0.191 0.951P-spline 0.5 -0.012 0.036 0.982 0.022 0.164 0.9841.0 -0.007 0.038 0.983 0.027 0.213 0.9701.5 -0.004 0.039 0.980 0.030 0.182 0.9752.0 -0.006 0.203 0.979 0.028 0.337 0.976

Table 1: Summary statistics for estimates of vaccine eﬃcacy and change in vaccine eﬃcacy forsimulated trials where the baseline hazard in year two was half the baseline hazard in year one.The log-linear and P-spline models correspond to (2) and (5), respectively. The average time ofcrossover (in years), τ x , and the average number of events at crossover, N x , along with standarddeviations beneath the crossover grouping in the design column. Time is given in years sincestudy initiation. 12 ntercept Linear trendBias Emp. Var. Covg. Bias Emp. Var. Covg.Vaccine eﬃcacy constant at 75% Cross at 150 cases Constant VE -0.012 0.039 0.952 — — —log-linear -0.015 0.051 0.951 0.004 0.192 0.952P-spline -0.022 0.086 0.973 0.005 0.188 0.956Cross at 1 year Constant VE -0.008 0.028 0.950 — — —log-linear -0.010 0.040 0.950 0.001 0.087 0.953P-spline -0.018 0.104 0.978 0.001 0.084 0.959Parallel trial Constant VE -0.005 0.018 0.949 — — —log-linear -0.009 0.052 0.951 -0.001 0.065 0.952P-spline -0.026 0.126 0.974 0.006 0.063 0.955

Vaccine eﬃcacy wanes from 85% to 35% over 1.5 years

Cross at 150 cases log-linear -0.017 0.056 0.951 0.007 0.123 0.953P-spline -0.027 0.102 0.975 -0.002 0.121 0.955Cross at 1 year log-linear -0.014 0.043 0.952 0.008 0.066 0.949P-spline -0.027 0.116 0.981 -0.003 0.065 0.951Parallel trial log-linear -0.014 0.057 0.950 0.008 0.048 0.951P-spline -0.034 0.145 0.978 0.003 0.047 0.952

Table 2: Empirical variance and coverage for estimates of the intercept and linear trend invaccine eﬃcacy under the log–linear model, (2), and semi–parametric model, (5). Here, thetime–varying baseline hazard in year two was half the baseline hazard in year one.arm at one year. The baseline hazard was constant and calibrated to yield either 50 or 300 casesper six month period on the placebo arm, and VE(s) was either constant or waned linearly onthe log hazard scale, as before. The frailty distributions in the original placebo and vaccine armsat the end of followup were more similar in the placebo crossover trials than in the standardparallel trials (Table 8). In the low baseline hazard scenario, where the dominant contributionto a participant’s propensity for disease was their underlying frailty, placebo crossover trialsyielded less biased estimates of VE(s) relative to the standard parallel design (Tables 9 and 10).In the high baseline hazard scenario, the common baseline hazard dominated heterogeneity inthe frailty distribution, and in this setting the bias in VE(s) estimates under placebo crossoverwas comparable to the bias that was observed with parallel trials. Higher baseline hazardsresulted in more diﬀerential culling of the risk set and increased bias in estimates of VE(s).In practice, we could mitigate biases resulting from heterogeneity in the frailty distribution byadjusting for known risk factors of disease and stratifying our analyses by site or geographicregion.

In this section, we present detailed analyses of two simulated COVID-19 vaccine trials wherethe true vaccine eﬃcacy proﬁle was either constant at 75% or waned linearly on the log–hazardscale from 85% to 35% over 1.5 years. Each trial enrolled 30,000 participants with linear accrualover three months in a 1:1 randomization to vaccine or placebo. The baseline attack rate waspiecewise constant with changepoints every three months, and was calibrated to yield 50, 75,50, and 25 cases on the placebo arm in each period in the ﬁrst year, and half the expected13umber of cases per period on the placebo arm in year two. In this example, interim analysesare planned at 150 cases, which ultimately result in crossover at the end of year one followingevaluation and vetting of the eﬃcacy by a regulatory agency. Placebo crossover occurs over afour week period. Each volunteer was followed for a total of two years.The two simulated trials are summarized in Table 3. In the constant VE scenario, the trialreached 150 cases in 222 days, and recorded 223 events by the one year crossover time–pointand 273 events, overall. In the waning VE scenario, the trial reached 150 cases in 242 days, andrecorded 199 events by the one year crossover time–point and 292 events, overall. The case splitacross treatment arms declined from roughly 83% on the placebo arm at the 150 case interimlook to 76.2% at the completion of the study in the constant VE scenario, and from 82% to65.2% in the waning VE scenario. The overall VE estimate at the one year crossover, estimatedusing a proportional hazards model without adjustment for time since vaccination, was 76.6%(95% CI: 67.2%, 83.3%) in the constant VE(s) case and 80.1% (95% CI: 71.0%, 86.3%) in thewaning VE(s) scenario (the true geometric mean VE(s) to one year post–vaccination is 75.6%).Point estimates for VE(0) and the linear trend in log VE(s) from the log–linear and P-spline models were close in both scenarios, although conﬁdence intervals in the P-spline modelswere wider. The estimated eﬃcacy proﬁles obtained with both methods were in agreementand recovered the true VE proﬁle (Figure 3). The P-spline estimates had wider point-wiseconﬁdence intervals, but the inﬂation in the variance appears to be fairly modest for the periodspanning the end of study enrollment through, roughly, year 1.5 post–vaccination. In practice,both the log–linear decay model and the P-spline model could be used to test a hypothesisof time–varying VE(s). This is straightforwardly carried out for the log–linear model via alikelihood ratio test (LRT) for the slope parameter in (2) where the test statistic is comparedto a chi–square distribution with one degree of freedom. For the P-spline models, we performa likelihood ratio test for whether all of the P-spline basis coeﬃcients are jointly equal to zero,and compare the test statistic to a chi–square distribution with 3.1 degrees of freedom (theeﬀective degrees of freedom for the P-splines in our models). In the waning VE(s) scenario,we resoundingly reject the null hypothesis of time-homogeneous VE(s), and fail to reject thenull in the constant VE(s) scenario (Table 3) at the end of 2 years of follow-up. The beneﬁtof an additional year of follow-up past crossover is substantial in terms of evaluating the longterm durability of the vaccine. Under the waning VE scenario, the p-value for testing the nullhypothesis of constant VE is close to 0.05 for both the log-linear and P-spline models but quiteconvincing at 2 years. These simulated examples show that for both the waning and constantVE scenarios, accurate inference about the behavior of the vaccine eﬃcacy over time can berecovered.

Knowing the durability of vaccine induced protection is a key question in vaccine development,especially for COVID-19 vaccines. With placebo volunteers being oﬀered vaccine before longterm follow-up has completed, it seems the ability to assess durability is lost. In this paper wedemonstrated that placebo controlled vaccine eﬃcacy can be accurately assessed long after theplacebo group has disappeared. Our method is the familiar Cox proportional hazards model.To reﬂect seasonal or outbreak variation in the attack rate, we used calendar time as thetime index. To recover diﬀerent vaccine eﬃcacy curves we speciﬁed ﬂexible models for vaccineeﬃcacy decay. If crossover occurs quickly, the early VE(s) will remain poorly estimated, nomatter how many post-crossover cases occur which will impact later estimates of VE(s). Our14 rue VE Constant at 75% True VE Wanes from 85% to 35%Time of 150 case interim look

Day 222 Day 242

Case split by original arm at interim look

Placebo = 124, Vaccine = 26 Placebo = 131, Vaccine = 19 at 1 year crossover

Placebo = 181, Vaccine = 42 Placebo = 166, Vaccine = 33 at 2 year follow-up

Placebo = 208, Vaccine = 65 Placebo = 191, Vaccine = 101

Estimates at interim look log-linear model

Intercept -0.84 (95% CI: -1.6, -0.09) -2.16 (95% CI: -3.17, -1.16)Linear trend -3.06 (95% CI: -6.05, -0.07) 0.81 (95% CI: -2.13, 3.75)LRT for time-varying VE 0.039 0.589

P-spline model

Intercept -1.41 (95% CI: -2.77, -0.05) -2.43 (95% CI: -4.23, -0.62)Linear trend -3.02 (95% CI: -6.36, 0.32) 0.8 (95% CI: -2.13, 3.73)LRT for time-varying VE 0.037 0.605

Estimates at 1 year crossover log-linear model

Intercept -1.34 (95% CI: -1.98, -0.7) -2.36 (95% CI: -3.17, -1.55)Linear trend -0.29 (95% CI: -1.74, 1.17) 1.8 (95% CI: 0.2, 3.4)LRT for time-varying VE 0.698 0.027

P-spline model

Intercept -1.14 (95% CI: -2.17, -0.1) -2.26 (95% CI: -3.68, -0.83)Linear trend -0.28 (95% CI: -1.66, 1.1) 1.8 (95% CI: 0.22, 3.37)LRT for time-varying VE 0.133 0.054

Estimates at 2 year follow-up log-linear model

Intercept -1.37 (95% CI: -1.77, -0.97) -2.19 (95% CI: -2.62, -1.75)Linear trend -0.13 (95% CI: -0.7, 0.43) 1.33 (95% CI: 0.82, 1.83)LRT for time-varying VE 0.641 ¡0.001

P-spline model

Intercept -1.33 (95% CI: -2.09, -0.58) -2.26 (95% CI: -3.15, -1.36)Linear trend -0.13 (95% CI: -0.7, 0.44) 1.28 (95% CI: 0.77, 1.8)LRT for time-varying VE 0.178 ¡0.001

Table 3: Summary of example trials simulated under constant and waning vaccine eﬃcacy (VE)at times of interim analysis and placebo crossover. The intercept and linear trend correspondto the immediate eﬀect of vaccination and the time–trend for VE(s) under model (2), and thetrue values were set to θ = − .

39 and θ = 0 in the constant VE scenario, and θ = − . θ = 0 .

98 in the waning VE setting. The likelihood ratio test (LRT) for waning VE comparesmodels (2) and (5) to a PH model without adjustment for time since vaccination.15

0, 0.25] (0.25, 0.5] (0.5, 0.75] (0.75, 1] (1, 1.25] (1.25, 1.5] (1.5, 1.75] (1.75, 2] C on s t a n t V E W a n i ng V E Years since vaccination N u m b e r o f e v e n t s Arm

Placebo Delayed vaccination Immediate vaccination

Number of events by treatment arm log−linear P−spline C on s t a n t V E W a n i ng V E Years since vaccination V acc i n e e ff i cac y Vaccine efficacy vs. years since vaccination

Figure 3: (Top) Number of events per quarter by arm. The delayed vaccination arm consists ofthe original placebo participants after they have been crossed over. (Bottom) Vaccine eﬃcacy(VE) as a function of time since vaccination. Dashed lines are the true VE(s), solid curves andribbons are pointwise means and 95% conﬁdence intervals.16esults point out the advantages of delaying crossover and longer crossover interludes to helpimprove the estimation. We also provide suggestions on how to manage the crossover interlude,discuss how to perform per-protocol analyses, and discuss solutions for open label crossoverwhere risk behavior might increase for the recently unblinded vaccinees.Future work could develop random eﬀects or frailty type models. Such models seem espe-cially suited for settings with a relatively high attack rate. Our work focused on the settingwhere the disease event was continuously monitored. An important endpoint in vaccine trialsis seroconversion, or the development of antibodies to the pathogen of interest. Seroconversionis typically measured rarely which results in an interval censored endpoint. The extension ofthese methods to interval censored data will be important. Finally, an emerging issue in thecontext of COVID-19 is how to estimate vaccine durability in the presence of emerging strains.This could be addressed within a competing risks framework in which times to ﬁrst acquiringdisease due to diﬀerent strains are treated as competing events. The framework developedin this paper are easily extended to this setting since our models could straightforwardly beapplied to the subdistribution hazards in a competing risks model.

Acknowledgments

This work utilized the computational resources of the NIH HPC Biowulf computing cluster(http://hpc.nih.gov). The authors would like to thank Keith Lumbard for help with simulations,as well as Michael Fay, Anastasios Tsiatis, and Larry Molton for helpful discussions regardingthis work.

Supplementary Materials

Code demonstrating how to simulate data and reproduce the results presented in this manuscriptis made available in the following GitHub repository: https://github.com/fintzij/ve_placebo_crossover . A minimal working example with R and SAS code is provided in theappendix.

References

Odd O Aalen, Richard J Cook, and Kjetil Røysland. Does Cox analysis of a randomized survivalstudy yield a causal treatment eﬀect?

Lifetime Data Analysis , 21:579–593, 2015.Theodor A Balan and Hein Putter. A tutorial on frailty models.

Statistical Methods in MedicalResearch , 29:3424–3454, 2020.Pyoeng Gyun Choe, Chang Kyung Kang, Hyeon Jeong Suh, Jongtak Jung, Kyoung-Ho Song,Ji Hwan Bang, Eu Suk Kim, Hong Bin Kim, Sang Won Park, Nam Joong Kim, et al.Waning antibody responses in asymptomatic and symptomatic sars-cov-2 infection.

EmergingInfectious Diseases , 27:327–329, 2020.David R Cox. Regression models and life-tables.

Journal of the Royal Statistical Society: SeriesB (Methodological) , 34(2):187–202, 1972. 17 Kathryn Durham, Ira M Longini Jr, M Elizabeth Halloran, John D Clemens, Nizam Azhar,and Malla Rao. Estimation of vaccine eﬃcacy in the presence of waning: application tocholera vaccines.

American Journal of Epidemiology , 147(10):948–959, 1998.Paul HC Eilers and Brian D Marx. Flexible smoothing with B-splines and penalties.

StatisticalScience , pages 89–102, 1996.DA Follmann, J Fintzi, et al. Assessing durability of vaccine eﬀect following blinded crossoverin COVID-19 vaccine eﬃcacy trials. medRxiv , 2020.Dean Follmann. Augmented designs to assess immune response in vaccine trials.

Biometrics ,62(4):1161–1169, 2006.Ana Maria Henao-Restrepo, Anton Camacho, Ira M Longini, Conall H Watson, W JohnEdmunds, Matthias Egger, Miles W Carroll, Natalie E Dean, Ibrahima Diatta, MoussaDoumbia, et al. Eﬃcacy and eﬀectiveness of an rvsv-vectored vaccine in preventing ebolavirus disease: ﬁnal results from the guinea ring vaccination, open-label, cluster-randomisedtrial (ebola ¸ca suﬃt!).

The Lancet , 389(10068):505–518, 2017.Mona N Kanaan and C Paddy Farrington. Estimation of waning vaccine eﬃcacy.

Journal ofthe American Statistical Association , 97(458):389–397, 2002.Marc Lipsitch. Challenges of vaccine eﬀectiveness and waning studies.

Clinical InfectiousDiseases , 68:1631––1633, 2019.Boikanyo Makubate and Stephen Senn. Planning and analysis of cross-over trials in infertility.

Statistics in Medicine , 29:3203–3210, 2010.Moderna. A phase 3, randomized, stratiﬁed, observer-blind, placebo-controlled study to eval-uate the eﬃcacy, safety, and immunogenicity of mrna-1273 sars-cov-2 vaccine in adultsaged 18 years and older. , 2020.Martha Nason and Dean Follmann. Design and analysis of crossover trials for absorbing binaryendpoints.

Biometrics , 66(3):958–965, 2010.Aris Perperoglou, Willi Sauerbrei, Michal Abrahamowicz, and Matthias Schmid. A review ofspline function procedures in r.

BMC Medical Research Methodology , 19, 2019.Gregory A Poland, Inna G Ovsyannikova, and Richard B Kennedy. SARS-CoV-2 immunity:Review and applications to phase 3 vaccine candidates.

The Lancet , 2020.Saranya Sridhar, Alexander Luedtke, Edith Langevin, Ming Zhu, Matthew Bonaparte, TifanyMachabert, Stephen Savarino, Betzana Zambrano, Annick Moureau, Alena Khromava, et al.Eﬀect of dengue serostatus on dengue vaccine safety and eﬃcacy.

New England Journal ofMedicine , 379(4):327–340, 2018.Terry Therneau, Cindy Crowson, and Elizabeth Atkinson. Using time dependent covariatesand time dependent coeﬃcients in the cox model.

Survival Vignettes , 2017.Terry M Therneau.

A Package for Survival Analysis in R , 2020. URL https://CRAN.R-project.org/package=survival . R package version 3.2-7.18erry M Therneau and Patricia M Grambsch.

Modeling survival data: Extending the Cox model .Springer Science & Business Media, 2013.David Wendler, Jorge Ochoa, Joseph Millum, Christine Grady, and Holly A Taylor. COVID-19vaccine trial ethics once we have eﬃcacious vaccines.

Science , 2020.Simon N Wood.

Generalized additive models: an introduction with R . CRC press, 2017.World Health Organization. Placebo-controlled trials of covid-19 vaccines—why we still needthem.

New England Journal of Medicine , 2020.19

Illustrative Computer Code

In this section we present a minimal example with

SAS and R code to estimate a log-linearwaning eﬃcacy curve. In this trial, a per-protocol analysis is used and disease cases are countedstarting 30 days after the ﬁrst dose. Calendar time is relative to 1 January 2021, so the volunteerdepicted in the ﬁrst row was dosed on 5 January 2021. Thus during the crossover period, casesare not counted for 30 days. A blinded crossover is assumed so the same placebo baseline hazardapplies throughout the study without time-dependent stratiﬁcation. If an open label crossoverwere pursued, an additional ’stratum’ variable could be created that identiﬁed whether a riskinterval was blinded or open label. The ’stratum’ variable would then be used as a stratiﬁcationvariable in the proportional hazards model.The variables below are id = subject identfierarm = original randomization arm 1=vaccine 0=placeboentry = A.1 SAS code

DATA new;INPUT id arm entry Xstart Xend eventtime status;CARDS; 1 0 35 65 95 370 02 1 45 80 110 400 03 0 55 150 . 150 04 1 60 170 200 310 15 0 65 . . 80 16 1 80 190 210 410 07 0 85 215 245 420 08 1 70 . . 90 1;/* did not start crossover: 1 line for initial period: event in period 1*/DATA data1; SET new;IF Xstart=. AND Xend = .;period=1; start=entry; stop=eventtime; event=status;/* did not finish crossover: 1 line for initial period: censor at X start*/DATA data2; SET new;IF Xstart^=. AND Xend = .;period=1; start=entry; stop=Xstart; event=0;/* did pass crossover so output for initial period*/DATA data3; SET new;IF Xend ^= .;period=1; start=entry; stop=Xstart; event=0; * did pass crossover so output for post-crossover period*/DATA data4; SET new;IF Xend ^= .;period=2; start=Xend; stop=eventtime; event=status;/* Merge the 4 datasets and mark the vaccination time and status*/DATA newest;SET data1 data2 data3 data4;IF arm=0 THEN DO; timevact=Xend; IF period=1 THEN vac=0; ELSE vac=1; END;IF arm=1 THEN DO; timevact=entry; vac=1; END;/* Run the code with a log-linear VE decay, fix so no missing vactime*/PROC PHREG DATA=newest;MODEL (start, stop)*event( 0 )= vac vactime/ itprint rl ;vactime=vac*(stop-timevact);IF vac=0 THEN vactime=0;RUN; The parameter estimates are (ˆ θ , ˆ θ ) = (-0.90472, 0.02288). A.2 R code library(survival)library(tidyverse) ase_when(status == 0 ~ rep(0,2),status == 1 &(is.na(Xstart) || eventtime <= Xstart) ~ c(1, NA),status == 1 & eventtime >= Xend ~ c(0,1)),vacc_status =case_when(arm == 1 ~ rep(1, 2),arm == 0 ~ c(0, 1)),vacc_time =case_when(arm == 1 ~ rep(entry, 2),arm == 0 & (is.na(Xstart) | is.na(Xend)) ~ Inf,arm == 0 & !is.na(Xstart) & !is.na(Xend) ~ rep(Xend, 2))) %>%drop_na() %>%as.data.frame() The parameter estimates are (ˆ θ , ˆ θ ) = (-0.90473, 0.02288). B Additional Simulation Results

B.1 Trials with Year Two Baseline Hazard Equal to Year One Base-line Hazard

In this section we provide simulation results where the baseline hazard function in year 2 is thesame as in year 1. Table 4 is the analogue to Table 1 and Table 5 is the analogue to Table 2.

B.2 Comparing Uniform Crossover, Crossover at One Year, andParallel Trials

This section contains results for a set of idealized trials with constant baseline hazard, instanta-neous enrollment and crossover, and constant VE. Trials either crossed placebo participants tothe vaccine arm at one year, uniformly over the two year study period, or never (correspondingto a standard parallel arm design). Table 6 provides the results for the vaccine eﬃcacy overtime while Table 7 provides the parameter estimates.

B.3 Frailty simulation results

This section presents results from simulated trials in where participants were heterogeneousin their baseline hazards. Simulation was analogous to trials simulated elsewhere in this22 og(

V E ( s )) log( V E ( s )) − log( V E (0))

Design Model Time Bias Emp. Var. Covg. Bias Emp. Var. Covg.True vaccine eﬃcacy constant at 75%

Cross at 150 cases log-linear 0.5 -0.012 0.046 0.950 0.001 0.028 0.952 τ x = 0 . ± .

05 1.0 -0.011 0.102 0.951 0.002 0.111 0.9521.5 -0.010 0.213 0.950 0.003 0.249 0.9522.0 -0.010 0.379 0.951 0.003 0.443 0.952P-spline 0.5 -0.015 0.064 0.962 0.003 0.118 0.9751.0 -0.014 0.148 0.960 0.004 0.229 0.9601.5 -0.013 0.226 0.957 0.005 0.303 0.9572.0 -0.019 0.461 0.966 -0.001 0.536 0.963Cross at 1 year log-linear 0.5 -0.008 0.027 0.950 -0.001 0.011 0.953 N x = 216 ±

13 1.0 -0.009 0.043 0.951 -0.002 0.042 0.9531.5 -0.010 0.080 0.953 -0.002 0.095 0.9532.0 -0.011 0.138 0.953 -0.003 0.170 0.953P-spline 0.5 -0.010 0.035 0.976 0.005 0.101 0.9811.0 -0.010 0.056 0.971 0.005 0.117 0.9651.5 -0.013 0.087 0.969 0.002 0.132 0.9772.0 -0.021 0.271 0.977 -0.005 0.336 0.977Parallel trial log-linear 0.5 -0.008 0.020 0.948 0.000 0.010 0.9501.0 -0.008 0.013 0.950 0.000 0.040 0.9501.5 -0.008 0.027 0.949 0.000 0.089 0.9502.0 -0.008 0.060 0.949 0.000 0.158 0.950P-spline 0.5 -0.010 0.033 0.981 0.012 0.134 0.9831.0 -0.007 0.033 0.983 0.016 0.164 0.9681.5 -0.012 0.035 0.981 0.011 0.142 0.9752.0 -0.039 0.171 0.977 -0.016 0.274 0.976

Vaccine eﬃcacy wanes from 85% to 35% over 1.5 years

Cross at 150 cases log-linear 0.5 -0.015 0.048 0.952 0.001 0.016 0.952 τ x = 0 . ± .

05 1.0 -0.014 0.078 0.949 0.002 0.064 0.9521.5 -0.013 0.140 0.950 0.004 0.144 0.9522.0 -0.012 0.234 0.949 0.005 0.255 0.952P-spline 0.5 -0.015 0.063 0.963 0.009 0.108 0.9811.0 -0.010 0.127 0.962 0.015 0.208 0.9681.5 -0.008 0.167 0.953 0.017 0.243 0.9652.0 -0.007 0.254 0.959 0.018 0.329 0.962Cross at 1 year log-linear 0.5 -0.009 0.030 0.948 0.001 0.008 0.953 N x = 211 ±

13 1.0 -0.008 0.039 0.952 0.003 0.031 0.9531.5 -0.006 0.065 0.952 0.004 0.071 0.9532.0 -0.005 0.106 0.954 0.006 0.126 0.953P-spline 0.5 -0.010 0.034 0.972 0.012 0.089 0.9871.0 -0.006 0.051 0.968 0.016 0.116 0.9691.5 -0.002 0.070 0.966 0.020 0.118 0.9802.0 0.001 0.154 0.973 0.023 0.218 0.974Parallel trial log-linear 0.5 -0.008 0.022 0.952 0.004 0.008 0.9481.0 -0.004 0.010 0.947 0.007 0.031 0.9481.5 -0.001 0.014 0.947 0.011 0.071 0.9482.0 0.003 0.034 0.949 0.014 0.125 0.948P-spline 0.5 -0.011 0.030 0.980 0.019 0.119 0.9901.0 -0.004 0.023 0.983 0.026 0.166 0.9731.5 -0.001 0.020 0.978 0.029 0.144 0.9762.0 -0.003 0.076 0.975 0.027 0.197 0.977

Table 4: Summary statistics for estimates of vaccine eﬃcacy (VE) and change in VE for simu-lated trials where the baseline hazard in year two was the same as the baseline hazard in yearone. The log-linear and P-spline models correspond to (2) and (5), respectively. The averagetime of crossover (in years), τ x , and the average number of events at crossover, N x , along withstandard deviations beneath the crossover grouping in the design column. Time is given inyears since study initiation. 23 ntercept Linear trendBias Emp. Var. Covg. Bias Emp. Var. Covg.Vaccine eﬃcacy constant at 75% Cross at 150 cases Constant VE -0.012 0.039 0.951 — — —log-linear -0.013 0.046 0.953 0.002 0.111 0.952P-spline -0.018 0.076 0.970 0.002 0.109 0.954Cross at 1 year Constant VE -0.007 0.027 0.949 — — —log-linear -0.008 0.033 0.952 -0.002 0.042 0.953P-spline -0.015 0.079 0.977 -0.002 0.042 0.956Parallel trial Constant VE -0.004 0.013 0.951 — — —log-linear -0.007 0.046 0.950 0.000 0.040 0.950P-spline -0.023 0.110 0.977 0.002 0.039 0.952

Vaccine eﬃcacy wanes from 85% to 35% over 1.5 years

Cross at 150 cases log-linear -0.016 0.050 0.951 0.002 0.064 0.952P-spline -0.025 0.086 0.975 -0.002 0.063 0.953Cross at 1 year log-linear -0.010 0.036 0.952 0.003 0.031 0.953P-spline -0.022 0.085 0.979 -0.004 0.031 0.955Parallel trial log-linear -0.011 0.049 0.950 0.007 0.031 0.948P-spline -0.030 0.124 0.980 0.003 0.031 0.949

Table 5: Bias, empirical variance, and coverage for estimates of the intercept and linear trendin vaccine eﬃcacy under the log–linear model, (2), and semi–parametric model, (5). Here, thetime–varying baseline hazard in year two the same as the baseline hazard in year one.manuscript, except that each trial consisted of 30,000 participants and the participant levelhazard was (cid:101) h i ( t ) = U i h i ( t ), with h i ( t ) corresponding to either a constant VE or log-linear VEmodel. In both cases the baseline hazard was constant. We considered two settings for thebaseline hazard: a low event rate scenario calibrated to yield 100 cases per year on placebo or ahigh event rate scenario calibrated to yield 600 cases per year on placebo. Participant frailtieswere drawn from a gamma distribution with mean one and a variance of either one (low frailtyvariance scenario) or four (high frailty variance scenario).Tables 8, 9, and 10 present the results. 24 E ( s ) ∆VE ( s ) Model Time Bias Empir. Var. Coverage Bias Empir. Var. CoverageContinuous uniform crossover

Crossover at one year

Parallel trial

Table 6: Summary statistics for estimates of vaccine eﬃcacy (VE) and change in VE for sim-ulated trials in an idealized scenario with constant baseline hazards and either continuouscrossover, instantaneous crossover at one year, or a standard trial. The log-linear and P-splinemodels correspond to (2) and (5), respectively.25 ntercept Linear trendBias Emp. Var. Covg. Bias Emp. Var. Covg.Continuous uniform crossover log-linear -0.008 0.031 0.950 0.007 0.034 0.950P-spline -0.016 0.071 0.973 0.003 0.034 0.952

Crossover at one year log-linear -0.008 0.035 0.953 0.000 0.039 0.947P-spline -0.018 0.099 0.978 -0.002 0.038 0.952

Parallel trial log-linear -0.006 0.051 0.950 -0.001 0.039 0.951P-spline -0.021 0.130 0.977 -0.001 0.039 0.952

Table 7: Empirical variance and coverage for estimates of the intercept and linear trend invaccine eﬃcacy under the log–linear model, (2), and semi–parametric model, (5), in an idealizedscenario with constant baseline hazards and continuous crossover, instantaneous crossover atone year, or a standard trial. 26 railty distribution summary statisticsBaselineHazard FrailtyVariance Design Original arm Mean SD 25%ile 50%ile 75%ileVE constant at 75%

Low Low Cross at 1 year Placebo 0.992 0.992 0.285 0.688 1.375Vaccine 0.997 0.996 0.287 0.691 1.382Parallel trial Placebo 0.987 0.987 0.284 0.684 1.368Vaccine 0.997 0.996 0.287 0.691 1.382High Cross at 1 year Placebo 0.969 1.937 0.010 0.169 1.010Vaccine 0.987 1.973 0.010 0.172 1.028Parallel trial Placebo 0.949 1.897 0.010 0.166 0.989Vaccine 0.987 1.973 0.010 0.172 1.028High Low Cross at 1 year Placebo 0.955 0.955 0.275 0.662 1.323Vaccine 0.980 0.980 0.282 0.680 1.359Parallel trial Placebo 0.926 0.926 0.266 0.642 1.283Vaccine 0.980 0.980 0.282 0.680 1.359High Cross at 1 year Placebo 0.841 1.681 0.009 0.147 0.876Vaccine 0.926 1.851 0.010 0.162 0.965Parallel trial Placebo 0.757 1.514 0.008 0.132 0.790Vaccine 0.926 1.851 0.010 0.162 0.965

VE wanes from 85% to 35% over 1.5 years

Low Low Cross at 1 year Placebo 0.992 0.992 0.285 0.688 1.375Vaccine 0.994 0.994 0.286 0.689 1.378Parallel trial Placebo 0.987 0.987 0.284 0.684 1.368Vaccine 0.994 0.994 0.286 0.689 1.378High Cross at 1 year Placebo 0.968 1.936 0.010 0.169 1.009Vaccine 0.975 1.950 0.010 0.170 1.017Parallel trial Placebo 0.949 1.897 0.010 0.166 0.989Vaccine 0.975 1.950 0.010 0.170 1.017High Low Cross at 1 year Placebo 0.954 0.954 0.274 0.661 1.322Vaccine 0.964 0.964 0.277 0.668 1.337Parallel trial Placebo 0.926 0.926 0.266 0.642 1.283Vaccine 0.964 0.964 0.277 0.668 1.337High Cross at 1 year Placebo 0.838 1.676 0.009 0.146 0.874Vaccine 0.870 1.740 0.009 0.152 0.907Parallel trial Placebo 0.757 1.514 0.008 0.132 0.790Vaccine 0.870 1.740 0.009 0.152 0.907

Table 8: Summary statistics of the frailty distribution of participants still in the risk set at theend of two years of followup. We report geometric means of summary statistics of each frailtydistribution from 10,000 simulated trials. 27 lacebo crossover Parallel trialFrailtyvariance Model Time VE ( s ) ∆VE ( s ) VE ( s ) ∆VE ( s ) Low baseline hazard

Low log-linear 0.5 -0.014 0.000 -0.013 0.0011.0 -0.015 0.000 -0.012 0.0021.5 -0.015 -0.001 -0.011 0.0032.0 -0.015 -0.001 -0.010 0.005P-spline 0.5 -0.017 0.022 -0.017 0.0431.0 -0.022 0.016 -0.017 0.0431.5 -0.020 0.018 -0.019 0.0402.0 -0.032 0.006 -0.046 0.014High log-linear 0.5 -0.009 0.009 -0.007 0.0111.0 0.000 0.019 0.004 0.0221.5 0.010 0.028 0.015 0.0322.0 0.019 0.038 0.026 0.043P-spline 0.5 -0.014 0.021 -0.014 0.0411.0 -0.007 0.028 0.001 0.0571.5 0.003 0.038 0.008 0.0632.0 0.005 0.041 -0.017 0.039

High baseline hazard

Low log-linear 0.5 0.011 0.016 0.012 0.0151.0 0.026 0.031 0.027 0.0291.5 0.042 0.047 0.041 0.0442.0 0.058 0.063 0.056 0.059P-spline 0.5 0.010 0.017 0.010 0.0181.0 0.026 0.033 0.027 0.0361.5 0.042 0.050 0.042 0.0502.0 0.052 0.059 0.046 0.054High log-linear 0.5 0.054 0.054 0.055 0.0501.0 0.108 0.108 0.105 0.1001.5 0.162 0.162 0.154 0.1492.0 0.216 0.216 0.204 0.199P-spline 0.5 0.052 0.054 0.053 0.0571.0 0.108 0.109 0.108 0.1121.5 0.162 0.164 0.154 0.1592.0 0.210 0.211 0.190 0.194

Table 9: Bias of estimates of VE and the decay in VE for trials simulated with constant VE at75% and gamma distributed frailties. The low baseline hazard scenario was calibrated to yieldan average of 50 cases per six month period on the placebo arm, while the high baseline hazardscenario was calibrated to yield 300 cases per six month period. The frailty distribution hadmean one and a variance of either one (low variance) or four (high variance).28 lacebo crossover Parallel trialFrailtyvariance Model Time VE ( s ) ∆VE ( s ) VE ( s ) ∆VE ( s ) Low baseline hazard

Low log-linear 0.5 -0.016 0.005 -0.013 0.0081.0 -0.010 0.011 -0.005 0.0151.5 -0.005 0.016 0.002 0.0232.0 0.001 0.022 0.010 0.030P-spline 0.5 -0.019 0.036 -0.020 0.0581.0 -0.010 0.045 -0.006 0.0711.5 -0.001 0.054 0.000 0.0772.0 0.009 0.064 0.005 0.083High log-linear 0.5 -0.008 0.011 -0.004 0.0131.0 0.003 0.022 0.009 0.0251.5 0.014 0.033 0.022 0.0382.0 0.026 0.044 0.034 0.051P-spline 0.5 -0.012 0.037 -0.014 0.0591.0 0.005 0.054 0.011 0.0831.5 0.020 0.069 0.022 0.0952.0 0.030 0.079 0.022 0.095

High baseline hazard