Survival analysis for AdVerse events with VarYing follow-up times (SAVVY) -- estimation of adverse event risks
Regina Stegherr, Claudia Schmoor, Jan Beyersmann, Kaspar Rufibach, Valentine Jehl, Andreas Brückner, Lewin Eisele, Thomas Künzel, Katrin Kupas, Frank Langer, Friedhelm Leverkus, Anja Loos, Christiane Norenberg, Florian Voss, Tim Friede
aa r X i v : . [ s t a t . A P ] A ug Survival analysis for AdVerse events with VarYing follow-up times(SAVVY) — estimation of adverse event risks
Regina Stegherr , Claudia Schmoor , Jan Beyersmann , ∗ , Kaspar Rufibach , Valentine Jehl ,Andreas Br¨uckner , Lewin Eisele , Thomas K¨unzel , Katrin Kupas , Frank Langer ,Friedhelm Leverkus , Anja Loos , Christiane Norenberg , Florian Voss and Tim Friede August 13, 2020 Institute of Statistics, Ulm University, Ulm, Germany Clinical Trials Unit, Faculty of Medicine and Medical Center, University of Freiburg,Freiburg im Breisgau, Germany F. Hoffmann-La Roche, Basel, Switzerland Novartis Pharma AG, Novartis Pharma AG, Basel, Switzerland Janssen-Cilag GmbH, Neuss, Germany Bristol-Myers-Squibb GmbH & Co. KGaA, Mnchen, Germany Lilly Deutschland GmbH, Bad Homburg, Germany Pfizer, Berlin, Germany Merck KGaA, Darmstadt, Germany Bayer AG, Wuppertal, Germany Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany Department of Medical Statistics, University Medical Center G¨ottingen, G¨ottingen, Ger-many ∗ Corresponding author: Jan Beyersmann, [email protected]
AbstractBackground:
The SAVVY project aims to improve the analyses of adverse event (AE)data in clinical trials through the use of survival techniques appropriately dealing withvarying follow-up times and competing events. Although statistical methodologies haveadvanced, in AE analyses often the incidence proportion, the incidence density, or a non-parametric Kaplan-Meier estimator are used, which either ignore censoring or competingevents. In an empirical study including randomized clinical trials from several sponsororganisations, these potential sources of bias are investigated. The main purpose of theempirical study is to compare the estimators that are typically used in AE analysis to thenon-parametric Aalen-Johansen estimator as the gold-standard. The present paper reportson one-sample findings, while a companion paper considers consequences when comparingsafety between treatment groups.
Methods:
Estimators are compared with descriptive statistics, graphical displays andwith a more formal assessment using a random effects meta-analysis. The influence ofdifferent factors on the size of the bias is investigated in a meta-regression. Comparisonsare conducted at the maximum follow-up time and at earlier evaluation time points. Com-peting events definition does not only include death before AE but also end of follow-up1or AEs due to events possibly related to the disease course or safety of the treatment.
Results:
Ten sponsor organisations provided 17 clinical trials including 186 types of inves-tigated AEs. The one minus Kaplan-Meier estimator was on average about 1.2-fold largerthan the Aalen-Johansen estimator and the probability transform of the incidence densityignoring competing events overestimated the AE probability even more. Leading forcesinfluencing bias were the amount of censoring and of competing events. The presence ofmany competing events in our study decreased the amount of censoring. As a consequence,the average bias using the incidence proportion was less than 5%. Assuming constant haz-ards using incidence densities was hardly an issue provided that competing events wereaccounted for.
Conclusions:
Both the choice of the estimator of the cumulative AE probability and acareful definition of competing events are crucial. There is an urgent need to improvethe guidelines of reporting risks of AEs so that the Kaplan-Meier estimator and the inci-dence proportion are finally replaced by the Aalen-Johansen estimator with an appropriatedefinition of competing events.
Keywords:
Aalen-Johansen estimator, adverse events, competing events, drug safety,incidence proportion, incidence density, Kaplan-Meier estimator
Time-to-event or survival endpoints are common in clinical research [1, 2]. The obser-vation of the event times is typically incomplete as a consequence of censoring, and thestatistical analysis, therefore, requires specialized techniques. This requirement holds forboth the evaluation of efficacy and safety. An important aim of the latter is the estima-tion of the probability of an adverse event (AE) of a specific type within a specific timeinterval, which, in a time-to-first-event analysis, is often done by the incidence proportion,i.e., the number of patients with an observed AE (of a certain type) in a specific timeperiod divided by group size, or the (exposure adjusted) incidence density, which dividesby cumulative patient-time at risk. The worry is that the incidence proportion underesti-mates the cumulative AE probability because it does not account for censoring [3, 4, 5, 6].One minus a Kaplan-Meier estimator counting AEs as the event would account for cen-soring, but not for competing events (CEs) such as death without prior AE. When CEsare present, Kaplan-Meier is commonly used [7, 8] but bound to overestimate the cumula-tive AE probability as the methodology implicitly assumes that every patient experiencesthe AE under consideration, possibly after a CE such as death. The incidence densityalso accounts for censoring but does not estimate a probability. Rather, it estimates theAE hazard assuming it to be time-constant. However, this assumption is not realistic formany drug-related adverse events[9, 10]. The interpretation of the incidence density as anestimator of a hazard is challenging, but the incidence density may be transformed ontothe probability scale; typically, such transformations do not consider CEs [11], althoughextensions are available [12]. 2he concerns above are qualitative. However, the amount of bias, comparing, e.g., theincidence proportion or one minus Kaplan-Meier with the non-parametric gold standard,the Aalen-Johansen estimator [4] accounting for both CEs and censoring will depend onthe specific trial setting. In particular, the relative frequencies of observed AEs, observedCEs and observed censorings add up to 100% at any point in time. The latter two areleading forces influencing bias, and, e.g., the presence of many CEs in a time-to-first-eventanalysis will impact the amount of censoring.The SAVVY project group (Survival analysis for AdVerse events with Varying follow-up times) is a collaborative effort from academia and pharmaceutical industry with the aimto improve the analyses of AE data in clinical trials through the use of survival techniquesthat account for varying follow-up times, censoring and CEs. Here, we report one-sampleresults from an empirical study of an opportunistic sample of randomized clinical trialsfrom several sponsor companies. The aim is to illustrate possible biases when quantifyingabsolute AE risk in single samples including categorization into AE frequency categories.Results when comparing safety between treatment groups in the two-sample case are ina companion paper[13]. Individual trial data analyses were run within the sponsor or-ganisations using SAS and R software provided by the academic project group members.Only aggregated data necessary for meta-analyses were shared and meta-analyses were runcentrally at the academic institutions.
A detailed Statistical Analysis Plan is available elsewhere [14]. Here, we briefly summarizeone-sample estimators and methods of meta-analysis. Properties and estimands of theestimators are discussed elsewhere[14, 6]. We describe in more detail the definition of CEswhich has an immediate consequence on the estimation procedures.
We will consider the following estimators of the cumulative AE probability or ‘AE risk’ ina time-to-first-event analysis. Since both probabilities and the amount of censoring [15]are time-dependent, we will allow for different evaluation times called τ . These evaluationtimes either imposed no restriction, i.e., evaluated the estimators until the maximum follow-up time, or considered the minimum of quantiles of observed times in the two treatmentgroups; the quantiles were 100%, 90%, 60% and 30%. We will report results from ‘ArmE’, denoting the experimental treatment groups. The incidence proportion is IP E ( τ ) = no. of patients w. observed AE on [0 , τ ] in E n E , (1)where n E denotes sample size in group E. This estimator will be called incidence proportion in the following.The AE incidence density is ID E ( τ ) = no. of patients w. observed AE on [0 , τ ] in Epatient-time at risk in E restricted by τ (2)3hich we transform onto the probability scale using1 − exp ( − ID E ( τ ) · τ ) , (3)called probability transform incidence density ignoring CE in the following. The one minusKaplan-Meier estimator only codes observed AEs as an event and censors anything elseon [0 , τ ].An incidence densities analysis accounting for CEs uses the competing incidence density ID E ( τ ) = no. of patients w. observed CE on [0 , τ ] in Epatient-time at risk in E restricted by τ (4)such that we get the following AE-probability estimator ID E ( τ ) ID E ( τ ) + ID E ( τ ) (cid:16) − exp( − τ · [ ID E ( τ ) + ID E ( τ )]) (cid:17) , (5)called probability transform incidence density accounting for CE in the following. Finally,the Aalen-Johansen estimator generalizes (5) to a fully non-parametric procedure anddecomposes the usual one minus Kaplan-Meier estimator of the time-to- any -first-event(AE or competing) into estimators of the cumulative AE probability plus the cumulativeCE probability [4].
The definition of events as ‘competing’ is essential to both the Aalen-Johansen estimatorand the competing incidence density. CEs (or ‘competing risks’) are events that precludethe occurrence or recording of the AE under consideration in a time-to-first-event analysis.One important competing event is death before AE. In addition, any event that would bothbe viewed from a patient perspective as an event of his/her course of disease or treatmentand would stop the recording of the interesting AE will be viewed as a CE. To illustrate,premature discontinuation of study treatment which leads to end of AE recording willbe handled as a CE[16]. Consequently, possibly disease- or safety-related loss to follow-up, withdrawal of consent and discontinuation is handled as a competing event as this istypically related to an event associated with the disease course or therapy.In order to investigate the impact of the definition of CEs, we also investigated a ‘deathonly’ scenario, which only treated death before AE as competing, but not the other CEs.This estimator will be called
Aalen-Johansen (death only) in the following.The data generation mechanism underlying the clinical trials is based on the hazardof the AE, the hazard of the CE, and the distribution of the censoring times, where thehazards are not restricted to be constant[14]. But not all estimators suggested for analysingAEs can adequately deal with all three processes. Table 1 gives an overview whether theestimators account for the three sources of bias, i.e., censoring, no constant hazards, andCEs. The incidence proportion ignores CEs and censoring in the analysis in the same wayas the respective patients are counted in the denominator as if they had been followed forthe entire study period. This is a proper handling of the CEs as it correctly takes intoaccount that an AE cannot occur after the patient had experienced a CE. It is an improperhandling of censoring as it incorrectly implies that an AE could have been observed overthe entire follow-up period, which is not true due to censoring.4able 1: Overview whether the estimators deal with the possible sources of bias.
Accounts for Makes no constant Accounts forcensoring hazard assumption CEsIncidence proportion No Yes YesProbability transform incidence density Yes No (AE Hazard) Noignoring CEs1-Kaplan-Meier Yes Yes NoProbability transform incidence density Yes No (AE and CE Hazard) Yesaccounting for CEsdeath only Aalen-Johansen estimator Yes Yes Yes (Death only)gold-standard Aalen-Johansen estimator Yes Yes Yes
The Aalen-Johansen estimator is the only estimator that is able to deal with all threepotential sources of bias and is therefore considered the gold standard estimator and willserve as a benchmark for comparison of results. In the following, we will use the term biasfor deviations of the estimators from this benchmark estimator and not for the differenceto the true value. This is considered appropriate as the differences of the estimators to theAalen-Johansen estimator converge in probability to the asymptotic bias.
According to the European Commission’s guideline on summary of product characteristics(SmPC)[17] and based on the recommendations of the CIOMS Working Groups III andV[18] the frequency categories of AE risk in the most representative exposure period arerespectively classified as ‘very rare’, ‘rare’, ‘uncommon’, ‘common’ and ‘very common’when found to be < < < < ≥ In the meta-analysis and meta-regression, the ratios of the AE probability estimates ob-tained with the different estimators divided by the AE probability obtained with the gold-standard Aalen-Johansen estimator are considered on the log-scale. The standard errorsof these log-ratios are calculated with a bootstrap to account for within trial dependencies.Then, a normal-normal hierarchical model is fitted and the exponential of the resultingestimate can be interpreted as the average ratio of the two estimators.In a meta-regression it is further investigated which variables impact this average ratio.Therefore, the proportion of censoring, the evaluation time point τ , i.e., the maximal timeto event in years (AE, CE or censoring) observed under the given evaluation time, and thesize of the AE probability estimated by the gold-standard Aalen-Johansen are included ascovariates in a univariable and a multivariable meta-regression. The covariates are centeredin the meta-regression. 5 .000.250.500.751.00 observed AE observed death before AE observed other competing event observed censoring r e l a t i v e f r equen c i e s Figure 1: Relative frequencies of observed events
Ten organisations provided 17 trials including 186 types of AEs (median 8; interquartilerange [3 , , , Panel A of Figure 2 shows box plots of the ratio of the one-sample estimators definedearlier divided by the gold-standard Aalen-Johansen estimator for the maximum follow-up time and one earlier evaluation time chosen as to the 90% quantile. As the incidence6roportion implicitly accounts for CEs (but not for censoring) as explained above, the smallamount of censoring which is a consequence of the high amount of other CEs explains whythe incidence proportion and the Aalen-Johansen estimator are of similar size in manysituations. But it has to be emphasized that in extreme cases an underestimation of up toseventy percent was present.Both one minus Kaplan-Meier and the probability transform incidence density ignoringCE overestimate the AE probability, and this is also true for the Aalen-Johansen estimatorthat only considers death before AE as competing. Interestingly, the probability transformincidence density ignoring CE appears to be worst, while the probability transform inci-dence density accounting for CE performs much better than the other three procedureswhich are clearly biased resulting in extreme overestimation in many situations, up to afactor of five. These biases become less pronounced when looking at earlier evaluationtimes which prevent CEs and censoring after the respective end of evaluation time to entercalculations.
The impact on frequency categories is illustrated in Table 2, where we have exemplarilychosen the maximum follow-up time as most representative exposure period. Some switchesto neighboring categories are detected. The probability transform of the incidence densityignoring CEs derives a higher AE frequency category for 38 types of AEs, and the oneminus Kaplan-Meier estimator for 16 types of AEs. The probability transform of theincidence density accounting for CE obtains a higher category for nine types of AEs butalso a lower category for one type of AE. Here, the definition of the competing eventis again of importance. The death only Aalen-Johansen estimator categorizes 14 types ofAEs to a higher category than the gold-standard Aalen-Johansen estimator. The incidenceproportion derives only two times a different AE frequency category than the gold-standardAalen-Johansen estimator. The good performance of the incidence proportion is closelyconnected to the CE definition, i.e. the maturity of data at the time of the analysis. If inthe comparison to the incidence proportion the Aalen-Johansen (death only) is used insteadof the gold-standard, the category common instead of very common is obtained for 15 typesof AEs and one type of AE is categorized to uncommon using the incidence proportionbut to common using the Aalen-Johansen that only considers death as a competing eventestimator (see last five rows of Table 2).
In a meta-analysis of the log-ratio of the incidence proportion divided by the Aalen-Johansen estimator evaluated at the maximum follow-up time, the average ratio was foundto be 0 .
972 with a 95%-confidence interval of [0 . , . .
097 [1 . , . .
214 [1 . , . .
130 [1 . , . .
170 [1 . , .
0% quantile maximum follow−up timeIP prob trans incid dens ignoring CE 1−KM prob trans incid dens acc for CE AJE death only IP prob trans incid dens ignoring CE 1−KM prob trans incid dens acc for CE AJE death only0.51.02.04.0 R a t i o A
90% quantile maximum follow−up time0.25 0.50 1.00 2.00 4.00 0.25 0.50 1.00 2.00 4.000.02.55.07.5
Ratio D en s i t y Incidence proportion probability transform incidence density ignoring CE 1−Kaplan−Meier probability transform incidence density accounting for CE Aalen−Johasen death only B Figure 2: Panel A: Accepting the all event definition of competing events as gold-standard,the ratios of one-sample estimator divided by gold-standard Aalen-Johansen estimatorare displayed. Two different evaluation times are displayed. The left boxplots are theresults for the estimators being evaluated at the 90% quantile and the right boxplots arethe results of the evaluation time with no restriction, i.e., at the end of follow-up. Thefollowing abbreviations are used for the estimators: incidence proportion (IP), probabilitytransform of the incidence density ignoring CE (prob trans incid dens ignoring CE), oneminus Kaplan-Meier (1-KM), probability transform of the incidence density accounting forCE (prob trans incid dens acc for CE), death only Aalen-Johansen estimator (AJE deathonly). Panel B: Plots of the kernel density estimates of the ratios of the AE probability ofthe estimators divided by the gold-standard Aalen-Johansen estimator.8able 2: The impact of the choice of one-sample estimator on AE frequency categoriesfor the maximal follow-up time. Deviations from the Aalen-Johansen estimator are thenon-diagonal entries. The first rows consider the gold-standard Aalen-Johansen estimatorand the last five rows the comparison of the incidence proportion and the Aalen-Johansen(death only) estimator. Diagonal entries are set in bold face. Non-diagonal zeros areomitted from the display. gold-standard Aalen-Johansenvery rare rare uncommon common very common i n c i d e n ce p r o p o r t i o n very rare rare uncommon common p r o b a b ili t y t r a n s f o r m i n c i d e n ce d e n s i t y i g n o r i n g C E very rare rare uncommon common 3 very common 35 - K a p l a n - M e i e r very rare rare uncommon common 2 very common 14 p r o b a b ili t y t r a n s f o r m i n c i d e n ce d e n s i t y a cc o un t i n g f o r C E very rare rare uncommon common 2 A a l e n - J o h a n s e nd e a t h o n l y very rare rare uncommon common 1 very common 13 Aalen-Johansen death onlyvery rare rare uncommon common very common i n c i d e n ce p r o p o r t i o n very rare rare uncommon The influence of different factors on the size of the bias was investigated in univariable andmultivariable meta-regression. The percentage of censoring, the size of the AE probabilityestimated by the gold-standard Aalen-Johansen, and the evaluation time point were con-sidered and included as covariates in the meta-regression models. In Tables 3 results areexemplarily displayed when evaluating estimators using the maximum follow-up time asevaluation time.Covariates were centered, i.e., the row ‘average risk ratio’ contains the average ratio ofthe estimator of interest and the Aalen-Johansen estimator if the covariate takes its mean.Those means were 31.5% censoring, 52.6% competing events, 971 days maximum follow-uptime, and a size of the AE probability estimated by the Aalen-Johansen estimator of 0.165.For example, for the comparison of the incidence proportion and the Aalen-Johansen esti-mator the estimated average ratio of the two estimators in a trial with 31.5% censoring is0.974. Furthermore, in a trial with 10% more censoring the estimated average ratio is in-creased by the factor 0.999 but the unit value is contained in the corresponding confidenceinterval. So, the amount of underestimation by the incidence proportion which does notaccount for censoring slightly increases with an increasing amount of censoring. Consider-ing the estimators that either do not (probability transform incidence density ignoring CE,one minus Kaplan-Meier) or only partially (Aalen-Johansen (death only)) account for CE,one finds that both a higher amount of censoring and a higher AE probability decrease theamount of overestimation. The explanation goes hand in hand with the increased averageratios for higher amounts of CEs as these estimators do account for censoring, and in-creased censoring will, in general, lead to a smaller amount of observed competing events.Likewise, a higher AE probability will, in general, lead to a smaller probability of CEs.These results are confirmed by the multivariable meta-regression.The amount of CEsis not included in the multivariable meta-regression as there is a strong dependence withthe amount of censoring and the size of the AE probability estimated by the gold-standardAalen-Johansen estimator.
Even though on average the incidence proportion does well in this sample of selected AEsthe possible variability must not be neglected.Considering the plots of the kernel density estimates of the ratios of the different esti-mators of the AE probability in Panel B of Figure 2, the ratio of incidence proportion andthe gold standard is most often close to one. But there are also peaks of the estimatedkernel density at smaller ratios indicating that the estimators are not always compara-ble. For the ratio of the probability transform of the incidence density accounting for CEsand the gold standard most values are slightly larger than one at the maximum follow-uptime. At the earlier follow-up time according to the 90% quantile the peak is closer toone with less variability present. The ratios of the one minus Kaplan-Meier and deathonly Aalen-Johansen estimator to the gold standard have few values close to one. For themajority of AE types these two estimators largely overestimate the AE probability. Both10able 3: Univariable and multivariable meta-regression. Average risk ratio and multiplicative change by 10% increase in censoring,10% increase in CEs, one additional year of observation or a 0.1 greater AE probabiltiy. Thereby, the size of the AE probability isestimated by the gold-standard Aalen-Johansen estimator. probability transform probability transformincidence incidence density 1-Kaplan-Meier incidence density Aalen-Johansenproportion ignoring CE accounting for CE death only
Univariable meta-regression % censoring average risk ratio 0.974 [0.964; 0.983] 2.308 [2.217; 2.403] 1.257 [1.226; 1.288] 1.101 [1.086; 1.116] 1.201 [1.175; 1.228]10% increase 0.999 [0.996; 1.002] 0.916 [0.903; 0.929] 0.973 [0.965; 0.980] 1.026 [1.021; 1.031] 0.979 [0.972; 0.986]%CEs average risk ratio 0.976 [0.969; 0.984] 2.191 [2.141; 2.243] 1.240 [1.214; 1.267] 1.124 [1.109; 1.140] 1.190 [1.168; 1.213]10% increase 1.003 [1.000; 1.006] 1.127 [1.117; 1.138] 1.036 [1.028; 1.045] 0.977 [0.971; 0.982] 1.029 [1.021; 1.036]size of AE average risk ratio 0.973 [0.966; 0.980] 2.105 [2.005; 2.210] 1.215 [1.185; 1.246] 1.131 [1.112; 1.150] 1.171 [1.146; 1.197]probability increase of 0.1 0.996 [0.992; 1.000] 0.954 [0.930; 0.980] 0.995 [0.982; 1.008] 0.993 [0.984; 1.003] 0.993 [0.982; 1.004]evaluation average risk ratio 0.972 [0.964; 0.980] 2.094 [1.994; 2.199] 1.214 [1.184; 1.244] 1.131 [1.112; 1.150] 1.170 [1.145; 1.195]time one additional year 0.993 [0.987; 1.000] 1.054 [1.021; 1.087] 1.015 [0.999; 1.033] 0.996 [0.986; 1.007] 1.013 [0.998; 1.027]
Multivariable meta-regression average risk ratio 0.976 [0.966; 0.985] 2.407 [2.348; 2.468] 1.277 [1.246; 1.308] 1.097 [1.082; 1.113] 1.218 [1.192; 1.245]%censoring 10% increase 0.997 [0.994; 1.000] 0.890 [0.882; 0.899] 0.965 [0.957; 0.973] 1.028 [1.023; 1.034] 0.972 [0.965; 0.979]size of AE probability increase of 0.1 0.995 [0.991; 0.999] 0.893 [0.882; 0.904] 0.972 [0.961; 0.983] 1.008 [1.000; 1.016] 0.975 [0.965; 0.985]evaluation time one additional year 0.994 [0.988; 1.000] 1.036 [1.021; 1.051] 1.014 [1.000; 1.027] 1.003 [0.995; 1.011] 1.011 [0.999; 1.024] lots illustrate pronounced variability for probability transform of the incidence densityignoring CE. A closer look is taken at single AE types in trials for which extreme under- or overestimationis present, i.e. extreme values in the right panel boxplots in Figure 2. For example, thelargest underestimation of the incidence proportion is for an AE which is only observed forthree out of 274 patients. This corresponds to an incidence proportion of 0.011. However,an Aalen-Johansen estimate of 0.037 is obtained. This corresponds to a ratio of 0.294 witha 95% confidence interval of [0.084; 1.025], where the confidence interval has been obtainedusing the bootstrap. As 27.0% of the observations for this type of AE are censored, theamount of censoring is below the mean censoring rate of all types of AEs. Moreover, forthis type of AE 17 deaths (6.2%) and 180 other CEs (65.7%) are observed. This type of AEdoes not only contribute the largest underestimation of the incidence proportion but alsoof the probability density of the incidence density accounting for CEs for which an estimateof 0.012 is obtained (ratio of 0.329 with 95% CI [0.094; 1.148]). Furthermore, for this typeof AE the largest overestimation of the one minus Kaplan-Meier estimator (estimate of0.208 and ratio of 5.575 [1.813; 17.147]) and the Aalen-Johansen (death only) estimator(estimate of 0.190 and ratio of 5.090 [1.815; 14.276]) is calculated. These impressive ratiosare partly due to the small value of the gold-standard Aalen-Johansen estimate, but westress that also the difference between one minus Kaplan-Meier and the gold standard isquite pronounced (0.208 vs. 0.037).In another extreme example with a higher AE probability the obtained incidence pro-portion is 0.059 and the Aalen-Johansen estimate is 0.109 (ratio 0.534 [0.529; 0.540]). Forthis type of AE many censored observations are present (63.3% of 752 patients). Moreover,44 AEs are observed, 137 deaths (18.2%), and 95 other CEs (12.6%). Here, due to the highamount of censoring one can expect in advance the incidence proportion not doing well.
To explicitly investigate the role of censoring without the methodological complication ofCEs the composite endpoint combining AEs and CEs is considered, which results in asingle endpoint survival setting. As a consequence the gold standard in this setting isthe one minus Kaplan-Meier estimator which is compared to the incidence proportion (seeFigure 3).In the composite endpoint analysis the underestimation of the incidence proportion ismore pronounced than in the analyses of the AE probability presented above. One reasonis that even in the presence of censoring for the one minus Kaplan-Meier estimator thetype of the last event is most important. If the last event is an AE or CE the one minusKaplan-Meier estimator is equal to one, even though censoring has been observed at earlierfollow-up times. The incidence proportion is only equal to one if no censoring is observed.12 .250.501.00 30% quantile 60% quantile 90% quantile end of follow up minimum of both groups end of follow up R a t i o Figure 3: Ratios of incidence proportion of the composite endpoint combining AE and CEdivided by composite 1-Kaplan-Meier estimator
The starting point of the present investigation was that AE analyses in terms of AEprobabilities, an important aspect of drug safety evaluations, should account for the timeunder observation and censoring if the latter is imposed by the data at hand. However,while primary efficacy endpoints often are time-to-event composites such as progression-freeand overall survival which every patient experiences, although possibly after study closure,the occurrence of AE (of a certain type) usually is subject to CEs such as death before AE.Survival analyses accounting for CEs is methodologically well established, but practical uselacks behind [7, 19]. Failure to account for censoring (e.g., incidence proportion) or CEs(e.g., one minus Kaplan-Meier) will generally lead to biased quantification of absolute AErisk, but the amount of bias has been unclear.In this study, we confirmed that one minus Kaplan-Meier should not be used to estimatethe cumulative AE probability, as it is bound to overestimate as a consequence of ignoringCEs. Interestingly, we found that the incidence proportion performed surprisingly wellwhen compared to the gold-standard Aalen-Johansen estimator. One reason may be a highamount of CEs before possible censoring. But not only the proportion of censoring but alsothe timing of the censoring are relevant as the first example of the single trials described indetail showed. This example led to the largest bias although the proportion of censoringwas below average. The observed proportion and timing of censoring in this project area consequence of twelve out of 17 trials being from oncology in which compared to othertherapeutic areas AEs and CEs are often observed early during follow-up and censoringoccurs much later. We also note that the observed constellation of CEs and censoringresults from a sample of completed trials after the final analysis had been performed. Theproportion of censoring may be different at the time point of a safety interim analysis oftrials which are typically presented to data safety monitoring boards. For this situation13he different estimators may behave differently [20].This finding must not be interpreted as a carte blanche to use AE incidence proportionsbased on censored data. In fact, comparable performance of incidence proportion andAalen-Johansen did not only rely on a high amount of CEs, but in particular on a carefuldefinition of what kind of events constitute a competing event as outlined earlier. In otherwords, use of the incidence proportion implicitly assumes events to be competing as definedin the methods section. This aspect is somewhat subtle, but nicely highlighted by the factthat an analysis accounting for both censoring and only death as CEs (Aalen-Johansen(death only)) also led to overestimating AE risk, although the bias was not as pronouncedas for one minus Kaplan-Meier.We also found that previous worries about the constant hazard assumption underly-ing incidence densities were justified in that a simple transformation of the AE incidencedensity onto probabilities (probability transform incidence density ignoring CE) performedworst. However, accounting for competing events in an analysis that parametrically mim-icked the non-parametric Aalen-Johansen performed better than both one minus Kaplan-Meier and Aalen-Johansen (death only); in this sense, ignoring CEs appeared to be worsethan assuming constant hazards in our empirical study.Most of the results were shown for the situation where the maximum follow-up time werechosen as evaluation time. When looking at earlier evaluation times defined by quantilesof the observed times, the resulting bias was, in general, less pronounced, due to a reducedrelative frequency of competing events and of censoring (see figure 1). We regarded thesituation of including all data up to the maximum follow-up time as the most relevant asthis is the usual practice.Our empirical study does have shortcomings. Using an opportunistic sample of ran-domized clinical trials from several sponsor companies, we have been able to illustratepossible consequences when quantifying AE risk in a manner that ignores censoring orCEs. However, being opportunistic, the sample does not lend itself to straightforwardgeneralizations. More than two thirds of the trials were from oncology. These came witha high amount of CEs, which, in turn, led to comparable performances of incidence pro-portion and Aalen-Johansen. The vast majority of AEs were classified as ‘common’ or‘very common’, and AEs were also heterogeneous, coming from different therapeutic areasand were not necessarily treatment-related. These shortcomings were to be anticipatedfrom an opportunistic sample, but it was our aim in this ‘real-world’ setting to investigateand demonstrate which biases can occur in practice. These shortcomings do also impactthe comparison of adverse event risks between treatment groups[13]. The observed resultsmotivate future empirical investigations on how to quantify AE risk with the aim of bettergeneralizability. As a further point, it was not the aim of this investigation to accuratelyestimate AE probabilities, but to compare different estimators. Our present study does notallow for a meaningful comparison of results in different diseases. Follow-up investigationsconcentrating on trials in specific disease areas are planned.A methodological restriction is that we have focused our investigation on an analysiswhich mostly does not consider AEs after treatment discontinuation due to e.g. diseaseprogression in oncology. This restriction is, in particular, due to trial design when treat-ment discontinuation leads to stopping AE recording after a pre-specified time period. Inaddition, in oncology, it is not uncommon that patients enter a different clinical trial afterprogression which further complicates matters. Another methodological restriction is that14e did not consider recurrent AEs, but only first events. It is desirable to consider morecomplex event histories, also beyond time-to-first-event. However, any such considerationwill need to account for CEs (and censoring), and our investigation therefore also informsmethodological considerations for analysing such more complex event histories. In otherwords, both AEs after treatment discontinuation and recurrent AEs will still be subject tocompeting events.Our recommendation is to ‘play it safe’ when analysing AE risk in a time-to-first-event analysis and neither hope for a small amount nor a large amount of CEs nor afavorable interplay of the distributions of the times of AEs, CEs, and censorings. In theformer case, one minus Kaplan-Meier might work well, while in the latter two cases theincidence proportion might do so. Playing it safe, we recommend using the Aalen-Johansenestimator which equals one minus Kaplan-Meier in the absence of CEs and equals theincidence proportion in the absence of censoring and does low for presence of both CEs andcensoring. Guidelines for reporting AEs should, therefore, advocate the Aalen-Johansenestimator instead of incidence proportion, incidence density and one minus Kaplan-Meier.
Data and code
Individual trial data analyses were run within the sponsor organizations using SAS and Rsoftware provided by the academic project group members. Only aggregated data neces-sary for meta-analyses were shared and meta-analyses were run centrally at the academicinstitutions.A markdown file providing exemplary code to compute all the estimators discussed inthis paper for a given dataset is available on github: https://github.com/numbersman77/AEprobs .The corresponding output is available as html file: https://numbersman77.github.io/AEprobs/SAVVY_AEprobs.html . Funding
Not applicable.
Declaration of conflicting interests
KR and TK are employees of F. Hoffmann-La Roche (Basel, Switzerland). VJ and AB areemployees of Novartis Pharma AG (Basel, Switzerland). LE, KK, FLa, FLe, AL, CN andVF are employees of Janssen-Cilag GmbH (Neuss, Germany), Bristol-Myers-Squibb GmbH& Co. KGaA (M¨unchen, Germany), Lilly Deutschland GmbH (Bad Homburg, Germany),Pfizer Deutschland (Berlin, Germany), Merck KGaA (Darmstadt, Germany), Bayer AG(Wuppertal, Germany) and Boehringer Ingelheim Pharma GmbH & Co. KG (Ingelheim,Germany), respectively. TF has received personal fees for consultancies (including datamonitoring committees) from Bayer, Boehringer Ingelheim, Janssen, Novartis and Roche,all outside the submitted work. JB has received personal fees for consultancy from Pfizer,all outside the submitted work. CS has received personal fees for consultancies (includingdata monitoring committees) from Novartis and Roche, all outside the submitted work.The companies mentioned contributed data to the empirical study. RS has declared noconflict of interest. 15 eferences [1] Horton N and Switzer S. Statistical methods in the journal.
N Engl J Med
N Engl J Med
Drug Inf J
Pharm Stat
Pharm Stat
Pharm Stat
J Clin Epidemiol
J Clin Epidemiol
StatMed
Trials
Analysis of Incidence Rates . Chapman and Hall/CRC, 2019.[12] Bonofiglio F, Beyersmann J, Schumacher M et al. Meta-analysis for aggregated sur-vival data with competing risks: a parametric approach using cumulative incidencefunctions.
Res Synth Methods
Biometrical Journal
The Lancet
Textbook of Clinical Trials in Oncology: A StatisticalPerspective (eds Halabi S, Michiels S), chapter The analysis of adverse events inrandomized clinical trials . Chapman and Hall/CRC, 2019.[17] EMA. A guideline on summary of product characteristics (smpc), 2009. URL https://ec.europa.eu/health/sites/health/files/files/eudralex/vol-2/c/smpc_guideline_rev2_en.pdf .[18] III CWG and V. Guidelines for preparing core clinical-safety information on drugs.Geneva:Council for International Organizations of Medical Sciences, 1999.[19] Phillips R and Cornelius V. Understanding current practice, identifying barriers andexploring priorities for adverse event analysis in randomised controlled trials: an on-line, cross-sectional survey of statisticians from academia and industry.
BMJ Open