[PDF] Human mortality at extreme age

Abstract

We use a combination of extreme value theory, survival analysis and computer-intensive methods to analyze the mortality of Italian and French semi-supercentenarians for whom there are validated records. After accounting for the effects of the sampling frame, there appears to be a constant rate of mortality beyond age 108 years and no difference between countries and cohorts. These findings are consistent with previous work based on the International Database on Longevity and suggest that any physical upper bound for humans is so large that it is unlikely to be approached. There is no evidence of differences in survival between women and men after age 108 in the Italian data and the International Database on Longevity; however survival is lower for men in the French data.

Full PDF

HHuman mortality at extreme age

Léo R. Belzile * , Anthony C. Davison † , Holger Rootzén and Dmitrii Zholud ‡ Abstract

We use a combination of extreme value statistics, survival analysis and computer-intensive methods toanalyze the mortality of Italian and French semi-supercentenarians. After accounting for the effects of thesampling frame, we conclude that constant force of mortality beyond 108 years describes the data well andthere is no evidence of differences between countries and cohorts. These ﬁndings are consistent with use ofa Gompertz model and with previous analysis of the International Database on Longevity and suggest thatany physical upper bound for the human lifespan is so large that it is unlikely to be approached. Power cal-culations suggest that it is unlikely that there could be an upper bound below 130 years. There is no evidenceof differences in survival between women and men after age 108 in the Italian data and the InternationalDatabase on Longevity, but survival is lower for men in the French data.

Solid empirical understanding of human mortality at extreme age is important as one basis for researchaimed at ﬁnding a cure for ageing (described, e.g., in Vijg & Campisi, 2008), and is also an element in thehotly-debated and societally-important question whether the current increase in expected lifespan in developedcountries, of about three months per year since at least 1840 (Oeppen & Vaupel, 2002), can continue. The limitto human lifespan, if any, also attracts considerable media attention (e.g. Guarino, 2018; Saplakoglu, 2018;Hoad, 2019).Einmahl et al. Einmahl et al. (2019) analyse data on mortality in the Netherlands and conclude that “thereindeed is a ﬁnite upper limit to the lifespan” for both men and women. Their dataset, provided by StatisticsNetherlands and consisting of about 285,000 “Dutch residents, born in the Netherlands, who died in the years1986–2015 at a minimum age of 92 years”, had not undergone any validation procedure. As might be expected,the vast majority died before their 100th birthdays: 99.5% lived 107 or fewer years, and 97% died at age 101 oryounger. The cohorts for analysis were taken to be the calendar years of death, and truncation of lifetimes wasnot taken into account. Hanayama & Sibuya Hanayama & Sibuya (2016) also ﬁnd an upper lifespan limit of 123years for Japanese persons aged 100 or more, by ﬁtting a generalized Pareto distribution to one-year and four-year birth cohorts, taking into account the sampling scheme. In both cases, any plateauing of mortality may beconfounded with the increase in hazard between ages 100 and 105, and this would invalidate the extrapolationto extreme age.The validity of conclusions about mortality at extreme age depends crucially on the quality of the dataon which they are based (Gavrilov & Gavrilova, 2019), as age misrepresentation for the very old is commoneven in countries with otherwise reliable statistical data (Poulain, 2010). Motivated by this, demographic re-searchers from 15 countries contribute to the International Database on Longevity (

IDL ), which currently con-tains 860 individually-validated life lengths of supercentenarians who had reached age 110 or more on 28September 2019; the data, which cover different time periods for different countries, can be obtained from * HEC Montréal, Department of Decision Sciences, 3000, chemin de la Côte-Sainte-Catherine, Montréal (QC), Canada H3T 2A7( [email protected] ) † EPFL-SB-MATH-STAT, Station 8, École polytechnique fédérale de Lausanne, 1015 Lausanne, Switzerland ‡ Department of Mathematical Sciences, Chalmers and Gothenburg University, Chalmers Tvärgata 3, 41296 Göteborg, Sweden a r X i v : . [ s t a t . A P ] O c t Belzile et al. . For some countries the

IDL has recently been extended to include dataon semi-supercentenarians, i.e., people who lived to an age of at least 105. As of October 2019, it containednew French data on 9612 semi-supercentenarians and 241 supercentenarians, which we call the

France 2019 data; all these supercentenarians but only some of the semi-supercentenarians were validated.A 2016 version of the

IDL ( IDL 2016 ) was analysed by Gampe (Gampe, 2010) and Rootzén & Zholud(Rootzén & Zholud, 2017), the latter with extensive discussion (Rootzén & Zholud, 2018). Both papers madeallowance for the sampling scheme, and in particular for the truncation of lifetimes that it entails. They con-cluded that there is no indication of an increase in mortality for ages above 110 years, and hence no indicationof a ﬁnite upper limit to the human lifespan. Rootzén & Zholud (Rootzén & Zholud, 2017) also found no dif-ferences in mortality between men and women or between populations from regions and countries as varied asJapan, the USA or Europe. These conclusions are striking, but the small size of the

IDL and the lack of balancebetween the subgroups limit the statistical power available to detect such differences.The Italian National Institute of Statistics (

ISTAT ) has recently produced an important new database (ISTAT,2018) containing individually validated birth dates and survival times in days of all persons in Italy who wereat least 105 years old at some point in the period from 1 January 2009 to 31 December 2015. Using advancedsurvival analysis tools, Barbi et al. (Barbi et al., 2018) found that death rates in this dataset “reach or closelyapproach a plateau after age 105” and found a small but statistically signiﬁcant cohort effect.Data analysis must take into account the sampling scheme underlying such data. The

ISTAT lifetimes areleft-truncated because only individuals who attain an age of at least 105 years during the sampling period areincluded, and they are right-censored because the date of death of persons still alive in 2016 is unknown; seeFigure 1. The right-censored lifetimes are shown by the tick marks at the right side of the panel, and include theoldest person; ignoring either the truncation or the censoring could lead to incorrect conclusions. The

France2019 lifetimes are left- and right-truncated: only individuals who are observed to die during the sampling periodappear in the dataset. The statistical consequences are discussed in the Appendix A.2.In our analysis we take the sampling frame into account and pinpoint the age, if any, at which mortalityattains a plateau, and disentangle the effects of age and of birth cohort. We also compare mortality in the

ISTAT , France 2019 and

IDL 2016 databases, and between men and women.We use the generalized Pareto distribution from extreme value statistics in the main analysis, supplementedby ﬁts of the Gompertz distribution, which is standard in demography. We ﬁrst outline our main results andconclusions; the Appendix gives a more detailed description of our methods.

Results for ISTAT data

Lifetimes beyond 105 years are highly unusual and the application of extreme value models (de Haan & Ferreira,2006) is warranted. We use the generalized Pareto distribution, F ( x ) = (cid:40) − (1 + γx/σ ) − /γ + , x ≥ , γ (cid:54) = 0 , − e − x/σ , x ≥ , γ = 0 , (1)to model x , the excess lifetime above u years. In (1), a + = max( a, and σ > and γ ∈ R are scale and shapeparameters. For negative shape parameter γ the distribution has a ﬁnite upper endpoint at − σ/γ , whereas γ ≥ yields an inﬁnite upper endpoint.The corresponding hazard function, often called the “force of mortality” in demography, is the densityevaluated at excess age x , conditional on survival to then, i.e., h ( x ) = f ( x )1 − F ( x ) = 1( σ + γx ) + , x ≥ , (2) preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age year of death a g e a t d e a t h ( y e a r s ) Figure 1: Lexis diagram of the

ISTAT data, showing age and calendar time at death for men (red crosses) andwomen (grey dot if only one woman, black dot if several). Censored observations are displayed in the rightmargin; the ticks indicate the ages of men (red) and women (black) still alive on 31 December 2015. Thesampling frame includes only persons aged at least 105 years and alive on 1 January 2009, or whose 105thbirthday occurred before 1 January 2016. The death counts for each calendar year are given at the top of thegraph for men (red) and women (black).where f ( x ) = d F ( x ) / d x is the generalized Pareto density function. If γ < , the hazard function tends toinﬁnity at the ﬁnite upper limit for exceedances. When γ = 0 , F is exponential and the hazard function isconstant, meaning that the likelihood that a living individual dies does not depend on age beyond the threshold.In this case, mortality can be said to have plateaued at age u .The choice of a threshold u such that Eq. [1] models exceedances appropriately is a basic problem in extremevalue statistics and is surveyed by Scarrott & MacDonald Scarrott & MacDonald (2012). If u is high enoughfor Eq. [1] to provide an adequate approximation to the distribution of exceedances, then the shape parameter γ is approximately unchanged if a higher threshold u (cid:48) is used, and the scale parameters for u and u (cid:48) have a knownrelationship, so a simple and commonly-used approach to the choice of threshold is to plot the parameters ofthe ﬁtted distributions for a range of thresholds (Davison & Smith, 1990) and to use the lowest threshold abovewhich parameter estimates stabilise. This choice balances the extrapolation bias arising if the threshold is toolow with the increased variance incurred when taking u too high to retain an adequate number of observations.The upper left-hand panel of Figure 2 shows that for age thresholds close to 105 years the estimated shapeparameters for excess life lengths are negative, with 95% conﬁdence intervals barely touching zero, but there isno systematic indication of non-zero shape above 107 years. The upper right-hand panel displays the estimatedscale parameter of the exponential model ﬁtted to life lengths exceeding the threshold. The scale parametersdecrease for ages 105–107 but show no indication of change after age 107, where the scale parameter estimateis 1.45. Parameter stability plots suggest an exponential model and hence a constant hazard after age 107 or so,where a mortality plateau seems to be attained.The upper part of Table 1 shows results from ﬁtting Eq. [1] and the exponential distribution to the ISTAT fora range of thresholds. The exponential model provides an adequate ﬁt to the exceedances over a threshold at 108years, above which the hypothesis that γ = 0 , i.e., the exponential model is an adequate model simpliﬁcation,is not rejected. Author’s Original Version preprint , version of October 5, 2020

Belzile et al.

105 107 109 111 - . . . . threshold (years) γ

105 107 109 111 . . . . threshold (years) σ e

105 107 109 111 - . . . . threshold (years) γ

105 107 109 111 . . . . threshold (years) σ e Figure 2: Parameter stability plots for the

ISTAT data (top) and for the

France 2019 data (bottom), showingthe shape γ of the generalized Pareto distribution (left) and the scale σ e of the exponential distribution (right)based on lifetimes that exceed the age threshold on the x -axis. The plots give maximum likelihood estimateswith 95% conﬁdence intervals derived using a likelihood ratio statistic. The horizontal lines in the right-handpanels correspond to the estimated scale for excess lifetimes over 108 years for the ISTAT data. preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age ISTAT threshold

105 106 107 108 109 110 111 n u σ .

67 (0 .

04) 1 . .

06) 1 .

47 (0 .

08) 1 .

47 (0 .

11) 1 .

33 (0 .

15) 1 .

22 (0 .

23) 1 . . γ − .

05 (0 . − .

07 (0 . − .

02 (0 . − .

01 (0 .

06) 0 .

03 (0 .

09) 0 .

12 (0 .

17) 0 .

06 (0 . σ e .

61 (0 .

03) 1 . .

04) 1 .

45 (0 .

05) 1 .

45 (0 .

08) 1 .

36 (0 .

11) 1 .

35 (0 .

17) 1 .

58 (0 . p -value .

04 0 .

01 0 .

70 0 .

82 0 .

74 0 .

44 0 . p ∞ .

02 0 .

01 0 .

35 0 .

41 0 .

63 0 .

78 0 . France 2019 threshold

105 106 107 108 109 110 111 n u σ .

69 (0 .

02) 1 .

59 (0 .

03) 1 .

54 (0 .

04) 1 .

43 (0 .

06) 1 .

36 (0 .

08) 1 .

34 (0 .

13) 1 .

26 (0 . γ − .

06 (0 . − .

04 (0 . − .

02 (0 .

03) 0 .

02 (0 .

05) 0 .

05 (0 .

07) 0 .

09 (0 . σ e .

62 (0 .

02) 1 .

53 (0 .

03) 1 .

49 (0 .

03) 1 .

41 (0 .

05) 1 .

38 (0 .

07) 1 .

39 (0 .

11) 1 .

36 (0 . p -value × − .

01 0 .

05 0 .

60 0 .

72 0 .

48 0 . p ∞ × − × − .

02 0 .

30 0 .

64 0 .

76 0 . Table 1: Estimates (standard errors) of scale and shape parameters ( σ , γ ) for the generalized Pareto distributionand of the scale parameter ( σ e ) for the exponential model for the ISTAT and

France 2019 datasets as a functionof threshold, with number of threshold exceedances ( n u ), p -value for the likelihood ratio test of γ = 0 andprobability that γ ≥ based on the proﬁle likelihood ratio test under the generalized Pareto model ( p ∞ ). ISTAT France 2019 IDL 2016 n σ e (95% CI) n σ e (95% CI) n σ e (95% CI) women

375 1 .

45 (1 . , .

62) 1116 1 .

46 (1 . , .

56) 507 1 .

39 (1 . , . men

40 1 .

41 (0 . , .

98) 94 0 .

90 (0 . , .

11) 59 1 .

68 (1 . , . All

415 1 .

45 (1 . , .

61) 1210 1 .

41 (1 . , .

51) 566 1 .

42 (1 . , . Table 2: Estimates of the scale, σ e , of the exponential distribution, with 95% conﬁdence intervals (CI). Thisdistribution is ﬁtted to exceedances of 108 years in the ISTAT and

France 2019 data and of 110 years in the

IDL 2016 data analysed in Rootzén & Zholud (2017).The estimated scale parameter obtained by ﬁtting an exponential distribution to the

ISTAT data for peopleolder than 108 is 1.45 (years) with 95% conﬁdence interval (1 . , . . Hence the hazard is estimated to be0.69 (1/years) with 95% conﬁdence interval (0 . , . ; above 108 years the estimated probability of survivingat least one more year at any given age is 0.5 with 95% conﬁdence interval (0 . , . .We investigated birth cohort effects, but found none; see Appendix A.3. Results for

France 2019 data

Estimation for the

France 2019 data was done as described in Rootzén & Zholud (Rootzén & Zholud, 2017),taking into account the left- and right truncation of the lifetimes. Both the parameter stability plots in thelower panels of Figure 2 and the results given in the lower part of Table 1 indicate that the exponential modelis adequate above 108 years. For persons older than 108 the exponential scale parameter is estimated to be1.41 (years) with 95% conﬁdence interval (1 . , . , the hazard is estimated to be 0.71 (years − ) with 95%conﬁdence interval (0 . , . and the estimated probability of surviving at least one more year is 0.49 with95% conﬁdence interval (0 . , . .Table 2 shows that estimates of the scale parameter for the exponential distribution for women and men for Author’s Original Version preprintpreprint

Belzile et al. the

France 2019 data differ. If men are excluded, then the estimated scale parameter increases from 1.41 to1.46 years, and if the oldest woman, Jeanne Calment, is also excluded, the estimate for women drops to 1.44years. Similarly to the

ISTAT data, survival for ages 105–107 was lower in earlier cohorts.

PowerPower

Our analysis above suggests that there is no upper limit to human lifetimes and that constant hazard adequatelymodels excess liftime if one considers only those persons whose lifetimes exceed 108 years: there is no evidencethat the force of mortality above this age is other than constant. One might wonder whether increasing forceof mortality would be detectable, however, as the number of persons attaining such ages is relatively small. Toassess this we performed a simulation study described in the Appendix A.7, mimicking the sampling schemesof the

ISTAT , France 2019 and

IDL 2016 datasets as closely as possible and generating samples from thegeneralized Pareto distribution with − . ≤ γ ≤ . To remove overlap between the last two datasets, wedropped France from IDL 2016 .Any biological limit to their lifespan should be common for all humans, whereas differences in mortalityrates certainly arise due to social and medical environments and can be accommodated by letting hazards varyby factors such as country or sex. With the overlap dropped we can treat the datasets as independent andcompute the power for a combined likelihood ratio test of γ = 0 (inﬁnite lifetime) against alternatives with γ < (ﬁnite lifetime). For concreteness of interpretation we express the results in terms of the implied upperlimit of lifetime ι = u − σ/γ . The left-hand panel of Figure 3 shows the power curves for the three datasetsindividually and pooled. The power of the likelihood ratio test for the alternatives ι ∈ { , , } years,for example, is . / . / . for the ISTAT data above 108, . / . / . for the France 2019 data above108, and . / . / . for the IDL 2016 data above 110. The power for ι = 125 / / years based on allthree datasets is . / . / . , so it appears to be rather unlikely that an upper limit to the human lifespan, ifthere is such a limit, is below 130 years or so.Similar calculations give the power for testing the null hypothesis γ = 0 against alternatives γ < . Forcingall datasets to have the same shape parameter would allow them to have different endpoints so we reject theoverall null hypothesis if we reject the exponential hypothesis any of the three datasets. The power of thisprocedure is shown with a dashed black line in Figure 3. The resulting combined power exceeds . for γ < − . and equals . for the alternative γ = − . , giving strong evidence against a sharp increase in thehazard function after 108 years. Gompertz model

The hazard function of the generalized Pareto distribution cannot model situations in which the hazard increasesto inﬁnity, but the upper limit to lifetimes is inﬁnite. This possibility is encompassed by the Gompertz distribu-tion (Gompertz, 1825), which has long been used for modelling lifetimes and often provides a good ﬁt to dataat lower ages (e.g., Thatcher, 1999). When the Gompertz model is expressed in the form F ( x ) = 1 − exp (cid:110) − ( e βx/σ − /β (cid:111) , x > , σ, β > ,σ is a scale parameter with the dimensions of x , and the dimensionless parameter β controls the shape of thedistribution. Letting β → yields the exponential distribution with mean σ ; small values of β correspond tosmall departures from the exponential model. The fact that β cannot be negative affects statistical comparisonof the Gompertz and exponential models; see Appendix A.8.The Gompertz distribution has inﬁnite upper limit to its support, so it cannot be used to assess whether thereis a ﬁnite upper limit to the human lifespan. Its hazard function, σ − exp( βx/σ ) , is ﬁnite but increasing for all preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age . . .

751 120 130 140 150 160 170 180 upper limit to human lifespan p o w e r . . . − . − . − . − . − .

05 0 γ p o w e r data combined France IDL2016 Istat Figure 3: Power functions based on the

IDL 2016 (excluding French records),

France 2019 and

ISTAT databases and combined datasets, with rugs showing the lifetimes above 115. Left: power for the alterna-tive of a ﬁnite endpoint ι against the null hypothesis of inﬁnite lifetime based on the likelihood ratio statistic.The endpoint cannot be lower than the largest observations in each database. Right: power of the Wald statisticfor the null hypothesis γ = 0 against the one-sided alternative γ < as a function of γ ; the dashed line repre-sents the power obtained by rejecting the null of exponentiality when any of the three one-sided test rejects. Thecurves are obtained by conditioning on the birthdates and left-truncated values in the databases, then simulatinggeneralized Pareto data whose parameters are the partial maximum likelihood estimates ( (cid:98) σ γ , γ ) . The simulatedrecords are censored if they fall outside the sampling frame for the ISTAT data and are simulated from a doublytruncated generalized Pareto distribution for

IDL 2016 and

France 2019 . See Appendix A.7 for more details.

Author’s Original Version preprintpreprint

Author’s Original Version preprintpreprint , version of October 5, 2020

Belzile et al. x ( β > ) or constant ( β = 0 ). The limiting distribution for threshold exceedances of Gompertz variables isexponential, and this limit is attained rather rapidly, so a good ﬁt of the Gompertz distribution for lower x wouldbe compatible with good ﬁt of the exponential distribution for threshold exceedances at higher values of x .Computations summarised in Appendix A.8 show that the exponential model, and hence also the Gompertzmodel with very small β , give equally good ﬁts to the Italian and the French datasets above age 107, and thatthe Gompertz and generalised Pareto models ﬁt equally well above age 105. Conclusions

None of the analyses of the

ISTAT , IDL 2016 or France 2019 data, for women and men separately or combined,indicates any deviation from exponentially distributed residual lifetimes, or equivalently from constant force ofmortality, beyond 108 years.Table 2 shows no differences between survival after age 108 in the

ISTAT data and survival after age 110 inthe

IDL 2016 data for women, for men, or for women and men combined, so we merged these estimates by tak-ing a weighted average with weights inversely proportional to the estimated variances. The resulting estimatesalso show no signiﬁcant differences in survival between men and women, and we conclude that survival timesin years after age 108 in the

ISTAT data and after age 110 in the

IDL 2016 data are exponentially distributedwith estimated scale parameter . and 95% conﬁdence interval (1 . , . . The corresponding estimatedprobability of surviving one more year is . with 95% conﬁdence interval (0 . , . .There was no indication of differences in survival for women or the whole of the France 2019 data and inthe combined

ISTAT and

IDL 2016 data, but survival for men was lower in the

France 2019 data. A weightedaverage of the estimates for the

ISTAT data, the

France 2019 data and the

IDL 2016 data with France removedgives an exponential scale parameter estimate of . years with 95% conﬁdence interval (1 . , . , andestimated probability . . , . of surviving one more year.Deleting the men from the France 2019 data or dropping Jeanne Calment changes estimates and conﬁdenceintervals by at most one unit in the second decimal.There is high power for detection of an upper limit to the human lifespan up to around 130 years, based onﬁts of the generalized Pareto model to the three datatbases. Moreover there is no evidence that the Gompertzmodel, with increasing hazard, ﬁts better than the exponential model, constant hazard, above 108 years.

Discussion

The results of the analysis of the newly-available

ISTAT data agree strikingly well with those for the

IDL supercentenarian database and for the women in the

France 2019 data. Once the effects of the sampling frameare taken into account by allowing for truncation and censoring of the ages at death, a model with constanthazard after age 108 ﬁts all three datasets well; it corresponds to a constant probability of 0.49 that a livingperson will survive for one further year, with 95% conﬁdence interval (0.48, 0.51). The power calculationsmake it implausible that there is an upper limit to the human lifespan of 130 years or below.Although many fewer men than women reach high ages, no difference in survival between the sexes isdiscernible in the

ISTAT and the

IDL 2016 data. Survival of men after age 108 is lower in the

France 2019 data, but it seems unlikely that this reﬂects a real difference between France and Italy and between France andthe other countries in the

IDL . It seems more plausible that this effect is due to some form of age bias or is afalse positive caused by multiple testing.If the

ISTAT and

France 2019 data are split by birth cohort, then we ﬁnd roughly constant mortality fromage 105 for those born before the end of 1905, whereas those born in 1906 and later have lower mortality forages 105–107; this explains the cohort effects detected by Barbi et al. (2018). Possibly the mortality plateau isreached later for later cohorts. The plausibility of this hypothesis could be weighed if further high-quality databecome available. preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age A. Supplementary material

This section summarises the methods used to produce ﬁgures and perform our inferences, and adds goodness ofﬁt diagnostics and hazard estimates.

A.1. Reproducibility

The

ISTAT data can be obtained from the Italian National Institute of Statistics by registering at the ContactCenter (https://contact.istat.it) and mentioning the Semisupercentenarian Survey and Marco Marsili as contactperson.LATool is a

MATLAB toolbox for life length analysis that makes alternative analyses possible. It con-sists of three ﬁles that are available from https://doi.org/10.1007/s10687-017-0305-5 , anda data ﬁle, data.mat , which can be downloaded, together with the full IDL database, by registering at . R R Core Team (2020) was used for many of the analyses and to generate all the ﬁgures as well as Tables 1and 3. Standalone code to reproduce the analyses is provided online.

A.2. Truncation and censoring

Figure 1 shows that many persons in the

ISTAT dataset were alive on 31 December 2016, when samplingﬁnished, so their lifetimes must be treated as right-censored. Lifetimes above 105 years on 1 January 2009 areleft-truncated and individuals not aged at least 105 from 1 January 2009 to 31 December 2016 are not included.Both censoring and truncation must be handled correctly to avoid biased inferences; the effect of the truncationis that inferences must be conditioned on the event that an individual appears in the database. Below we outlinehow this is achieved; see Rootzén & Zholud (2017) and Davison (2018) for more details.Consider a sampling interval C = ( b, e ) of calendar time during which individuals aged over a threshold of u years were observed. Let x = age − u denote the excess age of an individual who dies aged older than u , havingreached age u at calendar time t . Assume that the excess ages x are independent with cumulative distributionfunction F ( x ; θ ) , probability density function f ( x ; θ ) , and survival function S ( x, θ ) = 1 − F ( x, θ ) , where θ is a vector of parameters to be estimated.The likelihood contribution for someone who died in C is then f ( x ) S { ( b − t ) + } , x > , whereas that for someone who is known to be alive at the end of C , and thus whose lifetime is censored, is S ( e − t ) S { ( b − t ) + } , t < e. The likelihood function L ( θ ) is the product of the likelihood contributions from all individuals included in thedata under study. Estimates for the generalized Pareto and Gompertz distributions were found by numericalmaximization of the log likelihood, with standard errors obtained from the inverse observed information matrix.Explicit expressions exist for the maximum likelihood estimator of the exponential distribution scale parameterand its standard error, (cid:98) σ e = (cid:80) i { x i − ( b − t i ) + } deaths , se( (cid:98) σ e ) = (cid:98) σ e √ deaths . (3)The IDL and

France 2019 data were left- and right-truncated, and this was taken into account in our analysis;the likelihood contribution for these individuals is f ( x i ) F ( e − t i ) − F { ( b − t i ) + } , b − t i ≤ x i ≤ e − t i . (4) Author’s Original Version preprint , version of October 5, 2020 et al.

105 107 109 111 . . . . age (years) l o c a l h a z a r d

105 107 109 . . . . age (years) l o c a l h a z a r d Figure 4: Local hazard estimates with 95% pointwise conﬁdence intervals for

ISTAT birth cohorts 1886-1905(left) and 1906–1910 (right), with horizontal lines at 0.7. The rightmost point includes all survivors beyond 111years (respectively 109 years).Inappropriate analysis can lead to misleading results: for example, ﬁtting an exponential distribution tothe

ISTAT individuals who survive beyond age 107 without accounting for truncation or censoring gives theestimate (cid:98) σ e = 1 . with 95% conﬁdence interval (1 . , . , to be compared with (cid:98) σ e = 1 . with 95%conﬁdence interval (1 . , . once the truncation and censoring are accounted for; these conﬁdence intervalsdo not overlap. A.3.

ISTAT cohort effects

The local hazard estimates in Figure 4 are obtained by splitting the likelihood contribution of individuals intoyearly blocks, using disjoint intervals with ( b, e ) = ( a, a + 1) years to avoid using the same individuals severaltimes. For the highest interval we set e = ∞ and include all individuals who survived into that interval.The parameter stability plots in Figure 5 and the estimated hazard plots in Figure 4 show roughly constantmortality for the birth cohort 1886–1905 for the entire age range. Mortality is lower for ages 105 and 106 forthe birth cohort 1906–1910, but equals that for the earlier birth cohort at ages 107 and above. This reductionin mortality for the later birth cohort implies that plateauing for the entire ISTAT dataset does not start untilapproximately the age of 108.

A.4. Graphical diagnostics

A quantile-quantile (or QQ-) plot is a standard diagnostic of the ﬁt of a speciﬁed distribution to data, but itmust be modiﬁed to accommodate censoring or truncation. We plot the ordered ages at death, x i , of non-censored ISTAT individuals against plotting positions from (cid:98) F − { (cid:101) G ( x ) } (Waller & Turnbull, 1992), where inour case (cid:98) F − is the quantile function of the exponential distribution ﬁtted to ages exceeding 108 years and (cid:101) G is the Kaplan–Meier estimate of the distribution function, corrected for censoring and truncation (Tsai et al.,1987). To assess the variability of the plot we simulate new ages at death from the ﬁtted exponential distribution,conditioning on the birth dates, truncation time and censoring indicator; this amounts to simulating new lifetimesfrom a doubly truncated exponential distribution, since individuals whose death is observed during the samplingframe cannot exceed the age they would reach on 31 December 2015. The left-hand panel of Figure 6 shows 100such curves, corresponding to x i ( b ) on the y -axis and (cid:98) F b − { (cid:101) G b ( x i ( b ) ) } ; under the parametric bootstrap, both (cid:98) F and (cid:101) G are re-estimated using the simulated samples. The right-hand panel of Figure 6 shows approximate95% pointwise and simultaneous conﬁdence intervals obtained using a simulation envelope for (cid:98) F − b { (cid:101) G b ( x ) } preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age

105 107 109 111 - . . . . threshold (years) γ

105 107 109 111 . . . . threshold (years) σ e - . . . . threshold (years) γ . . . . threshold (years) σ e Figure 5: Parameter stability plots for the

ISTAT birth cohorts 1896–1905 (top) and 1906–1910 (bottom), forthe parameters γ of the generalized Pareto distribution (left) and for the scale σ e of the exponential distribution(right) obtained from exceedances of the age threshold on the x -axis. The plots give maximum likelihoodestimates with (proﬁle) likelihood 95% conﬁdence intervals. The horizontal line on the right panels correspondsto the estimated scale for exceedances above 108 for the full ISTAT dataset.(Davison & Hinkley, 1997, §4.2.4). Despite the discrepancy for individuals dying shortly after age 108, anexponential distribution adequately captures the observed mortality.

A.5. Differences between men and women

The imbalance in the number of men and women limits our ability to detect any effects of gender on mortalityat great age. To illustrate this, we conducted a simulation study based on the

ISTAT data to assess the powerof a likelihood ratio test for conditional lifetime exceedances above 108 years, for which we have 40 men and375 women; the lifetimes of 94 individuals, 15 of them men, are censored. We condition on the sampling frameand the sex of individuals and simulate new life trajectories for both men and women based on an exponentialdistribution with relative scale differing by a ratio λ > , corresponding to lower hazard for women. Thusindividuals who were older than 108 in 2009 survive at least beyond that age, but a simulated lifetime thatextends beyond 2016 is censored. For each of 10 000 simulated samples, we computed the likelihood ratiostatistic for comparison of ﬁts to men and women separately and a combined ﬁt; the statistic has an asymptotic χ distribution. We also considered conditioning only on the birth dates and the beginning of the samplingperiod to assess whether the right-censoring leads to loss of power, but the difference is negligible. Power of80% is achieved if λ ≈ . , corresponding to an average survival, σ e , of 1.85 years for women and 1.14 yearsfor men, with the corresponding differences for powers 20% and 50% being 0.28 years and 0.50 years. The Author’s Original Version preprint , version of October 5, 2020 et al.

108 112 116 theoretical quantiles o b s e r v e d q u a n t i l e s

108 112 116 theoretical quantiles o b s e r v e d q u a n t i l e s Figure 6: Exponential QQ-plot of the

ISTAT data for individuals who lived beyond 108 years and died before2016, with pointwise (dashed) and simultaneous (dotted) 95% conﬁdence intervals obtained through simulation(right). The left panel shows trajectories for simulated lifetimes drawn from the exponential model for observednon-censored observations. The circles give the ordered ages at death of non-censored

ISTAT lifetimes againstthe plotting positions.power of the corresponding test for the

IDL 2016 data is expected to be similar.For the

France 2019 data a likelihood ratio test strongly rejects the hypothesis of equal hazard for womenand men. The ratio of the estimated hazards for men and women in this dataset is approximately 1.61. Thecombined power of the tests in the

ISTAT and

IDL 2016 data for detecting this ratio is approximately . . A.6. Hazard estimates

To construct a local hazard estimate based on the limiting generalized Pareto distribution, we note that thisdistribution has reciprocal hazard function r ( x ) = ( σ + γx ) + , where a + = max( a, for real a . A moreﬂexible functional form is r ( x ) = { σ + γx + g ( x ) } + , where g ( x ) is a smooth function of x that tends to zeroas x increases. For exploratory purposes we take g ( x ) = (cid:80) Kk =1 β k b k ( x ) , where b k ( x ) = ( κ k − x ) , κ < · · · < κ K ; here the κ k are ﬁxed knots, and b k ( x ) = 0 for x ≥ κ k . The resulting cubic spline function g ( x ) has twocontinuous derivatives. This model has K + 2 parameters and corresponds to the generalized Pareto distributionfor x > κ K , but allows r ( x ) to depart from linearity for x ≤ κ K .Lifetime data are recorded to the nearest day and often there are ties for small x , so we use a discreteversion of the model, with x ∈ { , . . . , x max } days, where x max corresponds to 16 years after age 105. We let h ( x ) = 1 /r ( x ) denote the hazard function, set H ( x ) = (cid:80) xz =1 h ( z ) / for compatibility with the continuouscase, and obtain survivor and probability mass functions Pr(

X > x ) = exp {− H ( x ) } and Pr( X = x ) = h ( x ) exp {− H ( x ) } for x ∈ { , . . . , x max } .Let X denote a survival time (days) beyond 105 years. For each individual the available data are of the form ( s, d, x ) , where s = 0 if observation of X began at 105 years and, if not, s > is the age in days above 105at which observation of X began, d = 1 indicates death, X = x , and d = 0 indicates right-censoring, X > x . preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age

105 110 115 120 . . . . age (years) h a z a r d f u n c t i o n ( / y e a r )

105 110 115 120 . . . . age (years) h a z a r d f u n c t i o n ( / y e a r ) Figure 7: Local hazard estimation for the

ISTAT data using a spline approach with ﬁve random knots centered at , . . . , years. Left: original estimate (heavy solid line) with pointwise (dotted) and overall (dashed) 95%envelopes obtained from bootstrap replicates, of which are displayed (grey solid lines). The meanpositions of the knots are shown by the dashed vertical lines. The black rug shows (jittered) times of deaths,and the red rug shows (jittered) censored survival times; the rugs are suppressed for the lower ages, as theretoo many points to be informative. The right-hand panel shows the same output for the best-ﬁtting exponentialmodel, for which γ = 0 .The likelihood is then a product of terms of the form Pr( X = x ) d × Pr(

X > x ) − d Pr(

X > s ) , x > s, and depends on the parameters σ, γ, β , . . . , β K , estimates of which are readily obtained by numerical maxi-mization of the log likelihood function. The resulting ﬁt depends on the knots κ , . . . , κ K , but to reduce thisdependence we generate knots at random, roughly evenly spaced in an interval (0 , x max ) , where x max is cho-sen large enough that r ( x ) should be linear for x > x max , i.e., the generalized Pareto model is ﬁtted when x > x max .The left-hand panel of Figure 7 shows a local estimate of the hazard function for the ISTAT data, constrainedto have the form of (2) above 110 years. The hazard dips up to 108 years or so, then rises and declines slowly.To assess the signiﬁcance of this decline we performed a bootstrap analysis Efron & Tibshirani (1993); Davison& Hinkley (1997), generating 5000 replicate datasets, the hazard function estimates for 100 of which are shownin the panel. These suggest that the slow decrease after age 110 is not signiﬁcant, and this is conﬁrmed by thepointwise and overall 95% conﬁdence bands. The initial dip seems to be a genuine feature, but above 108 yearsthe conﬁdence bands include a wide range of possible functions, including a constant hazard.

A.7. Power

To assess the statistical power of our procedures, we compute the maximum proﬁle likelihood estimates (cid:98) σ γ for the original data with γ ﬁxed at the values − . , . . . , and simulate new excess lifetimes for each γ ,conditioning on the sampling frame. In the ISTAT data, for example, we calculate the ages of all individualson January 1st, 2009, retain the dates at which they reach 108 years and sample new excess lifetimes from thegeneralized Pareto distribution with parameters ( γ, (cid:98) σ γ ) , right-censoring any simulated lifetimes beyond the end Author’s Original Version preprint , version of October 5, 2020 et al. of 2016. This ensures that the power calculations are as relevant as possible, not only in terms of the samplingscheme but also in terms of the underlying parameters. For each simulated dataset we compute the directedlikelihood ratio statistic r = sign( (cid:98) γ − γ ) { (cid:96) ( (cid:98) γ, (cid:98) σ ) − (cid:96) (0 , (cid:101) σ e ) } / and the Wald statistic w = (cid:98) γ/ se ( (cid:98) γ ) fortesting the null hypothesis γ = 0 against the alternative γ < , which corresponds to testing for exponentialityagainst possible upper bounds to the human lifespan. The asymptotic null distribution of the statistics r and w is standard normal and we can assess the quality of this approximation by calculating the Wald statistics when γ = 0 . These simulations show that the estimator of the shape parameter is unbiased and normally distributed,but the distribution of the Wald statistics is left-skewed, leading to inﬂated Type I error. For the power study,we thus use the simulated null distribution for comparison; tests performed in the paper should be consideredliberal, meaning their level would be higher than that claimed.We proceed similarly for the endpoint ι . In order to simulate from generalized Pareto distributions with ι ﬁxed at a given value, we reparametrise the generalized Pareto log likelihood function in terms of ι and σ andestimate the scale parameters (cid:98) σ ι for each of the original three datasets with ι ﬁxed, then use the relation ι = u − σ/γ to obtain the three implied shape parameters (cid:98) γ ι . We then simulate new datasets with the sampling schemedescribed above, but using the three sets of parameter pairs ( (cid:98) γ ι , (cid:98) σ ι ) . For the joint test of the null hypothesisof an exponential distribution, ι = ∞ , against the alternative of a common but ﬁnite ι , we reparametrize thelikelihood in terms of ι and allow the three datasets to have different values of σ .The left-hand panel of Figure 3 shows how the empirical proportion of rejections for a test of nominal size based on the directed likelihood root statistic r = −{ (cid:96) gp ( (cid:98) ι, (cid:98) σ ) − (cid:96) exp ( (cid:101) σ e ) } / varies with ι = u − σ/γ for γ < , for the ISTAT , France 2019 and the rest of the

IDL 2016 data. Here ( (cid:98) ι, (cid:98) σ ) are the maximumlikelihood estimates for the generalized Pareto distribution parametrized in terms of a common ﬁnite ι and threescale parameters and (cid:101) σ e are the maximum likelihood estimates of the three exponential scale parameters underthe null hypothesis. A.8. Gompertz model

The reciprocal hazard function of the Gompertz distribution r ( x ) = σ exp( − βx/σ ) encodes the speed ofconvergence to the limiting extreme value distribution (Smith, 1987); even if β > , r (cid:48) ( x ) = β exp( − βx/σ ) → exponentially fast as x → ∞ . Any improvement in the ﬁt of the Gompertz model for exceedances over somethreshold would be shown by evidence that β > .Table 3 summarises the ﬁt of this model for various thresholds and the ISTAT data, without sex or cohorteffects. The hypothesis that the Gompertz distribution ( β > ) reduces to an exponential distribution ( β = 0 )is a boundary hypothesis, so the likelihood ratio statistic to test β = 0 does not have the usual approximate χ distribution; its large-sample distribution is a 50:50 mixture of a point mass at 0 and a χ distribution, sometimeswritten as χ + χ (Self & Liang, 1987). Barbi et al. Barbi et al. (2018) do not notice this, leading them tomis-state the signiﬁcance of the difference of log-likelihoods in their Table 2 — the likelihood ratio statistic fortesting β = 0 against β > is w = 2 × .

292 = 0 . , and Pr( χ > . ≈ . , which is the signiﬁcancelevel quoted in Barbi et al. (2018). In fact the correct (asymptotic) level would be Pr( χ > . ≈ . .Here the conclusion does not change, but it might in other contexts.In general, p -values obtained using a parametric bootstrap are likely to be more reliable than asymptoticapproximations of the form χ + χ and are preferable for such comparisons. In the present case thisentails simulating independent datasets like the original data from the boundary exponential distribution (thenull hypothesis, β = 0 ), and estimating the p -value by the empirical proportion of these simulated datasets inwhich the likelihood ratio statistics w ∗ are no smaller than the original value w , i.e., (cid:99) Pr ∗ ( w ∗ ≥ w ) , where theasterisk indicates a parametric bootstrap simulation and the subscript indicates that the simulation is under thenull hypothesis; see Chapter 4 of Davison & Hinkley (1997). This approach was used to obtain the p -values inTable 3, which show that the exponential model is statistically indistinguishable from the Gompertz model from preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age threshold

105 106 107 108 109 110

ISTAT n u σ .

67 (0 .

05) 1 .

71 (0 .

07) 1 .

48 (0 .

08) 1 .

47 (0 .

12) 1 .

36 (0 .

1) 1 .

35 (0 . β .

05 (0 .

02) 0 .

09 (0 .

04) 0 .

02 (0 .

05) 0 .

02 (0 .

07) 0 0 p -value .

02 0 .

01 0 .

37 0 .

45 1 1

France 2019 n u σ .

71 (0 .

03) 1 . .

03) 1 .

55 (0 .

05) 1 .

43 (0 .

06) 1 .

39 (0 . β .

08 (0 .

01) 0 .

05 (0 .

02) 0 .

06 (0 .

03) 0 .

02 (0 .

03) 0 0 p -value .

02 0 .

32 1 1

Table 3: Estimates (standard errors) of Gompertz parameters ( β , σ ) for the Italian ISTAT data (top) and the

France 2019 data (bottom) for different thresholds. The ﬁrst row give the number of threshold exceedances( n u ), and the bootstrap p -values are for the likelihood ratio test of β = 0 against β > . Estimates of β reportedas zero are smaller than − . threshold

105 106 107 108 109 110

Gompertz .

01 1 .

65 2 .

21 0 .

22 0 .

68 1 . GP .

00 2 .

17 2 .

21 0 .

23 0 .

56 1 . exponential .

16 8 .

21 2 .

36 0 .

28 0 .

68 1 . Table 4: Deviances for comparison of extended generalized Pareto (GP) and the exponential, GP and Gompertzsub-models for different thresholds. The rows show the likelihood ratio statistic, i.e., twice the difference inlog-likelihood between the speciﬁed model and an encompassing extended generalized Pareto model.108 years onwards, though the Gompertz model ﬁts better at 106 years and below for the

ISTAT data and at 107years and below for the

France 2019 data.Table 4 compares the ﬁts of the Gompertz, generalized Pareto and exponential models to the

ISTAT data,with the baseline taken to be an extended generalized Pareto distribution that encompasses all three other mod-els; the details will be reported elsewhere. The generalized Pareto and Gompertz models ﬁt equally well forall thresholds, since the differences between their respective deviances are minimal. Both are better than theexponential model below 107 years, but not above, in agreement with Table 3.

Author’s Original Version preprint , version of October 5, 2020 et al.

References

Barbi, E., Lagona, F., Marsili, M., Vaupel, J. W., & Wachter, K. W. (2018). The plateau of human mortality:Demography of longevity pioneers.

Science , 360(6396), 1459–1461.Davison, A. C. (2018). ‘The life of man, solitary, poore, nasty, brutish, and short’: Discussion of the paper byRootzén and Zholud.

Extremes , 21(3), 365–372.Davison, A. C. & Hinkley, D. V. (1997).

Bootstrap Methods and Their Application . Cambridge: CambridgeUniversity Press.Davison, A. C. & Smith, R. L. (1990). Models for exceedances over high thresholds (with Discussion).

Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 52(3), 393–442.de Haan, L. & Ferreira, A. (2006).

Extreme Value Theory: An Introduction . Springer.Efron, B. & Tibshirani, R. J. (1993).

An Introduction to the Bootstrap . New York: Chapman & Hall.Einmahl, J. J., Einmahl, J. H. J., & de Haan, L. (2019). Limits to human life span through extreme value theory.

J. Amer. Statist. Assoc. , 114, 1075–1080.Gampe, J. (2010). Human mortality beyond age 110. In H. Maier & et al. (Eds.),

Supercentenarians chapter 13,(pp. 219–230). Heidelberg: Springer-Verlag.Gavrilov, L. A. & Gavrilova, N. S. (2019). Late-life mortality is underestimated because of data errors.

PLOSBiology , 17(2), 1–7.Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and on a new modeof determining the value of life contingencies.

Philosophical Transactions of the Royal Society of London ,115, 513–585.Guarino, B. (2018). Good news for human life spans — at age 105, death rates suddenly stop going up.

Washington Post , (pp. June 28).Hanayama, N. & Sibuya, M. (2016). Estimating the upper limit of lifetime probability distribution, based ondata of Japanese centenarians.

J. Gerontol. A Biol. Sci. Med. Sci. , 71(8), 1014–1021.Hoad, P. (2019). ‘People are caught up in magical thinking’: was the oldest woman in the world a fraud?

TheGuardian , (pp. November 30).ISTAT (2018). Italian database on semisupercentenarians. See paper by Barbi et al. for instructions on how toobtain the data.Oeppen, J. & Vaupel, J. W. (2002). Broken limits to life expectancy.

Science , 296, 1029–1031.Poulain, M. (2010). On the age validation of supercentenarians. In H. Maier & et al. (Eds.),

Supercentenarians chapter 1, (pp. 4–30). Heidelberg: Springer-Verlag.R Core Team (2020).

R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria.Rootzén, H. & Zholud, D. (2017). Human life is unlimited — but short (with Discussion).

Extremes , 20(4),713–728.Rootzén, H. & Zholud, D. (2018). Rejoinder to discussion of the paper “Human life is unlimited — but short”.

Extremes , 21, 415–424.Saplakoglu, Y. (2018). Have humans reached their limit on life span? These researchers say no.

Live Science ,(pp. June 28).Scarrott, C. & MacDonald, A. (2012). A review of extreme-value threshold estimation and uncertainty quan-tiﬁcation.

REVSTAT – Statistical Journal , 10, 33–60.Self, S. G. & Liang, K. Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratiotests under nonstandard conditions.

Journal of the American Statistical Association , 82, 605–610.Smith, R. L. (1987). Approximations in extreme value theory. Technical Report 205, Center for StochasticProcesses, University of North Carolina at Chapel Hill. preprint , version of October 5, 2020 Author’s Original Version uman mortality at extreme age

Journal of the Royal Statistical Society, Series A , 162, 5–43.Tsai, W.-Y., Jewell, N. P., & Wang, M.-C. (1987). A note on the product-limit estimator under right censoringand left truncation.

Biometrika , 74(4), 883–886.Vijg, J. & Campisi, J. (2008). Puzzles, promises, and a cure for ageing.

Nature , 544, 1065–1071.Waller, L. A. & Turnbull, B. W. (1992). Probability plotting with censored data.

American Statistician , 46(1),5–12.

Author’s Original Version preprintpreprint