[PDF] Temporal models for demographic and global health outcomes in multiple populations: Introducing a new framework to review and standardize documentation of model assumptions and facilitate model comparison

Abstract

There is growing interest in producing estimates of demographic and global health indicators in populations with limited data. Statistical models are needed to combine data from multiple data sources into estimates and projections with uncertainty. Diverse modeling approaches have been applied to this problem, making comparisons between models difficult. We propose a model class, Temporal Models for Multiple Populations (TMMPs), to facilitate documentation of model assumptions in a standardized way and comparison across models. The class distinguishes between latent trends and the observed data, which may be noisy or exhibit systematic biases. We provide general formulations of the process model, which describes the latent trend of the indicator of interest. We show how existing models for a variety of indicators can be written as TMMPs and how the TMMP-based description can be used to compare and contrast model assumptions. We end with a discussion of outstanding questions and future directions.

Full PDF

TTemporal models for demographic and global health outcomes inmultiple populations: Introducing a new framework to review andstandardize documentation of model assumptions and facilitatemodel comparison

Herbert Susmann , Monica Alexander , and Leontine Alkema ∗ Department of Biostatistics and EpidemiologyUniversity of Massachusetts Amherst715 North Pleasant StreetAmherst, MA 01003 Departments of Statistical Sciences and SociologyUniversity of Toronto700 University AvenueToronto, ON M5G 1Z5February 22, 2021

Abstract

There is growing interest in producing estimates of demographic and global health indicators in populationswith limited data. Statistical models are needed to combine data from multiple data sources into estimatesand projections with uncertainty. Diverse modeling approaches have been applied to this problem, makingcomparisons between models diﬃcult. We propose a model class, Temporal Models for Multiple Populations(TMMPs), to facilitate documentation of model assumptions in a standardized way and comparison acrossmodels. The class distinguishes between latent trends and the observed data, which may be noisy or exhibitsystematic biases. We provide general formulations of the process model, which describes the latent trend ofthe indicator of interest. We show how existing models for a variety of indicators can be written as TMMPsand how the TMMP-based description can be used to compare and contrast model assumptions. We endwith a discussion of outstanding questions and future directions. ∗ The work was supported by the Bill & Melinda Gates Foundation. Contact: Herbert (Herb) Susmann ([email protected]). This work is licensed under CC-BY 4.0. a r X i v : . [ s t a t . M E ] F e b Introduction

Population-level measures of demographic and health indicators over time are essential to identify whichpopulations are the most disadvantaged, where progress has been made or is stalling, and help to informresource allocation. Important indicators include measures of health and mortality at diﬀerent ages, fertility,family planning measures and migration. Assessing changes in quantities over time is useful for both cross-country and within-country comparisons to see how outcomes have changed in the past and how they arelikely to change in future.In practice, producing estimates and projections of demographic and health indicators may not bestraightforward due to the lack of good-quality data. Well-functioning civil registration and vital statis-tics systems (CRVS), which are usually the most reliable source of demographic information, do not existin most countries worldwide, making reliable data on births, deaths, population and other health outcomesdiﬃcult to obtain. This is particularly the case in low-income countries, where the burden of mortality isoften the highest. In many cases, there are missing observations over time, and in some populations we mayhave no observations. Data that do exist are sometimes of very poor quality, and there may be issues rec-onciling diﬀerent observations of the same outcome. For example, in the absence of CRVS data, large-scalenational surveys such as the Demographic and Health Surveys and Multiple Indicator Cluster Survey (Croftet al., 2018; Khan and Hancioglu, 2019) are common sources of information about health and mortality;however, even if the surveys cover an overlapping period, measurements derived from the two sources are notnecessarily in agreement. Figure 1 illustrates this problem by showing the data used for estimating under-ﬁvemortality rates in Senegal, which come from a variety of data sources and are often not in agreement withone another.As such, statistical methods are usually required to model indicators over time because the data weobserve are incomplete or paint an imperfect picture of what we are actually trying to estimate. Statisticalmodels for the temporal estimation of demographic and health indicators have been used for decades, withsome of the earliest eﬀorts producing stochastic forecasts of mortality and fertility in the US (Lee andCarter, 1992; Lee and Tuljapurkar, 1994). As computational speed and power has increased, so too has thecomplexity of models. In the global health context, the era of monitoring progress towards DevelopmentGoals has elicited the creation of statistical models to capture non-linear trends in a wide range of indicatorsacross a variety of data-sparse situations; see for example work in estimating under-5 mortality rates (Alkemaand New, 2014; You et al., 2015; Dicker et al., 2018; Li et al., 2019) and maternal mortality (Alkema et al.,2016, 2017; Kassebaum et al., 2016). In addition, as the focus of studying population-level outcomes hasshifted to understanding inequalities across key sub-groups, the methods used to monitor trends at the macrolevel have also been applied to more ﬁner-grained geographies, particularly where issues of small populationsbecome apparent (Alexander et al., 2017; Burstein et al., 2019; Li et al., 2019). Statistical methods toaccount for and smooth temporal variation used are wide ranging, and include ARIMA models (Alkemaet al., 2017), penalized splines regression (Alexander and Alkema, 2018), and Gaussian process regression(Dicker et al., 2018). Models may be hierarchical, spatial, and include explanatory covariates.On the outset, the statistical techniques used to model important demographic and health indicatorsappear to be diverse and highly dependent on the speciﬁc outcome being estimated, the background of theresearchers, and the intended audience and users of the estimates. However, when viewed in the broadercontext of common modeling goals — to produce reliable estimates and projections over time, accountingfor missing data and multiple data sources, and to give an idea of the level of uncertainty in estimates andprojections — a large set of models can be explained in reference to a general model class.2n this paper, we propose such a model class, “Temporal Models for Multiple Populations” (TMMPs), thatencompasses many existing demographic and health models. The model class has two main components:a process model and a data model. The aim of the process model is to capture trends over time in theunderlying outcome of interest, which is itself unobserved. The process model is composed of a systematicand covariate parts, which may capture the relationship between the outcome and covariates, and a temporalsmoothing component, which allows for data-driven trends over time. The data model relates the observeddata to the underlying latent outcome of interest, taking into account diﬀerent types of error that may bepresent in diﬀerent data sources. Considering models in this class makes clear the distinction between thegoal of capturing systematic, underlying trends in the ‘true’ outcome of interest, and the goal of accountingand adjusting for diﬀerent types of measurement error in observations.After introducing statistical notation for the TMMP model class, we describe several existing modelsof a range of health and demographic indicators to illustrate how they ﬁt into our proposed model class,and how these standardized descriptions can be used for comparison across models. The ﬁrst two modelsdescribed both aim to produce under-ﬁve country-level estimates of child mortality, but employ markedlydiﬀerent strategies (Alkema and New, 2014; Dicker et al., 2018). Other models discussed are: a model ofcontraceptive use rates, which provides an example of a model that captures a transition process (Cahillet al., 2018); a model to estimate neonatal mortality rates globally, which includes a strong associationwith an underlying covariate (Alexander and Alkema, 2018); estimating maternal mortality rates, whichcombines a multilevel mixed model and an ARIMA time series model (Alkema et al., 2017); and a modelof sub-national age-speciﬁc mortality, which shows how a more traditional demographic modeling approachcan be expressed in our model class (Alexander et al., 2017).The rest of this paper is structured as follows. Section 2 formalizes the setting and describes the corestatistical task at hand. Section 3 describes two existing models for the same indicator, illustrating howtheir original notation makes it diﬃcult to directly compare their modeling assumptions and motivating theuse of a unifying modelling class. Section 4 presents an overview of the proposed framework for TMMPs,introducing the distinction between process and data models, before examining the process model in moredetail and reviewing strategies for parameter estimation, including hierarchical modeling. Section 5 returnsto the models of Section 3, casting them in our model class to show how it facilitates their comparison.Additional detailed examples are included in Section 6. We discuss open problems and future directions forresearch in Section 7.

In this section we describe the overarching statistical problem that the models in our proposed class intendto solve. They are designed to produce estimates and future projections of a particular outcome for a setof populations over a certain time period. For example, the modellers may be interested in estimating andprojecting child mortality for every UN-member country over the period 1990 until the current year.In general, the outcome of interest could be any population-level demographic or health indicator. Thepopulations of interest could include national, regional or subnational populations, or population subgroupsbased on some other characteristic (e.g. sex, wealth quintile or education); the only commonality is thatindicators for multiple populations are being modeled within the same framework.The time period of interest can range any possible timespan, but usually includes a period beyond themost recent observation year in which case projections need to be made. Longer term projections may be3f interest as well. For example, a common future year for projections is 2030, which is the target year forreaching the Sustainable Development Goals.In many contexts, there may not be a complete set of observations of the outcome of interest for all timepoints and populations. Additionally, data sources may be of diﬀering quality, with varying levels of randomerror and systematic biases. The overall goal of a model is to use the available data to produce a full timeseries of the outcome for all populations of interest. Models need to be able to account for these errors andbiases when producing estimates for years with observations, and be structured such that estimates can alsobe obtained for years without observations. Figure 1 illustrates the problem by showing the data availablefor estimating the under-ﬁve mortality rate in Senegal from 1950-2019 and compares the resulting estimatesfrom two models.Figure 1: Comparison of estimates of the under-ﬁve mortality rate in Senegal from 1950-2019 from themost recent estimates available from two models: the Global Burden of Disease (GBD) Model from theInstitute of Health Metrics and Evaluation (IHME), and the B3 model from the UN Inter-agency Group forChild Mortality Estimation (IGME). While both models generate estimates of the same indicator, they havediﬀerent data inclusion criteria and modeling assumptions, leading to diverging estimates. (A) Raw dataand model estimates from the IHME Global Burden of Disease Study; source: healthdata.org , accessed12/22/2020. (B) Raw data and model estimates from the UN IGME B3 model; source: childmortality.org , accessed 12/22/2020. (C) Comparison of estimates from each model on the same scale.Formally, let t = 1 , . . . , T index the time period and c = 1 , . . . , C index the populations of interest. Thegoal is to estimate the ‘true’ value of the indicator in each population at every time point, which we denote η c,t . The estimates should be probabilistic and accurately describe the uncertainty in the estimates of the4rue values. To inform the model there are observations y i , i = 1 , . . . , N of the indicator. Each of theseobservations is associated with a time point t [ i ] and population c [ i ]. There may be multiple observations forthe same population and time point, as well as populations with few or no observations.The goal in this paper is to propose a framework that uniﬁes the modeling approaches used to describethe evolution of the underlying outcome of interest over time, η c,t . A number of diverse modeling approacheshave been applied to this problem. While all of the approaches we will consider generate probabilisticestimates, the way they go about it can be quite diﬀerent. Without the additional structure provided by amodel class, it can be diﬃcult to compare across existing approaches. In the next section, we illustrate thisdiﬃculty by describing two existing models for the same indicator using their original notation, in which itis not obvious how to make direct comparisons between models. We ﬁrst motivate the need for a model class to compare TMMPs by illustrating two important modelsthat estimate the under-ﬁve mortality rate (U5MR) at the same geographic level and for the same timeperiod. The U5MR, which is deﬁned as the number of deaths to children under the age of ﬁve per 1,000 livebirths, is one of the most important indicators of child mortality, forming the basis of one of the MillenniumDevelopment Goals (MDGs) and now one of the Sustainable Development Goals (SDGs). In particular,SDG 3.2 stipulates that by 2030, all countries should reduce U5MR to at least as low as 25 per 1,000 livebirths. As a consequence of the focus on U5MR in the global health community, much work has been done onestimating and projecting trends in U5MR over time, as well as trying to understand these trends (Alkemaet al., 2014; Dicker et al., 2018; IGME, 2020; Li et al., 2020; Wang et al., 2017; You et al., 2015).In terms of producing reliable estimates and projections for U5MR, one of the main challenges occurswith trying to reconcile multiple, incomplete data sources in low-income countries, ﬁlling in gaps in thetime series and giving reasonable bounds of uncertainty in estimates and projections. Two main groupshave formulated models to estimate U5MR for all countries. The ﬁrst is a group at the Institute of Healthand Metrics Evaluation (IHME), as part of their work on the Global Burden of Disease (GBD) Study(Dicker et al., 2018). The second is the United Nations Interagency Group on Child Mortality Estimation(UN-IGME), whose model forms the basis of the oﬃcial U5MR estimates produced by UNICEF (Alkemaand New, 2014). The models produce diﬀering estimates of U5MR: Figure 1 compares the most recentmodel estimates covering 1950-2019 in Senegal as an example. Diﬀerences in the estimates could be due todiﬀerences in the data chosen for inclusion, diﬀerences in data processing, or diﬀerences in model assumptions(Alkema and You, 2012).We will brieﬂy describe these models following their original presentation to show how the use of diﬀerentnotational conventions makes them appear superﬁcially diﬀerent. Then, after we introduce our model class,we will return to these examples to show how casting them in our notation facilitates their comparison. InSection 6 we will expand the scope beyond these two models and show how several other existing models ﬁtinto the model class.

The GBD model (Dicker et al., 2018) has three modeling stages. The ﬁrst two stages preprocess the observeddata and the third stage provides the ﬁnal estimates of U5MR. In the ﬁrst stage, a nonlinear mixed eﬀectsregression model is ﬁt to the U5MR observations, using lag-distributed income per capita (LDI), mean years5f education for women of reproductive age (15-49 years), and HIV death rate in ages 0-4, as covariates: y i = exp h ( β + γ ,c [ i ] ) log( x LDIc [ i ] ,t [ i ] ) + ( β + γ ,c [ i ] ) x educc [ i ] ,t [ i ] + γ c [ i ] + γ c [ i ] ,s [ i ] + α type [ i ] i +( β + γ ,c [ i ] ) x HIVc [ i ] ,t [ i ] + (cid:15) c [ i ] ,t [ i ] ,s [ i ] , where c [ i ], t [ i ], and type [ i ] are the country, year, and source type of observation i , respectively, and α type [ i ] refers to a random eﬀect to capture data-type-speciﬁc errors. Bias adjusted data points y adjustedi werederived from the ﬁtted coeﬃcients and mixed eﬀects according to the following formula: y adjustedi = exp h ( β + γ ,c [ i ] ) log( x LDIc [ i ] ,t [ i ] + ( β + γ ,c [ i ] ) x educc [ i ] ,t [ i ] + γ c [ i ] + γ c [ i ] ,ref + α ref,c [ i ] i + ( β + γ ,c [ i ]) x HIVc [ i ] ,t [ i ] + (cid:15) c [ i ] ,t [ i ] ,s [ i ] , where γ ref and α ref are the random eﬀects from one or more reference data sources that are deemed tobe more reliable than other sources by the study team. In the second modeling stage, predicted values arecalculated by the formula y predictedi = exp (cid:16) β log( x LDIc [ i ] ,t [ i ] ) + β x educc [ i ] ,t [ i ] + α (cid:17) + β x HIVc [ i ] ,t [ i ] , which is the same as the formula for the ﬁrst stage, but without any random eﬀects. The diﬀerence between y predicted and y adjusted was smoothed using a tricubic window to yield estimates in every country and year.Then a weighted combination of the smoothed diﬀerences and a local linear ﬁt was used to yield estimatedresiduals (see Dicker et al. (2018) page 16-17 in Supplementary material 1; note that no mathematicaldescription is provided).In the third modeling stage, a Gaussian process is used to provide an additional level of temporal smooth-ing. The observed data are assumed to be normally distributed around the true value of the indicator onthe log10-transformed scale: log y i | η c [ i ] ,t [ i ] ∼ N (log η c [ i ] ,t [ i ] , s i ) . The observation variance s i is ﬁxed depending on the data source. The true U5MR values are smoothed viaa Gaussian Process, again on the log10-transformed scale:log η c | µ c , Σ c ∼ M V N ( µ c , Σ c ) , where the mean of the process µ c are the log10-transformed y predicted plus the smoothed residuals from thesecond modeling stage, and Σ c is a covariance matrix derived from a Mat´ern autocovariance function withﬁxed hyperparameters chosen based on a measure of data availability within each country. The model used by the UN-IGME is referred to the ‘B3’ model, which refers to the fact that the model isa Bayesian, bias-adjusted B-Splines model (Alkema and New (2014)). This model set-up also assumes that The equation supplied by the authors does not include a random intercept γ ,c [ i ] for the HIV covariate; we did not receiveclariﬁcation from the corresponding author, so we took this omission to be an error and included the random eﬀect. y i = log( η c [ i ] ,t [ i ] ) + δ i , where error δ i is normally distributed with mean and variance depending on the data source.For complete vital registration systems, the error term is normally distributed with zero mean and aﬁxed stochastic variance v i : δ i ∼ N (0 , v i ) . For survey observations or data from other sources, the sampling variance is decomposed into a ﬁxed samplingerror and estimated non-sampling error. Systematic biases are also estimated, either as a linear trend fordata sources with multiple observations or as an oﬀset otherwise: δ i | Ψ i , Σ i ∼ N (Ψ i , Ω i + v i ) , where Ψ i is a systematic bias parameter and Ω i is an estimated non-sampling error.The true U5MR values η c,t are smoothed via a cubic B-spline model:log( η c,t ) = K c X k =1 b c,k ( t ) α c,k , where K c are the number of knots in country c , b c,k is the k th spline value in country c , and α c,k is the k th spline coeﬃcient. The spline coeﬃcients during the observation period are parameterized as following alinear trend with random walk deviations away from that trend: α c,k = λ c, + λ c, ( k − K c /

2) + [ D K c ( D K c D K c ) − ε c ] k where λ c, and λ c, are the unknown level and slope parameters for the spline coeﬃcients in country c andparameter vector ε c = ( ε c, , . . . , ε c,Q c ) contains the Q c = K c − ε c,q = ( α c,q +2 − α c,q +1 ) − ( α c,q +1 − α c,q ) for q = 1 , . . . , Q c ; [ D K c ( D K c D K c ) − ε c ] k refers tothe k -th element of vector D K c ( D K c D K c ) − ε c , with known diﬀerence matrix D K c (deﬁned by D K c ,i,i = D K c ,i,i +2 = 1, D K c ,i,i +1 = − D K c ,i,j = 0 otherwise). Vague priors were used for the λ c, ’s and the λ c, ’s. Second-order diﬀerences are penalized by imposing ε c,q | σ c ∼ N (0 , σ c ) , for q = 1 , . . . , Q c , where variance σ c determines the extent of smoothing; a smaller variance corresponds to smoother trajecto-ries. A multilevel model is placed on the smoothing parameters, i.e. the standard deviation of ε c,q , to shareinformation between countries: log( σ c ) | χ, ϕ σ ∼ N ( χ, ϕ σ ) , where χ and ϕ σ refer to the mean and variance of the log-transformed standard deviations.UN IGME uses B3 to produce estimates of U5MR up to the most recent year. A logarithmic poolingapproach was used to produce estimates past the most recent observation year to combine country-speciﬁc7osterior predictive distributions (PPDs) for changes in spline coeﬃcients with a global PPD. This procedurewas applied to modify the PPDs for α c,k for k = K c , K c + 1 , . . . , P c , where K c and P c refer to the indices ofthe most recent splines in the observation and projection periods respectively. The approach is summarized(leaving out indices referring to posterior samples) as follows: ε c,K c + a | Γ c,K c + a , Θ c,K c + a ∼ N (Γ c,K c + a , Θ c,K c + a ) , for a ≥ , where Γ c,k = W · G + (1 − W ) · ε c,k − , Θ c,k = W · V + (1 − W ) · Θ c,k − , with G and V equal to the median and variance of the estimates of past 2nd-order diﬀerences ˆ ε C, K c ’srespectively, and Θ c,K c − = σ c . The overall pooling weight 0 ≤ W ≤ Based on these overviews we see immediately that each of the two models take diverging approaches tomodeling U5MR. They make diﬀerent choices about use of covariates and how to impose smoothness on theﬁnal estimates. For example, while the GBD model uses covariates derived from other data sources, such asLDI and education levels, the B3 model relies solely on data-driven trends that are smoothed using penalizedB-splines. However, it is less clear how to systematically compare the two. The three modeling stages ofthe GBD model make it diﬃcult to compare directly to the integrated modeling approach of the B3 model.We would like to be able to compare the assumptions each models makes about how the true value of theindicator evolves, its relationship to the observed data, and how information is shared between countries.The goal of our proposed model class for TMMPs is to aid in making such direct comparisons.Throughout the paper, in addition to illustrating how the two U5MR models can be re-expressed withinthe TMMP framework, we will also call on other example models from the demographic and health estimationliterature, including models to estimate maternal mortality (Alkema et al., 2017), neonatal mortality (Alexan-der and Alkema, 2018), contraceptive use (Cahill et al., 2018) and age-speciﬁc mortality rates (Alexanderet al., 2017). Each of these examples highlights important aspects of the general TMMP formalization indiﬀerent settings.

In this section we present a general notation for describing a modeling framework for TMMPs. As a generalrule, we use Greek letters for unknown parameters and Latin letters for ﬁxed values. For example, σ for anunknown variance to be estimated, while s for a ﬁxed sampling error variance. Bold face is used to denotematrices.The framework is organized around a distinction between observed data and latent trends. The observeddata are noisy, possibly biased observations of the latent trend. Let η c,t be the latent true value of theindicator in country c at time t , which we emphasise is an unobserved parameter that needs to be estimated.Under this notation, modeling requires specifying a relationship between the observed y i and latent η c,t ,8hich we refer to as the data model (also often referred to in the literature as the likelihood), and describingthe temporal evolution of the latent trend, which we refer to as the process model (Figure 2). η c,t η c,t +1 η c,t +2 η c,t +3 y y ProcessModelData ModelFigure 2: The model class distinguishes between the true values of an indicator η c,t and the noisy observeddata y i . The process model describes the evolution of the true values, and the data model describes how theobserved data are generated from the true values. This structure handles missing data naturally: the latenttrend is modeled for all time points by the process model, and it is possible that only some time points haveobserved values. In this example, observed data only exist for times t and t + 2.Our proposed model class draws on the rich literature on models involving latent structures and mea-surement error. Measurement error models acknowledge uncertainty and bias in observed data Carroll et al.(2006), as we do in our model class. State space models also make a distinction between noisy observationsand an underlying latent process (Chatﬁeld and Xing, 2019). There is a vast literature on ﬂexible anddata-driven modeling techniques, especially for time-series, which we draw on in our framework (West andHarrison, 2006). Our model class tailors these broad existing approaches to the context of demographicand health indicators by providing an overarching structure. This provides a principled way of combiningthe ideas behind measurement error models, state space models, and smoothing models into interpretablemodels suited for the application of interest.The process model describes the evolution of the η s. The process model is separated into covariate,systematic, oﬀset, and smoothing components: g ( η c,t ) = g ( X c,t , β c ) + g ( t, η c,s = t , α c ) + a c,t + (cid:15) c,t , (1)where • g is a transformation of η c,t . • g ( X c,t , β c ) is the covariate component, a function of covariates X c,t and parameter vector β c . • g ( t, η c,s = t , α c ) refers to a systematic temporal component, a mathematical function which describesa parametric trend over time, that may depend on prior or future values η c,s = t . • a c,t is a ﬁxed oﬀset which may be used to incorporate information from models estimated in a separateprocess. • (cid:15) c,t is a deviation term to allow for data-driven deviations from the trends described by g , g , and theﬁxed oﬀset. The vector (cid:15) c is decomposed as (cid:15) c = Bδ c , where B is a full rank matrix that can be used9o transform (cid:15) to a lower-dimensional space, e.g. with a B-spline basis approach. The term satisﬁes E ( r (cid:15) c,t ) = 0 for some r ≥

0, and, if r > X k ∈K d,c d δ c,k = 0 , for 0 ≤ d ≤ ( r − , for set of indices K d,c .The process model controls the behavior of the model particularly in projections and data sparse timeperiods. The chosen form of the process model aﬀects not only the central estimate of projections and periodswith missing data, but also the estimated uncertainty around those point estimates. While the process modelin Eq. 1 is speciﬁed as a marginal model for one indicator η at time t , correlation structures over time, space,or other dimensions can be incorporated through the speciﬁcation of the process model components. We willconsider each component of the process model separately with attention to their contribution to the overallbehavior of a model.The data model connects the latent trends of the process model to the observed data. The data modelshould allow for diﬀerent response distributions, data transformation, and biases depending on the observa-tion. This ﬂexibility would allow us to model diﬀering sampling errors, non-sampling errors, and systematicbiases depending on the data source of each observation. For example, a logit-normal data can be expressedas: logit( y i ) | η c [ i ] ,t [ i ] , σ i ∼ N ( η c [ i ] ,t [ i ] , σ i ) , where σ i is the estimated variance of the observations. Extensions to this simple data model could includeexplicitly incorporating biases and decomposing the variance into sampling and non-sampling error. Weleave developing a notation for data models to future work, focusing instead in this article on the details ofthe process model. In some cases, particularly in data sparse settings, it might make sense to include covariates in the processmodel, such that the indicator of interest can be modeled as a function of related factors which may havemore data available. For example, there is often a strong association observed between health indicators andmeasures of the economy or wealth, such as a country’s gross domestic product (GDP). While information onspeciﬁc health indicators may be sparse, estimates of GDP are widely available. The addition of covariatesand the modeling of such associations can be thought of in the same way as a standard generalized linearmodel framework, with other aspects of the process model adding greater complexity and allowing for non-linear trends to be captured.Covariates can be included in the process model via the regression function g ( X c,t , β c ), where β c arecountry-level parameters. The choice of covariates to include may be informed by exploratory data analysesor substantive knowledge of the covariates that are associated or causally related to the outcome of interest.The covariate-based component is useful for providing expected trends of indicators in projections and data-sparse settings.Table 1 gives examples of regression functions used in existing models of global health indicators. Forexample, a model to estimate maternal mortality ratios (MMRs) discussed in Alkema et al. (2017) uses GrossDomestic Product (GDP), General Fertility Rate (GFR), and percentage of births with a skilled attendant10resent (SAB) as covariates for estimating the log-transformed proportion of non-AIDS maternal deaths.The neonatal model discussed in Alexander and Alkema (2018) uses U5MR as a covariate, where U5MRis used as a predictor for the ratio of NMR to (NMR - U5MR), that is, neonatal to other child mortality.The inclusion of U5MR as a covariate is based on the strong demographic relationship that in general, aschild mortality decreases, the share of deaths that are neonatal increases. The form of this relationship ispiecewise linear, based on the results of an exploratory data analysis. The GBD U5MR model of Dickeret al. (2018) uses a non-linear regression function incorporating lag-distributed income per capita (LDI),mean years of education for women age 15-49 (EDU), and the HIV death rate in ages 0-4 (HIV). Finally, theage-speciﬁc mortality model of Alexander et al. (2017) uses the principle components of mortality schedulesas covariates.Indicator η c,t g ( · ) g ( · ) covariatesMaternal mortalityratio (Alkema et al.,2017) proportion of non-AIDS deaths that arematernal log β c, + P k X c,t,k β k log(GDP), log(GFR),SABNeonatal mortalityrate (Alexander andAlkema, 2018) NMR/(U5MR-NMR) logit β ,c + β · (log( X c,t ) − log( β )) [ X c,t >β ] U5MRSub-nationalmortality (Alexanderet al., 2017) age-speciﬁc mortality log P k X c,t,k β c,t,k principal componentsof mortality scheduleU5MR (Dicker et al.,2018) crisis-free U5MR log exp[ β c, · log( X LDIc,t ) + β ,c · X EDUc,t + β ,c ] + β ,c x HIVc,t

LDI, EDU, HIVTable 1: Selected examples of covariate functions.

We may expect the trend over time of some demographic and health indicators to follow a certain path thatcan be expressed as a parametric function. For example, demographic transition theory suggests that asa country’s fertility rate declines, it will ﬁrst decline rapidly, and then decelerate, and eventually plateauat a certain level (Kirk, 1996). A similar assumption about the shape of the rate of change can be madefor contraceptive prevalence over time. Such assumptions can be incorporated by a mathematical function g ( t, η c,s = t , α c ) which encodes a parametric trend depending on a set of parameters α c . The form of thisfunction may be informed by prior knowledge of the mechanisms that drive the evolution of the indicator ofinterest.Table 2 gives examples of systematic components used in a selection of existing models. For example,the process model of Cahill et al. (2018) describes the total contraceptive use rate in a country over time.Based on a modeling assumption that a country’s total contraceptive use rate will increase slowly, speedup, then taper oﬀ as it moves through the contraceptive use transition, their model includes a systematiccomponent which captures an expected rate of change based on a logistic growth curve. Note that this typeof systematic component, which deﬁnes the level of an indicator by calculating its change from its previousvalue, results in increasing variance in projections. This is discussed further in Section 6.1.11ndicator g ( · )Total contraceptiveuse (Cahill et al.,2018) Logistic curve: g ( · ) = Ω c when t = t ∗ , and for t > t ∗ : g ( · ) = logit ( η c,t − ) + ι c,t = ( logit (cid:16) ˜ P c · logit − (cid:16) logit (cid:16) η c,t − ˜ P c (cid:17) + ω c (cid:17)(cid:17) , when η c,t − < ˜ P c logit ( η c,t − ) , otherwise,where α c = n ˜ P c , ω c , Ω c o .Inﬂation in the sexratio at birth(Chao et al., 2019) Trapezoid function:  ξ c ( t − γ ,c ) /λ ,c , γ ,c < t < γ c, ξ c , γ ,c < t < γ ,c ξ c − ξ c ( t − γ ,c ) /λ ,c , t < γ ,c < t < γ ,c , otherwisewhere α c = { ξ c , γ ,c , λ c, , λ ,c } , γ c, = γ c, + λ c, , γ ,c = γ c, + λ ,c , and γ ,c = γ ,c + λ ,c .Table 2: Selected examples of systematic components. Some models incorporate information from separate modeling stages into their process model. The TMMPmodel class accounts for this through the a c,t term in the process model. Alternatively, the oﬀset could beintegrated into the covariate component, akin to the use of oﬀsets in generalized linear models. Here wechose to separate it into a separate component to make it clearer when models incorporate ﬁxed externalinformation besides covariates. For example, the GBD U5MR model uses an oﬀset which is derived fromthe smoothed residuals from a regression model estimated separately and serves to adjust the output of thecovariate component. The ﬁnal part of the process model, the smoothing component, captures trends that are not explained by thesystematic or covariate components. This component allows for trends over time to be driven by patternsobserved in the data. To prevent overﬁtting, a model is placed on the deviations to enforce some degree ofsmoothness in the model ﬁt. We deﬁne a smoothing model as the distribution placed on the joint vector ofall deviations from a country, (cid:15) c = [ (cid:15) , · · · , (cid:15) T ]. The smoothing model is deﬁned as follows: (cid:15) c = B c δ c , (2)where B c is a T × K c , K c ≤ T full rank matrix and δ c = [ δ c, , δ c, , . . . , δ c,K c ] is a vector of parameters.We deﬁne the parameter vector δ c or its diﬀerenced version to be normally distributed: for r ≥ r δ c | Σ c ∼ N ( , Σ c ) , (3)where r is the diﬀerence operator with δ = [ δ − δ , . . . , δ k − δ K c − ] and r δ = r δ . When r >

0, toensure the model is generative we need to introduce extra model structure to anchor the overall level of the12eviations. Formally, we require X k ∈K d,c d δ c,k = 0 , for 0 ≤ d ≤ ( r − , (4)for set of indexes K d,c .The covariance matrix Σ c is speciﬁed via an autocovariance function s , which may depend on hyperpa-rameters. We restrict the set of autocovariance functions to those that can be expressed as a function onlyof the absolute distance between t and t :Σ c,t ,t = s ( t , t ) = s ∗ ( | t − t | ) . (5)We further require that the covariance kernel goes to zero as | t − t | goes to inﬁnity. Commonly usedautocovariance functions in this situation are the autocovariance associated with an autoregressive processof order 1 (AR(1) process), the squared exponential covariance function, and the Mat´ern covariance function,which have the following forms: s AR1 ( t , t ) = κ ρ | t − t | , (6) s SE ( t , t ) = κ exp (cid:18) − ( t − t ) ‘ (cid:19) , (7) s Mat´ern ( t , t ) = κ − ν Γ( ν ) (cid:18) √ ν | t − t | ‘ (cid:19) ν K ν (cid:18) √ ν | t − t | ‘ (cid:19) . (8)Smoothing models that include a transformation (that is, B c = I ) include B-spline approaches (Eilersand Marx, 1996). The transformation B c can be used to lower the dimension of the smoothing model; forB-splines, for example, knots could be placed on a sparse grid in order to reduce the number of smoothingparameters to estimate. In this setup the smoothing parameters δ c are the coeﬃcients of the spline basisfunctions. Since the smoothing model is the linear combination of diﬀerentiable functions, the smoothingmodel itself will be diﬀerentiable, which may not be the case for other choices of the smoothing model.Diﬀerentiability may be a desirable property because it implies a degree of smoothness on the smoothingmodel.This deﬁnition of smoothing models as summarized in Equations 2, 3, 5, and 4 encapsulates main methodsof smoothing, including ARIMA processes, random walks, Gaussian processes, and B-splines, as illustratedin Table 3.As mentioned earlier, for non-stationary models with r >

0, we require the constraint X t ∈K d,c d δ c,k = 0 , for 0 ≤ d ≤ ( r − , for set of indexes K d,c . This requirement makes the model identiﬁable by anchoring the sum of the diﬀerencedprocess at zero over the set K d,c . If the smoothing model does not perform any dimensionality reduction byits choice of B then K d,c is interpretable as the set of years for which the diﬀerenced process sums to zero.Common choices include K d,c ﬁxed at a speciﬁc reference year, i.e., as used for family planning estimation(Cahill et al., 2018) and maternal mortality (Alkema et al., 2017). K d,c might also be chosen to span theobservation period of the country of interest. For the models used in neonatal and child mortality estimation,the sum of the spline coeﬃcients are constrained to sum to zero, corresponding to K d,c = { , . . . , K c } where13 c are the number of splines in country c (Alexander and Alkema, 2018; Alkema and New, 2014).The behavior of the smoothing model may inﬂuence the trends and uncertainty of projections. Ap-pendix B provides further detail on stationary and non-stationary models, including how they behave inprojections.Indicator B k ( t , t ) r K d,c Maternal mortality ratio(Alkema et al., 2016) B = I ARMA(1, 1) 1 { } U5MR (Alkema and New,2014) B c,t,k = b c,k ( t )=cubic B-splines,knots 2.5 years indep. s ( t , t ) = σ t = t ) 2 K ,c = { k ∗ } , K ,c = { , · · · , K c } U5MR (Dicker et al.,2018) B = I Mat´ern, see Eq.8 0 · Table 3: Selected examples of smoothing processes.

It is commonly of interest to produce estimates beyond the last observed data points. Suppose the last datapoint in a country is observed at T , and we would like to project the value of the indicator to time T ∗ . Usingthe estimation process model for projections is taken as default in the TMMP framework. In particular,based on the estimation model, η c,t can be estimated for 1 < t < = T ∗ as part of the joint estimation processof the entire model. Alternatively, η c,t can be estimated up until T , the last available data point, and theestimation process model can be used afterwards to generate projections from T to T ∗ . We make explicithow projections come about for TMMP models at the end of this section.In some modeling approaches, the projection model used for η c,t>T may diﬀer from the estimation model.For example, in the B3 model, a pooling approach is used to control the variance in the projections. Forsuch models, the projection model used needs to speciﬁed separately, either as a new TMMP model, or interms of how the model diﬀers from the estimation model. Default projections obtained from TMMP models

For TMMP models, the smoothing componentup to year T ∗ is given by (cid:15) c = B c δ c , (9)where (cid:15) c is a T × B c is a T × K c matrix, and δ c is a K c × T ∗ , we deﬁne an extended vector of smoothing terms (cid:15) ∗ c with (cid:15) ∗ c = B ∗ c δ ∗ c , (10)where (cid:15) ∗ c is a T ∗ × B c is a T ∗ × K ∗ c matrix, and δ c is a K ∗ c × B ∗ . When the projection uses the estimation process model, we have that r δ ∗ c | Σ ∗ c ∼ N ( , Σ ∗ c ) . (11)14he distribution of the (diﬀerenced) smoothing terms in the projection period follow from the conditionaldistribution r δ ∗ c,t>T | δ c , Σ c , which has a closed form.For TMPPs with smoothers, the TMMP projection process model defaults to r th-order diﬀerenced es-timation process model. For example, if the smoothing model has r = 0, then the projection model isgiven directly by the estimation model with the extended smoothing term as per Eq. 10. If r = 1, then theprojections are based on the ﬁrst order diﬀerences of the process model. To illustrate, consider a processmodel that includes a smoothing component with r = 1. The process model is written η c,t = g ( X c,t , β c ) + g ( t, η c,s = t, α c ) + (cid:15) c,t , (12)where (cid:15) c = B c δ c . The ﬁrst order diﬀerence of the process model is given by η c,t − η c,t − =( g ( X c,t , β c ) − g ( X c,t − , β c ))+ ( g ( t, η c,s = t , α c ) − g ( t − , η c,s = t − , α c )+ ( (cid:15) c,t − (cid:15) c,t − ) . (13)The conditional distribution of the ﬁrst order diﬀerences ( (cid:15) c,t − (cid:15) c,t − ) in the projection period then followsfrom Equations 10 and 11. The TMMP framework requires estimating many country-level parameters. In the process model the sys-tematic trend depends on α c , the covariate function depends on β c , and the smoothing model may havecountry-speciﬁc hyperparameters. To complete the speciﬁcation of a model, we need to deﬁne how theseparameters will be estimated. In this section we describe several approaches, including ﬁxing parameters,use of informative priors, and hierarchical modeling, and present examples of how they have been used inexisting models. These approaches can be easily combined within a model: for example, informative priorsmight be set for some parameters while hierarchical distributions are used for others. Documentation of theassumptions made is important to make explicit the sharing of information, for example across populations,and the use of external information to inform the estimates. Parameters can be ﬁxed by the modeler to set values based on substantive knowledge, convenience, orresults from separate models. For example, Dicker et al. (2018) set the country level hyperparameters oftheir smoothing model to ﬁxed values depending on a measure of data availability in each country. Theyalso use ﬁxed parameters in the covariate term in the process model. The regression coeﬃcients are obtainedfrom ﬁtting a mixed eﬀects regression model to a global data set. The oﬀset terms a c,t are obtained fromsmoothing the residuals of that regression. Prior distributions can be applied to parameters to inform estimation. When substantive information isavailable for parameters then informative priors can be used. Vague priors can be used when there is littleor no prior knowledge to inform the parameter values. The models considered in this paper as examples use15ague priors for most model hyperparameters. We note that informative priors have been used more oftenfor data bias and measurement error parameters in data models, a discussion of which we leave to futurework.

Hierarchical modeling is a general approach that can be used for parameter estimation when it is desirableto share information between units. These parameters can be estimated in diﬀerent ways according tohow the modeler wants to share information between countries. At one extreme, the parameters could beestimated independently for each country. This encodes an assumption that the parameter in one country isunrelated to the parameter in another country. Another extreme is to have every country share the same setof parameters, which is appropriate for modeling global level patterns that are shared across all countries.In between these two options is an intermediary choice in which information is shared between countriesvia hierarchical modeling. By using hierarchical modeling, the parameter estimates in one country can beinformed by trends in similar countries. Sharing information can be especially useful in cases where countrieshave limited or no data available, but the modeler still wishes to generate estimates.To ease comparison between models we introduce a basic notation for hierarchical models which encodesdistributional assumptions and the levels and groupings of countries. For a country-level parameter γ c ,which may be deﬁned on a transformed scale and can belong to any of the process model components, ahierarchical model can be written as γ c | γ ( region ) r [ c ] , σ ( region )2 γ ∼ π ( γ ( region ) r [ c ] , σ ( region )2 γ ) , where π represents a probability distribution. The parameters γ ( region ) r [ c ] are the group-level means, where r [ c ] indexes the group that country c belongs to. The parameter σ ( region )2 γ is the variance of the countrylevel parameters around their group means. This setup can be extended recursively to allow for deeperhierarchies, where the group r [ c ] can be a member of a higher-level group itself, and so forth. Specifying ahierarchical model therefore requires deﬁning the distribution π , the number of levels in the hierarchy, andthe hierarchical groupings. A common setup is to take π to be the normal distribution and to have only onelevel of hierarchy, in which case r [ c ] = 1 for all c and γ ( region )1 represents a global mean: γ c | γ ( world ) , σ γ ∼ N ( γ ( world ) , σ γ ) . The use of hierarchical modeling in several models is compared in Table 4, giving their choices for distribution π and the number of levels and hierarchical groupings. We recommend that the writing of model assumptions for TMMPs, including those related to parameterestimation, in a standardized way is considered for a future version of GATHER, referring to Guidelines forAccurate and Transparent Health Estimates Reporting (Stevens et al., 2016). We provided a template fordoing so in Appendix A and illustrate its use for the case studies discussed in the remainder of this paper.Ideally it would be possible to express a complete model within the provided template. The overall goal,however, is to facilitate communicating important modeling assumptions so if the model is too complex to16ndicator Hierarchical Models π Levels GroupingsTotal contraceptive use(Cahill et al., 2018) asymptote ˜ P normal 1 countries within worldrate ω c normal 3 countries within sub-region,region, worldtiming Ω c , developingcountries normal 3 countries within sub-region,region, worldtiming Ω c , developedcountries normal 1 countries within worldMaternal mortality(Alkema et al., 2017) smoothing parameters λ c truncatednormal 1 countries within worldAge-speciﬁc mortality(Alexander et al., 2017) regression coeﬃcients β k,a normal 1 counties within stateTable 4: Hierarchical modeling in selected models.be presented clearly in the template than only the parts most relevant for its main assumptions should beincluded. The proposed model class can be used as a tool for interpreting existing models. In this section, we returnto GBD and B3 models of U5MR and show how they can be interpreted within the TMMPs model class.The model descriptions are summarized in the TMMP template in Table 5.

As we saw in Section 3, the GBD model is presented as using a three-stage modeling approach in which datago through two smoothing steps before being used in a spatiotemporal Gaussian Process regression model.

Process Model

To cast the GBD model within our model class, we will start with their third modelingstage, the spatiotemporal Gaussian Process regression model, which can be thought of as a process model.The GBD process model can be understood as having a covariance component, ﬁxed oﬀset, and a GaussianProcess smoothing model component. Deﬁne η c,t to be the true crisis-free U5MR in country c at time t .Then the process model is given by:log ( η c,t ) = g ( X c,t , β c ) + a c,t + (cid:15) c,t . The covariate component is a non-linear function of lag-distributed income per capita (LDI), mean years ofeducation for women of reproductive age (15-49 years), and HIV death rate in ages 0-4 as covariates: g ( X c,t , β c ) = exp (cid:2) β c, · log( X LDIc,t ) + β ,c · X EDUc,t + β ,c (cid:3) + β ,c x HIVc,t . (14)The oﬀset a c,t adjusts the values from the covariate component based on the results of a separate modelingstep. The oﬀset values were estimated separately by smoothing the residuals of a mixed-eﬀect model (the17econd modeling stage, in the original description).Finally, the smoothing model uses a Gaussian Process prior, with no transformation ( B = I , so (cid:15) c = δ c ) δ c | Σ c ∼ N ( , Σ c ) , where Σ c is given by the Mat´ern covariance function. Parameter Estimation

Fixed parameter values for the covariate and systematic components come fromthe ﬁrst two stages of the GBD modeling approach. The regression coeﬃcients in the covariate componentare ﬁxed to values estimated in a separate non-linear mixed eﬀects model (the ﬁrst modeling stage of theoriginal description of the model.) The oﬀsets a c,t are derived by smoothing the residuals from this model.These residuals are smoothed in time and between countries in the same region. As such, the oﬀsets areestimated using a sort of hierarchical structure, in that information is shared between countries in the sameregion. The ﬁnal oﬀsets are calculated as a weighted combination of the smoothed residuals and a locallinear ﬁt.The remaining hyperparameters for the data model and the smoothing component of the process modelare largely set to ﬁxed values. For example, the parameters of the Mat´ern covariance kernel were ﬁxeddepending on a measure of data availability in each country in order that areas of low data availability hadhigher smoothing than areas with more data. Process Model

Deﬁne η c,t as the true crisis-free U5MR in country c at time t . The process model includessystematic and smoothing components:log ( η c,t ) = g ( t, η c,s = t , α c ) + (cid:15) c,t . The systematic component comprises a country-speciﬁc slope and intercept: g ( t, η c,s = t , α c ) = α c, + α c, ( t − t ∗ c ) , where t ∗ c refers approximately to the midpoint of the observation period. The smoothing term is deﬁned as (cid:15) c = B c δ c , where B c contains K c cubic B-spline basis functions placed evenly over the observation period of each country(that is, B c,t,k = b c,k ( t ) where b c,k ( t ) is the k -th spline basis function for country c evaluated at time t ). Thespline coeﬃcients δ c,k for k = 1 , , . . . , K c follow a RW(2) process. As such, after two levels of diﬀerencing,the coeﬃcients are normally distributed with mean zero: γ c,k = δ c,k | σ δ,c ∼ N (0 , σ δ,c ) (15)18ith δ c,k ∗ = 0 , (16) K c X k =2 δ c,k = 0 , (17)with k c ∗ referring approximately to the spline centered around t ∗ c , such that K ,c = { k ∗ } and K ,c = { , . . . , K c } . These constraints imply that δ c can be recovered from the K c − γ c by δ c = (cid:2) D c ( D c D c ) − (cid:3) γ c , (18)where D is a K c × ( K c −

2) second order diﬀerencing matrix.

Projections

The estimates of the B3 model cover the period of observed data in each country. Projectionsafter the last observed data point in each country are generated by projecting the second order diﬀerencesof the process model formula. For this model, the second order diﬀerences are given by(log( η c,t +1 ) − log( η c,t )) − (log( η c,t ) − log( η c,t − )) = ( (cid:15) c,t +1 − (cid:15) c,t ) − ( (cid:15) c,t − (cid:15) c,t − ) , where the terms (cid:15) c,t are obtained by extending the smoothing model to incorporate the projection period: (cid:15) c,t = K c + P c X k =1 b c,k ( t ) δ c,k , where b c,k ( t ) is the k -th spline function in country c evaluated at time t and P c refers to the number of splinefunctions added. The coeﬃcients δ c,k are extrapolated as follows: δ c,k +1 = 2 δ c,k − δ c,k − + γ c,k ,γ c,k | Γ c,k , Θ c,k ∼ N (Γ c,k , Θ c,k ) , where Γ c,k = W · G + (1 − W ) · 4 δ c,k − , Θ c,k = W · V + (1 − W ) · Θ c,k − , with G and V equal to the median and variance of the estimates of past 2nd-order diﬀerences ˆ δ C, K c ’srespectively, and Θ c,K c − = σ τ,c . Parameter Estimation

The variances σ δ,c of the random walk deviations δ c,k are estimated hierarchicallyso that information on the degree of smoothing can be shared between countries. Conversely, the parameters α c, and α c, controlling the intercept and slope of the systematic component are estimated independentlywith vague priors. 19 .3 Comparing the U5MR models With both models expressed using the TMMPs notation it is easier to list the main assumptions made ina standardized way and compare assumptions across models. Table 5 summarizes the process model andparameter estimation strategies of each model. Comparing non-smoothing components, the GBD model usesseveral covariates (LDI, EDU, and HIV), and an oﬀset estimated in a separate regression, as compared tothe use of a linear systematic trend that is non-zero during the observation period in the B3 model. Thetwo models use diﬀerent smoothing approaches, with the GBD model using a Gaussian process and theB3 model using a second order random walk on B-spline coeﬃcients. Finally, the GBD model ﬁxes modelparameters in earlier steps in its estimation procedure, while B3 uses full Bayesian inference to estimateparameter uncertainty during the observation period. B3 uses hierarchical distributions to share informationbetween countries on the variability of the smoothing term only. For the GBD model, the oﬀsets estimatedin a separate step by smoothing residuals serve as a way of sharing information across countries.Using this comparison we can interpret how the models will behave in forward projections. The GBDmodel’s projections are informed by its covariate model and systematic oﬀsets. Given its usage of a stationarysmoothing model, deviations away from these components in recent years will converge back to zero, hencethe estimates of U5MR will convergence back to the covariate-plus-oﬀset-based estimates. The B3 modelshort-term projections, on the other hand, derive entirely from the smoothing component, extrapolatingrecently observed linear trends (in log space) into the future, while - through the pooling component - in B3longer term projections, extreme trends converge to a global distribution of past observed rates of change.

In this section we show how models for four diﬀerent health indicators can be described within our proposedmodel class.

The Family Planning Estimation Model (FPEM) is a country-level model of contraceptive use rates amongmarried women of reproductive age from 1990-2020 (Cahill et al., 2018), and is an updated version of anearlier model (Alkema et al., 2013). The full model breaks down the total use rate into traditional andmodern contraceptive method users, and models in addition the unmet need for contraceptives. For thepurposes of this paper we will only examine how it models the total contraceptive use rate, and refer readersto the paper for additional details.

Process Model

The FPEM process model describes the true trend of total contraceptive use over timein each country. The process model includes a a systematic and smoother component:logit( η c,t ) = g ( t, η c,s = t , α c ) + (cid:15) c,t . Adoption of contraception at the country level is expected to start slowly, speed up, then slow down beforereaching an asymptote. The model incorporates this expectation of an S-shaped trend through the systematiccomponent. The systematic component encodes the rate of change of a logistic curve. A reference year t ∗ isﬁxed, and the systematic trend is propagated forward and backward from the reference year. At t ∗ , we set20 BD B3 η c,t crisis-free U5MR crisis-free U5MR g ( · ) log logProcess model formula g ( η c,t ) = g ( X c,t , β c ) + a c,t + (cid:15) c,t g ( η c,t ) = g ( t, α c ) + (cid:15) c,t Covariate Component g ( · ) non-linear regression for-mula (Equation 14) · Covariates LDI, EDU, HIV · Systematic Component g ( · ) · α c, + α c, ( t − t ∗ c ), with t ∗ c ≈ middle of observationperiod α c · intercept α c, and slope α c, Oﬀsets a c,t oﬀsets obtained fromsmoothed residuals of amixed-eﬀects regressionmodel ﬁt · Smoothing Component (cid:15) c = B c δ c B B = I B c,k = cubic B-splines, knots every 2.5 years s ( t , t ) Mat´ern indep. s ( t , t ) = σ τ,c t = t ) r K d,c · K ,c = { k ∗ } , K ,c = { , · · · , K c } Projections (if not defaulting to estimation model)Projections · logarithmic pooling approach: for projections, δ c,k ∼ N (Γ c,k , Θ c,k ) , Γ c,k = W · G + (1 − W ) · 4 δ c,k − , Θ c,k = W · V + (1 − W ) · Θ c,k − . Parameter Estimation

Fixed Mat´ern covariance hyper-parameters; β c in covariatecomponent For projections, G and V equal to the medianand variance of the estimates of past 2nd-orderdiﬀerences ˆ δ c,k ’s respectively, W ﬁxed throughvalidation exerciseVague Priors · systematic parameters α c, , α c, Informative prior · ·

Hierarchical distribution · smoothing parameters σ τ,c Distribution π · normalNumber of levels in hierarchy · · countries within worldTable 5: Comparison of the U5MR process model and estimation strategies in the IHME GBD model andthe UN-IGME B3 model. 21 ( t ∗ , η c,s = t , α c ) = Ω c , where Ω c is the level in the reference year. When t > t ∗ , the systematic componentpropagates forward: g ( t, η c,s = t , α c ) | t > t ∗ = logit ( η c,t − ) + ι c,t , =  logit (cid:16) ˜ P c · logit − (cid:16) logit (cid:16) η c,t − ˜ P c (cid:17) + ω c (cid:17)(cid:17) , when η c,t − < ˜ P c logit ( η c,t − ) , otherwise,where ˜ P c is an asymptote, ω c a rate parameter, and α c = n ˜ P , ω c , Ω c o . A similar equation is applied when t < t ∗ to extend the systematic trend backwards from the reference year. This component encodes theassumption that contraceptive adoption in each country follows the same shape, but may diﬀer in its timing,rate, peak adoption. The eﬀect of the systematic component can be seen in the model’s projections andﬁts in data-sparse countries. In Cˆote d’Ivoire, the model projects an increase in contraceptive use becausethe country’s systematic trend, with hyperparameters informed by other countries in the region, puts thecountry on the verge of its contraceptive transition (Figure 3).Figure 3: Family Planning Estimation Model (FPEM) estimates for Cˆote d’Ivoire. The systematic componentof the model informs forward projections and estimates in countries with limited data available. CopyrightCC BY 4.0.Deviations from the expected rate of change given by the systematic component are captured via anAR(1) smoothing component. Using the notation of our proposed model class, this can be expressed asplacing a multivariate normal distribution on the vector of deviations: (cid:15) c | Σ c ∼ N ( , Σ c ) , where Σ c is given by the AR(1) autocovariance function. These deviations are added to the systematiccomponent, which for time t > t ∗ depends on η c,t − . As such, the smoothing terms (cid:15) c,t represent deviations inthe rate of change of modern contraceptive use within a country. Because these smoothing terms accumulateover time, variance increases with the projection horizon.22 arameter Estimation FPEM introduces many country-level parameters that need to be estimated.The systematic component for FPEM has parameters for the asymptote, rate, and timing of each country’slogistic curve that describes its adoption of contraceptives. Hierarchical distributions share informationabout each parameter between countries. The asymptote parameters have one level of hierarchy:˜ P c | ˜ P w , σ P ∼ N (cid:16) ˜ P w , σ P (cid:17) , where ˜ P w is the world mean asymptote, and σ P describes the variation around that mean. The rate pa-rameters have three levels of hierarchy, in that they are nested ﬁrst within sub-regions and regions. Thehierarchical model for the rate parameter ω c is given by: ω c | ω s [ c ] , σ ω s ∼ N ( ω s [ c ] , σ s ) ,ω s | ω r [ c ] , σ ω r ∼ N ( ω r [ c ] , σ r ) ,ω r | ω w , σ ω w ∼ N ( ω w , ω w ) , where s [ c ] indexes the sub-region and r [ c ] indexes the region of country c . The hierarchical structure for thetiming parameters Ω c depends on whether the country is classiﬁed as developing or developed. For developingcountries, three levels of hierarchy are used (sub-region, region, and world); for developed countries, onlyone level is used. The diﬀerence in hierarchical structure between ˜ P and Ω c , ω c arises from substantiveknowledge, in that rate is expected to diﬀer regionally, and the timing may vary for developing countries,while the asymptote is expected to be less variable. Alexander and Alkema (2018) model the neonatal mortality rate in 195 countries from at latest 1990 to2015, and was adopted by the United Nations Inter-agency Group for Child Mortality Estimation. Themodel setup incorporates the strong relationship between NMR and U5MR: as U5MR increases, the ratioof neonatal to other child mortality tends to decrease. As such, this model provides an example of howcovariates can be used in our proposed model class.

Process Model

Rather than model the NMR directly, the process model speciﬁes the ratio of the neonatalmortality rate to the rate of non-neonatal deaths. This was motivated by the strong relationship betweenthe ratio and U5MR, and also to constrain the NMR to be smaller than the U5MR.We deﬁne η c,t = N c,t /U c,t , where N c,t and U c,t are the neonatal and non-neonatal death rates (under age5) in country c at time t . The process model includes a covariate and smoothing component:log( η c,t ) = g ( X c,t , β c ) + (cid:15) c,t , where X c,t is the U5MR for country c at time t . The authors found through exploratory data analyses thatthe relationship between U5MR and the log ratio is constant up to a cutoﬀ U5MR value and then decreaseslinearly (Figure 4). As such, they chose a piecewise linear form for the function h : g ( X c,t , β c ) = β c, + β · (log( X c,t ) − log( β )) [ X c,t >β ] , where β c, and β are intercept and slope parameters, and β is the cutoﬀ U5MR value at which point the23elationship becomes constant.Figure 4: The relationship between log(U5MR) and the ratio of neonatal deaths to non-neonatal deathsobserved in the dataset used for the neonatal mortality rate model of Alexander et al. (2018). CopyrightCC BY 3.0 DE.The deviations η c,t capture trends in the data not explained by the covariate component and are modeledwith B-splines, similar to the B3 model. In TMMP class notation we can write (cid:15) c = B c δ c , where B c contains K c cubic B-spline basis functions, placed at evenly spaced knot locations. Suﬃcient knotsare placed in each country to cover the period for which data are available. The spline coeﬃcients δ c aremodeled with an RW(1) process, which after one level of diﬀerencing is multivariate normally distributedwith mean zero. That is, γ c,k = δ c,k | σ γ,c ∼ N (0 , σ γ,c ) , (19)with K c X k =1 δ c,k = 0 , (20)such that K ,c = { , . . . , K c } . Note that the sum-to-zero constraint implies that δ c can be recovered fromthe K c − γ c by δ c = h D c ( D c D c ) − i γ c , (21)where D is a K c × ( K c −

1) ﬁrst-order diﬀerencing matrix.Figure 5 shows model estimates separated into covariate and smoothing components, which illustrateshow projections are inﬂuenced by the relationship between U5MR and covariates. The projection follows24he expected trend based on covariates closely, with a country-speciﬁc oﬀset. The smoothing component hasa larger eﬀect on the model ﬁt where there are data available, showing how the model modiﬁes the expectedtrend from covariates to ﬁt the observed NMR data.Figure 5: Example estimates for neonatal mortality from Alexander and Alkema (2018). Global relation(blue) shows the estimates from the covariate component of the process model, global relation + intercept(green) the covariate estimates plus a country-speciﬁc intercept, and ﬁnal ﬁt (red) shows the ﬁnal estimatesthat include the smoothing model. The smoothing model has the most impact within the data support, asit modiﬁes the expected trend based on covariates to better ﬁt the observed data. Copyright: CC BY 3.0DE.

Projections

The main model produces estimates that cover the period of observed data in each country.As per the TMMP speciﬁcation, for the NMR model with the non-stationary RW(1) smoother with r = 1,projections for the period after the last observed data point in each country to 2015 are obtained by projectingﬁrst order diﬀerences that follow from the process model expression. Here the ﬁrst order diﬀerences are givenby η c,t +1 − η c,t = β · (cid:0) log( X c,t +1 ) [ X c,t +1 >β ] − log( X c,t ) [ X c,t >β ] (cid:1) + (cid:15) c,t +1 − (cid:15) c,t , (22)where the smoothing terms (cid:15) c,t are obtained from extending the spline smoothing model (cid:15) c,t = K c + P c X k =1 b c,k ( t ) δ c,k , (23)25here P c refers to the total number of spline functions added and the δ c,k are projected based on the RW(1)process: δ c,k = δ c,k − + γ c,k , (24) γ c,k | σ γ,c ∼ N (0 , σ γ,c ) . (25) Parameter Estimation

The smoothing model estimates some country-level parameters independentlyand others hierarchically, which encodes assumptions about how under-5 mortality rates are related acrosscountries. The intercepts β c, were estimated hierarchically, as were the variances of the smoothing modeldeviations. All other hyperparameters were assigned vague priors. The Bayesian Maternal Mortality Model, referred to as ‘Bmat’, is used to estimate the maternal mortalityratio (MMR, maternal deaths per 100,000 live births) in countries from 1985 to 2017 (Alkema et al., 2017;UN MMEIG, 2015; UN MMEIG, 2019). The model is an extension of a previous multilevel regressionmodeling approach (Wilmoth et al., 2012) used by the United Nations Maternal Mortality Estimation Inter-agency group (UN MMEIG). Bmat incorporates covariates similarly to the earlier model, while also allowingdata-driven deviations from the expected covariate trend. The Bmat model handles AIDS and non-AIDSmaternal deaths separately; for the purposes of this paper we will focus on the model of non-AIDS maternaldeaths.

Process Model

Let η c,t represent the proportion of non-AIDS deaths to women of reproductive age thatare of a maternal cause in country c at time t . Bmat models the expected trend in η by a combination ofcovariate and smoothing model components:log ( η c,t ) = g ( X c,t , β c ) + (cid:15) c,t , where X c,t is a set of covariates for country c at time t , and β c are associated regression coeﬃcients.The covariate model is based on the earlier UN MMEIG regression model, and is given by g ( X c,t , β c ) = β c, + β log( X GDPc,t ) + β log( X GF Rc,t ) + β X SABc,t . The deviations are smoothed with an ARIMA(1, 1, 1) process, which is a stationary autoregressivemoving average (ARMA) process after diﬀerencing once ( r = 1). In TMMP notation (cid:15) c = δ c ,δ c,t = 0 , for t = 1990 , δ c | Σ c ∼ N ( , Σ c ) , where the covariance matrix Σ c is speciﬁed via an autocovariance function s that captures the autocovari-ance of an ARMA(1,1) process, parametrized with country-speciﬁc stationary variance σ δ,c , autoregressiveparameter 0 ≤ ρ ≤ − ≤ θ ≤ Parameter Estimation

The intercepts β c, are estimated hierarchically with two levels (country withinregion within world). The other regression coeﬃcients are shared between all countries and are given avague prior. The country-speciﬁc variance parameter of the ARIMA(1, 1) smoothing model is estimatedhierarchically in order to share information between countries: σ δ,c = σ δ,w · (1 + λ c ) λ c | σ λ ∼ T N ( − , (0 , σ λ ) . The parameter σ δ,w represents a central value of the variances across all countries, and λ c is a country-speciﬁcmultiplier. A truncated normal prior is placed on λ c , limiting its values to fall between − Alexander et al. (2017) developed a model for estimating age-speciﬁc mortality rates at the subnational level,focusing on obtaining estimates by county in the United States. Unlike the other examples mentioned in thispaper, this model set-up is more of a ’traditional’ demographic model as it deals explicitly with modeling agepatterns in mortality over time, rather than focusing on modeling temporal trends in an aggregate indicator27f mortality. The challenge with estimating mortality age schedules at the subnational level is that, whenpopulations are small, stochastic variation is high, and so the underlying mortality risk curve may be unclearor uncertain from the observed data.In essence, Alexander et al. (2017) build on the traditional demographic method of using model ageschedules of mortality, but place this within a Bayesian framework which allows for stochastic estimatesand forecasts and increased ﬂexibility in estimates. We include it as an example here to show that bothdemographic models and models for global health indicators can be expressed in the same generalized TMMPframework.

Process Model

For the process model, η a,c,t is deﬁned as the mortality rate for age a in county c at time t . The process model incorporates a covariate and smoothing component:log ( η a,c,t ) = g ( X a , β c,t ) + (cid:15) a,c,t , where X a = { X a, , X a, , X a, } are the ﬁrst three principal components of a set of standard mortality curves,and β c,t are the associated regression coeﬃcients which diﬀer by county and time point. The regression isequation is given by: g ( X a , β c,t ) = β c,t, · X a, + β c,t, · X a, + β c,t, · X a, . Whereas previous model examples with covariates have focused on relating trends in country-level indicators,the covariates here capture the main sources of variation in mortality over age, which allows for plausibleestimates over age-speciﬁc mortality to be imputed, and also ’smooths out’ noisy age-speciﬁc mortality curvesthat are commonly observed in small populations.The (cid:15) a,c,t term was included in the model to account for overdisperson commonly seen in observedmortality rates. Unlike previous examples, no temporal model is placed on the deviations (cid:15) a,c,t to smooththe deviations over time. Instead, the deviations are smoothed within age groups: (cid:15) a,c,t | σ a ∼ N (0 , σ a ) , where σ a is the variance for age group a . Parameter Estimation

The process model requires estimating regression coeﬃcients β c,t,p for everyprincipal component, county, and time point. To share information across counties, a hierarchical prior isplaced on the regression coeﬃcients: β p,c,t | µ p,t , σ p,t ∼ N (cid:0) µ p,t , σ p,t (cid:1) . Further structure is placed on the hyperparameters µ p,t in order to smooth over time, with the assumptionbeing that the underlying expected trend in variation in each dimension exhibits some smooth trend overtime. µ p,t | µ p,t − , µ p,t − , σ µ ∼ N (2 µ p,t − − µ p,t − , σ µ ) . In this paper we have introduced a general model class, “Temporal Models for Multiple Populations”(TMMPs), which encompasses many existing demographic and health models. The key structural feature ofthe class is that it makes a distinction between the data and process model, separating the expected trendof an indicator from details of how the observed data were generated. Thus, the process model expressesthe modeler’s assumptions about the dynamics driving the latent value of the indicator, and the data modeldescribes assumptions about noise and error in the observed data. Once a model is shown to fall into thismodel class, it is easier to compare the assumptions made in diﬀerent models.We showed how six existing models of demographic and health indicators ﬁt into the model class. Theexample models were for a diverse set of outcomes: total contraceptive use, under-5 mortality, maternalmortality, neonatal mortality, and sub-national mortality rates. These models incorporated a variety ofmodeling approaches, including Gaussian process regression, ARIMA time series, and penalized spline re-gressions. The Global Burden of Disease Study U5MR model in particular might appear to be the mostdiﬀerent of the example models based on how it was originally presented as a three-stage modeling procedure(Dicker et al., 2018). However, we showed how this model and the other example models can be expressedunder our modeling class, which facilitates documentation of assumptions and across-model comparisons.We recommend that the writing of model assumptions for TMMPs in a standardized way is consideredfor a future version of GATHER, referring to Guidelines for Accurate and Transparent Health Estimates29eporting (Stevens et al., 2016). We provided a template for doing so in Appendix B.The models we included as examples in this paper are by no means an exhaustive list of the modelsthan can be described within our framework. Many spatial models, which directly model spatial correlationbetween units, can be expressed in our class. Spatial smoothing models following the work of Knorr-Held(2000) in particular are an example of a modeling approach that falls into our model class. Additionalresearch is needed to investigate how other modeling approaches relate to TMMPs. The structure of theTMMP model class is ﬂexible and extendable, making it possible to propose new components as necessaryto capture models currently outside of the class.So far we have focused on describing the model class and showing how existing models fall into it, ratherthan evaluating the assumptions made by each model. In future work we plan to compare in more detailthe eﬀects of the particular choice for each component on the resulting estimates. A better understanding ofthe eﬀects of various smoothing models, for example, is helpful both for analyzing the behavior of existingmodels and for developing new models.The number of models developed for providing demographic and health indicator estimates has perhapsgrown faster than tools to interpret their results and how they relate to one another. The model classproposed in this paper is one tool that can be used to understand model assumptions. We hope it willfacilitate interpreting, comparing, and contrasting existing models that fall within the model class as well asthe development of new modeling approaches.

References

M. Alexander and L. Alkema. Global estimation of neonatal mortality using a Bayesian hierarchical splinesregression model.

Demographic Research , 38:335–372, Jan. 2018. ISSN 1435-9871. doi: 10.4054/DemRes.2018.38.15. URL .M. Alexander, E. Zagheni, and M. Barbieri. A Flexible Bayesian Model for Estimating Subnational Mortality.

Demography , 54(6):2025–2041, Dec. 2017. ISSN 0070-3370. doi: 10.1007/s13524-017-0618-7. URL .L. Alkema and J. R. New. Global estimation of child mortality using a Bayesian B-spline Bias-reductionmodel.

The Annals of Applied Statistics , 8(4):2122–2149, Dec. 2014. ISSN 1932-6157. doi: 10.1214/14-AOAS768. URL http://projecteuclid.org/euclid.aoas/1419001737 .L. Alkema and D. You. Child mortality estimation: a comparison of un igme and ihme estimates of levelsand trends in under-ﬁve mortality rates and deaths.

PLoS Med , 9(8):e1001288, 2012.L. Alkema, V. Kantorova, C. Menozzi, and A. Biddlecom. National, regional, and global rates andtrends in contraceptive prevalence and unmet need for family planning between 1990 and 2015: a sys-tematic and comprehensive analysis.

The Lancet , 381(9878):1642–1652, May 2013. ISSN 0140-6736.doi: 10.1016/S0140-6736(12)62204-1. URL .L. Alkema, D. Chou, D. Hogan, S. Zhang, A.-B. Moller, A. Gemmill, D. M. Fat, T. Boerma, M. Temmer-man, C. Mathers, and L. Say. Global, regional, and national levels and trends in maternal mortalitybetween 1990 and 2015, with scenario-based projections to 2030: a systematic analysis by the UN Ma-ternal Mortality Estimation Inter-Agency Group.

The Lancet , 387(10017):462–474, Jan. 2016. ISSN301406736. doi: 10.1016/S0140-6736(15)00838-7. URL https://linkinghub.elsevier.com/retrieve/pii/S0140673615008387 .L. Alkema, S. Zhang, D. Chou, A. Gemmill, A.-B. Moller, D. M. Fat, L. Say, C. Mathers, and D. Hogan.A Bayesian approach to the global estimation of maternal mortality.

The Annals of Applied Statistics , 11(3):1245–1274, Sept. 2017. ISSN 1932-6157. doi: 10.1214/16-AOAS1014. URL https://projecteuclid.org/euclid.aoas/1507168829 .R. Burstein, N. J. Henry, M. L. Collison, L. B. Marczak, A. Sligar, S. Watson, N. Marquez, M. Abbasalizad-Farhangi, M. Abbasi, F. Abd-Allah, et al. Mapping 123 million neonatal, infant and child deaths between2000 and 2017.

Nature , 574(7778):353–358, 2019.N. Cahill, E. Sonneveldt, J. Stover, M. Weinberger, J. Williamson, C. Wei, W. Brown, and L. Alkema.Modern contraceptive use, unmet need, and demand satisﬁed among women of reproductive age who aremarried or in a union in the focus countries of the Family Planning 2020 initiative: a systematic analysisusing the Family Planning Estimation Tool.

The Lancet , 391(10123):870–882, Mar. 2018. ISSN 0140-6736. doi: 10.1016/S0140-6736(17)33104-5. URL .R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu.

Measurement error in nonlinear models:a modern perspective . CRC press, 2006.F. Chao, P. Gerland, A. R. Cook, and L. Alkema. Systematic assessment of the sex ratio at birth for allcountries and estimation of national imbalances and regional reference levels.

Proceedings of the NationalAcademy of Sciences , 116(19):9303–9311, May 2019. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1812593116. URL .C. Chatﬁeld and H. Xing.

The analysis of time series: an introduction with R . CRC press, 2019.T. N. Croft, A. M. Marshall, C. K. Allen, F. Arnold, S. Assaf, S. Balian, et al. Guide to dhs statistics.

Rockville: ICF , 2018.D. Dicker, G. Nguyen, D. Abate, and other GBD 2017 Mortality Collaborators. Global, regional, and nationalage-sex-speciﬁc mortality and life expectancy, 1950–2017: a systematic analysis for the Global Burden ofDisease Study 2017.

The Lancet , 392(10159):1684–1735, Nov. 2018. ISSN 01406736. doi: 10.1016/S0140-6736(18)31891-9. URL https://linkinghub.elsevier.com/retrieve/pii/S0140673618318919 .P. H. Eilers and B. D. Marx. Flexible smoothing with b-splines and penalties.

Statistical science , pages89–102, 1996.N. J. Kassebaum, R. M. Barber, Z. A. Bhutta, L. Dandona, P. W. Gething, S. I. Hay, Y. Kinfu, H. J.Larson, X. Liang, S. S. Lim, et al. Global, regional, and national levels of maternal mortality, 1990–2015:a systematic analysis for the global burden of disease study 2015.

The Lancet , 388(10053):1775–1812,2016.S. Khan and A. Hancioglu. Multiple indicator cluster surveys: delivering robust data on children and womenacross the globe.

Studies in Family Planning , 50(3):279–286, 2019.D. Kirk. Demographic transition theory.

Population Studies , 50(3):361–387, 1996.31. Knorr-Held. Bayesian modelling of inseparable space-time variation in disease risk.

Statistics in medicine ,19(17-18):2555–2567, 2000.R. D. Lee and L. R. Carter. Modeling and forecasting us mortality.

Journal of the American statisticalassociation , 87(419):659–671, 1992.R. D. Lee and S. Tuljapurkar. Stochastic population forecasts for the united states: Beyond high, medium,and low.

Journal of the American Statistical Association , 89(428):1175–1189, 1994.Z. Li, Y. Hsiao, J. Godwin, B. D. Martin, J. Wakeﬁeld, S. J. Clark, and with support from the UnitedNations Inter-agency Group for Child Mortality Estimation and its technical advisory group. Changesin the spatial distribution of the under-ﬁve mortality rate: Small-area analysis of 122 DHS surveys in262 subregions of 35 countries in Africa.

PLOS ONE , 14(1):e0210645, Jan. 2019. ISSN 1932-6203. doi:10.1371/journal.pone.0210645. URL https://dx.plos.org/10.1371/journal.pone.0210645 .G. A. Stevens, L. Alkema, R. E. Black, J. T. Boerma, G. S. Collins, M. Ezzati, J. T. Grove, D. R. Hogan,M. C. Hogan, R. Horton, et al. Guidelines for accurate and transparent health estimates reporting: thegather statement.

PLoS medicine , 13(6):e1002056, 2016.United Nations Inter-agency Group for Child Mortality Estimation (UN IGME). Levels & trends in childmortality: report 2020, estimates developed by the United Nations Inter-agency Group for Child MortalityEstimation, 2020.United Nations Maternal Mortality Estimation Inter-Agency Group (MMEIG). Trends in maternal mortal-ity: 1990 to 2015: estimates by WHO, UNICEF, UNFPA, World Bank Group and the United NationsPopulation Division, 2015.United Nations Maternal Mortality Estimation Inter-Agency Group (MMEIG). Trends in maternal mor-tality 2000 to 2017: estimates by WHO, UNICEF, UNFPA, World Bank Group and the United NationsPopulation Division, 2019.H. Wang, A. A. Abajobir, K. H. Abate, C. Abbafati, K. M. Abbas, F. Abd-Allah, S. F. Abera, H. N.Abraha, L. J. Abu-Raddad, N. M. Abu-Rmeileh, et al. Global, regional, and national under-5 mortality,adult mortality, age-speciﬁc mortality, and life expectancy, 1970–2016: a systematic analysis for the globalburden of disease study 2016.

The Lancet , 390(10100):1084–1150, 2017.M. West and J. Harrison.

Bayesian forecasting and dynamic models . Springer Science & Business Media,2006.J. R. Wilmoth, N. Mizoguchi, M. Z. Oestergaard, L. Say, C. D. Mathers, S. Zureick-Brown, M. Inoue, andD. Chou. A new method for deriving global estimates of maternal mortality.

Statistics, Politics and Policy ,3(2), 2012.D. You, L. Hug, S. Ejdemyr, P. Idele, D. Hogan, C. Mathers, P. Gerland, J. R. New, and L. Alkema. Global,regional, and national levels and trends in under-5 mortality between 1990 and 2015, with scenario-basedprojections to 2030: a systematic analysis by the UN Inter-agency Group for Child Mortality Estimation.

The Lancet , 386(10010):2275–2286, Dec. 2015. ISSN 01406736. doi: 10.1016/S0140-6736(15)00120-8. URL https://linkinghub.elsevier.com/retrieve/pii/S0140673615001208 .32 ppendix A

Properties of stationary smoothing models, with r = 0 When there is no diﬀerencing in the smoothing model ( r = 0) then by construction the distribution of thesmoothing terms is stationary: the unconditional ﬁrst and second moments of (cid:15) c will not depend on time. Aswe will see shortly, this implies that in the absence of data the smoothing term will be centered around zero.This is important for understanding the behavior of the model in projections: the smoothing model willcontribute to uncertainty in the projections, but since it will eventually revert to mean zero the projectionswill be centered around the other process model components. As we will see in the next section, this is nottrue when r > t ∗ , and let (cid:15) t ∗ = [ (cid:15) , · · · , (cid:15) t ∗ ]. We set B = I to simplifythe analysis, which means (cid:15) = δ (the two are interchangeable, and we drop the subscript c for simplicity).Similarly deﬁne δ t ∗ = [ δ , · · · , δ t ∗ ]. We placed the restriction that δ be multivariate normally distributed,which means there is a closed-form solution for the distribution of δ conditional on the observed data. Theconditional mean of δ t given δ t ∗ for t > t ∗ is given byE[ δ t | δ ∗ t ] = L > Σ − δ t ∗ , where L is the Gram matrix derived from the covariances between δ t and every element of δ t ∗ ( L i = s ( t, i )for i = 1 , · · · , t ∗ ), and Σ is the Gram matrix derived from the covariances between every pair in δ t ∗ . Werestricted the covariance function of the smoothing model to depend only diﬀerences in time, and that it goesto zero as the diﬀerences in time grows. This ensures that the conditional mean is guaranteed to convergeto zero as t → ∞ . The manner in which the conditional means converge is determined by the speciﬁccovariance function. For example, the sparsity of Σ − for an AR(1) kernel makes it straightforward to derivethe conditional distribution: E[ δ t | δ t ∗ ] = δ t ∗ · ρ ( t − t ∗ ) . In projections, an AR(1) process depends only on the last observed value ( δ t ∗ ), converging back to zeroas ρ ( t − t ∗ ) converges to zero. This is a consequence of the sparsity of the precision matrix for the AR(1)covariance. Covariance functions that yield non-sparse precision matrices do not lead to conditional meanswith simple forms like the AR(1) process. This leads to more complex behavior: for example, the squaredexponential kernel can project trends in the previously observed data before returning to zero. Figure 1compares the conditional behavior of several covariance functions. Non-stationary smoothing models, with r > When the degree of diﬀerencing is greater than zero ( r >

0) then the resulting smoothing models are notstationary (their unconditional ﬁrst and second moments will depend on time). However, diﬀerencing canyield a stationary process. For example, a RW(1) process, with δ t | δ t − , σ ∼ N ( δ t − , σ ) , after one level of diﬀerencing becomes δ t | σ ∼ N (0 , σ ) . Therefore within our framework the RW(1) model can be expressed as r = 1 (one level of diﬀerencing toyield a stationary process) and Σ c = σ I (the covariance matrix of the diﬀerenced stationary process isdiagonal). Similarly, a RW(2) process after two levels of diﬀerencing is stationary, so we describe it as r = 2and Σ c = σ I . Autoregressive integrated moving average (ARIMA(p,d,q)) models are obtained by setting r = d and using the respective autocovariance function in Σ c . For example, the autoregressive integratedARI(1,1) model is given by r = 1 and the AR(1) covariance function.1 a r X i v : . [ s t a t . M E ] F e b nconstrained non-stationary smoothing processes can be written according to the TMMP process modelspeciﬁcation by a reparametrization of such processes into a zero-constrained process and a structural com-ponent. For example, an unconstrained RW(1) ( d = 1) for years K ,c can be reparametrized into one thatincludes a sum-to-zero constraint by introducing as parameters the mean of the process and the deviationsaway from the mean. This parametrization is most clearly expressed using a ﬁrst order diﬀerencing matrix D with D i,i = − D i,i +1 = 1, and is zero everywhere else. With D and K ,c = { , , . . . , t ∗ } , we can write δ t ∗ = α + [ D ( DD ) − ] γ t ∗ − , where α = 1 /t ∗ P t δ t and γ t = δ t − δ t − for t = 2 , ..., t ∗ . Hence, an unconstrained RW(1) process δ t can berewritten as the combination of a smoothing process γ with P t ∈K ,c d γ c,t = 0 and a systematic component g ( t, α c ) = α for t ∈ K d,c . Moving the intercept from the smoothing component to the systematic componentalso helps to clarify the behavior of the model: an overall level during the period associated with K ,c isestimated as a systematic trend, and deviations are left to the smoothing model. Similarly, an unconstrainedRW(2) process can be reparametrized into the mean and rate of change of the process during sets of years K ,c and K ,c .Non-stationary smoothing models do not necessarily revert to mean zero in the absence of data, unlikethe models we saw in the previous section. As such, non-stationary models can inﬂuence the trend ofprojections. The RW(1) model will extend forward the last observed data point, and the RW(2) model willextend forward a linear trend based on δ t | σ, δ t − ∼ N ( δ t − , σ ). The behavior of projections willtherefore be determined by the interplay between the smoothing and the covariate and systematic processmodel components that extend past K d,c . Appendix B

The following tables are templates for specifying models that fall within the TMMP framework. Fourexample speciﬁcations are provided for each of the additional examples described in the main text.

FPEM

Citation Cahill et al. (2018) η c,t total contraceptive use rate g ( · ) logitProcess model formula logit( η c,t ) = g ( · ) + (cid:15) c,t Covariate Component g ( · ) · Covariates · Systematic Component g ( · ) Logistic curve: g ( · ) = Ω c when t = t ∗ , and for t > t ∗ : g ( · ) = logit ( η c,t − ) + δ c,t = ( logit (cid:16) ˜ P c · logit − (cid:16) logit (cid:16) η c,t − ˜ P c (cid:17) + ω c (cid:17)(cid:17) , when η c,t − < ˜ P c logit ( η c,t − ) , otherwise,where α c = n ˜ P c , ω c , Ω c o . α c ˜ P , ω c , Ω c Oﬀsets c,t · Smoothing Component

B B = I s ( t , t ) AR(1); s ( t , t ) = ρ | t − t | σ (1 − ρ ) r K d,c · Parameter Estimation

Fixed · Vague Priors · Informative Priors · Hierarchical model systematic parameters ˜

P , ω c , Ω c Hierarchical distribution π normalNumber of levels in hierar-chy ˜ P : 1 ω c : 3Ω c , developing countries: 3Ω c , developed countries: 1Hierarchical groupings ˜ P : countries within world ω c : countries within sub-region, region worldΩ c , developing countries: countries within sub-region, region, worldΩ c , developed countries: countries within world Projections

Projections · Table 1: TMMP speciﬁcation for the Family Planning EstimationModel (Cahill et al., 2018).

NMR

Citation Alexander and Alkema (2018) η c,t NMR / (U5MR - NMR) g ( · ) logProcess model formula log( η c,t ) = g ( · ) + (cid:15) c,t Covariate Component g ( · ) β c, + β log( X c,t − log( β )) [ X c,t >β ] Covariates U5MR

Systematic Component g ( · ) · α c · ﬀsets a c,t · Smoothing Component B B c,t,k = b c,k ( t ) = cubic B-splines s ( t , t ) independent k ( t , t ) = σ t = t ) r K d,c { , . . . , K c } Parameter Estimation

Fixed · Vague Priors regression coeﬃcientsInformative Priors · Hierarchical model smoothing parametersHierarchical distribution π normalNumber of levels in hierar-chy 1Hierarchical groupings countries within world Projections

Projections · Table 2: TMMP speciﬁcation for the Neonatal Mortality Ratemodel (Alexander and Alkema, 2018).

Bmat

Citation Alkema et al. (2017) η c,t proportion of non-AIDS deaths that are maternal among women ofreproductive age g ( · ) logProcess model formula log( η c,t ) = g ( · ) + (cid:15) c,t Covariate Component g ( · ) β c, + P k X c,t,k β k Covariates log(GDP), log(GFR), SAB

Systematic Component g ( · ) · α c · Oﬀsets a c,t · moothing Component B B c,k = b c,k ( t ) = cubic B-splines B = I s ( t , t ) ARMA(1,1) r K d,c { } Parameter Estimation

Fixed · Vague Priors regression coeﬃcientsInformative Priors · Hierarchical model regression interceptssmoothing parametersHierarchical distribution π intercept: normalsmoothing: truncated normalNumber of levels in hierar-chy intercept: 2smoothing: 1Hierarchical groupings intercept: countries within region within worldsmoothing: countries within world Projections

Projections · Table 3: TMMP speciﬁcation for Bmat (Alkema et al., 2017).

Mortality

Citation Alexander et al. (2017) η c,t age-speciﬁc mortality g ( · ) logProcess model formula log( η c,t ) = g ( · ) + (cid:15) c,t Covariate Component g ( · ) P k X a,k β c,t,k Covariates X k,a is the k th principal component of the mortality schedule for agegroup a Systematic Component g ( · ) · α c · Oﬀsets a c,t · moothing Component B B = I s ( t , t ) independent r K d,c · Parameter Estimation

Fixed · Vague Priors · Informative Priors · Hierarchical model regression coeﬃcients, smoothing parametersHierarchical distribution π normalNumber of levels in hierar-chy 1Hierarchical groupings counties within state Projections

Projections · Table 4: TMMP speciﬁcation for the age-speciﬁc mortality model(Alexander et al., 2017).

References

M. Alexander and L. Alkema. Global estimation of neonatal mortality using a Bayesian hierarchical splinesregression model.

C) GP, Matern (D) GP, squared exponential(A) AR(1) (B) ARMA(1, 1)0 3 6 9 0 3 6 90 3 6 9 0 3 6 90.00.51.01.5−1.0−0.50.00.51.01.5−0.50.00.51.01.5−0.50.00.51.01.5 x y conditional mean observation conditional 95% interval Figure 1: Comparison of the conditional distributions of four smoothing models based on two observed datapoints. The parameters for the smoothers are: (A) AR(1), ρ = 0 . , κ = 0 .

05. (B) ARMA(1, 1), ρ = 0 . θ = 0 . κ = 0 .

05. (C) GP, Mat´ern, κ = 0 . / (1 − ρ ) , ρ = 0 . , ν = 3 / , ‘ = 3.(D) GP, squared exponential, κ = 0 . / (1 − ρ ) , ρ = 0 . , ‘ = 3.7etween 1990 and 2015, with scenario-based projections to 2030: a systematic analysis by the UN Ma-ternal Mortality Estimation Inter-Agency Group. The Lancet , 387(10017):462–474, Jan. 2016. ISSN01406736. doi: 10.1016/S0140-6736(15)00838-7. URL https://linkinghub.elsevier.com/retrieve/pii/S0140673615008387 .L. Alkema, S. Zhang, D. Chou, A. Gemmill, A.-B. Moller, D. M. Fat, L. Say, C. Mathers, and D. Hogan.A Bayesian approach to the global estimation of maternal mortality.

The Lancet , 391(10123):870–882, Mar. 2018. ISSN 0140-6736. doi: 10.1016/S0140-6736(17)33104-5. URL .R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu.

Proceedings of the NationalAcademy of Sciences , 116(19):9303–9311, May 2019. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1812593116. URL .C. Chatﬁeld and H. Xing.

The analysis of time series: an introduction with R . CRC press, 2019.T. N. Croft, A. M. Marshall, C. K. Allen, F. Arnold, S. Assaf, S. Balian, et al. Guide to dhs statistics.

The Lancet , 388(10053):1775–1812,2016.S. Khan and A. Hancioglu. Multiple indicator cluster surveys: delivering robust data on children and womenacross the globe.

Studies in Family Planning , 50(3):279–286, 2019.D. Kirk. Demographic transition theory.

Population Studies , 50(3):361–387, 1996.L. Knorr-Held. Bayesian modelling of inseparable space-time variation in disease risk.

Statistics in medicine ,19(17-18):2555–2567, 2000.R. D. Lee and L. R. Carter. Modeling and forecasting us mortality.

Journal of the American statisticalassociation , 87(419):659–671, 1992.R. D. Lee and S. Tuljapurkar. Stochastic population forecasts for the united states: Beyond high, medium,and low.

Journal of the American Statistical Association , 89(428):1175–1189, 1994.8. Li, Y. Hsiao, J. Godwin, B. D. Martin, J. Wakeﬁeld, S. J. Clark, and with support from the UnitedNations Inter-agency Group for Child Mortality Estimation and its technical advisory group. Changesin the spatial distribution of the under-ﬁve mortality rate: Small-area analysis of 122 DHS surveys in262 subregions of 35 countries in Africa.

The Lancet , 390(10100):1084–1150, 2017.M. West and J. Harrison.

The Lancet , 386(10010):2275–2286, Dec. 2015. ISSN 01406736. doi: 10.1016/S0140-6736(15)00120-8. URL https://linkinghub.elsevier.com/retrieve/pii/S0140673615001208https://linkinghub.elsevier.com/retrieve/pii/S0140673615001208