[PDF] Structural identifiability analysis of PDEs: A case study in continuous age-structured epidemic models

Abstract

Computational and mathematical models rely heavily on estimated parameter values for model development. Identifiability analysis determines how well the parameters of a model can be estimated from experimental data. Identifiability analysis is crucial for interpreting and determining confidence in model parameter values and to provide biologically relevant predictions. Structural identifiability analysis, in which one assumes data to be noiseless and arbitrarily fine-grained, has been extensively studied in the context of ordinary differential equation (ODE) models, but has not yet been widely explored for age-structured partial differential equation (PDE) models. These models present additional difficulties due to increased number of variables and partial derivatives as well as the presence of boundary conditions. In this work, we establish a pipeline for structural identifiability analysis of age-structured PDE models using a differential algebra framework and derive identifiability results for specific age-structured models. We use epidemic models to demonstrate this framework because of their wide-spread use in many different diseases and for the corresponding parallel work previously done for ODEs. In our application of the identifiability analysis pipeline, we focus on a Susceptible-Exposed-Infected model for which we compare identifiability results for a PDE and corresponding ODE system and explore effects of age-dependent parameters on identifiability. We also show how practical identifiability analysis can be applied in this example.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

Structural identiﬁability analysis of PDEs:A case study in continuous age-structured epidemicmodels

Marissa Renardy · Denise Kirschner · Marisa Eisenberg

Received: date / Accepted: date

Abstract

Computational and mathematical models rely heavily on estimated pa-rameter values for model development. Identiﬁability analysis determines how wellthe parameters of a model can be estimated from experimental data. Identiﬁabilityanalysis is crucial for interpreting and determining conﬁdence in model parametervalues and to provide biologically relevant predictions. Structural identiﬁabilityanalysis, in which one assumes data to be noiseless and arbitrarily ﬁne-grained,has been extensively studied in the context of ordinary diﬀerential equation (ODE)models, but has not yet been widely explored for age-structured partial diﬀeren-tial equation (PDE) models. These models present additional diﬃculties due toincreased number of variables and partial derivatives as well as the presence ofboundary conditions. In this work, we establish a pipeline for structural identiﬁ-ability analysis of age-structured PDE models using a diﬀerential algebra frame-work and derive identiﬁability results for speciﬁc age-structured models. We useepidemic models to demonstrate this framework because of their wide-spread usein many diﬀerent diseases and for the corresponding parallel work previously donefor ODEs. In our application of the identiﬁability analysis pipeline, we focus ona Susceptible-Exposed-Infected model for which we compare identiﬁability resultsfor a PDE and corresponding ODE system and explore eﬀects of age-dependentparameters on identiﬁability. We also show how practical identiﬁability analysiscan be applied in this example.

Keywords epidemiology · age structure · partial diﬀerential equations · tuberculosis · modeling M. RenardyUniversity of Michigan Medical School, Department of Microbiology and ImmunologyE-mail: [email protected]. KirschnerUniversity of Michigan Medical School, Department of Microbiology and ImmunologyM. EisenbergUniversity of Michigan, Department of EpidemiologyUniversity of Michigan, Department of MathematicsE-mail: [email protected] a r X i v : . [ q - b i o . Q M ] F e b Marissa Renardy et al.

Identiﬁability analysis addresses the question of whether the parameters of a math-ematical model can be uniquely identiﬁed, given observed data. There are two gen-eral types of identiﬁability analysis: structural (assuming data to be noiseless andarbitrarily ﬁne-grained) and practical (using real data). These analyses are crucialto interpreting biologically relevant predictions from computational and mathe-matical models that rely on parameter values that are estimated from experimen-tal datasets. Previous studies have shown that unidentiﬁable models can lead tovastly diﬀerent predictions for diﬀerent parameter values that are able to producethe same observed data (Kao and Eisenberg, 2018). In the context of biologicalmodels, parameter identiﬁability gives increased conﬁdence in model predictionsthat are based on estimated parameters. This is particularly relevant for predic-tions applied to topics such as diseases, treatment outcomes, vaccination eﬃcacy,and future infection dynamics. Speciﬁcally, the use of epidemiological models hasbecome increasingly necessary and useful to predicting disease outcomes, as wehave seen in the case of COVID-19, and these eﬀorts should be as accurate andconﬁdent as possible to produce the most useful predictions. Thus, we focus ourwork in the context of epidemic models due to their use in many diseases andto compare with corresponding parallel work previously done for ODEs regardingstructural identiﬁability.Many theoretical results and algorithms for structural identiﬁability of linearand non-linear ordinary diﬀerential equations (ODEs) models have been derived(Miao et al., 2011; Audoly et al., 2001; Ljung and Glad, 1994; Pohjanpalo, 1978;Chappell et al., 1990; Cobelli and DiStefano, 1980; Chis et al., 2011) and havebeen applied to study an array of diseases. There are models for the spread ofcholera (Eisenberg et al., 2013), vector-borne diseases (Kao and Eisenberg, 2018;Zhu et al., 2018; Tuncer et al., 2016), and general infectious diseases (Tuncerand Le, 2018; Evans et al., 2005). Methods for structural identiﬁability are oftenanalytical in nature (e.g. the commonly used diﬀerential algebra approach (Audolyet al., 2001; Ljung and Glad, 1994)), but may also be numerical, or some blend ofthe two (Jacquez and Greif, 1985; Jacquez and Perry, 1990; Raue et al., 2009).ODE models are not always suﬃcient for representing disease dynamics when,for example, age is an important factor. In many diseases, diﬀerent age groupsmay respond diﬀerently to diseases or disease interventions based on diﬀerences insusceptibility (Gardner, 1980), mortality (Castillo-Chavez et al., 1991), and con-tact rates (Mossong et al., 2008; Glasser et al., 2012). Age preferences have beenshown to have a signiﬁcant eﬀect on mixing patterns that drive disease spread(Valle et al., 2007). This type of information can be incorporated into a mathe-matical model for epidemics, for example, by explicitly representing age-structureddistributions of the susceptible and infected subpopulations. Age-structured mod-els have been used to evaluate and optimize age-targeted vaccination strategiesfor several diseases (Harris et al., 2019; Keeling and White, 2011; Hao et al., 2019;Castillo-Chavez and Feng, 1998; Shim et al., 2006), and it has been shown that population heterogeneity and non-random mixing can have a substantial eﬀect onthe prediction of disease outbreaks and herd immunity (Glasser et al., 2016; Fenget al., 2015).In a continuous setting, age structure is introduced by creating state variablesthat are functions of both age and time, and also by introducing transport terms tructural identiﬁability analysis of PDEs 3 to represent constant aging of a population. These model formulations result inPDEs, rather than ODEs, making them more biologically accurate; however, suchmodels also typically more complex to analyze. There are other PDE formulationsof epidemic models such as age of infection models (Thieme and Castillo-Chavez,1993; Inaba and Sekine, 2004) or spatially explicit models (Bertuzzo et al., 2009;Wu, 2008). The focus of this paper will be speciﬁcally on age-structured models.While there are alternative approaches to incorporating age structure, such asto use compartmentalization to stratify populations (Chow et al., 2011; Fergusonet al., 1996) or to use a stochastic discrete representation such as an agent-basedmodel (Ajelli et al., 2010; Bian, 2004), we focus here on the PDE formulation.Importantly, the issues of both structural and practical parameter identiﬁabilityfor these types of models still needs to be elucidated.While structural identiﬁability analysis has been extensively applied in the con-text of ODE epidemic models, it has not been widely explored for age-structuredPDE models. PDE models present additional diﬃculties due to increased numberof variables and derivatives as well as the presence of boundary conditions. Iden-tiﬁability of age-structured models was ﬁrst considered by Perasso et al. (2011)for a simple SI-type model and was again considered by Perasso and Razaﬁson(2016) for a population dynamics model including only birth and death. Theseanalytical results have not yet been extended to more comprehensive epidemicmodels, though some identiﬁability results have been shown numerically (Tunceret al., 2016). For these types of problems, modelers have relied on practical iden-tiﬁability analyses which are dependent on either available data or data that hasbeen synthesized for the purpose of analysis.In this paper, we derive analytical identiﬁability results for more complex age-structured epidemic models using a diﬀerential algebra framework that is simplerto apply than the previous (aforementioned) previous approaches. We apply thismethodology to a compartmental epidemic Susceptible-Exposed-Infected (SEI)model both with and without age structure explicitly modeled, and we comparethe identiﬁability results under diﬀerent conditions such as the presence of immi-gration in the system or for age-dependent model parameters. To corroborate ourstructural identiﬁability results, we also perform a practical identiﬁability anal-ysis for one version of the model. While we focus on age-structured models, themethods we present herein can be applied to other PDE formulations, with certainassumptions about smoothness and uniqueness of solutions. describe the underlying system of interest, and (2) a measurement model or set of output equations , which describe the observation or measurement of interest (i.e.,the quantities of interest for which one either has or plans to collect data). Inthis study, we assume the following general ﬁrst-order partial diﬀerential equation(PDE) model structure, which is commonly used in age-structured PDE models

Marissa Renardy et al. in biology and health: ∂ x ∂t + ∂ x ∂a = f ( x , t, a, p , u ) y = g ( x , t, a, p ) (1)where x is a vector of state variables describing the system of interest, p is thevector of parameters, t and a are two independent variables. Here we considertime and age as the independent variables, although the approach given here willalso work for more general variables, as well as for additional derivatives of furtherindependent variables on the left hand side. Additionally, y represents the vectorof n y model outputs (measured variables) and u represents any known inputs(e.g. forcing functions) to the system, if they exist. Finally, f and g are rationalfunctions of the model variables and parameters such that the model can be writtenequivalently as diﬀerential polynomials by clearing denominators.In structural identiﬁability analysis, it is assumed that the inputs and outputsare known (observed) perfectly – in this case, we assume that u , and y are knownfor all times and all ages and that there is no noise in the data. Similarly, theindependent variables t and a are assumed to be known. Thus, the resulting iden-tiﬁability results can be interpreted as the upper limit of what can possibly beidentiﬁed with perfect data collection methods. However, structural identiﬁabilityis a necessary condition for identiﬁability in the realistic data case, making it animportant ﬁrst step in the parameter estimation process.A model such as (1) can be thought of as a map from parameter space (e.g.for p ∈ R n p ) to output space (the space of trajectories for y ), often termed themodel map Φ : p (cid:55)→ y . With this framing, the structural identiﬁability questionbecomes that of whether or not the model map Φ is injective. In particular, wedeﬁne structural identiﬁability as follows: Deﬁnition 1

For a model of the form (1), an individual parameter p in p is uniquely (or globally) structurally identiﬁable if for almost every value p ∗ andalmost all initial conditions, the equation y ( x , t, a, p ∗ ) = y ( x , t, a, p ) implies p = p ∗ . A parameter p is said to be non-uniquely (or locally) structurally identiﬁable if for almost any p ∗ and almost all initial conditions, the equation y ( x , t, a, p ∗ ) = y ( x , t, a, p ) implies that p has a ﬁnite number of solutions. Deﬁnition 2

Similarly, a model of the form (1) is said to be uniquely (respec-tively, non-uniquely ) structurally identiﬁable for a given choice of output y if ev-ery parameter is uniquely (respectively, non-uniquely) structurally identiﬁable,i.e. for almost every value p ∗ and almost all initial conditions, the equation y ( x , t, a, p ∗ ) = y ( x , t, a, p ) has only one solution, p = p ∗ (respectively, ﬁnitelymany solutions). Equivalently, a model is uniquely structurally identiﬁable fora given output if and only if the map Φ is injective almost everywhere, i.e. ifthere exists a unique set of parameter values p ∗ which yields a given trajectory y ( x , t, a, p ∗ ) almost everywhere. We note that because some speciﬁc parameter values or initial conditions mayrender an otherwise identiﬁable model unidentiﬁable (e.g. if all parameters or ini-tial conditions are zero), we deﬁne structural identiﬁability generically, i.e., foralmost all parameter values and initial conditions (Audoly et al., 2001; Pia Sacco-mani et al., 2003). tructural identiﬁability analysis of PDEs 5 n y monic polynomial equationsin terms of the known variables y , u , and their derivatives, with rational coeﬃcientsin the parameters p . The input-output equations represent an implicit form of themodel map, and as a set of diﬀerential equations, they are input-output equivalentto the original system. This means that for the same inputs and parameter values,the equations generate the same output trajectories (Eisenberg, 2019). We thenconsider the map from the input parameter space to the coeﬃcients of the resultingmonic polynomial.A key insight of the diﬀerential algebra approach in the rational ODE caseis that injectivity of the model map can be evaluated by considering the the co-eﬃcient map, i.e. the map from parameters to coeﬃcients of the input-outputequations (Audoly et al., 2001; Ljung and Glad, 1994). In the rational ODE case,the coeﬃcient map typically provides complete identiﬁability information for themodel. If this map is injective, then the model is structurally identiﬁable. If themodel is unidentiﬁable, this map determines the identiﬁable combinations of pa-rameters. For example, while some model parameters may not be independentlyidentiﬁable, the product or sum of two or more model parameters may be identi-ﬁable. This may enable the model to be reparameterized in a way that allows themodel to become structurally identiﬁable (Chappell and Gunn, 1998; Cole et al.,2010). It should be noted that a set of identiﬁable combinations is not necessarilyunique – there may be many sets of parameter combinations that are equivalent.For example, the sets of identiﬁable combinations { ab, bc } and { ab, a/c } are equiv-alent since ( ab ) / ( bc ) = a/c .The diﬀerential algebra approach for ODE models typically uses characteristicsets to calculate the input-output equations (Audoly et al., 2001; Ljung and Glad,1994). Characteristic sets are a diﬀerential analog of the Gr¨obner basis (Ritt,1950) based on diﬀerential polynomial psuedodivision. For more details on thecharacteristic set approach, the reader is referred to (Audoly et al., 2001; Ljungand Glad, 1994; Pia Saccomani et al., 2003; Hong et al., 2020; Eisenberg, 2019).This methodology has previously been applied to ODE systems, where deriva-tives are taken with respect to only one variable. In this work, we generalize thismethodology to PDE systems of the form in (1); in particular, we consider age-structured systems where derivatives are taken with respect to both time andage. We show that the same framework can be applied to these systems, withthe additional consideration of boundary conditions. In age-structured systems, aboundary condition must be speciﬁed at a = 0, usually in the form of a Dirich- let boundary condition. Thus, in addition to performing identiﬁability analysis onthe model equations, one must also consider the equations evaluated at a = 0 toexplicitly include information at the boundary. This is analogous to methods fordetermining identiﬁability of initial conditions in ODE systems (Pia Saccomaniet al., 2003). Marissa Renardy et al. y , u and theirderivatives with respect to either independent variable and the parameters p ), weassume that our observed trajectory contains suﬃciently many algebraically inde-pendent points that we can successfully identify the coeﬃcients of the input-outputequations. In other words, from a given observed solution of y over age and time,we assume the observed solution contains an arbitrary number of distinct points atwhich the input-output equations can be evaluated to solve for the input-outputcoeﬃcients (in particular, more distinct points than the number of coeﬃcients). Inpractice most trajectories meet this criterion, although some care must be takento avoid systems or measure zero sets of initial conditions for which the solutionsare at steady state, or have key parameters equal to zero, etc. (Hong et al., 2020).With the coeﬃcient-identiﬁability assumption (Eisenberg, 2019), the rest of thediﬀerential algebra approach follows as with the ODE case, and is described below.Procedurally, the process of performing identiﬁability analysis for PDEs is al-most identical to the ODE case, though substitutions may be made more compli-cated by the presence of an additional derivative. To obtain input-output equationsfrom model equations, a typical ﬁrst step is to solve the output equations for statevariables and, using substitution, eliminate these state variables and their deriva-tives from the model equations. For simplicity, all nonzero terms in the modelequations are moved to the right-hand side. Remaining state variables can then beeliminated by solving one or more of the model equations. The particular substi-tutions required will of course vary depending on the model and output equations.The goal is to obtain a system of polynomial equations in terms of the outputsand their derivatives, with all state variables eliminated, whose coeﬃcients are interms of the model parameters. In the case that square roots arise, for example,they must be eliminated by multiplying by the conjugate.Once input-output polynomial equations are obtained, we divide each equation by the coeﬃcient of its leading term to obtain a monic polynomial (see below fordetails about the ranking of terms). From these monic polynomials, we obtain theset of polynomial coeﬃcients that deﬁnes the coeﬃcient map. This map, as in theODE case, can be used to determine complete identiﬁability information for themodel equations. tructural identiﬁability analysis of PDEs 7 Note that thus far, we have not considered any information about the boundaryconditions. Analysis of the boundary conditions is a key component of identiﬁa-bility analysis for PDEs that does not occur in the ODE case. In the case ofage-structured systems, there is a single boundary condition at a = 0. Thus, wetreat the boundary condition as similar to an initial condition for ODEs (Pia Sac-comani et al., 2003). If information is known about the boundary conditions, suchas values or functional forms, this information can be substituted into the modelequations evaluated at a = 0 to obtain boundary condition equations . Since thevalues of model outputs are known at a = 0, the same identiﬁability analysis asabove can be applied to boundary condition equations. This may reveal additionalidentiﬁability information about the model or any parameters that are present inthe boundary conditions.We note that, as in the initial condition case, if identiﬁability is evaluatedwithout considering boundary conditions, this implicitly assumes that the bound-ary conditions are generic and unknown. However, there are some aspects of theboundary conditions which are more subtle than the ODE case of initial condi-tions, and may warrant further exploration, such as that boundary conditions stillmay vary over time. Exploring how diﬀerent boundary condition types may aﬀectidentiﬁability would be an important direction for future work.Our procedure for performing structural identiﬁability analysis, which can beapplied to both ODE and age-structured PDE systems, is outlined in Figure 1. Rewrite model equations as input-output equationsWrite input-output equations as monic polynomial equationsExtract polynomial coefficientsConsider the map from parameter space to the polynomial coefficients. Is it injective?All parameters are identifiable with generic boundary conditions Find “identifiable combinations” for which the map is one-to-one.

NoYes *for PDE case onlyIf applicable, evaluate identifiability of boundary condition equations

Fig. 1

Flowchart describing the methodology for identiﬁability analysis of ODE and age-structured PDE models based on using a diﬀerential algebra approach. Marissa Renardy et al. F = X T X , where X is the sensi-tivity matrix deﬁned by X ( i,j ) = ∂y∂p i ( t j ) for parameters p , ..., p n and time points t , ..., t m . For an age structured system, we deﬁne F = X T X where the sensitivitymatrix is deﬁned by X ( i,j ) = ∂y∂p i (( a, t ) j ) where ( a, t ) j for j = 1 , ..., m are pairsof age and time points that form a grid. Here, y denotes the observable modeloutput. For multiple model outputs (e.g., y and y ), the matrix X can be deﬁnedby concatenation (i.e., X = [ X ; X ]). The rank of F indicates the number ofidentiﬁable parameters and parameter combinations. If F is singular, the model isstructurally unidentiﬁable; if F is close to singular, the model may be practicallyunidentiﬁable. The FIM can also be used to determine which parameter subsetsare identiﬁable or unidentiﬁable (Eisenberg and Hayashi, 2014).We compute proﬁle likelihoods by ﬁxing a particular parameter p i and ﬁttingall remaining parameters p j (cid:54) = i to maximize a likelihood function. We deﬁne thelikelihood by assuming constant, normally distributed measurement error withstandard deviation equal to 10% of the mean of the data, and mean equal to themodel trajectory. Maximizing the likelihood function is equivalent to minimizing acost function based on the negative log likelihood (in this case equivalent to leastsquares). This is performed at many values across the range of values for p i . Theproﬁle likelihood for p i is given by the maximum values of the likelihood functionacross the range of values for p i . If the proﬁle likelihood is ﬂat, p i is structurallyunidentiﬁable; if it is suﬃciently shallow, p i is practically unidentiﬁable. For moredetails on proﬁle likelihood approaches to identiﬁability, the reader is referred to(Raue et al., 2009; Venzon and Moolgavkar, 1988).In our analyses herein, we solved the age-structured PDEs numerically usingMethod of Lines, using a ﬁrst order ﬁnite diﬀerence approximation of the spatialderivative, and using Matlab’s ODE solver ode45 . In several case studies below, we consider a simple mathematical model of theepidemiology of tuberculosis (TB). TB is transmitted via the aerosolized bac-terium

Mycobacterium tuberculosis from person to person via lung inhalation. TBis unique from many infectious diseases in that most people develop a latent in-fection, meaning that they present no symptoms and cannot transmit the disease,and the infection can remain latent for an extended period of time. Thus, ourmodel consists of three mutually exclusive compartments: Susceptible (S), Ex-posed/latent (E), and Infectious (I). Through contact with infectious individuals, susceptible individuals (called susceptibles) may progress to the exposed class (i.e.latent infection) or the infectious class (i.e. primary infection). This is slightly dif-ferent from the standard SEI model, in that some individuals bypass the exposedclass and immediately become infectious, but is similar to what has been used tomodel tuberculosis epidemics previously (Ozcaglar et al., 2012; Guzzetta et al., tructural identiﬁability analysis of PDEs 9

S E I ! " $ % ! & '( (1 − ,). Parameter Definition (.,/! " , ! $ , ! & birth ratetransmission rateprobability of primary infectionreactivation ratedeath rates Fig. 2

SEI diagram and parameter deﬁnitions for a mathematical model of TB epidemiology. S represents the susceptible population, E represents the exposed/latent population, and I represents the actively infectious population. Arrows describe ﬂow between mutually exclusivecompartments. For simplicity, we ﬁrst demonstrate the identiﬁability analysis framework on amodel with density-dependent transmission, i.e. transmission is proportional to SI rather than S IN . In this case, the per-individual rate of contact increases asthe population density increases. This is not necessarily a realistic model for TB,since TB is often spread only among close contacts, but simpliﬁes the diﬀerentialalgebra for the purposes of demonstration.In order to perform parameter identiﬁability analysis, we must ﬁrst deﬁne whatthe outputs (i.e., the quantities for which we could obtain reliable, biologically rel- evant time series data) of the speciﬁc model are. For all of the following examples,we will assume that we can observe a fraction of the exposed individuals, i.e. y E = k E E where k E is the reporting/detection rate. This type of data could beobtained by, for example, contact tracing to identify individuals who have beenexposed to TB disease, or testing a proportion of the population for exposure. This is a useful practice even in areas with a low TB burden such as the US, sincedetection and treatment of latent infection is crucial for TB elimination (LoBueand Mermin, 2017). Since latent disease is asymptomatic, it is highly unlikely thatall exposed/latent individuals will be identiﬁed, and thus we assume instead thatonly a fraction k E of the exposed individuals are identiﬁed.In addition, we may be able to observe all or some of either all active infections(prevalence) or new active infections (incidence). The relevant output functiondepends on the reporting and surveillance data for the region of interest. Forexample, if all infectious cases are monitored and reported then these data wouldrepresent total prevalence. If not all cases are detected due to misdiagnosis, forexample, then the data would represent only a fraction of the total prevalence.If new infectious cases are reported upon diagnosis but then lost to follow-up,then incidence is an observable output while prevalence is not. Again, since it ispossible that not all infected individuals will be diagnosed, we may only be ableto observe a fraction of incidence. We present here results for observing prevalenceover time, but the identiﬁability analysis of model parameters for incidence wouldfollow similarly.We assume that y I = k I I , i.e. that we are able to obtain data for a fraction ofthe total number of infections at any given time. We assume no other informationabout the system parameters or initial conditions.4.1 ODE SEI model with density-dependent transmissionThough identiﬁability analysis for ODE systems is well established, we include thetechnical details here for completeness and for comparison when we move to thePDE system. The model can be written as a system of ODEs: dSdt = c − βSI − µ S SdEdt = (1 − (cid:15) ) βSI − δE − µ E EdIdt = (cid:15)βSI + δE − µ I I (2)We begin by rewriting the system of equations (2) in terms of the outputs y E and y I . For ease of notation, we adopt the notation y (cid:48) = dydt . Using the substitutions E = y E k E and I = y I k I , we obtain the following equations: S (cid:48) = c − βS y I k I − µ S Sy (cid:48) E k E = (1 − (cid:15) ) βS y I k I − δ y E k E − µ E y E k E y (cid:48) I k I = (cid:15)βS y I k I + δ y E k E − µ I y I . It now remains to eliminate S from these equations so that we obtain equationsin terms of solely the model parameters and outputs. This can be done by solvingfor S in the equation for dy I dt : S = k E ( y (cid:48) I + µ I y I ) − δk I y E βk E (cid:15)y I . tructural identiﬁability analysis of PDEs 11 This expression can then be substituted into the remaining two equations toobtain two input-output equations that are rational equations in terms of the out-puts and their derivatives. By multiplying by a common denominator and movingterms to one side, these input-output equations can be rewritten as the followingpolynomial equations.0 = k I δµ S y E y I + ( ck E k I β(cid:15) − k E k I µ I µ S ) y I + k I βδy E y I − k E βµ I y I + k I δy I y (cid:48) E − k I δy E y (cid:48) I − k I k E µ S y I y (cid:48) I − k E βy I y (cid:48) I + k E k I y (cid:48) I − k E k I y I y (cid:48)(cid:48) I − k I δ − k I (cid:15)µ E ) y E + ( k E µ I − k E (cid:15)µ I ) y I − k I (cid:15)y (cid:48) E + ( k E − k E (cid:15) ) y (cid:48) I Dividing the ﬁrst equation by k E k I and the second equation by k E − k E (cid:15) to formmonic polynomials, we get the following set of polynomial coeﬃcients. (cid:26) − , , βk I , − δk I k E , δk I k E , − βδk E , k I (cid:15)k E ( (cid:15) − , k I ( δ + µ E (cid:15) ) k E ( (cid:15) − ,βµ I k I , µ I , µ S , − δk I µ S k E , µ I µ S − βc(cid:15) (cid:27) We consider this set of coeﬃcients as a function of the model parameters, andwe seek to determine whether this map is injective. If it is not, then we considerwhether it is injective as a function of some parameter combinations. We ﬁnd that µ S and µ I are structurally identiﬁable. Through simpliﬁcation and substitution,we ﬁnd that other identiﬁable parameter combinations are (cid:26) βk I , βc(cid:15), βδk E , δ ( − (cid:15) ) (cid:15) , µ E + δ (cid:27) . If we further assume that µ S = µ E , then δ and (cid:15) also become identiﬁable while β , c , k E , and k I remain unidentiﬁable. However, the following combinations areidentiﬁable: { βk E , βk I , βc } . Consequently, if any one of β , c , k E , or k I is known,then the rest become identiﬁable.4.2 Age-structured PDE model with density-dependent transmission andconstant parametersNow that identiﬁability results are established for the ODE representation of theSEI model 2, we extend these results to a more complex PDE model that capturesthe same epidemiological setting (SEI), but includes age structure. We incorporate age structure into the model by adding a transport term to capture populationaging. To simplify the model, we assume that transmission occurs between individ-uals of the same age, so that the transmission term is proportional to S ( t, a ) I ( t, a )for any given age a and time t . While this is a signiﬁcant simpliﬁcation, datahas shown that people tend to interact primarily with others in their age group (Mossong et al., 2008). The model thus becomes ∂S∂t + ∂S∂a = − βSI − µ S S∂E∂t + ∂E∂a = (1 − (cid:15) ) βSI − δE − µ E E∂I∂t + ∂I∂a = (cid:15)βSI + δE − µ I I (3)for t > a ∈ [0 , ∞ ). Note that all state variables are now functions of bothtime and age. The boundary conditions are S ( t,

0) = c , E ( t,

0) = I ( t,

0) = 0. Notethat the birth rate is now incorporated into the boundary condition rather thaninto the diﬀerential equation for S . We begin by assuming that model parametersare constants, independent of age. We ﬁrst consider identiﬁability of the model parameters without explicitly ac-counting for the presence of boundary conditions, meaning that we assume thereis no a priori knowledge about the boundary conditions and that they cannot beobserved. We call this case generic boundary conditions. To perform the identiﬁa-bility analysis, we utilize the same framework as we did for the above ODE systemanalysis. That is, we rewrite the model system as a set of polynomial equationsin terms of the outputs and their partial derivatives, and then consider the mapfrom parameter space to the polynomial coeﬃcients.We adopt the notation y ( i,j ) = ∂ i + j y∂t i ∂a j . Using the substitutions E = y E k E and I = y I k I , as in the ODE case, we obtain the following equations: S (1 , + S (0 , = − βS y I k I − µ S Sy (0 , E k E + y (1 , E k E = (1 − (cid:15) ) βS y I k I − δ y E k E − µ E y E k E y (0 , I k I + y (1 , I k I = (cid:15)βS y I k I + δ y E k E − µ I y I . We now eliminate S to obtain equations in terms of only the model parametersand outputs. This can be done by solving for S in the third equation: S = k E (cid:16) y (0 , I + y (1 , I + µ I y I (cid:17) − δk I y E βk E (cid:15)y I . As before, this expression can then be substituted into the remaining two equationsto obtain two input-output equations that are rational equations of the outputs andtheir derivatives. By multiplying by a common denominator and moving terms toone side, these input-output equations can be rewritten as the following polynomial tructural identiﬁability analysis of PDEs 13 equations. 0 = − k E k I µ S y (0 , I y I − k E k I µ S y (1 , I y I − k E k I y (0 , I y I − k E k I y (1 , I y I − k E k I y (2 , I y I + k E k I ( y (0 , I ) + k E k I ( y (1 , I ) + 2 k E k I y (0 , I y (1 , I − k E k I µ I µ S y I − βk E y (0 , I y I − βk E y (1 , I y I − βk E µ I y I + δk I y (0 , E y I + δk I y (1 , E y I − δk I y E y (0 , I − δk I y E y (1 , I + δk I µ S y E y I + βδk I y E y I − k E (cid:15)y (0 , I − k E (cid:15)y (1 , I + k E y (0 , I + k E y (1 , I + k E µ I y I − k E µ I (cid:15)y I − k I (cid:15)y (0 , E − k I (cid:15)y (1 , E − δk I y E − k I µ E (cid:15)y E We choose derivatives with respect to time to be higher ranked than those withrespect to age for our ranking of terms within the polynomial, and let y I be higherranked than y E , yielding the ranking y E < y I < y (0 , E < y (0 , I < y (1 , E < y (1 , I

0) = c, E ( t,

0) = I ( t,

0) = 0, where c represents the birth rate. Todetermine the identiﬁability of the boundary condition parameter c , we performthe same analysis as above but with the model equations evaluated at a = 0(boundary condition equations). We let the boundary conditions for E and I begeneric to avoid reducing equations to zero. Then, noting that S ( t,

0) is constantand thus ∂S∂t ( t,

0) = 0, the relevant equations are ∂S∂a ( t,

0) = − βcI ( t, − µ S c∂E∂t ( t,

0) + ∂E∂a ( t,

0) = (1 − (cid:15) ) βcI ( t, − δE ( t, − µ E E ( t, ∂I∂t ( t,

0) + ∂I∂a ( t,

0) = (cid:15)βcI ( t,

0) + δE ( t, − µ I I ( t,

0) (4)4 Marissa Renardy et al.

By transforming the system (4) to a set of polynomial equations we obtain thefollowing polynomial coeﬃcients: (cid:26) , − k I δk E , − ck E β − ck E β(cid:15)k I , δ + µ E , µ I − cβ(cid:15) (cid:27) Since µ I is identiﬁable from the model equations, we thus ﬁnd that βc(cid:15) is anidentiﬁable combination, which is consistent with the ODE model. Thus, the fullset of identiﬁable parameters and combinations for this model is (cid:26) µ S , µ I , βk I , βc(cid:15), δk I k E , δ ( (cid:15) − (cid:15) , µ E + δ (cid:27) which is equivalent to the set of identiﬁable combinations for the ODE model. With generic initial conditions, we do not gain any additional identiﬁability results.However, if we assume that certain information is known, such as the total popula-tion size at the initial time, then further model parameters can become identiﬁable.Here, we assume that the initial population size as well as its derivatives with re-spect to time and age are known for all ages. Under these conditions, we may solvefor the initial susceptible population S (0 , a ) as S (0 , a ) = N (0 , a ) − E (0 , a ) − I (0 , a ),where N represents the total population size. Similarly, since we assume the deriva-tives of N are known at t = 0, we may solve for the derivatives of S at t = 0 interms of the derivatives of E and I . It is thus straightforward to solve for S andits derivatives at t = 0 in terms of the outputs y E and y I .To determine parameter identiﬁability from the initial condition with this ad-ditional knowledge, we evaluate the model equations at t = 0 and perform identi-ﬁability analysis analogous to the previous analysis for the boundary conditions.We obtain the following set of polynomial coeﬃcients: (cid:26) , k I k E , βk E , βk I , − δk I k E , − βk E ( (cid:15) − k I , βk E ( (cid:15) − k I , β(cid:15)k E ,β(cid:15)k I , β − β(cid:15)k I , δ + µ E , µ I − β(cid:15), k I µ S k E , µ S − β, − k I ( µ S + 2) (cid:27) From these coeﬃcients, we ﬁnd that β, δ, (cid:15), µ S , µ E , µ I , k E , and k I are all identi-ﬁable. Combining this with the previously established identiﬁability of cβ(cid:15) fromthe boundary equations, we ﬁnd that all parameters are identiﬁable if the initialpopulation size and its derivatives are known. While density-dependent transmission is convenient for simplicity, it is not gen-erally a realistic model for TB. Rather, most models for TB epidemiology usefrequency-dependent transmission models wherein the transmission term is pro-portional to S IN . In this section we present identiﬁability results for the SEI modelwith frequency-dependent transmission, assuming that our output functions are tructural identiﬁability analysis of PDEs 15 y E = k E E and y I = k I I . In addition to the ODE and constant-parameter PDEmodels, we will also consider variations including immigration and age-dependentdeath rates. The full details of the analysis are cumbersome and thus are notprovided here, but can be found in the Supplementary Material.5.1 ODE SEI model with frequency-dependent transmissionFor completeness, we include here identiﬁability results for the ODE version of themodel. This model can be written as the following system of equations: dSdt = c − βS IN − µ S SdEdt = (1 − (cid:15) ) βS IN − δE − µ E EdIdt = (cid:15)βS IN + δE − µ I I (5)where N ( t ) = S ( t ) + E ( t ) + I ( t ). We ﬁnd that the set of identiﬁable parametersand parameter combinations for this model is (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ck I , k E k I (cid:27) . ∂S∂t + ∂S∂a = − βS IN − µ S S∂E∂t + ∂E∂a = (1 − (cid:15) ) βS IN − δE − µ E E∂I∂t + ∂I∂a = (cid:15)βS IN + δE − µ I I (6)with boundary conditions S ( t,

0) = c , E ( t,

0) = I ( t,

0) = 0. From the model equa-tions, we ﬁnd that the parameters and combinations { β, δ, (cid:15), µ S , µ E , µ I , k E k I } areidentiﬁable. We further ﬁnd from the boundary equations that ck I is identiﬁable.Thus, the full set of identiﬁable parameters and combinations for this model is { β, δ, (cid:15), µ S , µ E , µ I , ck I , k E k I } which is consistent with the ODE model. If the initial age distribution of the totalpopulation and its derivatives were known (as in Section 4.2.3 for the previouscase study), then all parameters in the model would be identiﬁable. dSdt = m + c − βS IN − µ S SdEdt = (1 − (cid:15) ) βS IN − δE − µ E EdIdt = (cid:15)βS IN + δE − µ I I (7)where m is the immigration rate. If we assume that the age distribution θ ( a ) ofimmigrants is known, then the corresponding age-structured PDE system becomes ∂S∂t + ∂S∂a = mθ ( a ) − βS IN − µ S S∂E∂t + ∂E∂a = (1 − (cid:15) ) βS IN − δE − µ E E∂I∂t + ∂I∂a = (cid:15)βS IN + δE − µ I I (8)where θ ( a ) is some known function such that θ (0) = 0 and (cid:82) ∞ θ ( a ) da = 1, withboundary conditions S ( t,

0) = c and E ( t,

0) = I ( t,

0) = 0.The identiﬁability results for the ODE system (7) remain the same as beforebut with c replaced by c + m , i.e., the identiﬁable parameters and combinationsare (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ( c + m ) k I , k E k I (cid:27) . For the PDE system (8), we get the following identiﬁable parameters and combi-nations from the model and boundary equations: (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ck I , mk I , k E k I (cid:27) . Thus, in this case the identiﬁability results for the PDE model diﬀer from thosefor the ODE model in that the combinations ck I and mk I are both identiﬁable,whereas in the ODE model only the sum ( c + m ) k I is identiﬁable. In other words,if all immigrants are susceptible then in an age-structured model it is possible todiﬀerentiate between birth and immigration, whereas in an ODE model it is not. tructural identiﬁability analysis of PDEs 17 death rates. There are several ways to introduce age dependence to the deathrate: one could assume a speciﬁc functional form (e.g., piecewise-constant, linear,exponential), that the function belongs to a particular family (e.g., polynomi-als, splines), or that the function is completely arbitrary with some conditionson smoothness. In this section, we present identiﬁability results for four cases: (1)piecewise-constant death rate functions for all three compartments, (2) a single ex-ponential death rate function for all compartments, (3) a single polynomial deathrate function for all compartments, and (4) a single arbitrary function for all com-partments. For simplicity, we will consider the SEI model without immigration. We ﬁrst consider the case that the death rates are piecewise-constant functionsof age with discontinuities at ages { a , a , a , ...a n = A } , i.e. µ j ( a ) = µ ij for a ∈ ( a i , a i +1 ) and j = S, E, I . This represents the case where death rates areknown independently for diﬀerent age groups. Then the system (6) becomes asequence of PDE systems obeying the equations ∂S∂t + ∂S∂a = − βS IN − µ iS S∂E∂t + ∂E∂a = (1 − (cid:15) ) βS IN − δE − µ iE E∂I∂t + ∂I∂a = (cid:15)βS IN + δE − µ iI I for t > a ∈ ( a i , a i +1 ), 0 ≤ i < n , with boundary conditions S ( t,

0) = c , E ( t,

0) = I ( t,

0) = 0 and S ( t, a i ) = lim a → a − i S ( t, a ) E ( t, a i ) = lim a → a − i E ( t, a ) I ( t, a i ) = lim a → a − i I ( t, a )for 0 < i < n . We will assume that the locations of the discontinuities a , ..., a n are known, and that the sub-intervals are wide enough that the usual structuralidentiﬁability results hold.Since the model equations remain essentially unchanged, structural identiﬁa-bility of the model equations has already been established in Section 5.2. Thus,from the model equations we obtain that { β, δ, (cid:15), µ iS , µ iE , µ iI , k E k I } are identiﬁable for all i . From the boundary equations at a = 0 we again getthat ck I is identiﬁable, as in the constant parameter case. We gain no additionalinformation about the parameter values from the boundary conditions at a = a i for i > One may also wish to use a continuous function to represent age-dependent deathrates. One possibility is to use an exponential function µ ( a ) = µ exp( κa ) so thatdeath rate increases exponentially with age. For simplicity, we assume that deathrates are the same for all three compartments. Then the model system can bewritten as ∂S∂t + ∂S∂a = − βS IN − µS∂E∂t + ∂E∂a = (1 − (cid:15) ) βS IN − δE − µE∂I∂t + ∂I∂a = (cid:15)βS IN + δE − µIdµda = κµ for t > a ∈ [0 , ∞ ), with boundary conditions S ( t,

0) = c , E ( t,

0) = I ( t,

0) =0, and µ (0) = µ . We write the system in this way rather than directly inserting µ ( a ) = µ exp( κa ) so that we maintain a diﬀerential polynomial form.To obtain an input-output equation from the model equations, we use the I equation to solve for S and the E equation to solve for µ ( a ) (note that these choicesare somewhat arbitrary). Then by noting that I = y I /k I and E = y E /k E andsubstituting into the S and µ equations, we obtain rational equations in terms ofthe outputs and their derivatives. Thus, by multiplying by a common denominator,we obtain diﬀerential polynomials. We ﬁnd that (cid:110) β, δ, (cid:15), κ, k E k I (cid:111) are the identiﬁableparameters and parameter combinations.By performing similar analysis on the boundary equations, we ﬁnd that µ and ck I are identiﬁable. Thus, the full set of identiﬁable parameters and combinationsfor this version of the model is (cid:26) β, δ, (cid:15), κ, µ , ck I , k E k I (cid:27) . This shows that both parameters from the exponential death rate are identiﬁable.The identiﬁability of the remaining parameters is consistent with the results forthe ODE model in Section 5.1.

Another option for a continuous death rate is to use a polynomial, µ ( a ) = (cid:80) ni =0 κ i a i .For example, a polynomial may be ﬁt to the available data for death rate by agegroup. Then the model equations become ∂S∂t + ∂S∂a = − βS IN − (cid:32) N (cid:88) i =0 κ i a i (cid:33) S ∂E∂t + ∂E∂a = (1 − (cid:15) ) βS IN − δE − (cid:32) N (cid:88) i =0 κ i a i (cid:33) E∂I∂t + ∂I∂a = (cid:15)βS IN + δE − (cid:32) N (cid:88) i =0 κ i a i (cid:33) I tructural identiﬁability analysis of PDEs 19 for t > a ∈ [0 , ∞ ), with boundary conditions S ( t,

0) = c and E ( t,

0) = I ( t,

0) = 0. The age a is considered to be observable. By using the I equation tosolve for S and substituting into the remaining equations, we obtain a set of input-output equations that are diﬀerential polynomials. We ﬁnd that the correspondingmonic polynomial contains the monomial terms κ i a i y I for all i = 0 , ..., n . Thus,all coeﬃcient parameters κ i in µ ( a ) are identiﬁable and µ ( a ) can be uniquelyrecovered. We also ﬁnd that (cid:110) β, δ, (cid:15), k E k I (cid:111) are identiﬁable from the model equationsand ck I is identiﬁable from the boundary equations. The ﬁnal and most general option we consider is to allow µ ( a ) to be an arbitraryfunction of a , i.e. the model equations become ∂S∂t + ∂S∂a = − βS IN − µ ( a ) S∂E∂t + ∂E∂a = (1 − (cid:15) ) βS IN − δE − µ ( a ) E∂I∂t + ∂I∂a = (cid:15)βS IN + δE − µ ( a ) I for t > a ∈ [0 , ∞ ). In this case, we use the I equation to solve for S and then use the E equation to solve for µ ( a ). We obtain the following expression for µ ( a ) in terms of the outputs and parameters: µ ( a ) = − k E ( (cid:15) − y I + k I (cid:15)y E (cid:16) δk I y E + k I (cid:15)y (1 , E + k I (cid:15)y (0 , E + k E ( (cid:15) − y (1 , I + k E ( (cid:15) − y (0 , I (cid:17) = − (cid:15) − y I + k I k E (cid:15)y E (cid:18) δ k I k E y E + k I k E (cid:15)y (1 , E + k I k E (cid:15)y (0 , E + ( (cid:15) − y (1 , I + ( (cid:15) − y (0 , I (cid:19) (9)By substituting (9) into the remaining equation for S and performing the iden-tiﬁability analysis, we again ﬁnd that (cid:110) β, δ, (cid:15), k I k E (cid:111) are identiﬁable. We note thatthe second derivatives of y E and y I appear in the input-output equations, andhence µ ( a ) must be at least twice diﬀerentiable. We then ﬁnd that the right handside of (9) consists entirely of observable quantities and identiﬁable parametercombinations; hence µ ( a ) is identiﬁable. Note that while we use the term “identiﬁ-able” here, what we actually mean is that µ ( a ) can be determined from observablequantities and identiﬁable parameter combinations. Further, we obtain from theboundary equations that ck I is identiﬁable.Identiﬁability results for all cases of the ODE and age-structured PDE model systems are summarized in Table 1. We note that in the cases where the same deathrate is used for all compartments, the death rate function is always identiﬁable.In cases where the compartments have diﬀerent death rates, the rates for thesusceptible and infected compartments are identiﬁable. These results hold underthe assumption that a fraction of the total prevalence and a fraction of all exposures can be observed at all times and ages. It has been shown previously that otheroutput functions may lead to non-identiﬁability of death rates even in simplepopulation models (Perasso and Razaﬁson, 2016).5.5 Practical identiﬁabilityThe above structural analysis determines identiﬁable parameters and combinationsunder the assumption that data are noiseless and are available for arbitrarilymany time and age points. In reality, epidemiological data are often noisy andare collected only at discrete time points and for discrete age groups. Thus, itis also necessary to perform practical identiﬁability analysis to determine whichparameters and parameter combinations are still identiﬁable given more realisticdata. The practical identiﬁability of model parameters given realistic data maydiﬀer from their structural identiﬁability.As a proof of concept, we explore the practical identiﬁability of the constant-parameter model from Section 5.2 using proﬁle likelihoods. We assume that dataare collected monthly for 5-year age groups from age 0 to 100 over a span of 20 Case Identiﬁable parameters and combinations

Density-dependent ODE(Section 4.1) (cid:26) µ S , µ I , βk I , βc(cid:15), δk I k E , δ ( (cid:15) − (cid:15) , µ E + δ (cid:27) Density-dependent PDE with constant parameters(Section 4.2) (cid:26) µ S , µ I , βk I , βc(cid:15), δk I k E , δ ( (cid:15) − (cid:15) , µ E + δ (cid:27) Frequency-dependent ODE(Section 5.1) (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ck I , k E k I (cid:27) Frequency-dependent PDE with constant parameters(Section 5.2) (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ck I , k E k I (cid:27) Frequency-dependent ODE with immigration(Section 5.3) (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ( c + m ) k I , k E k I (cid:27) Frequency-dependent PDE with immigration(Section 5.3) (cid:26) β, δ, (cid:15), µ S , µ E , µ I , ck I , mk I , k E k I (cid:27) Frequency-dependent PDE with piecewise-constant death rates(Section 5.4.1) (cid:26) β, δ, (cid:15), ck I , k E k I (cid:27) ∪ (cid:32) n − (cid:91) i =0 { µ iS , µ iI , µ iE } (cid:33) Frequency-dependent PDE with exponential death rate(Section 5.4.2) (cid:26) β, δ, (cid:15), κ, µ , ck I , k E k I (cid:27) Frequency-dependent PDE with polynomial death rate(Section 5.4.3) (cid:26) β, δ, (cid:15), ck I , k E k I (cid:27) ∪ (cid:32) n (cid:91) i =0 { κ i } (cid:33) Frequency-dependent PDE with arbitrary death rate function(Section 5.4.4) (cid:26) β, δ, (cid:15), ck I , k E k I , µ ( a ) (cid:27) Table 1

Summary of identiﬁable parameters and combinations for the ODE and age-structured PDE systems where the output functions are y E = k E E and y I = k I I .tructural identiﬁability analysis of PDEs 21 years. We synthetically generate data by choosing a baseline parameter set, sim-ulating the model, and adding noise that is normally distributed with a standarddeviation of 5%. Using this synthetic data, we perform parameter estimation usingthe Matlab function fminsearch , and evaluate practical identiﬁability of the modelparameters as detailed in Section 2.4.We compute the Fisher Information Matrix (FIM) using the outputs y E and y I at the above-speciﬁed ages and time points. We ﬁnd that for a wide variety ofparameter sets within reasonable ranges, the rank of the FIM is eight, implyingthat there should be eight identiﬁable parameters and parameter combinations.This is consistent with our structural identiﬁability analysis performed in Section5.2, which revealed a set of eight identiﬁable parameters and combinations. Fromthe structural analysis, we see that all parameters except c, k I , and k E are struc-turally identiﬁable. However, from proﬁle likelihoods we observe that the practicalidentiﬁability of these parameters does vary across the parameter space. Here,we demonstrate two baseline parameter sets that lead to qualitatively diﬀerentidentiﬁability proﬁles (Figure 3). For both parameter sets, c, k E , and k I are notpractically identiﬁable.Using the parameter set P = { c = 100 , β = 1 , (cid:15) = 0 . , δ = 0 . , µ S =0 . , µ E = 0 . , µ I = 0 . , k E = 0 . , k I = 0 . } , we ﬁnd that certain struc-turally identiﬁable parameters are only practically identiﬁable on one side, that is,we can establish a reasonable lower or upper bound but not both. While the proﬁlelikelihoods do have a unique minimum, the proﬁles are ﬂat and conﬁdence intervalsfor these parameters are thus wide. In particular, the transmission rate β cannotbe reasonably bounded from above, and (cid:15), δ, µ S , and µ E cannot be distinguishedfrom zero.Using a slightly diﬀerent parameter set P = { c = 100 , β = 1 , (cid:15) = 0 . , δ =0 . , µ S = 0 . , µ E = 0 . , µ I = 0 . , k E = 0 . , k I = 1 } , in which report-ing rates and death among the exposed population are increased, we ﬁnd thatall structurally identiﬁable parameters are also practically identiﬁable. While thisparameter set is not necessarily biologically realistic (since µ E > µ I , for exam-ple), this demonstrates that there are regions in the parameter space where theseparameters can be well estimated. Creating mathematical and computational models can greatly impact the biologi-cal community through predictions that can be used and tested. Parameterizationof models is a key factor in connecting them to the biology, and thus identifyingparameters accurately is an important step of model development. Parameters aretypically estimated from data; however, data may be sparse and noisy, and evenperfect data does not guarantee unique estimability of parameters. Understandingif model parameters are identiﬁable lends credibility to the model system. If pa-rameters are not identiﬁable, then the applicability of those models becomes more theoretical. While this may add value to understanding, it limits the potential forpredictions to be translated in a meaningful way toward impacting the biologi-cal system. To this end, we focus on parameter identiﬁability in both ODE andPDE models. While there is a solid literature on structural identiﬁability analysisfor ODE models, there is not a uniﬁed framework for studying identiﬁability in AB Fig. 3

Practical identiﬁability of the constant-parameter model from Section 5.2. Costfunctions (black lines) based on likelihoods are shown for two sets of parameter values, P = { c = 100 , β = 1 , (cid:15) = 0 . , δ = 0 . , µ S = 0 . , µ E = 0 . , µ I = 0 . , k E = 0 . , k I = 0 . } (panel A) and P = { c = 100 , β = 1 , (cid:15) = 0 . , δ = 0 . , µ S = 0 . , µ E = 0 . , µ I =0 . , k E = 0 . , k I = 1 } (panel B). Red dashed lines denote 95% conﬁdence thresholds, andred dots denote local minima (corresponding to maximum likelihoods). See Section 2.4 formore details of the analysis. PDEs. Here, we have presented a methodology for performing structural identi-ﬁability analysis on age-structured epidemic models using a diﬀerential algebraframework. We have applied this methodology to explore the structural identi-ﬁability of a simple age-structured Susceptible-Exposed-Infected (SEI) epidemicmodel under various assumptions.We found that when the age-structured model uses constant parameters anddoes not include immigration, then the identiﬁability results are identical to those for the corresponding ODE model. In the presence of immigration, however, dif-ferences in identiﬁability arise: the rates for birth and immigration appear only asa sum in the identiﬁable combinations of the ODE model, while they appear sep-arately in those for the age-structured PDE model formulation. Further, we foundthat age-dependent death parameters in the PDE model are able to be uniquely tructural identiﬁability analysis of PDEs 23 recovered in at least the following three cases: piecewise-constant death rates, asingle exponential death rate for all compartments, and a single polynomial deathrate for all compartments.These ﬁndings imply that identiﬁability results for corresponding ODE andPDE models are likely equivalent when parameters are constant, but that addi-tional information can be gained from PDE models when parameters vary withthe non-temporal variable(s). Rigorous derivation of necessary and suﬃcient con-ditions for diﬀerent models to have identical identiﬁability results remains to bedone, however based on our results herein, we conjecture that identiﬁability forthe ODE and PDE forms of an age-structured model should be equivalent whenthere are not age-dependent parameters present in the model. Additionally, futurework to investigate how discretization of the PDE (e.g. converting age dynamicsinto linear ‘chains’ of compartments for age groups (Hurtado and Kirosingh, 2019;Hurtado and Richards, 2020)) impacts identiﬁability of the model would be a use-ful direction given the common use of compartmental representations of aging inbiological models.The framework presented here could be extended to other PDE models, suchas reaction-diﬀusion and advection-diﬀusion equations, in a variety of applicationareas. In particular, the assumption used here that age-interactions are only local(i.e. individuals interact with only people their own age) may be reasonable forreaction-diﬀusion models. However, since this methodology relies on the ability toexpress the input-output equations as diﬀerential polynomials, there is a limit tothe amount of complexity that it can handle. Thus, highly nonlinear models and/oroutput functions may require the use of other tools such as numerical methods.One such method is the FIM, which we have demonstrated in the context of anage-structured model with preferential mixing informed by real data.A signiﬁcant limitation of structural identiﬁability analysis is that it assumesthe availability of perfect data. In practice, data may contain noise and bias, andmay be sampled only at discrete time points. Thus, practical identiﬁability anal-ysis must also be performed to gain a complete understanding of the parameteridentiﬁability of a model given the available data. As demonstrated in our exam-ples, practical identiﬁability of a model may vary not only with the quality of thedata but also with the corresponding values of model parameters. Practical iden-tiﬁability thus needs to be handled on a case-by-case basis for models and theircorresponding parameter sets. Taken together, practical and structural identiﬁa-bility provide rigor to the modeling process, adding to the applicability of modelpredictions.

Acknowledgements

This research was supported by NIH grants R01AI123093 and U01HL131072awarded to DK and NSF DMS Grant 1853032 and NIH MIDAS Grant U01GM110712 awarded to ME.

References

Ajelli M, Gon ˜A § alves B, Balcan D, Colizza V, Hu H, Ramasco JJ, Merler S,Vespignani A (2010) Comparing large-scale computational approaches to epi-demic modeling: Agent-based versus structured metapopulation models. BMCInfectious Diseases 10(1):190, DOI 10.1186/1471-2334-10-190Audoly S, Bellu G, D’Angio L, Saccomani MP, Cobelli C (2001) Global iden-tiﬁability of nonlinear models of biological systems. IEEE Trans Biomed Eng48(1):55–65Bertuzzo E, Casagrandi R, Gatto M, Rodriguez-Iturbe I, Rinaldo A (2009) Onspatially explicit models of cholera epidemics. Journal of the Royal Society In-terface 7(43):321–333, DOI 10.1098/rsif.2009.0204Bian L (2004) A conceptual framework for an individual-based spatially explicitepidemiological model. Environment and Planning B: Planning and Design31(3):381–395, DOI 10.1068/b2833Castillo-Chavez C, Feng Z (1998) Global stability of an age-structure model for TBand its applications to optimal vaccination strategies. Mathematical Biosciences151(2):135–154, DOI 10.1016/S0025-5564(98)10016-0Castillo-Chavez C, Hethcote HW, Andreasen V, Levin SA, Liu WM (1991)Epidemiological models with age structure, proportionate mixing, and cross-immunity. Journal of Mathematical Biology 27(3):233–258, DOI 10.1007/BF00275810Chappell MJ, Gunn RN (1998) A procedure for generating locally identiﬁ-able reparameterisations of unidentiﬁable non-linear systems by the similar-ity transformation approach. Mathematical Biosciences 148(1):21–41, DOIhttps://doi.org/10.1016/S0025-5564(97)10004-9Chappell MJ, Godfrey KR, Vajda S (1990) Global identiﬁability of the parametersof nonlinear systems with speciﬁed inputs: A comparison of methods. Mathe-matical Biosciences 102(1):41–73Chis OT, Banga JR, Balsa-Canto E (2011) Structural identiﬁability of systemsbiology models: a critical comparison of methods. PloS one 6(11):e27755Chow L, Fan M, Feng Z (2011) Dynamics of a multigroup epidemiologicalmodel with group-targeted vaccination strategies. Journal of Theoretical Bi-ology 291:56–64, DOI 10.1016/j.jtbi.2011.09.020Cobelli C, DiStefano JJ (1980) Parameter and structural identiﬁability conceptsand ambiguities: a critical review and analysis. American Journal of Physiology- Regulatory, Integrative and Comparative Physiology 239(1):R7–R24Cole D, Morgan B, Titterington D (2010) Determining the parametric structure ofmodels. Mathematical Biosciences 228(1):16–30, DOI https://doi.org/10.1016/j.mbs.2010.08.004Eisenberg M (2019) Input-output equivalence and identiﬁability: some simple gen-eralizations of the diﬀerential algebra approach. arXivEisenberg MC, Hayashi MA (2014) Determining identiﬁable parameter combina-tions using subset proﬁling. Mathematical biosciences 256:116–126 Eisenberg MC, Robertson SL, Tien JH (2013) Identiﬁability and estimation ofmultiple transmission pathways in cholera and waterborne disease. Journal ofTheoretical Biology 324:84–102, DOI 10.1016/j.jtbi.2012.12.021Evans ND, White LJ, Chapman MJ, Godfrey KR, Chappell MJ (2005) The struc-tural identiﬁability of the susceptible infected recovered model with seasonal tructural identiﬁability analysis of PDEs 25 forcing. Mathematical Biosciences 194:175–197, DOI 10.1016/j.mbs.2004.10.011Feng Z, Hill AN, Smith PJ, Glasser JW (2015) An elaboration of theory aboutpreventing outbreaks in homogeneous populations to include heterogeneity orpreferential mixing. Journal of Theoretical Biology 386:177–187, DOI 10.1016/j.jtbi.2015.09.006Ferguson NM, Nokes DJ, Anderson RM (1996) Dynamical complexity in age-structured models of the transmission of the measles virus: Epidemiological im-plications at high levels of vaccine uptake. Mathematical Biosciences 138(2):101–130, DOI 10.1016/S0025-5564(96)00127-7Gardner ID (1980) The eﬀect of aging on susceptibility to infection. Reviews ofInfectious Diseases 2(5):801–810, DOI 10.1016/j.mbs.2011.10.001Glasser J, Feng Z, Moylan A, Valle SD, Castillo-Chavez C (2012) Mixing in age-structured population models of infectious diseases. Mathematical Biosciences235:1–7, DOI 10.1016/j.mbs.2011.10.001Glasser J, Feng Z, Omer SB, Smith PJ, Rodewald LE (2016) The eﬀect of hetero-geneity in uptake of the measles, mumps, and rubella vaccine on the potential foroutbreaks of measles: a modelling study. Lancet Infectious Diseases 16(5):599–605, DOI 10.1016/S1473-3099(16)00004-9Guzzetta G, Ajelli M, Yang Z, Merler S, Furlanello C, Kirschner D (2011) Mod-eling socio-demography to capture tuberculosis transmission dynamics in a lowburden setting. Journal of Theoretical Biology 289(1):197–205Hao L, Glasser JW, Su Q, Ma C, Feng Z, Yin Z, Goodson JL, Wen N, Fan C,Yang H, Rodewald LE, Feng Z, Wang H (2019) Evaluating vaccination policiesto accelerate measles elimination in China: a meta-population modelling study.International Journal of Epidemiology dyz058, DOI 10.1093/ije/dyz058Harris RC, Sumner T, Knight GM, Evans T, Cardenas V, Chen C, White RG(2019) Age-targeted tuberculosis vaccination in China and implications for vac-cine development: a modelling study. Lancet Global Health 7:E209–E218, DOI10.1016/S2214-109X(18)30452-2Hong H, Ovchinnikov A, Pogudin G, Yap C (2020) Global identiﬁability of diﬀer-ential models. Communications on Pure and Applied Mathematics 73(9):1831–1879Hurtado PJ, Kirosingh AS (2019) Generalizations of the ‘linear chain trick’: in-corporating more ﬂexible dwell time distributions into mean ﬁeld ode models.Journal of mathematical biology 79(5):1831–1883Hurtado PJ, Richards C (2020) Time is of the essence: Incorporating phase-type distributed delays and dwell times into ode models. arXiv preprintarXiv:200801318Inaba H, Sekine H (2004) A mathematical model for Chagas disease with infection-age-dependent infectivity. Mathematical Biosciences 1:39–69, DOI 10.1016/j.mbs.2004.02.004Jacquez JA, Greif P (1985) Numerical parameter identiﬁability and estimability:Integrating identiﬁability, estimability, and optimal sampling design. Mathemat-ical Biosciences 77(1-2):201–227

Jacquez JA, Perry T (1990) Parameter estimation: local identiﬁability of param-eters. Am J Physiol 258(4 Pt 1):E727–36Kao YH, Eisenberg MC (2018) Practical unidentiﬁability of a simple vector-bornedisease model: Implications for parameter estimation and intervention assess-ment. Epidemics 25:89–100, DOI 10.1016/j.epidem.2018.05.010

Keeling MJ, White PJ (2011) Targeting vaccination against novel infections: risk,age and spatial structure for pandemic inﬂuenza in Great Britain. Journal ofthe Royal Society Interface 8(58):661–670, DOI 10.1098/rsif.2010.0474Ljung L, Glad T (1994) On global identiﬁability for arbitrary model parameteri-zation. Automatica 30(2):265–276LoBue PA, Mermin JH (2017) Latent tuberculosis infection: the ﬁnal frontier oftuberculosis elimination in the USA. The Lancet Infectious Diseases 17(10):e327– e333, DOI https://doi.org/10.1016/S1473-3099(17)30248-7Miao H, Xia X, Perelson A, Wu H (2011) On identiﬁability of nonlinear ode modelsand applications in viral dynamics. SIAM Review 53(1):3–39, DOI 10.1137/090757009, URL http://epubs.siam.org/doi/abs/10.1137/090757009 , http://epubs.siam.org/doi/pdf/10.1137/090757009 Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, Massari M,Salmaso S, Tomba GS, Wallinga J, Heijne J, Sadkowska-Todys M, Rosinska M,Edmunds WJ (2008) Social contacts and mixing patterns relevant to the spreadof infectious diseases. PLoS Medicine 5(3):e74Ozcaglar C, Shabbeer A, Vandenberg SL, Yener B, Bennett KP (2012) Epidemio-logical models of mycobacterium tuberculosis complex infections. MathematicalBiosciences 236(2):77–96, DOI 10.1016/j.mbs.2012.02.003Perasso A, Razaﬁson U (2016) Identiﬁability problem for recovering the mortal-ity rate in an age-structured population dynamics model. Inverse Problems inScience and Engineering 24(4):711–728, DOI 10.1080/17415977.2015.1061522Perasso A, Laroche B, Chitour Y, Touzeau S (2011) Identiﬁability analysis ofan epidemiological model in a structured population. Journal of MathematicalAnalysis and Applications 374:154–165, DOI 10.1016/j.jmaa.2010.08.072Pia Saccomani M, Audoly S, D’Angio L (2003) Parameter identiﬁability of non-linear systems: the role of initial conditions. Automatica 39(4):619–632Pohjanpalo H (1978) System identiﬁability based on the power series expansion ofthe solution. Mathematical Biosciences 41(1-2):21–33Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmuller U, TimmerJ (2009) Structural and practical identiﬁability analysis of partially observed dy-namical models by exploiting the proﬁle likelihood. Bioinformatics 25(15):1923–9Renardy M, Kirschner D (2019) Evaluating vaccination strategies for tuberculosisin endemic and non-endemic settings. Journal of Theoretical Biology 469:1–11Renardy M, Kirschner D (2020) A framework for network-based epidemiologicalmodeling of tuberculosis dynamics using synthetic datasets. Bulletin of Mathe-matical Biology 82(6):78Ritt JF (1950) Diﬀerential algebra, vol 33. American Mathematical Soc.Shim E, Feng Z, Martcheva M, Castillo-Chavez C (2006) An age-structured epi-demic model of rotavirus with vaccination. Journal of Mathematical Biology53(4):719–746, DOI 10.1007/s00285-006-0023-0Thieme HR, Castillo-Chavez C (1993) How may infection-age-dependent infectiv-ity aﬀect the dynamics of HIV/AIDS? SIAM Journal of Applied Mathematics tructural identiﬁability analysis of PDEs 27 application to Rift Valley fever. Bulletin of Mathematical Biology 78:1796–1827,DOI 10.1007/s11538-016-0200-2Valle SYD, Hyman JM, Hethcote HW, Eubank SG (2007) Mixing patterns betweenage groups in social networks. Social Networks 29(4):539–554Venzon DJ, Moolgavkar SH (1988) A method for computing proﬁle-likelihood-based conﬁdence intervals. Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics) 37(1):87–94, DOI https://doi.org/10.2307/2347496, URL https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2347496 , https://rss.onlinelibrary.wiley.com/doi/pdf/10.2307/2347496https://rss.onlinelibrary.wiley.com/doi/pdf/10.2307/2347496