[PDF] Parameter estimation in nonlinear mixed effect models based on ordinary differential equations: an optimal control approach

Abstract

We present a parameter estimation method for nonlinear mixed effect models based on ordinary differential equations (NLME-ODEs). The method presented here aims at regularizing the estimation problem in presence of model misspecifications, practical identifiability issues and unknown initial conditions. For doing so, we define our estimator as the minimizer of a cost function which incorporates a possible gap between the assumed model at the population level and the specific individual dynamic. The cost function computation leads to formulate and solve optimal control problems at the subject level. This control theory approach allows to bypass the need to know or estimate initial conditions for each subject and it regularizes the estimation problem in presence of poorly identifiable parameters. Comparing to maximum likelihood, we show on simulation examples that our method improves estimation accuracy in possibly partially observed systems with unknown initial conditions or poorly identifiable parameters with or without model error. We conclude this work with a real application on antibody concentration data after vaccination against Ebola virus coming from phase 1 trials. We use the estimated model discrepancy at the subject level to analyze the presence of model misspecification.

Full PDF

PParameter estimation in nonlinear mixed effect models basedon ordinary differential equations: an optimal control ap-proach

Quentin Clairon

University of Bordeaux, Inria Bordeaux Sud-Ouest, Inserm, Bordeaux Population HealthResearch Center, SISTM Team, UMR1219, F-33000 Bordeaux, FranceVaccine Research Institute, F-94000 Cr ´eteil, France

E-mail: [email protected] ´e Pasin

Institute of Medical Virology, University of Zurich, Zurich, Switzerland.Department of Infectious Diseases and Hospital Epidemiology, University Hospital, Zurich,Switzerland.

Irene Balelli

Universit ´e C ˆote d’Azur, INRIA Sophia Antipolis, EPIONE Research Project, Valbonne,France.

Rodolphe Thi ´ebaut

University of Bordeaux, Inria Bordeaux Sud-Ouest, Inserm, Bordeaux Population HealthResearch Center, SISTM Team, UMR1219, F-33000 Bordeaux, FranceCHU Bordeaux, F-33000 Bordeaux, FranceVaccine Research Institute, F-94000 Cr ´eteil, France

M ´elanie Prague

Summary . We present a parameter estimation method for nonlinear mixed effect modelsbased on ordinary differential equations (NLME-ODEs). The method presented here aimsat regularizing the estimation problem in presence of model misspeciﬁcations, practicalidentiﬁability issues and unknown initial conditions. For doing so, we deﬁne our estima-tor as the minimizer of a cost function which incorporates a possible gap between theassumed model at the population level and the speciﬁc individual dynamic. The costfunction computation leads to formulate and solve optimal control problems at the subjectlevel. This control theory approach allows to bypass the need to know or estimate initialconditions for each subject and it regularizes the estimation problem in presence of poorlyidentiﬁable parameters. Comparing to maximum likelihood, we show on simulation exam-ples that our method improves estimation accuracy in possibly partially observed systemswith unknown initial conditions or poorly identiﬁable parameters with or without model er-ror. We conclude this work with a real application on antibody concentration data aftervaccination against Ebola virus coming from phase 1 trials. We use the estimated modeldiscrepancy at the subject level to analyze the presence of model misspeciﬁcation.

1. Introduction

ODE models are standard in population dynamics, epidemiology, virology, pharmacoki-netics, or genetic regulation networks analysis due to their ability to describe the mainmechanisms of interaction between diﬀerent biological components of complex systems,their evolution in time and to provide reasonable approximations of stochastic dynamicsPerelson et al. (1996); Lavielle and Mentr´e (2007); Wakeﬁeld and Racine-Poon (1995);Andraud et al. (2012); Pasin et al. (2019); M. Lavielle and Mentre (2011); Le et al.(2015); Engl et al. (2009); Wu et al. (2014). Evidence of the relevance of ODEs residesfor example in their joint use with control theory methods for the purpose of optimaltreatment design Guo and Sun (2012); Agusto and Adekunle (2014); Zhang and Xu(2016); Pasin et al. (2018); Villain et al. (2019). In cases of experimental designs in-volving a large number of subjects and limited number of individual measurements,non-linear mixed-eﬀect models may be more relevant than subject-by-subject model togather information from the whole population while allowing between-individual vari-ability. For example, clinical trials and pharmacokinetics/pharmacodynamics studies arameter estimation in NLME-ODEs via optimal control theory often fall into this category M. Lavielle and Mentre (2011); Guedj et al. (2007); Huangand Lu (2008); Wang et al. (2014); Prague et al. (2013). Formally, we are interestedin a population where the dynamics of the compartments of each subject i ∈ (cid:74) , n (cid:75) ismodeled by the d -dimensional ODE:  ˙ x i ( t ) = f θ,b i ( t, x i ( t ) , z i ( t )) x i (0) = x i, (1.1)where f is a d − dimensional vector ﬁeld, θ is a p − dimensional parameter, b i ∼ N (0 , Ψ)is a q − dimensional random eﬀect where Ψ is a variance-covariance matrix, x i, is theinitial condition for subject i belonging to R d and z i is a covariate function. We denote X θ,b i ,x i, the solution of (1.1) for a given set ( θ, b i , x i, ).Our goal is to estimate the true population parameters ( θ ∗ , Ψ ∗ ) as well as the truesubject speciﬁc realizations { b ∗ i } i ∈ (cid:74) , n (cid:75) from partial and noisy observations coming from n subjects and described by the following observational model: y ij = CX θ ∗ ,b ∗ i ,x ∗ i, ( t ij ) + (cid:15) ij where t ij is the j -th measurement time-point for the i − th subject on the observationinterval [0 , T ]. Here C is a d o × d sized observation matrix emphasizing the potentiallypartially observed nature of the process and (cid:15) ij ∼ σ ∗ × N (0 , I d o ) is the measurementerror. We also assume only a subset of the true initial condition x ∗ i, , denoted x k ∗ i, , isknown, the other ones, denoted x u ∗ i, , being unknown. For the sake of clarity, we orderthe state variables as follows: x i, = (cid:18)(cid:16) x ui, (cid:17) T , (cid:16) x ki, (cid:17) T (cid:19) T . We denote n i the number ofobservations for the i -th subject, y i = { y ij } j ∈ (cid:74) , n i (cid:75) its corresponding set of observationsand y = { y i } i ∈ (cid:74) , n (cid:75) the set of all observations in the population.Our problem belongs to the class parameter estimation problem in nonlinear mixedeﬀect models. In this context, frequentist methods based on likelihood maximization(via diﬀerent numerical procedures: Laplace approximation Pinheiro and Bates (1994),Gaussian quadrature Pinheiro and Bates (1994); Lindstrom and Bates (1990); Guedjet al. (2007) or SAEM Kuhn and Lavielle (2005); Lavielle and Mentr´e (2007); Cometset al. (2017)) and Bayesian ones aiming to reconstruct the a posteriori distribution or toderive the maximum a posteriori estimator (via MCMC algorithms Lunn et al. (2000); M ´elanie Prague

Huang and Dagne (2011); Huang et al. (2010), importance sampling Raftery and Bao(2010), approximation of the asymptotic posterior distribution Prague et al. (2013)) havebeen proposed. In particular, dedicated methods/softwares using the structure of ODEmodels have been implemented to increase numerical stability and speed up convergencerate Tornoe et al. (2004), to reduce the computational time Donnet and Samson (2006) orto avoid the repeated model integration and estimation of initial conditions Wang et al.(2014). However, all the preceding methods face similar pitfalls due to speciﬁc featuresof population models based on ODEs (with the exception of Wang et al. (2014)):(a) They do not account for model misspeciﬁcation presence, a common feature in ODEmodels used in biology. Indeed, the ODE modeling process suﬀers from model in-adequacy, understood as the discrepancy between the mean model response andreal world process, and residual variability issues, that is subject speciﬁc stochasticperturbations or missed elements which disappear by averaging over the whole pop-ulation Kennedy and O’Hagan (2001). As examples of model inadequacy causes,one can think of ODE models used in epidemiology and virology which are de-rived by approximations where for instance, interactions are modeled by pairwiseproducts while higher order terms and/or the inﬂuence of unknown/unmeasuredexternal factors are neglected Stein et al. (2013). Regarding residual variability, letus remind that biological processes are often stochastic Bowsher and Swain (2012);Komorowski et al. (2013) and the justiﬁcation of deterministic modeling comesfrom the approximation of stochastic processes Kurtz (1978); Gillespie (2000);Kampen (1992). Moreover, in the context of population models, new sources ofmodel uncertainties emerge. Firstly, error measurement in covariates z i which isnot often considered leads to use a proxy function (cid:98) z i instead of z i Huang andDagne (2011). Secondly, the sequential nature of most inference methods leads toestimate { b ∗ i } i ∈ (cid:74) , n (cid:75) based on an approximation (cid:98) θ instead of the true populationparameter value θ ∗ . Thus, the structure of mixed-eﬀect models spread measure-ment uncertainty into the mechanistic model structure during the estimation. Itturns classical statistical uncertainties into model error causes. Estimation of θ ∗ ,Ψ ∗ and { b ∗ i } i ∈ (cid:74) , n (cid:75) has to be done with model misspeciﬁcation presence although arameter estimation in NLME-ODEs via optimal control theory it is known to dramatically impair the accuracy of methods which do not take intoaccount potential modeling error Brynjarsdottir and O’Hagan (2014); Kirk et al.(2016).(b) They have to estimate or make assumptions on x u ∗ i, values. In ODE models, theinitial conditions are generally nuisance parameters in the sense that knowing theirvalues does not bring answers to the scientiﬁc questions which motivate the modelconstruction but the estimation of the relevant parameters requires x ∗ i, inference aswell. For example partially observed compartmental models used in pharmacoki-netics/pharmacodynamics often involve unknown initial conditions which needs tobe inferred to estimate the transmission rates between compartments which are thetrue parameters of interest. Unknown initial conditions imply either: assumptionson their values M. Lavielle and Mentre (2011); Guedj et al. (2007); Thiebaut et al.(2014), another potential cause of model misspeciﬁcations, or the need to estimatethem Huang et al. (2006); Huang and Lu (2008) which increases the optimiza-tion problem dimension and degrades estimation accuracy due to covariance eﬀectbetween ( θ ∗ , Ψ ∗ ) and x u ∗ i, estimate.(c) They can face accuracy degradation when the inverse problem of parameter esti-mation is ill-posed Engl et al. (2009); Stuart (2010) due to practical identiﬁabilityissues. Ill-posedness in ODE models is often due to the geometry induced by themapping ( θ, b i , x i, ) (cid:55)−→ CX θ,b i ,x i, , where there can be a small number of relevantdirections of variation skewed from the original parameter axes Gutenkunst et al.(2007); Transtrum et al. (2011, 2015); Leary et al. (2015). This problem, calledsloppiness, often appears in ODE models used in biology Gutenkunst et al. (2007);Leary et al. (2015) and leads to an ill-conditioned Fisher Information Matrix. Formaximum likelihood estimators this is a cause for high variance due to the Cram´er-Rao bound. For Bayesian inference, it leads to a nearly singular asymptotic a pos-teriori distribution because of Bernstein–von Mises theorem (see Campbell (2007)for the computational induced problems). Despite this problem is in part mitigatedby the population approach which merges diﬀerent subjects for estimating ( θ ∗ , Ψ ∗ )and uses distribution of b i | Ψ as prior at the subject level Lavielle and Aarons

M ´elanie Prague (2015), estimation accuracy can beneﬁt from the use of regularization techniquesfor the inverse problem.These speciﬁc features of ODE-based population models limit the amount of informa-tion classic approaches can extract for estimation purposes from observations no mattertheir qualities or abundances. This advocates for the development of new estimationprocedures. Approximate methods Varah (1982); Ramsay et al. (2007); Clairon andBrunel (2018) have already proven to be useful for ODE models facing these issues withobservations coming from one subject. These approaches rely on an approximation ofthe solution of the original ODE (1.1) which is expected to have a smoother depen-dence with respect to the parameters and to relax the constraint imposed by the modelduring the estimation process. The interest of such approximations is twofold. Firstlythey produce estimators with a better conditioned variance matrix comparing to clas-sic likelihood based approaches and they reduce the eﬀect of model error on estimatoraccuracy. Secondly, some of these approximations bypass the need to estimate initialconditions Ramsay et al. (2007); Clairon (2020). In this work, we generalize one of theseapproaches to population models by developing a new estimation method speciﬁc toNLME-ODEs aiming to integrate such approximations to mitigate the eﬀect of modelmisspeciﬁcation and poorly identiﬁable parameter on estimation accuracy, while avoid-ing the need to estimate x u ∗ i, . We propose here a nested estimation procedure wherepopulation parameters ( θ ∗ , Ψ ∗ , σ ∗ ) are estimated through the maximization of an outercriterion. This requires in turn an estimator for the { b ∗ i } i ∈ (cid:74) , n (cid:75) obtained through therepeated optimization of inner criteria. We consider that the actual dynamic for eachsubject is described by a perturbed version of the ODE (1.1) where the added perturba-tion captures diﬀerent sources of errors at the subject level Brynjarsdottir and O’Hagan(2014); Tuo and Wu (2015). We control the magnitude of the acceptable perturbationsby deﬁning the inner criteria through a cost function balancing the two contrary objec-tives of ﬁdelity to the observations and to the original model: to this end, we introducea model discrepancy penalization term. The practical computation of the { b ∗ i } i ∈ (cid:74) , n (cid:75) estimators require to solve optimal control problems Clarke (2013); Kirk (1998); Sontag(1998) known as tracking problems. This is done using a method inspired by Cimen arameter estimation in NLME-ODEs via optimal control theory and Banks (2004b,a) based on pseudo-linear representation and Linear-Quadratic the-ory. In addition, our method does not need to estimate x u ∗ i, . Nevertheless, it can providean estimator of x u ∗ i, as a direct byproduct of structural parameters estimation with noadditional computational costs.In section 2, we present the estimation method and derive the inner and outer criteriaas well as an estimator of the asymptotic Variance-Covariance matrix for the estimatorsof ( θ ∗ , Ψ ∗ ). In section 3, we present the optimal control problems related to the diﬀerentcriteria as well as the algorithms used to solve them. In section 4, we compare ourapproach with classic maximum likelihood in simulations. We then proceed to the realdata analysis coming from clinical studies and a model of the antibody concentrationdynamics following immunization with an Ebola vaccine in East African participantsPasin et al. (2019). Section 6 concludes and discuss further applications and extensionsof the method.

2. Construction of the estimator: deﬁnition of the inner and outer criteria

From now on, we use the following Choleski decomposition σ Ψ − = (cid:52) T (cid:52) (or equiv-alently Ψ = σ (cid:0) (cid:52) T (cid:52) (cid:1) − ) and the parametrization ( θ, ∆ , σ ) instead of ( θ, Ψ , σ ). Thisparametrization will allows us to enforce positiveness and symmetry of Ψ and to derivean explicit estimator of σ given a value for ( θ, ∆). The norm (cid:107) . (cid:107) will denote the classicEuclidean one deﬁned by (cid:107) b (cid:107) = √ b T b. Similarly as in the Expectation-Maximization(EM) algorithm, we estimate the population and individual parameters via a nestedprocedure: • Estimation of (cid:98) b i := (cid:98) b i ( θ, ∆) for each subject i by minimization of the inner crite-rion g i , a modiﬁed version of the log joint-likelihood function of the data and therandom eﬀects. • Estimation of ( θ, ∆ , σ ) via the maximization of an outer criterion deﬁned as anapproximation of the proﬁled joint distribution of ( θ, ∆ , σ, b ) with respect to b anddenoted G ( θ, ∆ , σ ) . M ´elanie Prague

In this section, we describe the procedure used to estimate the q − dimensional randomeﬀects { b ∗ i } i ∈ (cid:74) , n (cid:75) for a given ( θ, ∆ , σ ) value. A straightforward approach would be tolook for the minimum of the log joint-likelihood function of the data and (cid:110) b i , x u ,i (cid:111) . However, we want to:(a) avoid estimation of unknown initial conditions,(b) allow for each subject an acceptable departure from the assumed model at thepopulation level to take into account possible model misspeciﬁcations.To solve the ﬁrst point, we deﬁne our estimator as the maximizer of the joint conditionallikelihood P ( y i , b i | x u ,i , θ, ∆ , σ ) proﬁled on the unknown initial condition. Since P ( y i , b i | x u ,i , θ, ∆ , σ ) = P ( y i | b i , x u ,i , θ, ∆ , σ ) P ( b i | θ, ∆ , σ )= (2 π ) − ( d o n i + q ) / σ − ( d o n i + q ) |(cid:52)| e − . (cid:16)(cid:80) j (cid:107) CX θ,bi,x ,i ( t ij ) − y ij (cid:107) + b Ti ( (cid:52) T (cid:52) ) b i (cid:17) /σ by using P ( y i | b i , θ, ∆ , σ ) = (cid:81) j P ( y ij | b i , θ, ∆ , σ ) = (cid:81) j (2 π ) − d o / σ − d o e − . (cid:107) CX θ,bi,x ,i ( t ij ) − y ij (cid:107) /σ , P ( b i | θ, ∆ , σ ) = (2 π ) − q/ | Ψ | − / e − . b Ti Ψ − b i and σ q | Ψ | − = |(cid:52)| , a straightforwardmixed-eﬀect estimator would be (cid:98) b i = arg min b i min x u ,i (cid:110)(cid:80) j (cid:13)(cid:13) CX θ,b i ,x ,i ( t ij ) − y ij (cid:13)(cid:13) + (cid:107) ∆ b i (cid:107) (cid:111) that is, the classic maximum likelihood criteria proﬁled on x u ,i . Concerning the secondpoint, we allow perturbations comparing to the original model, by assuming that thedynamic of each subject i follows a perturbed version of ODE (1.1):  ˙ x i ( t ) = f θ,b i ( t, x i ( t ) , z i ( t )) + Bu i ( t ) x i (0) = x i, (2.1)with the addition of the forcing term t (cid:55)→ Bu i ( t ) with B a d × d u matrix and u i afunction in L (cid:0) [0 , T ] , R d u (cid:1) . We denote X θ,b i ,x i, ,u i the solution of this new ODE (2.1).However, to ensure the possible perturbation remains small, we replace the data ﬁttingcriterion (cid:80) j (cid:13)(cid:13) CX θ,b i ,x ,i ( t ij ) − y ij (cid:13)(cid:13) by min u i C i ( b i , x i, , u i | θ, U ) where C i ( b i , x i, , u i | θ, U ) = (cid:80) j (cid:13)(cid:13) CX θ,b i ,x ,i ,u i ( t ij ) − y ij (cid:13)(cid:13) + (cid:107) u i (cid:107) U,L and (cid:107) u i (cid:107) U,L = (cid:82) T u i ( t ) T U u i ( t ) dt isthe weighted Euclidean norm. Therefore the magnitude of the allowed perturbations iscontrolled by a positive deﬁnite and symmetric weighting matrix U. Finally, we obtain: (cid:98) b i ( θ, ∆) := arg min b i g i ( b i | θ, ∆ , U ) (2.2) arameter estimation in NLME-ODEs via optimal control theory where: g i ( b i | θ, ∆ , U ) = min x u ,i (cid:26) min u i C i ( b i , x i, , u i | θ, U ) + (cid:107) ∆ b i (cid:107) (cid:27) . This requires to solve the inﬁnite dimensional optimization problem min u i C i ( b i , x i, , u i | θ, U ) in L (cid:0) [0 , T ] , R d u (cid:1) . This problem belongs to the ﬁeld of optimal control theory forwhich dedicated approaches have been developed to solve them Sontag (1998); Aliyu(2011); Clarke (2013). Here we use the same method as in Clairon (2020), which ensuresthe existence and uniqueness of the solution and provides a computationally eﬃcientway to ﬁnd it for linear ODEs. This method can be extended to non-linear ODEsthrough an iterative procedure where the original problem is replaced by a sequence ofproblems involving only linear ODEs. In addition, the methods from Clairon (2020)presents the advantage of formulating min u i C i ( b i , x i, , u i | θ, U ) as a quadratic form(or a sequence of quadratic forms) with respect to x u ,i . Thus, the computation ofmin x u ,i { min u i C i ( b i , x i, , u i | θ, U ) } does not add any computational complexity compar-ing to min u i C i ( b i , x i, , u i | θ, U ).The control corresponding to the solution of min x u ,i { min u i C i ( b i , x i, , u i | θ, U ) } isnamed optimal control and denoted u i,θ,b i . The corresponding solution of (2.1) is denoted X θ,b i and named optimal trajectory. In particular, X θ,b i and u i,θ,b i are respectively thesubject speciﬁc state variable and perturbation such that: g i ( b i | θ, ∆ , U ) = (cid:88) j (cid:13)(cid:13) CX θ,b i ( t ij ) − y ij (cid:13)(cid:13) + (cid:107) u i,θ,b i (cid:107) U,L + (cid:107) ∆ b i (cid:107) . (2.3)To incorporate possible model errors in the estimation process, e.g. due to subject spe-ciﬁc exogenous perturbations, X θ,b i is now assumed to be the subject speciﬁc regressionfunction, deﬁned as the state-variable which needs the smallest perturbation in order toget close to the observations. The numerical procedure to derive X θ,b i and g i is presentedin section 3. Remark 2.1.

The deﬁnition of the optimal control u i,θ,b i has an interpretation interms of Bayesian inference in an inﬁnite dimensional space. According to Dashti et al.(2013) (theorem 3.5 and Corollary 3.10), u i,θ,b i is a maximum a posteriori estimatorwhere the chosen prior measure is a centered Gaussian random ﬁeld with the covari-ance operator determined by U . This link can be fruitful to import tools coming from M ´elanie Prague deterministic control theory to solve statistical problem formalized in functional spaces.

We focus in this section on population parameter estimation. Classic approaches relyon maximum a posteriori distribution or the likelihood of the observations in whichthey get rid of the unknown subject speciﬁc parameters by taking the mean valueof P [ θ, ∆ , σ, b | y ] or P [ y | θ, ∆ , σ, b ], E b [ P [ θ, ∆ , σ, b | y ]] or E b [ P [ y | θ, ∆ , σ, b ]] respec-tively, as outer criteria. This generally requires the numerical approximation of integralsof possibly high dimensions (the same as b ), a source of approximation and computa-tional issues Pinheiro and Bates (1994). To avoid this, we consider the random eﬀectsas nuisance parameters and rely on a classic proﬁling approach for ( θ ∗ , (cid:52) ∗ ) estima-tion Murphy and der Vaart (2000). Instead of taking the mean, we rely on the max-imal value of the joint distribution with respect to b . We consider the cost functionmax b P [ θ, ∆ , σ, b | y ] (or equivalently max b ln P [ θ, ∆ , σ, b | y ]). Bayes formula gives us P [ θ, ∆ , σ, b | y ] ∝ P [ y | θ, ∆ , σ, b ] P [ θ, ∆ , σ, b ] . Since P [ θ, ∆ , σ, b ] = P [ b | θ, ∆ , σ ] P [ θ, ∆],we get P [ θ, ∆ , σ, b | y ] ∝ ( (cid:81) i P [ y i | θ, ∆ , σ, b i ] P [ b i | θ, ∆ , σ ]) P [ θ, ∆] by conditional in-dependence of subject by subject observations and subject speciﬁc parameters. It fol-lows that max b ln P [ θ, ∆ , σ, b | y ] ∝ (cid:80) i max b i (ln P [ y i | θ, ∆ , σ, b i ] + ln P [ b i | θ, ∆ , σ ]) +ln P [ θ, ∆] . From now on we will use the estimate (2.2) of the previous section to con-struct a suitable approximation of G (1) ( θ, ∆ , σ | y ) = (cid:88) i max b i (ln P [ y i | θ, ∆ , σ, b i ] + ln P [ b i | θ, ∆ , σ ]) + ln P [ θ, ∆]as our criteria to estimate population parameters. As said in the previous section, wedeﬁne the optimal trajectory X θ,b i as the regression function for each subject. Therefore,we approximate P [ y i | θ, ∆ , σ, b i ] by (cid:101) P [ y i | θ, ∆ , σ, b i ] (cid:39) (cid:81) j (2 π ) − d o / σ − d o e − . (cid:107) CX θ,bi ( t ij ) − y ij (cid:107) /σ . By using the previous section computations, we get arg max b i (cid:16) ln (cid:101) P [ y i | θ, ∆ , σ, b i ] + ln P [ b i | θ, ∆ , σ ] (cid:17) =arg max b i (cid:16)(cid:80) j (cid:13)(cid:13) CX θ,b i ( t ij ) − y ij (cid:13)(cid:13) + (cid:107) ∆ b i (cid:107) (cid:17) . We regularize this estimation problemby approximating it via the addition of the Tikhonov penalization term on perturba-tion magnitude (cid:107) u i,θ,b i (cid:107) U,L , thus arg max b i (cid:16) ln (cid:101) P [ y i | θ, ∆ , σ, b i ] + ln P [ b i | θ, ∆ , σ ] (cid:17) (cid:39) arameter estimation in NLME-ODEs via optimal control theory arg max b i g i ( b i | θ, ∆ , U ) = (cid:98) b i ( θ, ∆) by using deﬁnition (2.3). Also, we use G (2) [ θ, ∆ , σ | y ] = (cid:88) i (cid:16) ln (cid:101) P (cid:104) y i | θ, ∆ , σ, (cid:98) b i ( θ, ∆) (cid:105) + ln P (cid:104)(cid:98) b i ( θ, ∆) | θ, ∆ , σ (cid:105)(cid:17) +ln P [ θ, ∆]as an approximation of G (1) . By replacing (cid:101) P [ y i | θ, ∆ , σ, b i ] and P [ b i | θ, ∆ , σ ] by theirvalues, we notice that arg max ( θ, ∆) (cid:110) G (2) [ θ, ∆ , σ | y ] (cid:111) = arg max ( θ, ∆) (cid:110) G (3) [ θ, ∆ , σ | y ] (cid:111) for every σ > G (3) [ θ, ∆ , σ | y ] = − σ (cid:80) i (cid:18)(cid:80) j (cid:13)(cid:13)(cid:13) CX θ, (cid:98) b i ( θ, ∆) ( t ij ) − y ij (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) (cid:52) (cid:98) b i ( θ, ∆) (cid:13)(cid:13)(cid:13) (cid:19) − . d o (cid:80) i n i + qn ) ln (cid:0) σ (cid:1) + 0 . n ln (cid:0)(cid:12)(cid:12) (cid:52) T (cid:52) (cid:12)(cid:12)(cid:1) + ln P [ θ, ∆] . Moreover, for each ( θ, ∆), the maximizer in σ of G (3) has a closed form expression: σ ( θ, ∆) = 1( d o (cid:80) i n i + qn ) (cid:88) i (cid:88) j (cid:13)(cid:13)(cid:13) CX θ, (cid:98) b i ( θ, ∆) ( t ij ) − y ij (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) (cid:52) (cid:98) b i ( θ, ∆) (cid:13)(cid:13)(cid:13)  . (2.4)By using the expression of σ ( θ, ∆) given by equation (2.4), we get that arg max ( θ, ∆) max σ G (3) ( θ, ∆ , σ | y ) = arg max ( θ, ∆) { G [ θ, ∆ | y ] } where: G [ θ, ∆ | y ] = − . (cid:32) d o (cid:88) i n i + qn (cid:33) ln (cid:0) σ ( θ, ∆) (cid:1) + n ln |(cid:52)| + ln P [ θ, ∆] . Thus we can proﬁle G (3) on sigma σ and deﬁne our estimator as: (cid:16)(cid:98) θ, (cid:98) ∆ (cid:17) = arg max ( θ, ∆) { G [ θ, ∆ | y ] } (2.5)to reduce the optimization problem dimension and focus on the structural parameters.An estimator of σ ∗ is obtained from there by computing σ (cid:16)(cid:98) θ, (cid:98) ∆ (cid:17) given by equation(2.4). The details of the outer criteria derivation are left in appendix A. In this section, we derive an estimator of the asymptotic variance of (cid:16)(cid:98) θ, (cid:98) ∆ (cid:17) . We highlightthat in practice the matrix ∆ is parametrized by a vector δ of dimension q (cid:48) , i.e (cid:52) := (cid:52) ( δ ).We give here a variance estimator of (cid:16)(cid:98) θ, (cid:98) δ (cid:17) . The variance of (cid:98) ∆ can be obtained using M ´elanie Prague classic delta-methods (see van der Vaart (1998) chapter 3). First of all, we drop thevector ﬁeld dependence in z and we introduce the function: h ( b i , θ, ∆ , y i ) = (cid:107) ∆ b i (cid:107) + (cid:88) j (cid:13)(cid:13) CX θ,b i ( t ij ) − y ij (cid:13)(cid:13) in order to present suﬃcient conditions ensuring our estimator is asymptotically normal:(a) the function (cid:101) G [ θ, ∆( δ )] = − . d o E [ n ] + q ) ln (cid:18) lim n n (cid:80) ni E [ h ( (cid:98) b ( θ, ∆( δ )) ,θ, ∆( δ ) ,y i ) ] d o E [ n ]+ q (cid:19) +ln | ∆( δ ) | has a well separated minimum (cid:0) θ, δ (cid:1) belonging to the interior of a compactΘ × Ω ⊂ R d × q (cid:48) (b) the true initial conditions (cid:110) x ∗ ,i (cid:111) i ∈ (cid:74) ,n (cid:75) ∈ (cid:74) , n (cid:75) have ﬁnite variance and either(i) they are i.i.d,(ii) for ν = 0 and ν = 1:lim n −→∞ (cid:0) V ( ν ) (cid:1) E (cid:34) n (cid:88) i =1 (cid:16) h ( ν ) ( y i ) − E (cid:104) h ( ν ) ( y i ) (cid:105)(cid:17) { h ( y i ) − E [ h ( y i ) ] >ε √ V ( ν ) } (cid:35) where h ( ν ) ( y i ) = d ( ν ) hd ( ν ) ( θ,δ ) ( (cid:98) b i ( θ, ∆( δ )) , θ, ∆( δ ) , y i ) and V ( ν ) = (cid:113)(cid:80) i V ar ( h ( ν ) ( y i )) ,(c) the subject speciﬁc number of observations { n i } i ∈ (cid:74) ,n (cid:75) are i.i.d and uniformly bounded,(d) for all possibles values ( θ, b i ), the solution X θ,b i ,x ∗ ,i belongs to a compact χ of R d ,and for all ( t, θ, x ), the mapping b i (cid:55)−→ f θ,b i ( t, x ) has a compact support Θ b ,(e) ( θ, b i , t, x ) (cid:55)−→ f θ,b i ( t, x ) belongs to C (Θ × Θ b × [0 , T ] × χ, R d ),(f) the matrices ∂ ∂ b i g i ( (cid:98) b i (cid:0) θ, ∆( δ ) (cid:1) | θ, ∆( δ ) , U ) and ∂ C i ∂ x ,i ( (cid:98) b i (cid:0) θ, ∆( δ ) (cid:1) , X θ, (cid:98) b i ( θ, ∆( δ )) (0) , u θ, (cid:98) b i ( θ, ∆( δ )) | θ, U ) are of full rank almost surely for every sequence y i ,(g) there is a neighborhood Θ θ of θ such that ( θ, b i , t, x ) (cid:55)−→ f θ,b i ( t, x ) ∈ C (Θ θ × Θ b × [0 , T ] × χ, R d ).Conditions 1-4 are used to derive the consistency of our estimator toward (cid:0) θ, δ (cid:1) by follow-ing classic steps for M-estimator by proving 1/the uniform convergence of our stochasticcost function to a deterministic one, 2/the existence of a well-separated minimum forthis deterministic function (van der Vaart (1998) chapter 5). Conditions 6-7 ensures thatour cost function is asymptotically smooth enough in the vicinity of (cid:0) θ, δ (cid:1) to proceedto a Taylor expansion and transfer the regularity of the cost function to the asymptotic arameter estimation in NLME-ODEs via optimal control theory behavior of √ n ( (cid:98) θ − θ, (cid:98) δ − δ ). Less restrictive conditions can be established under whichour estimator is still asymptotically normal, in particular regarding f θ,b i regularity withrespect to t . Also, we emphasize that the second assumption does not require to knowthe distribution of the x ∗ ,i . Theorem 2.1.

Under conditions 1-7, there is a model dependent lower bound λ suchthat if (cid:107) U (cid:107) > λ then the estimator (cid:16)(cid:98) θ, (cid:98) δ (cid:17) is asymptotically normal and: √ n ( (cid:98) θ − θ, (cid:98) δ − δ ) (cid:32) N (cid:16) , A ( θ, δ ) − B ( θ, δ ) (cid:0) A ( θ, δ ) − (cid:1) T (cid:17) where A ( θ, δ ) = lim n n (cid:80) ni =1 (cid:104) ∂ (cid:101) J ( θ,δ, y i ) ∂ ( θ,δ ) (cid:105) , B ( θ, δ ) = lim n n (cid:104)(cid:80) i (cid:101) J ( θ, δ, y i ) (cid:101) J ( θ, δ, y i ) T (cid:105) and the vector valued function (cid:101) J ( θ, δ, y i ) =  (cid:101) J θ ( θ, δ, y i ) (cid:101) J δ ( θ, δ, y i )  is given by: (cid:101) J θ ( θ, δ, y i ) = ddθ h ( (cid:98) b ( θ, ∆( δ )) , θ, ∆( δ ) , y i ) (cid:101) J δ ( θ, δ, y i ) = ddδ h ( (cid:98) b i ( θ, ∆( δ )) , θ, ∆( δ ) , y i ) − d o E [ n ]+ q T r (cid:16) (cid:52) ( δ ) − ∂ (cid:52) ( δ ) ∂δ k (cid:17) h ( (cid:98) b i ( θ, ∆( δ )) , θ, ∆( δ ) , y i ) . The proof is left in appendix C. The practical interest of this theorem is to give anestimator of Variance-Covariance: V ( (cid:98) θ, (cid:98) δ ) (cid:39) (cid:98) A ( (cid:98) θ, (cid:98) δ ) − (cid:98) B ( (cid:98) θ, (cid:98) δ ) (cid:16) (cid:98) A ( (cid:98) θ, (cid:98) δ ) − (cid:17) T /n. In the last equation the matrices (cid:98) A and (cid:98) B are deﬁned by:  (cid:98) A ( (cid:98) θ, (cid:98) δ ) = − n (cid:80) ni =1 ∂J ( (cid:98) θ, (cid:98) δ, y i ) ∂ ( θ,δ ) (cid:98) B ( (cid:98) θ, (cid:98) δ ) = n (cid:80) ni =1 J ( (cid:98) θ, (cid:98) δ, y i ) J ( (cid:98) θ, (cid:98) δ, y i ) T where the ( p + q ) components of the vector valued function J for 1 ≤ k ≤ p are given by J k ( θ, δ, y i ) = ddθ k h ( (cid:98) b ( θ, ∆( δ )) , θ, ∆( δ ) , y i )and for p + 1 ≤ k ≤ p + q by J k ( θ, δ, y i ) = ddδ k h ( (cid:98) b i ( θ, ∆( δ )) , θ, ∆( δ ) , y i ) − nd o (cid:80) i n i + qn T r (cid:18) (cid:52) ( δ ) − ∂ (cid:52) ( δ ) ∂δ k (cid:19) h ( (cid:98) b i ( θ, ∆( δ )) , θ, ∆( δ ) , y i ) . Now that we have proven the existence of the variance matrix V ( θ ∗ , δ ∗ ) such that (cid:98) δ − δ ∗ (cid:32) N (0 , V ( θ ∗ , δ ∗ )), we can use the Delta method to derive the asymptotic normality of the M ´elanie Prague original matrix Ψ (cid:16)(cid:98) δ (cid:17) = σ (cid:16) ∆( (cid:98) δ ) T ∆( (cid:98) δ ) (cid:17) − as well as an estimator of its asymptoticvariance. In the case of a diagonal matrix Ψ, composed of the elements (cid:0) Ψ , . . . Ψ q (cid:1) andof the parametrization (cid:52) ( δ ) =  e δ e δ q  used in section 4, we derive:  Ψ ( (cid:98) δ )...Ψ q ( (cid:98) δ )  −  Ψ ( δ ∗ )...Ψ q ( δ ∗ )  (cid:32) N  , σ  e − δ ∗ e − δ ∗ q  V ( θ ∗ , δ ∗ )  e − δ ∗ e − δ ∗ q  . Remark 2.2.

The previous theorem 2.1 states that we retrieve a parametric con-vergence rate despite a number of nuisance parameter increasing with the number ofsubjects. We avoid the pitfall described in Sartori (2003) for proﬁled methods, thanksto the i.i.d structure of the nuisance parameters. This allows us to prevent bias ac-cumulation for score functions among subjects by using the central limit theorem. Ourestimator shares similarities with conditional maximum likelihood ones and our proof forasymptotic normality follows similar steps as in Andersen (1970) since the { b i } i ∈ (cid:74) , n (cid:75) are i.i.d.

3. Numerical procedure for u i,θ,b i , X θ,b i and g i computation In this section we explain how to get numerical approximations for min x u ,i { min u i C i ( b i , x i, , u i | θ, U ) } and u i,θ,b i which are then used to evaluate X θ,b i and g i deﬁned by equation (2.3). Firstlywe approximate g i with a special type of discrete time optimal control problem, knownas ’tracking problem’. Secondly we adapt the method proposed by Cimen and Banks(2004a,b) to solve it. g i expression as an optimal control problem We introduce a pseudo-linear version of model (2.1):  ˙ x i ( t ) = A θ,b i ( t, x i ( t ) , z i ( t )) x i ( t ) + r θ,b i ( t, z i ( t )) + Bu i ( t ) x i (0) = x i, (3.1) arameter estimation in NLME-ODEs via optimal control theory where A θ,b i (resp. r θ,b i ) is a d × d sized matrix (resp. d dimensional vector) valued func-tion, linked to the original model by the relation A θ,b i ( t, x i ( t ) , z i ( t )) x i ( t )+ r θ,b i ( t, z i ( t )) = f θ,b i ( t, x i ( t ) , z i ( t )). This formulation is crucial for solving the optimal control problemdeﬁning our estimators in a computationally eﬃcient way. Linear models already ﬁtin this formalism with A θ,b i ( t, z i ( t )) := A θ,b i ( t, x i ( t ) , z i ( t )). For nonlinear models, thepseudo-linear representation is not unique but always exists Cimen and Banks (2004b)(in order to exploit this non-uniqueness as an additional degree of freedom, see Cimen(2008) section 6).We consider a discretized version of the perturbed ODE (2.1) to proceed to parametricestimation:  x i ( t dk +1 ) = (cid:0) I d + ∆ k A θ,b i ( t dk , x i ( t dk ) , z i ( t dk )) (cid:1) x i ( t dk ) + ∆ k r θ,b i ( t dk , z i ( t dk )) + B ∆ k u i ( t dk ) x i (0) = x i, (3.2)where the discretization is made at K i + 1 time points (cid:8) t dk (cid:9) ≤ k ≤ K i with t d = 0 and t dK i = t in i . This set contains the observations time points i.e. { t ij } ≤ j ≤ n i ⊂ (cid:8) t dk (cid:9) ≤ k ≤ K i ,but can be bigger and patient speciﬁc, allowing to accurately approximate X θ,b i ,x i, evenwhen the observations are sparse on [0 , T ]. We deﬁne: • ∆ k = t dk +1 − t dk , the mesh size between two discretization time-points, • u di the set of discrete values taken by the control at each time step i.e u di = (cid:0) u ( t dk ) , . . . , u ( t dK i − ) (cid:1) , • w k = 1 {∃ t ij | t ij = t dk } / ( t dk +1 − t dk ) i.e. w k is equal to 1 / ( t dk +1 − t dk ) if t dk corresponds toan observation time t ij , otherwise w k = 0, • y dk = y ij if t dk = t ij , 0 otherwise, • X dθ,b i ,x i, ,u di the solution of (3.2).The weights w k and the set of extended data (cid:8) y dk (cid:9) are introduced to have a vector ofobservations with the same length as (cid:8) t dk (cid:9) ≤ k ≤ K i . We now introduce the discretized M ´elanie Prague version of the cost C i to be minimized: C di ( b i , x i, , u di | θ, U ) = (cid:80) n i j =0 (cid:13)(cid:13)(cid:13) CX dθ,b i ,x i, ,u di ( t ij ) − y ij (cid:13)(cid:13)(cid:13) + (cid:80) K i − k =0 (cid:52) k u i ( t k ) T U u i ( t k )= (cid:13)(cid:13)(cid:13) CX dθ,b i ,x i, ,u di ( t in i ) − y in i (cid:13)(cid:13)(cid:13) + (cid:80) K i − k =0 (cid:52) k (cid:18)(cid:13)(cid:13)(cid:13) CX dθ,b i ,x i, ,u di ( t dk ) − y dk (cid:13)(cid:13)(cid:13) w k + u i ( t k ) T U u i ( t k ) (cid:19) . (3.3)such that our inner criteria g i can be approximated by: g i ( b i | θ, ∆ , U ) (cid:39) min x u ,i min u di C di ( b i , x i, , u di | θ, U ) + (cid:107) ∆ b i (cid:107) . The solution of this discrete control problem will be denoted u di,θ,b i , and the relatedoptimal trajectory X dθ,b i : they will be used as numerical approximations of u i,θ,b i and X θ,b i respectively. We present how to numerically obtain min x u ,i min u di C di ( b i , x i, , u di | θ, U ) as well as thecorresponding minimizer u di,θ,b i . We start with linear ODE models (section 3.2.1), thenwe consider nonlinear models (section 3.2.2).

Here, we suppose A θ,b i ( t, z i ( t )) := A θ,b i ( t, x, z i ( t )) in model (1.1), for the sake of claritywe drop the dependence of A θ,b i and r θ,b i in z i . For a given set ( θ, b i , x i, ), Linear-Quadratic theory ensures the existence and uniqueness of the optimal control u di,θ,b i andthat min x u ,i min u di C di ( b i , x i, , u di | θ, U ) can be computed by solving a discrete ﬁnal valueproblem, called the Riccati equation (e.g. Sontag (1998)). Proposition 3.1.

Let us introduce ( R θ,b i ,k , h θ,b i ,k ) for ≤ k ≤ K i , the solution of arameter estimation in NLME-ODEs via optimal control theory the discrete Riccati equation:  R θ,b i ,k = R θ,b i ,k +1 + (cid:52) k w k C T C + ∆ k (cid:0) R θ,b i ,k +1 A θ,b i ( t dk ) + A θ,b i ( t dk ) T R θ,b i ,k +1 (cid:1) + (cid:52) k A θ,b i ( t dk ) T R θ,b i ,k +1 A θ,b i ( t dk ) − (cid:52) k ( I d + (cid:52) k A θ,b i ( t dk ) T ) R θ,b i ,k +1 BG ( R θ,b i ,k +1 ) B T R θ,b i ,k +1 ( I d + (cid:52) k A θ,b i ( t dk )) h θ,b i ,k = h θ,b i ,k +1 − (cid:52) k w k C T y dk + (cid:52) k A θ,b i ( t dk ) T h θ,b i ,k +1 + ∆ k (cid:0) I d + ∆ k A θ,b i ( t dk ) (cid:1) T R θ,b i ,k +1 r θ,b i ( t dk ) − ∆ k ( I d + ∆ k A θ,b i ( t dk )) T R θ,b i ,k +1 BG ( R θ,b i ,k +1 ) B T (cid:0) h θ,b i ,k +1 + ∆ k R θ,b i ,k +1 r θ,b i ( t dk ) (cid:1) (3.4) with ﬁnal condition ( R θ,b i ,K i , h θ,b i ,K i ) = ( C T C, − C T y in i ) and G ( R θ,b i ,k +1 ) = (cid:2) U + (cid:52) k B T R θ,b i ,k +1 B (cid:3) − . Hence we get: g i ( b i | θ, ∆ , U ) = (cid:107) ∆ b i (cid:107) + y Tin i y in i − (cid:16) R ukθ,b i , x k ,i + h uθ,b i , (cid:17) T (cid:16) R uθ,b i , (cid:17) − (cid:16) R ukθ,b i , x k ,i + h uθ,b i , (cid:17) + (cid:16) x k ,i (cid:17) T R kθ,b i , x k ,i + 2 (cid:16) h kθ,b i , (cid:17) T x k ,i + (cid:80) K m − k =0 (cid:52) k (cid:16) w k (cid:0) y dk (cid:1) T y dk + (cid:16) h θ,b i ,k +1 ) T + ∆ k r θ,b i ( t dk ) T R θ,b i ,k +1 (cid:17) r θ,b i ( t dk ) (cid:17) − (cid:80) K m − k =0 (cid:52) k (cid:0) h θ,b i ,k +1 + ∆ k R θ,b i ,k +1 r θ,b i ( t dk ) (cid:1) T BG ( R θ,b i ,k +1 ) B T (cid:0) h θ,b i ,k +1 + ∆ k R θ,b i ,k +1 r θ,b i ( t dk ) (cid:1) (3.5) where R uθ,b i , , R ukθ,b i , , R kθ,b i , , h uθ,b i , and h kθ,b i , are given by the following decomposition R θ,b i , :=  R uθ,b i , R ukθ,b i , (cid:16) R ukθ,b i , (cid:17) T R kθ,b i ,  and h θ,b i , := (cid:16) h uθ,b i , h kθ,b i , (cid:17) . Moreover, the con-trol u di,θ,b i which minimizes the cost (3.3) is unique and equal to: u di,θ,b i ( t dk ) = − G ( R θ,b i ,k +1 ) B T (cid:16) R θ,b i ,k +1 (cid:16)(cid:16) I d + (cid:52) k A θ,b i ( t dk ) (cid:17) X dθ,b i ( t dk ) + ∆ k r θ,b i ( t dk ) (cid:17) + h θ,b i ,k +1 (cid:17) (3.6) where X dθ,b i is the optimal trajectory, i.e. the solution of the initial value problem:  X dθ,b i ( t dk +1 ) = (cid:0) I d + (cid:52) k A θ,b i ( t dk ) (cid:1) X dθ,b i ( t dk ) + ∆ k r θ,b i ( t dk ) − (cid:52) k BG ( R θ,b i ,k +1 ) B T R θ,b i ,k +1 (cid:16)(cid:0) I d + (cid:52) k A θ,b i ( t dk ) (cid:1) X dθ,b i ( t k ) + ∆ k r θ,b i ( t dk ) (cid:17) − (cid:52) k BG ( R θ,b i ,k +1 ) B T h θ,b i ,k +1 (3.7) with estimator (cid:100) x ui, = − (cid:16) R uθ,b i , (cid:17) − (cid:16) R ukθ,b i , x k + h uθ,b i , (cid:17) for x ui, . Remark 3.1.

The theoretical basis for replacing C di and the perturbed ODE (2.1) bytheir discretized counterparts can be found in Clairon (2020) where, under mild regularity M ´elanie Prague conditions on A θ,b i and r θ,b i , X dθ,b i and u di,θ,b i converge to the solution of the continuousoptimal control problem. We adapt the method proposed by Cimen and Banks (2004b) to solve tracking problemfor discrete time models. The outline of the method is the following: we replace theoriginal problem (3.3) by a recursive sequence of problems, where the l -th one is deﬁnedby:min u di C d,li ( b i , x i, , u di | θ, U ) := (cid:13)(cid:13)(cid:13) CX d,lθ,b i ,x i, ,u di ( t in i ) − y in i (cid:13)(cid:13)(cid:13) + (cid:80) K i − k =0 (cid:52) k (cid:18)(cid:13)(cid:13)(cid:13) CX d,lθ,b i ,x i, ,u di ( t dk ) − y dk (cid:13)(cid:13)(cid:13) w k + u i ( t k ) T U u i ( t k ) (cid:19) such that  x i ( t dk +1 ) = (cid:16) I d + ∆ k A θ,b i ( t dk , X d,l − θ,b i ( t dk ) , z i ( t dk )) (cid:17) x i ( t dk ) + ∆ k r θ,b i ( t dk , z i ( t dk )) + B ∆ k u i ( t k ) x i (0) = x i, . (3.8)where X d,l − θ,b i is the solution of problem (3.8) at iteration l −

1. Thus, for each l thematrix A θ,b i ( t dk , X d,l − θ,b i ( t dk ) , z i ( t dk )) does not depend on x i and the problem (3.8) is aLinear-Quadratic one. We use the results of section 3.2.1 to construct the followingalgorithm:(a) Initialization phase: X u,d, θ,b i ( t dk ) = x u,ri, for all k ∈ (cid:74) , n i (cid:75) where x u,ri, is an arbitrarystarting point for the unknown initial condition and X k,d, θ,b i ( t dk ) = x ki, . (b) At iteration l : use proposition 3.1 to obtain ( R lθ,b i , h lθ,b i ) , u d,li,θ,b i , X d,lθ,b i and g li ( b i | θ, ∆ , U ).(c) If (cid:80) K i k =1 (cid:13)(cid:13)(cid:13) X d,lθ,b i ( t dk ) − X d,l − θ,b i ( t dk ) (cid:13)(cid:13)(cid:13) < ε and (cid:12)(cid:12)(cid:12) g li ( b i | θ, ∆ , U ) − g l − i ( b i | θ, ∆ , U ) (cid:12)(cid:12)(cid:12) <ε , then step 4; otherwise get back to step 2.(d) Set ( R θ,b i , h θ,b i ) = ( R lθ,b i , h lθ,b i ) , u di,θ,b i = u d,li,θ,b i , X dθ,b i = X d,lθ,b i and g i ( b i | θ, ∆ , U ) = g li ( b i | θ, ∆ , U ).

4. Results on simulated data

We compare the accuracy of our approach with maximum likelihood (ML) in diﬀer-ent models and experimental designs reﬂecting the problems exposed in introduction, arameter estimation in NLME-ODEs via optimal control theory that is estimation in 1/presence of model error, 2/partially observed framework withunknown initial conditions and 3/presence of poorly identiﬁable parameters. For thefairness of comparison with ML where no prior is speciﬁed, we choose a non-informativeone i.e. ln P [ θ, ∆] = 0 for our method throughout this section. If the diﬀerentialequation (1.1) has an analytical solution, the ML estimator is computed via SAEMalgorithm (SAEMIX package Comets et al. (2017)). Otherwise it is done via a re-stricted likelihood method dedicated to ODE models implemented in the nlmeODEpackage Tornoe et al. (2004). For both our method and the ML, we proceed to Monte-Carlo simulations based on N MC = 100 runs. At each run, we generate n i observa-tions coming from n subjects on an observation interval [0 , T ] with Gaussian measure-ment noise of standard deviation σ ∗ . From these data, we estimate the true popu-lation parameters θ ∗ , Ψ ∗ as well as the subject parameter realizations b ∗ i ∼ N (0 , Ψ ∗ )with both estimation methods. We quantify the accuracy of each entry (cid:98) ψ p of thepopulation parameters estimate (cid:98) ψ = (cid:16)(cid:98) θ, (cid:98) Ψ (cid:17) via Monte-Carlo computation of the bias b ( (cid:98) ψ p ) = E (cid:104) (cid:98) ψ p (cid:105) − ψ ∗ p , the empirical variance V emp ( (cid:98) ψ p ) = E (cid:20)(cid:16) E (cid:104) (cid:98) ψ p (cid:105) − ψ ∗ p (cid:17) (cid:21) , the meansquare error M SE ( (cid:98) ψ p ) = b ( (cid:98) ψ p ) + V emp ( (cid:98) ψ p ), the estimated variance (cid:98) V (cid:16) (cid:98) ψ p (cid:17) as wellas the coverage rate of the 95%-conﬁdence interval derived from it, it corresponds tothe frequency at which the interval (cid:20) (cid:98) ψ p ± z . (cid:114) (cid:98) V (cid:16) (cid:98) ψ p (cid:17)(cid:21) contains ψ ∗ p with z . the0 . − quantile of the centered Gaussian law. We compute the previous quantities forthe normalized values (cid:98) ψ normp := (cid:98) ψ p ψ ∗ p to make relevant comparisons among parameterswith diﬀerent order of magnitude. For the subject speciﬁc parameter, we estimate themean square error M SE ( (cid:98) b i ) = E (cid:20)(cid:13)(cid:13)(cid:13) b ∗ i − (cid:98) b i (cid:13)(cid:13)(cid:13) (cid:21) . For each subsequent examples, we givethe results for n = 50 and present in appendix B the case n = 20 to analyze the evolutionof each estimator accuracy with respect to the sparsity of the available observations.For our method, we need to select U the matrix appearing in the inner criteria deﬁni-tion (2.3) balancing model and data ﬁdelity. We use for this the forward cross-validationmethod presented in G. Hooker and Earn (2011). Let us denote (cid:99) θ U , (cid:110)(cid:100) b i,U (cid:111) i ∈ (cid:74) , n (cid:75) theestimators obtained for a given matrix U . For each subject i , we split [0 , T ] into H = 2sub-intervals [ t h , t h +1 ], such that t = 0 and t H = T . We denote X θ,b i ( ., t h , x h ) the solu-tion of ˙ x ( t ) = f θ,b i ( t, x i ( t ) , z i ( t )) deﬁned on the interval [ t h , t h +1 ] with initial condition M ´elanie Prague X θ,b i ( t h , t h , x h ) = x h . The forward cross-validation uses the causal relation imposed tothe data by the ODE to quantify the prediction error:EP( i, U ) = H (cid:88) h =1 (cid:88) { t ij ∈ [ t h , t h +1 ] } (cid:13)(cid:13)(cid:13) y ij − CX (cid:99) θ U , (cid:100) b i,U ( t ij , t h , X (cid:99) θ U , (cid:100) b i,U ( t h )) (cid:13)(cid:13)(cid:13) . The rationale of this selection method is the following: if U is too small, CX (cid:99) θ U , (cid:100) b i,U ( t h ) willbe close to y h but not to the actual ODE solution, and t (cid:55)−→ CX (cid:99) θ U , (cid:100) b i,U ( t, t h , X (cid:99) θ U , (cid:100) b i,U ( t h ))will diverge from the observations on [ t h , t h +1 ]. If U is too large, X (cid:99) θ U , (cid:100) b i,U ( t h ) will beclose to the ODE solution but far from y h and it will lead to a large value for EP( i, U ).Thus, a proper value for U which minimizes EP( i, U ) will be chosen between these twoextreme cases. The global prediction error for the whole population is computed withEP( U ) = (cid:80) i EP( i, U ). We retain the matrix U which minimizes EP among a trialof tested values and we denote (cid:98) θ, (cid:98) Ψ , (cid:110) (cid:98) b i, (cid:111) i ∈ (cid:74) , n (cid:75) the corresponding estimator. In thefollowing, we use the subscript M L to denote the ML estimator.For solving the optimization problems required for computing our inner and outercriteria, we use the Nelder-Mead algorithm implemented in the optimr package Nash(2016). All optimization algorithms used by the estimation methods require a startingguess value. We start from the true parameter value for each of them. By doing so,we aim to do not mix two distinct problems: 1)the numerical stability of the estimationprocedures, 2)the intrinsic accuracy of the diﬀerent estimators. These two problems arecorrelated, but we aim to adress only the latter which corresponds to the issues raised inintroduction. Still, we check on preliminary analysis that local minima presence was notan issue in the vicinity of ( θ ∗ , (cid:52) ∗ ) by testing diﬀerent starting points for all methods. Noproblem appears for our method and SAEMIX. A negligible number of non convergencecases appear for nlmODE which have been discarded thanks to the convergence criteriaembedded in the package. arameter estimation in NLME-ODEs via optimal control theory We consider the population model where each subject i follows the ODE:  ˙ X ,i = φ ,i X ,i − φ ,i X ,i ˙ X ,i = − φ ,i X ,i ( X ,i (0) , X ,i (0)) = ( x , , x , ,i ) (4.1)with the following parametrization:  log( φ ,i ) = θ + b i log( φ ,i ) = θ where b i ∼ N (0 , Ψ). The true population parameter values are θ ∗ = ( θ ∗ , θ ∗ ) = (log (0 . , log (2)),and Ψ ∗ = 0 . and we are in a partially observed framework where only X ,i is acces-sible. The true initial conditions are subject speciﬁcs and normally distributed with x ∗ , ,i ∼ N (2 , .

5) and x ∗ , ,i ∼ N (3 , . ODE (4.1) has an analytic solution given by X ,i ( t ) = e − φ ,i t ( x , + x , φ ,i φ ,i − φ ,i ( e ( φ ,i − φ ,i ) t − n i = 11 ob-servations per subject on [0 , T ] = [0 ,

10] with Gaussian measurement noise of standarddeviation σ = 0 .

05. An example of observations and corresponding solution is plottedin ﬁgure 4.1.We want to investigate the impact of initial condition, especially the unobserved one x ∗ , ,i , on the ML estimator accuracy. Indeed, our method does not need to estimate x ∗ , ,i and thus no additional diﬃculties appear in this partially observed framework. Forthe ML, however, it is nuisance subject-speciﬁc parameter that should be estimated andfor which no observations are available. For this, we compute (cid:98) θ MLx , ,x , , (cid:98) θ ML,x , and (cid:98) θ ML the ML estimator respectively when: 1) both initial conditions are perfectly known,2) x ∗ , ,i is replaced by the measured value, 3)in addition x ∗ , ,i has to be estimated. We present the estimation results in table 4.1. For ML, the results are goods in terms ofaccuracy and consistent in terms of asymptotic conﬁdence interval coverage rate whenboth initial conditions are known: 95% for θ and θ in accordance with theoretical M ´elanie Prague

Time X1X2Observations

Fig. 4.1.

Examples of (4.1) solutions and corresponding observations. results. However, there is a signiﬁcant drop in accuracy when x ∗ , ,i has to be estimated,especially for θ . In particular, the coverage rate drops to 86% and 80% for θ and θ respectively. Interestingly, ML inaccuracy is driven by bias and under-estimated vari-ance when initial conditions are not known. In this case our method provides a relevantalternative: it gives accurate estimations with a good coverage rate for all parameterswhile avoiding the estimation of the unobserved initial conditions. Estimation of indi-vidual random eﬀects is also more accurate with our method, with a decrease of morethan 90% of MSE for b i comparing to ML. To mimic misspeciﬁcation presence, we now generate the observations from the hypoel-liptic stochastic model:  dX ,i = φ ,i X ,i dt − φ ,i X ,i dtdX ,i = − φ ,i X ,i dt + αdB t ( X ,i (0) , X ,i (0)) = ( x , , x , ,i ) (4.2) arameter estimation in NLME-ODEs via optimal control theory Table 4.1.

Results of estimation for model (4.1). Thedifferent subscripts stand for the following estimation sce-narios: 1) ( x , , x , ) when both initial conditions are set to (cid:0) x ∗ , , x ∗ , (cid:1) , 2) x , when x ,i is set to y i, and x , to x ∗ , ,3/absence of subscript when x ,i is set to y i, and x , is es-timated. Results from our method are in bold. Well-speciﬁed modelMSE Bias Emp. Var Est. Var Cov. Rate MSE b i θ (cid:98) θ ML,x , ,x , (cid:98) θ ML,x , (cid:98) θ ML (cid:98) θ θ (cid:98) θ ML,x , ,x , (cid:98) θ ML,x , (cid:98) θ ML (cid:98) θ Ψ (cid:98) θ ML,x , ,x , (cid:98) θ ML,x , (cid:98) θ ML (cid:98) θ Misspeciﬁed model MSE Bias Emp. Var Est. Var Cov. Rate MSE b i θ (cid:98) θ ML,x , ,x , (cid:98) θ ML,x , (cid:98) θ ML (cid:98) θ θ (cid:98) θ ML,x , ,x , (cid:98) θ ML,x , (cid:98) θ ML (cid:98) θ Ψ (cid:98) θ ML,x , ,x , (cid:98) θ ML,x , (cid:98) θ ML (cid:98) θ M ´elanie Prague . . . . . . . Time ODE solutionSDE solutionObservations

Fig. 4.2.

Solution of (4.1) and a realization of (4.2) for the same parameter values. with B t a Wiener process and α = 0 . α will be hard to estimate when we only have observations for X ,i . Thus,we still estimate the parameters from the model (4.1) which is now seen as a determin-istic approximation of the true stochastic process. Still, it is expected that our methodwill mitigate the eﬀect of stochasticity on the estimation accuracy by taking into ac-count model error presence. Results are presented in table 4.1. The diﬀerences betweenthe two methods are similar to the previous well-speciﬁed case with an additional lossof accuracy coming from model error for both estimators. However, the misspeciﬁca-tion eﬀect for SAEM is more pronounced than for our method which manages to limitthe damages done. This conﬁrms the beneﬁts of taking into account model uncertaintyfor the regularization of the inverse problem, in particular when model error occurs inthe unobserved compartment, a situation in which classic statistical criteria for modelassessment based on a data ﬁtting criterion are diﬃcult to use. arameter estimation in NLME-ODEs via optimal control theory We consider a simpliﬁed version of the model used in Tornoe et al. (2004) for the analysisof glucose and insulin regulation:  ˙ G i = S G ( G B − G i ) − X i G i ˙ I i = γt ( G i − h ) − n i ( I i − I B )˙ X i = − p ( X i + S I ( I i − I B )) . (4.3)We are in a partially observed framework where only the glucose ( G i ) and insulin ( I i )concentration are measured. The values of parameters ( p , γ, h, G B , I B ) are ﬁxed to( − . , − . , . , , θ = ( θ S G , θ S I , θ n ), linked to theoriginal model via the parametrization:  log( S G ) = θ S G log( S I ) = θ S I log( n i ) = θ n + b i where b i ∼ N (0 , Ψ). The true population parameter values are θ ∗ = ( − . , − . , − . ∗ = 0 . . The true initial conditions x ∗ i, = (cid:16) G ∗ ,i , I ∗ ,i , X ∗ ,i (cid:17) are subject-speciﬁcand distributed according to ln( x ∗ i, ) ∼ N ( l x ∗ , Ψ l x ∗ ) with l x ∗ = (5 . , . , −

7) andΨ l x ∗ = (cid:0) . , . , − (cid:1) . We generate n i = 5 observations on [0 , T ] = [0 , σ ∗ = 3. As in the previous example,we investigate the impact of unknown initial conditions on estimators accuracy. We areparticularly interested in the joint estimation of θ S I , which appears only in the equa-tion ruling the unobserved state variable X i , and X ∗ ,i required for each subject by themaximum likelihood based method. For this, we distinguish two cases, 1)when θ S I isknown, 2)when θ S I has to be estimated here and we respectively denote (cid:99) θ S i and (cid:98) θ thecorresponding estimators. Finally, since the model is nonlinear we have to specify apseudo-linear representation to use the algorithm presented in section 3.2.2: A θ,b i ( t, G i , I i , X i ) =  − S G − G i γt − n i − p S I − p  , r θ,b i ( t ) =  S G G B − γth + n i I B − p S I I B  . M ´elanie Prague

We present the estimation results in table 4.2. Our method obtains smaller MSE thanML and escapes the drop in coverage rate of the conﬁdence interval in the case of θ ∗ S I estimation. The diﬀerence between the two estimators behavior is explained by the factthat they are deﬁned through the construction of two diﬀerent optimization problems.At the population level our approach leads to minimize a cost function depending ona 4-dimensional parameter whereas ML, due to its need to estimate x ∗ i, , considers a 7-dimensional one. Thus, the topology of the parameter spaces explored by each methodto look for the minimum are very diﬀerent. To mimic misspeciﬁcation presence, we generate the observations from the stochasticmodel:  dG i = ( S G ( G B − G i ) − X i G i ) dt + α dB ,t dI i = ( γt ( G i − h ) − n i ( I i − I B )) dt + α dB ,t dX i = ( − p ( X i + S I ( I i − I B ))) dt + α dB ,t (4.4)where the B i,t are Wiener processes and ( α , α , α ) = (cid:0) , , × − (cid:1) their diﬀusioncoeﬃcients. We present the estimation results in table 4.2. For ML, the drop in coveragerate for θ ∗ S G and θ ∗ S I is even more striking when θ ∗ S I needs to be estimated. This isexplained by the eﬀect of model misspeciﬁcation which increases bias and the fact thatML does not take into account this new source of uncertainty leading here to under-estimation of variance and too narrow conﬁdence intervals. We consider the model presented in Pasin et al. (2019) to analyze the antibody concen-tration, denoted Ab i , generated by two populations of antibody secreting cells: the short arameter estimation in NLME-ODEs via optimal control theory Table 4.2.

Results of estimation for model (4.3). The different subscripts stand forthe following estimation scenarios: 1) S i when S i is set to S ∗ i , 2)absence of subscriptwhen S i is estimated. Results from our method are in bold. Well-speciﬁed modelMSE Bias Emp. Var Est. Var Cov. Rate MSE b i θ S G (cid:98) θ ML,S i (cid:98) θ ML (cid:98) θ S i (cid:98) θ θ S I (cid:98) θ ML,S i known (cid:98) θ ML (cid:98) θ S i known (cid:98) θ θ n (cid:98) θ ML,S i (cid:98) θ ML (cid:98) θ S i (cid:98) θ Ψ (cid:98) θ ML,S i (cid:98) θ ML (cid:98) θ S i (cid:98) θ Misspeciﬁed modelMSE Bias Emp. Var Est. Var Cov. Rate MSE b i θ S G (cid:98) θ ML,S i (cid:98) θ ML (cid:98) θ S i (cid:98) θ θ S I (cid:98) θ ML,S i known (cid:98) θ ML (cid:98) θ S i known (cid:98) θ θ n (cid:98) θ ML,S i (cid:98) θ ML (cid:98) θ S i (cid:98) θ Ψ (cid:98) θ ML,S i (cid:98) θ ML (cid:98) θ S i (cid:98) θ M ´elanie Prague

Table 4.3.

Biological interpretation and parameter values

Parameters Biological interpretation Values δ L long-lived cells declining rate log(2) / (364 × θ ∗ ψ ∗ δ S Mean log-value for δ S , the short-lived cells declining rate log(log(2) / . (cid:39) − . ψ ∗ φ S Mean log-value for φ S , the antibodies inﬂux from short-lived cells log(2755) (cid:39) . ψ ∗ φ L Mean log-value for φ L , the antibodies inﬂux from long-lived cells log(16) (cid:39) . ψ ∗ δ Ab Mean log-value for δ Ab , the antibodies declining rate log(log(2) / (cid:39) − . ∗ Ψ ∗ φ S Inter individual variance for log( φ S,i ) 0 . Ψ ∗ φ L Inter individual variance for log( φ L,i ) 0 . Ψ ∗ δ Ab Inter individual variance for log( δ Ab,i ) 0 . lived, denoted S i , and the long-lived, denoted L i :  ˙ S i = − δ S S i ˙ L i = − δ L L i ˙ Ab i = θ S,i S i + θ L,i L i − δ Ab,i Ab i ( S i (0) , L i (0) , Ab i (0)) = ( S ,i , L ,i , Ab ,i ) . (4.5)This model is used to quantify the humoral response on diﬀerent populations after anEbola vaccine injection with a 2 doses regimen seven days after the second injection whenthe antibody secreting cells enter in a decreasing phase. These cells being unobserved,the preceding equation can be simpliﬁed to focus on antibody concentration evolution:˙ Ab i = φ S,i e − δ S t + φ L,i e − δ L t − δ Ab,i Ab i (4.6)with φ S,i := θ S,i S ,i and φ L,i := θ L,i L ,i . This equation has an analytic solution whichwill be used for maximum likelihood estimation with SAEMIX. We consider the followingparametrization:  log( δ S ) = ψ δ S log( φ S,i ) = ψ φ S + b φ S ,i log( φ L,i ) = ψ φ L + b φ L ,i log( δ Ab,i ) = ψ δ Ab + b δ Ab ,i . The true parameter values are presented in table 4.3. According to Pasin et al. (2019),the parameter δ L was non-identiﬁable and only a lower bound has been derived for it via arameter estimation in NLME-ODEs via optimal control theory proﬁled likelihood. So, to make fair comparisons between our approach and maximumlikelihood, we do not estimate it. Regarding population parameters, we are particularlyinterested in the behavior of estimation methods for ψ δ S and ψ φ S . Indeed a parametersensitivity analysis shows the symmetric role of ψ δ S and ψ φ S on the ODE solution (seeBalelli et al. (2020)). Thus, they are likely to face practical identiﬁability problems.To investigate this eﬀect, we estimate the parameters when 1) ψ ∗ δ S is known (the corre-sponding estimators will be denoted with the subscript ψ δ S ), 2) it has to be estimatedas well. We generate n i = 11 observations on the interval [0 , T ] = [0 , σ ∗ = 100. For each subject i , the initial conditionhas been generated according to Ab ∗ ,i ∼ N ( Ab , σ Ab ) with Ab = 500 and σ Ab = 260to reﬂect the dispersion observed in the data presented in Pasin et al. (2019). We presentthe estimation results in table 4.4.Our method improves the estimation of ψ ∗ δ S facing practical identiﬁability problemscomparing to the ML. In particular our method reduces its variance. As advocated inthe introduction, our approach provides an improved estimate for the { b ∗ i } i ∈ (cid:74) , n (cid:75) . Weassume that is due to the committed estimation error for θ ∗ , as it causes model error for { b ∗ i } i ∈ (cid:74) , n (cid:75) estimation, which is not taken into account by exact methods. This in turnexplains why their variance Ψ ∗ is better estimated with our approach. In this mixed-eﬀect context, this cause of model error is systematically present and claims for the useof estimation methods taking into account modeling uncertainties when subject speciﬁcparameters are critical for the practitioner. The data are now generated with a stochastic perturbed version of the original model: dAb i = (cid:16) φ S,i e − δ S t + φ L,i e − δ L t − δ Ab,i Ab i (cid:17) dt + αdB t (4.7) M ´elanie Prague

Table 4.4.

Results of estimation for model (4.6). The different subscripts standfor the following estimation scenarios: 1) ψ δ S when ψ δ S is set to ψ ∗ δ S , 2)absence ofsubscript when ψ δ S is estimated. Results from our method are in bold. Well-speciﬁed modelMSE Bias Emp. Var Est. Var Cov. Rate MSE b i ψ δ S (cid:98) θ ML,ψ δS known (cid:98) θ ML (cid:98) θ ψ δS known (cid:98) θ ψ φ S (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ ψ φ L (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ ψ δ Ab (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ Ψ φ S (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ Ψ φ L (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ Ψ δ Ab (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ arameter estimation in NLME-ODEs via optimal control theory where B t is a Wiener process and α = 10 its diﬀusion coeﬃcient. The value for α hasbeen chosen big enough to produce signiﬁcantly perturbed trajectories but small enoughto ensure that ODE (4.6) is still a relevant approximation for estimation purpose. Wekeep the same parameter values and measurement noise level as in the previous section.The results are presented in table 4.5.Our method still outperforms the maximum likelihood for ψ ∗ δ S as well as the { b ∗ i } i ∈ [1 , n ] estimation and their variances. In addition, we mitigate the eﬀect of model error onestimation accuracy.

5. Real data analysis

We now proceed to the estimation using real data presented in Pasin et al. (2019) fromwhich the parameter values given in table 4.3 come from. In Pasin et al. (2019) , theestimation is made from cohorts coming from three phase I trials performed in Africanand European countries. Each subject was vaccinated with two doses, Ad26.ZEBOV(Janssen Vaccines and Prevention) and MVA-BN-Filo (Bavarian Nordic). In these co-horts, both the eﬀect of injection order, either Ad26.ZEBOV ﬁrst and MVA-BN-Filosecond, or MVA-BN-Filo ﬁrst and Ad26.ZEBOV second, and the delay between, 28 or56 days, were evaluated. In this study, we focus on an east African subpopulation whereAd26.ZEBOV was injected ﬁrst and then MVA-BN-Filo with a delay of 28 days be-tween the two doses. As in Pasin et al. (2019) and the simulation section, to stay in thetemporal domain of validity of the model we use measurements made seven days afterthe second dose injection. It leaves us with 5 measurements of antibody concentrationbetween days 7 up to days 330 per subject. The estimation in the original work has beendone using the NIMROD software Prague et al. (2013) and log-transformed antibodyconcentration measurement. We now estimate the parameters with our method withthe aim to compare our results with the existing one. We used the same prior distri-bution π ( θ ) ∼ N  − − .  , 

25 0 0 00 100 0 00 0 100 00 0 0 1  for θ = ( ψ δ S , ψ φ S , ψ φ L , ψ δ Ab ) as M ´elanie Prague

Table 4.5.

Results of estimation in presence of model error when data are gen-erated from (4.7). The different subscripts stand for the same estimation scenarioas in table 4.4. Results from our method are in bold.

Misspeciﬁed modelMSE Bias Emp. Var Est. Var Cov. Rate MSE b i ψ δ S (cid:98) θ ML,ψ δS known (cid:98) θ ML (cid:98) θ ψ δS known (cid:98) θ ψ φ S (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ ψ φ L (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ ψ δ Ab (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ Ψ φ S (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ Ψ φ L (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ Ψ δ Ab (cid:98) θ ML,ψ δS (cid:98) θ ML (cid:98) θ ψ δS (cid:98) θ arameter estimation in NLME-ODEs via optimal control theory Table 5.1.

Estimation presented in Pasin et al. (2019) (left)and via our approach (right)

Pasin et al. CI (95%) OCA CI (95%) ψ δ S -0.57 [-1.02, -0.02] -0.18 [-0.58, 0.22] ψ φ S ψ φ L ψ δ Ab -3.54 [-3.62, -3.45] -3.48 [-3.95, -3.01]Ψ φ S φ L δ Ab the one deﬁned in the NIMROD software. We choose our mesh-size such that we get 200discretization points for each subject on the observation interval and we use U = 10 i.e.a value lower than in the simulated data case because of the model error presence. Wealso proceed to the log-transformation of the data to stabilize the measurement noisevariance. This drives us to use the nonlinear model:˙ (cid:103) Ab i ( t ) = 1ln(10) (cid:16) φ S,i e − δ S t + φ L,i e − δ L t (cid:17) − (cid:103) Ab i ( t ) − δ Ab,i ln(10) (5.1)describing the dynamic of (cid:103) Ab i ( t ) := log Ab i ( t ) for parameter estimation purpose. Weuse A θ,b i ( t, x, z i ( t )) = (cid:0) φ S,i e − δ S t + φ L,i e − δ L t (cid:1) − x x and r θ,b i ( t, z i ( t )) = − δ Ab,i ln(10) forthe pseudo-linear formulation of the model. Our estimations and the ones from theoriginal paper Pasin et al. (2019) are presented in Table 5.1 for the sake of comparison. Inthe following, we denote (cid:16)(cid:98) θ P , (cid:98) b iP (cid:17) (respectively (cid:16)(cid:98) θ, (cid:98) b i (cid:17) ) the estimation obtained by Pasinet al. (2019) (respectively our approach) for the mean population parameter and subjectspeciﬁc ones. Both methods produce estimations with overlapping conﬁdence intervalsfor θ . Still, signiﬁcant diﬀerences appear for (Ψ φ S , Ψ φ L , Ψ δ Ab ) estimation which quantiﬁesthe dispersion of random eﬀects. This is explained by the fact that we only consider asubset of the subjects used in Pasin et al. (2019) for estimation. This has an eﬀect on theobserved diversity within the cohort of patients and thus on (Ψ φ S , Ψ φ L , Ψ δ Ab ) estimation.Regarding the predictions, we present in ﬁgure 5.1 examples of estimated trajectories.The conﬁdence intervals are computed via Monte-Carlo sampling from the approximated M ´elanie Prague

Time: 7 days after second dose injection (Days) E L I SA un i t s / m l OCA based estimation NIMROD based estimation

Estimated trajectories

Time: 7 days after second dose injection (Days) E L I SA un i t s / m l OCA based estimation NIMROD based estimation

Estimated trajectories

Time: 7 days after second dose injection (Days) E L I SA un i t s / m l OCA based estimation NIMROD based estimation

Estimated trajectories

Time: 7 days after second dose injection (Days) E L I SA un i t s / m l OCA based estimation NIMROD based estimation

Estimated trajectories

Fig. 5.1.

Examples of ﬁtted trajectories for both methods for different subjects. Here Time=0is the 7th day post-second dose. Dashed lines: ﬁtted ODE solutions (5.1) with (cid:16)(cid:98) θ P , (cid:98) b iP (cid:17) . Solidline: optimal trajectories X (cid:98) θ, (cid:98) b i . Shaded area are the 95% conﬁdence intervals. arameter estimation in NLME-ODEs via optimal control theory normal laws N ( (cid:98) θ, V ( (cid:98) θ )) and N ( (cid:98) θ P , V ( (cid:98) θ P )) to quantify the eﬀect of estimation uncertainyon θ on the predicted trajectories. For NIMROD estimation, for a given sampled value (cid:101) θ P ∼ N ( (cid:98) θ P , V ( (cid:98) θ P )) and subject i , the sampled regression function X (cid:101) θ P , (cid:98) b iP ,y ,i is obtainedby solving ODE (5.1) for parameter values ( θ, b i , x ,i ) = (cid:16)(cid:101) θ P , (cid:98) b iP , y ,i (cid:17) . Regarding ourapproach we recall the regression functions are now deﬁned as optimal trajectories.So, for (cid:101) θ ∼ N ( (cid:98) θ, V ( (cid:98) θ )) the sampled regression function for subject i is the optimaltrajectory X (cid:101) θ, (cid:98) b i obtained via the minimization of the cost function C i ( (cid:98) b i , x i, , u i | (cid:101) θ, U ).This explain the diﬀerences between the two conﬁdence intervals in terms of shapeand width. Our method gives narrower intervals because for each sampled value anoptimal control problem is solved to obtain the related optimal trajectory. This imposesa common goal of data ﬁdelity to each sampled X (cid:101) θ, (cid:98) b i which limits their inter-variability.Still, despite these diﬀerences in shapes, both prediction intervals cover the same points.Morever, on the long-term our intervals are nearly always contained in the ones givenby NIMROD.Our estimation of θ supports the parameter inference obtained in Pasin et al. (2019)via another method and the subsequent analysis made on the antibody concentrationdynamics. In addition to this parametric comparison, we want to assess the modeladequacy via the temporal evolution analysis of the optimal controls u i, (cid:98) θ,b i ( (cid:98) θ ) estimatedas byproducts of our method. Indeed, they quantify the exogenous perturbations u i weneed to add to model (5.1) so that the solution of its perturbed counterpart,˙ (cid:93) Ab i,u ( t ) = 1ln(10) (cid:16) φ S,i e − δ S t + φ L,i e − δ L t (cid:17) − (cid:93) Ab i,u ( t ) − δ Ab,i ln(10) + u i (5.2)reproduce the observations. This approach is similar to the one developed in Hookeret al. (2015) where control theory replaces non-parametric procedures to estimate u i . Still, their approach relies on a ﬁnite basis approximation of (cid:93) Ab i,u which requires tospecify a basis function family, its dimension as well as a penalization parameter similarto U . At the contrary, our method avoids this complex step of hyper-parameter selectionand only needs U . For comparison, we also quantify the committed model error for (cid:16)(cid:98) θ P , (cid:98) b iP (cid:17) . To do so we compute u Pi , the solution of the optimal control problem: u Pi =arg min u i (cid:26)(cid:80) j (cid:13)(cid:13)(cid:13)(cid:102) Ab i, (cid:98) θ P , (cid:98) b iP ,y i ,u i ( t ij ) − y ij (cid:13)(cid:13)(cid:13) + (cid:107) u i (cid:107) U,L (cid:27) by using the procedure described M ´elanie Prague in section 3 for non-linear models. In the last expression (cid:102) Ab i, (cid:98) θ P , (cid:98) b iP ,y i ,u i is the solution ofthe perturbed ODE (5.2) for ( θ, b i ) = (cid:16)(cid:98) θ P , (cid:98) b iP (cid:17) and y i is the measured concentrationat t = 0 used a surrogate value for the initial condition (as they did in Pasin et al.(2019)). We still use U = 10 for this optimal control problem to allow for the samelevel of perturbation magnitude for both methods. In ﬁgure (5.2), we plot u i, (cid:98) θ,b i ( (cid:98) θ ) and u Pi as well as their mean values and conﬁdence intervals. Our method leads toresidual perturbations of smaller magnitudes and narrower conﬁdence intervals. Thismeans our approach produces an estimation which minimizes the committed modelerror for each subject comparing to a method based only on a data ﬁtting criteria.This is particularly clear at the beginning of the observation interval when the inﬂuenceof the initial conditions is the highest. In this case our narrower conﬁdence intervalclearly excludes a null perturbation and advocates for an over-estimation of the predictedantibody concentration by the model. This makes sense because model (4.5) assumesthat both populations of antibody secreting cells decrease with time, and that is probablynot completely true at the beginning of the dynamic. Thus, despite similar resultsregarding parameter values between our estimation and Pasin et al. (2019), the insightgiven by our method at the dynamic scale leads us to the additional conclusion of modelmisspeciﬁcation presence at the beginning of the observation interval.

6. Conclusion

In this work, we propose an estimation method addressing issues encountered by clas-sic approaches for the problem of parameter estimation in NLME-ODEs. We identifythree potential sources of problems for exact methods such as likelihood based inference:their diﬃculties in presence of model error, their need to estimate initial conditions andtheir dramatic performance degradation when facing poorly identiﬁable parameters. Wepropose here a method based on control theory accounting for the presence of potentialmodel uncertainty at the subject level and which can be easily proﬁled on the initialconditions. Simulations with both presence and absence of model errors illustrate thebeneﬁts of regularization techniques for estimating poorly identiﬁable parameters, sub-ject speciﬁc parameters as well as their variances in NLME-ODEs. In addition, bypass- arameter estimation in NLME-ODEs via optimal control theory −0.04−0.020.000.020.04 0 100 200 300 Time: 7 days after second dose injection (Days) E L I SA un i t s / m l / D a y Individual perturbations for NIMROD based estimation −0.04−0.020.000.020.04 0 100 200 300

Time: 7 days after second dose injection (Days) E L I SA un i t s / m l / D a y Individual perturbations for OCA based estimation −0.04−0.020.000.020.04 0 100 200 300

Time: 7 days after second dose injection (Days) E L I SA un i t s / m L / D a y Mean optimal control

Perturbations: Means and 95% CI for NIMROD estimation −0.04−0.020.000.020.04 0 100 200 300

Time: 7 days after second dose injection (Days) E L I SA un i t s / m L / D a y Mean optimal control

Perturbations: Means and 95% CI for OCA estimation

Fig. 5.2.

1) Up: Estimated residual controls for each subject, 2) bottom: mean optimal con-trol and 95% conﬁdence interval for the optimal controls a) left: u Pi obtained from parameterestimation in Pasin et al. (2019), b) right: u i, (cid:98) θ,b i ( (cid:98) θ ) obtained from our estimation.8 M ´elanie Prague ing estimation of initial conditions represents a clear advantage for partially observedsystems comparing to likelihood based approaches, as emphasized in simulations.Still, this beneﬁt in term of estimation accuracy comes with a computational price.On a server with the parallelization package Snow in R language, it takes approximately10-15 minutes to obtain an estimation for the two-dimensional linear model, 30 minutesfor the insulin model and 3-4 hour for the antibody concentration evolution one, whereasit was a matter of minutes for the other approaches. Nevertheless, the use of compiledlanguages and proper parallelization could reduce the computation time. Moreover, wehave willingly separated the formal deﬁnition of the optimal control problem requiredby our method and the numerical procedure used to solve it, in case it may exist bettersuited approaches for this speciﬁc control problem. Right now, our current strategyallows us to proﬁle on initial conditions, so looking for another numerical procedure isbeyond the scope of this paper.An under-exploited feature of the method so far is the obtained optimal controls. Thequalitative based analysis exposed in section 5 can be made more rigorous. For example,to stay in a Bayesian setting, we can specify a prior distribution for the controls andthen compare it with the obtained posterior once the inference is made. This would leadto a semi-parametric inference problem for which an optimal control based approach hasalready been proven useful (see Clairon and Brunel (2018); Clairon (2020)). This is asubject for further work.

Software

Our estimation method is implemented in R and a code reproducing the examples ofSection 4 is available on a GitHub repository located here.

Acknowledgement arameter estimation in NLME-ODEs via optimal control theory funding from the Innovative Medicines Initiative 2 Joint Undertaking under projectsEBOVAC1 and EBOVAC3 (respectively grant agreement No 115854 and No 800176).The IMI2 Joint Undertaking receives support from the European Union’s Horizon 2020research and innovation programme and the European Federation of PharmaceuticalIndustries and Association. References

Agusto, F. and Adekunle, A. (2014) Optimal control of a two-strain tuberculosis-hiv/aidsco-infection model.

BioSystems , , 20–44.Aliyu, M. (2011) Nonlinear H-Inﬁnity Control, Hamiltonian Systems and Hamilton-Jacobi Equations . CRC Press.Andersen, E. (1970) Asymptotic properties of conditional maximum-likelihood estima-tors.

Journal of the Royal Statistical Society , , 283–301.Andraud, M., Lejeune, O., Musoro, J., Ogunjimi, B., Beutels, P. and Hens, N. (2012)Living on three time scales: the dynamics of plasma cell and antibody populationsillustrated for hepatitis a virus. Plos Computational Biology , .Balelli, I., Pasin, C., Prague, M., Crauste, F., Van Eﬀelterre, T., Bockstal, V., Solforosi,L. and Thi´ebaut, R. (2020) A model for establishment, maintenance and reactivationof the immune response after vaccination against ebola virus. Journal of TheoreticalBiology , 110254.Bowsher, C. G. and Swain, P. (2012) Identifying source of variation and the ﬂow ofinformation in biochemical networks.

PNAS , , 1320–1328.Brynjarsdottir, J. and O’Hagan, A. (2014) Learning about physical parameters: Theimportance of model discrepancy. Inverse Problems , , 24.Campbell, D. (2007) Bayesian Collocation Tempering and Generalized Proﬁling for Es-timation of Parameters from Diﬀerential Equation Models . Ph.D. thesis, McGill Uni-versity Montreal,Quebec. M ´elanie Prague

Cimen, T. (2008) State-dependent riccati equation (sdre) control: A survey.

IFACProceedings , , 3761–3775.Cimen, T. and Banks, S. (2004a) Global optimal feedback control for general nonlinearsystems with nonquadratic performance criteria. Systems and Control Letters , ,327–346.— (2004b) Nonlinear optimal tracking control with application to super-tankers forautopilot design. Automatica , , 1845–1863.Clairon, Q. (2020) A regularization method for the parameter estimation problem inordinary diﬀerential equations via discrete optimal control theory. Journal of StatisticalPlanning and Inference .Clairon, Q. and Brunel, N. J.-B. (2018) Optimal control and additive perturbations helpin estimating ill-posed and uncertain dynamical systems.

Journal of the AmericanStatistical Association , , 1195–1209.Clarke, F. (2013) Functional Analysis, Calculus of Variations and Optimal Control .Graduate Texts in Mathematics. Springer-Verlag London.Comets, E., Lavenu, A. and Lavielle, M. (2017) Parameter estimation in nonlinear mixedeﬀect models using saemix, an r implementation of the saem algorithm.

Journal ofStatistical Software , , 1–42.Dashti, M., Law, K. J. H., Stuart, A. and Voss, J. (2013) Map estimators and theirconsistency in bayesian nonparametric inverse problems. Inverse Problems , .Donnet, S. and Samson, A. (2006) Estimation of parameters in incomplete data modelsdeﬁned by dynamical systems. Journal of Statistical Planning and Inference , ,2815–2831.Engl, H., Flamm, C., K¨ugler, P., Lu, J., M¨uller, S. and Schuster, P. (2009) Inverseproblems in systems biology. Inverse Problems , . arameter estimation in NLME-ODEs via optimal control theory G. Hooker, S. P. Ellner, L. D. V. R. and Earn, D. J. D. (2011) Parameterizing state-spacemodels for infectious disease dynamics by generalized proﬁling: measles in ontario.

Journal of the Royal Society , , 961–974.Gillespie, D. (2000) The chemical langevin equation. Journal of Chemical Physics , ,297–306.Guedj, J., Thiebaut, R. and Commenges, D. (2007) Maximum likelihood estimation indynamical models of hiv. Biometrics , , 1198–206.Guo, B. and Sun, B. (2012) Dynamic programming approach to the numerical solu-tion of optimal control with paradigm by a mathematical model for drug therapies. Optimization and Engineering , 1–18.Gutenkunst, R. N., Waterfall, J., Casey, F., Brown, K., Myers, C. and Sethna, J. (2007)Universally sloppy parameter sensitivities in systems biology models.

Public Libraryof Science Computational Biology , , e189.Hooker, G., Ellner, S. P. et al. (2015) Goodness of ﬁt in nonlinear dynamics: misspeciﬁedrates or misspeciﬁed states? The Annals of Applied Statistics , , 754–776.Huang, Y. and Dagne, G. (2011) A bayesian approach to joint mixed-eﬀects modelswith a skew normal distribution and measurement errors in covariates. Biometrics , , 260–269.Huang, Y., Liu, D. and Wu, H. (2006) Hierachical bayesian methods for estimation ofparameters in a longitudinal hiv dynamic system. Biometrics , , 413–423.Huang, Y. and Lu, T. (2008) Modeling long-term longitudinal hiv dynamics with appli-cation to an aids clinical study. Annal of Applied Statistics , , 1348–1408.Huang, Y., Wu, H. and Acosta, E. P. (2010) Hierarchical bayesian inference for hivdynamic diﬀerential equation models incorporating multiple treatment factors. BiomJ , , 470–486.Kampen, N. V. (1992) Stochastic Process in Physics and Chemistry . Elsevier. M ´elanie Prague

Kennedy, M. C. and O’Hagan, A. (2001) Bayesian calibration of computer models.

Jour-nal of the Royal Statistical Society: Series B (Statistical Methodology) , , 425–464.Kirk, D. E. (1998) Optimal Control Theory: An Introduction . Dover Publication.Kirk, P., Silk, D. and Michael, M. (2016) Reverse engineering under uncertainty. In

Uncertainty in Biology , 15–32. Springer.Komorowski, M., Miekisz, J. and Stumpf, M. (2013) Decomposing noise in biochemicalsignaling systems highlights the role of protein degradation.

Biophysical journal , ,1783–1793.Kuhn, E. and Lavielle, M. (2005) Maximum likelihood estimation in nonlinear mixedeﬀects models. Computational Statistics and Data Analysis , , 1020–1038.Kurtz, T. (1978) Strong approximation theorems for density dependent markov chains. Stochastic Processes and their Applications , , 223–240.Lavielle, M. and Aarons, L. (2015) What do we mean by identiﬁability in mixed eﬀectsmodels? Journal of pharmacokinetics and pharmacodynamics .Lavielle, M. and Mentr´e, F. (2007) Estimation of population pharmacokinetic parametersof saquinavir in hiv patients with the monolix software.

Journal of Pharmacokineticsand Pharmacodynamics , .Le, D., Miller, J. and Ganusov, V. (2015) Mathematical modeling provides kinetic detailsof the human immune response to vaccination. Frontiers in Cellular and InfectionMicrobiology , , 177.Leary, T. O., Sutton, A. and Marder, E. (2015) Computational models in the age oflarge datasets. Current Opinion in Neurobiology , , 87–94.Lindstrom, M. J. and Bates, D. M. (1990) Nonlinear mixed eﬀects models for repeatedmeasures data. Biometrics , , 673–687.Lunn, D., A.Thomas, Best, N. and Spiegelhalter, D. (2000) Winbugs - a bayesian mod-elling framework: Concepts, structure and extensibility. Statistics and Computing , ,325–337. arameter estimation in NLME-ODEs via optimal control theory M. Lavielle, A. Samson, A. F. and Mentre, F. (2011) Maximum likelihood estimation oflong terms hiv dynamic models and antiviral response.

Biometrics , , 250–259.Murphy, S. and der Vaart, A. V. (2000) On proﬁle likelihood. Journal of AmericanStatistical Association , , 449–465.Nash, J. C. (2016) Using and extending the optimr package.Pasin, C., Balelli, I., Van Eﬀelterre, T., Bockstal, V., Solforosi, L., Prague, M., Douoguih,M. and Thi´ebaut, R. (2019) Dynamics of the humoral immune response to a prime-boost ebola vaccine: quantiﬁcation and sources of variation. Journal of virology , ,e00579–19.Pasin, C., Dufour, F., Villain, L., Zhang, H. and Thiebaut, R. (2018) Controlling il-7injections in hiv-infected patients. Bulletin of Mathematical Biology , , 2349–2377.Perelson, A., Neumann, A., Markowitz, M., Leonard, J. and Ho, D. (1996) Hiv-1 dy-namics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science , , 1582–1586.Pinheiro, J. and Bates, D. M. (1994) Approximations to the loglikelihood function in thenonlinear mixed eﬀects model. Journal of the Computational and Graphical Statistics , , 12–35.Prague, M., Commengues, D., Guedj, J., Drylewicz, J. and Thi´ebaut, R. (2013) Nim-rod: A program for inference via a normal approximation of the posterior in modelswith random eﬀects based on ordinary diﬀerential equations. Computer Methods andPrograms in Biomedicine , , 447–458.Raftery, A. and Bao, L. (2010) Estimating and projecting trends in hiv/aids generalizedepidemics using incremental mixture importance sampling. Biometrics , , 1162–1173.Ramsay, J., Hooker, G., Cao, J. and Campbell, D. (2007) Parameter estimation for dif-ferential equations: A generalized smoothing approach. Journal of the Royal StatisticalSociety (B) , , 741–796. M ´elanie Prague

Sartori, N. (2003) Modiﬁed proﬁle likelihood in models with stratum nuisance parame-ters.

Biometrika , , 553–549.Sontag, E. (1998) Mathematical Control Theory: Deterministic ﬁnite-dimensional sys-tems . Springer-Verlag (New-York).Stein, R., Bucci, V., Toussaint, N., Buﬃe, C., Ratsch, G., Pamer, E., Sander, C. andXavier, J. (2013) Ecological modeling from time-series inference: Insight into dynamicsand stability of intestinal microbiota.

Public Library of Science Computational Biology , , 12.Stuart, A. (2010) Inverse problems: A bayesian perspective. Acta Numerica , 451–559.Thiebaut, R., Drylewicz, J., Prague, M., Lacabaratz, C. and et al., S. B. (2014) Quantify-ing and predicting the eﬀect of exogenous interleukin on cd4+t cells in hiv-1 infection.

Plos Computational Biology ,

10 (5) .Tornoe, C., Agerso, H., Jonsson, E. N., Madsen, H. and Nielsen, H. A. (2004) Non-linearmixed-eﬀects pharmacokinetic/pharmacodynamic modelling in nlme using diﬀerentialequations.

Computer Methods and Programs in Biomedicine , , 31–41.Transtrum, M., Machta, B. and Sethna, J. (2011) Geometry of nonlinear least squareswith applications to sloppy models and optimization. Physical Review , , 35.Transtrum, M. K., Machta, B. B., Brown, K. S., Daniels, B. C., Myers, C. R. andSethna, J. P. (2015) Perspective: Sloppiness and emergent theories in physics, biology,and beyond. The Journal of chemical physics , , 07B201 1.Tuo, R. and Wu, C. (2015) Eﬃcient calibration for imperfect computer models. Annalsof Statistics .van der Vaart, A. (1998)

Asymptotic Statistics . Cambridge Series in Statistical andProbabilities Mathematics. Cambridge University Press.Varah, J. M. (1982) A spline least squares method for numerical parameter estimationin diﬀerential equations.

SIAM J.sci. Stat. Comput. , , 28–46. arameter estimation in NLME-ODEs via optimal control theory Villain, L., Commenges, D., Pasin, C., Prague, M. and Thi´ebaut, R. (2019) Adaptiveprotocols based on predictions from a mechanistic model of the eﬀect of il7 on cd4counts.

Statistics in medicine , , 221–235.Wakeﬁeld, J. and Racine-Poon, A. (1995) An application of bayesian population pharma-cokinetic/pharmacodynamic models to dose recommendation. Statistics in Medicine , , 971–986.Wang, L., Cao, J., Ramsay, J., Burger, D., Laporte, C. and Rockstroh, J. (2014) Es-timating mixed-eﬀects diﬀerential equation models. Statistics and Computing , ,111–121.Wu, H., Lu, T., Xue, H. and Liang, H. (2014) Sparse additive odes for dynamic generegulatory network modeling. Journal of the American Statistical Association , ,700–716.Zhang, S. and Xu, X. (2016) Dynamic analysis and optimal control for a model ofhepatitis c with treatment. Communications in Nonlinear Science and NumericalSimulation ,46