[PDF] Recovering the time-dependent transmission rate from infection data via solution of an inverse ODE problem

Abstract

The transmission rate of many acute infectious diseases varies significantly in time, but the underlying mechanisms are usually uncertain. They may include seasonal changes in the environment, contact rate, immune system response, etc. The transmission rate has been thought difficult to measure directly. We present a new algorithm to compute the time-dependent transmission rate directly from prevalence data, which makes no assumptions about the number of susceptibles or vital rates. The algorithm follows our complete and explicit solution of a mathematical inverse problem for SIR-type transmission models. We prove that almost any infection profile can be perfectly fitted by an SIR model with variable transmission rate. This clearly shows a serious danger of over-fitting such transmission models. We illustrate the algorithm with historic UK measles data and our observations support the common belief that measles transmission was predominantly driven by school contacts.

Full PDF

aa r X i v : . [ q - b i o . Q M ] J un Recovering the time-dependent transmission rate from infection data viasolution of an inverse ODE problem

Mark Pollicott , Hao Wang † , and Howie Weiss Mathematics Institute, University of Warwick, Coventry, CV4 7AL, United Kingdom Department of Mathematical and Statistical Sciences,University of Alberta, Edmonton, Alberta, T6G 2G1, Canada School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia, 30332, United States † To whom correspondence should be addressed. Phone: 1-780-492-8472.Fax: 1-780-492-6826. Electronic address: [email protected]

Abstract

The transmission rate of many acute infectious diseases varies signiﬁcantly in time, but theunderlying mechanisms are usually uncertain. They may include seasonal changes in theenvironment, contact rate, immune system response, etc. The transmission rate has beenthought diﬃcult to measure directly. We present a new algorithm to compute the time-dependent transmission rate directly from prevalence data, which makes no assumptionsabout the number of susceptibles or vital rates. The algorithm follows our complete andexplicit solution of a mathematical inverse problem for SIR-type transmission models. Weprove that almost any infection proﬁle can be perfectly ﬁtted by an SIR model with variabletransmission rate. This clearly shows a serious danger of over-ﬁtting such transmissionmodels. We illustrate the algorithm with historic UK measles data and our observationssupport the common belief that measles transmission was predominantly driven by schoolcontacts.

Keywords : epidemiology, time-dependent transmission rate, recovery algorithm, inverseproblem, measles, modulus of Fourier transform, Fourier analysis, periodicity, overﬁtting ecovery algorithm I. INTRODUCTION

The transmission rate of an infectious disease is the rate of which susceptible individuals be-come infected. In Section 3.4.9 of Anderson and May [1], the authors state that “ ... the directmeasurement of the transmission rate is essentially impossible for most infections. But if we wishto predict the changes wrought by public health programmes, we need to know the transmissionrate ... .”

The transmission rate of many acute infectious diseases varies signiﬁcantly in timeand frequently exhibits signiﬁcant seasonal dependence [3, 13, 40]: inﬂuenza, pneumococcus, androtavirus cases peak in winter; respiratory syncytial virus and measles cases peak in spring; andpolio cases peak in summer.Most investigators deﬁne the transmission rate β ( t ) of an infectious disease via the discretetransmission model: S ( k + 1) = S ( k ) − I ( k + 1) , (1) I ( k + 1) = β ( k ) S ( k ) I ( k ) , (2)where S ( k ) , I ( k ) are the fractions of susceptible and infected individuals during week k [2, 16].Equation (2) is equivalent to β ( k ) = I ( k + 1) / [ I ( k ) S ( k )], which provides a formula for the trans-mission rate. Application of this “algorithm” requires knowledge of S (0), which depends on vitalrates and in general is believed to be very diﬃcult to estimate. We developed our algorithm, inpart, to avoid having to estimate S (0).Our new algorithm is based on the soluition of a mathematical inverse problem for SIR-typetransmission models. We ﬁrst consider the simplest SIR model (Kermack and McKendrik [26])and allow the transmission rate to be a time-dependent function, i.e. , there is a positive function β ( t ) such that S ′ ( t ) = − β ( t ) S ( t ) I ( t ) , (3) I ′ ( t ) = β ( t ) S ( t ) I ( t ) − νI ( t ) , (4) R ′ ( t ) = νI ( t ) , (5)where S ( t ) , I ( t ), and R ( t ) are the fractions of susceptible, infected, and removed individuals attime t . We pose the mathematical question: Given “smooth infection data” f ( t ) on time interval [0 , T ] and removal rate ν > , can onealways ﬁnd a non-negative transmission rate function β ( t ) such that the I ( t ) output of the SIRmodel always coincides with f ( t ) with the given ν ? ecovery algorithm ν , and we provide an explicit formula for the solution.The construction also illustrates the danger of overﬁtting a transmission model where one canchoose the time-dependent transmission rate.However, in practice, infection data are always discrete, not continuous. We show that onecan robustly estimate β ( t ) by ﬁrst smoothly interpolating the data with a spline or trigonometricfunction and then applying the formula to smooth data.The usual transmission rate recovery method based on (1) and (2) can be viewed as a dis-cretization of (3) and (4). However, unlike the discrete recovery method, our method extends toa large spectrum of transmission models (see Section IV B) including those with diﬀerent trans-mission modes or immunity memory periods. Our approach also yields an explicit formula for thetransmission rate.One of the extensions in Section IV B is to the SEIR epidemic model with historical (time-dependent) vital rates, and we illustrate this extended algorithm using UK measles data during1948-1966. Our recovered transmission rate exhibits two dominant spectral peaks: at frequencies1 and 3 per year, respectively. We show that the latter peak reﬂects the “cycle” of Christmas,Easter, and Summer school vacations. These observations support the common belief that measlestransmission is predominantly driven by school contacts [19, 25]. However, we also ﬁnd indicationsof a cycle with 2 year period. II. RESULTS

We ﬁrst derive the algorithm for recovering the time-dependent transmission rate from infectiondata. We then apply the algorithm to two simulated data sets representing two characteristic“types” of infectious diseases. Finally we illustrate the algorithm using UK measles data from1948-1966.

A. Solution of inverse problem

The new algorithm follows from the complete solution of an inverse problem for the SIR systemof ODEs. The generality of the result seems striking, while the proof is almost trivial.

Theorem II.1

Given a smooth positive function f ( t ) , ν > , β > , and T > , there exists K > such that if β < K there is a solution β ( t ) with β (0) = β such that I ( t ) = f ( t ) for ecovery algorithm ≤ t ≤ T if and only if f ′ ( t ) /f ( t ) > − ν for ≤ t ≤ T . The growth condition imposes no restrictions on how f ( t ) increases, but requires that f ( t )cannot decrease too quickly, in the sense that its logarithmic derivative is always bounded belowby − ν . It is easy to see that f ′ ( t ) /f ( t ) > − ν is a necessary condition, since Equation (4) impliesthat f ′ ( t ) + νf ( t ) = β ( t ) S ( t ) f ( t ), which must be positive for 0 ≤ t ≤ T .The proof of the theorem consists of showing that this condition is also suﬃcient. We rewriteEquation (4) as S ( t ) = f ′ ( t ) + νf ( t ) β ( t ) f ( t ) , (6)then compute S ′ ( t ), and then equate with Equation (3) to obtain ddt (cid:18) f ′ ( t ) + νf ( t ) β ( t ) f ( t ) (cid:19) = − β ( t ) (cid:18) f ′ ( t ) + νf ( t ) β ( t ) f ( t ) (cid:19) f ( t ) . (7)Calculating the derivative and simplifying the resulting expression yields the following Bernoullidiﬀerential equation for β ( t ) β ′ ( t ) − p ( t ) β ( t ) − f ( t ) β ( t ) = 0 , where p ( t ) = f ′′ ( t ) f ( t ) − f ′ ( t ) f ( t )( f ′ ( t ) + νf ( t )) . (8)The change of coordinates x ( t ) = 1 /β ( t ) transforms this nonlinear ODE into the linear ODE x ′ ( t ) − p ( t ) x ( t ) − f ( t ) = 0 . (9)The method of integrating factors provides the explicit solution1 β ( t ) = x ( t ) = x (0) e − P ( t ) − e − P ( t ) Z t e P ( s ) f ( s ) ds, where P ( t ) = Z t p ( τ ) dτ. (10)A problem that could arise with this procedure is for the denominator of p ( t ) to be zero. Asingular solution is prevented by requiring that the denominator be always positive, i.e. , f ′ ( t ) + νf ( t ) >

0. Having done this, to ensure that β ( t ) is positive, β (0) must satisfy Z T e P ( s ) f ( s ) ds < /β (0) . (11)Mathematically, there are inﬁnitely many choices of β (0) and thus inﬁnitely many transmissionfunctions β ( t ). In this sense the inverse problem is under-determined. ecovery algorithm B. Recovery algorithm

We now turn the proof of the theorem into an algorithm to recover the transmission rate β ( t )from an infection data set. The algorithm has four steps and requires two conditions. Step 1.

Smoothly interpolate the infection data with a spline or trigonometric function to generatea smooth f ( t ). Check condition 1: f ′ ( t ) /f ( t ) > − ν , where ν is the removal rate. Step 2.

Compute the function p ( t ) = f ′′ ( t ) f ( t ) − f ′ ( t ) f ( t )( f ′ ( t ) + νf ( t )) . Condition 1 prevents a zero denomina-tor in p ( t ). Step 3.

Choose β (0) and compute the integral P ( t ) = Z t p ( τ ) dτ . Check condition 2: β (0) < (cid:30)Z T e P ( s ) f ( s ) ds , where T is the time length of the infection data. Alternatively, choose β (0) suﬃciently small to satisfy condition 2. Step 4.

Apply the formula β ( t ) = 1 (cid:30)(cid:20) e − P ( t ) /β (0) − e − P ( t ) Z t e P ( s ) f ( s ) ds (cid:21) to compute β ( t ) onthe given interval [0 T ].Condition 1 is equivalent to d (ln f ( t )) /dt > − ν , i.e. , the time series of infection data cannotdecay too fast at any time. This is a mild condition that most data sets satisfy. If a data setdoes not satisfy this condition, we propose a scaling trick in Section III to be able to apply thealgorithm.In Section IV A, we present extensions of the basic recovery algorithm to several popular ex-tensions of the SIR model, including the SEIR model with variable vital rates. Our algorithm canbe extended to virtually any such compartment model. C. Recovering the transmission rate from simulated data

We ﬁrst illustrate the recovery algorithm using two simulated data sets. The functions f ( t ) and g ( t ) are the fractions of the infected population for two characteristic “types” of infectious diseases.The ﬁrst data set simulates an infectious disease with periodic outbreaks, as observed in measles(before mass vaccination) and cholera [5, 30]. The periodic function f ( t ) = 10 − [1 . . t )]represents the continuous infection data, and Figure 1(a) contains plots of both f ( t ) (solid) andits associated transmission rate function β ( t ) (dashed). ecovery algorithm g ( t ) = 10 − [1 . t )] exp( − . t ) representsthe continuous infection data, and Figure 2(a) contains plots of both g ( t ) (solid) and its associatedtransmission rate function β ( t ) (dashed).We extract discrete data from functions f ( t ) and g ( t ) by sampling them at equi-spaced inter-vals (see the small black squares in Figure 1(a) and Figure 2(a)). To each discrete time series, weapply two well-known interpolation algorithms (trigonometric approximation and spline approxi-mation) [27, 39]. Figure 1(b) and Figure 2(b) contain plots of β ( t ) obtained from the two smoothinterpolations together with the recovery algorithm. Both interpolation schemes yield excellentapproximations of β ( t ) in both examples.Many simulations show that the recovery algorithm is robust with respect to white noise up to10% of the data mean, as well as the number of sample points. D. Recovering the transmission rate from UK measles data

Previous studies [14, 24] employed the SEIR model with vital rates to explore the epidemicand endemic behaviors of measles infections, using the notiﬁcation data in [31]. To compare ournew recovery technique with previous measles studies, we extend our recovery algorithm to theSEIR model with variable vital rates (see Section IV A 5) and use the same data set. To examinethe robustness of our new results, we post condition the data to account for underreporting andreapply the extended recovery algorithm.We use the measles parameter values from Anderson and May [1], OPCS et al. [31]: ν =52/year= 52 / /ν is the removal period), a = 52/year= 52 / /a is the latent period), and δ = 1 / / / − − − − − ecovery algorithm / / β ( t ) recovered from our algorithm forfour diﬀerent January initial values chosen to represent a wide range of β (0). The recovered β ( t ) inpanels (e)(g) are stationary, slowly increasing peaks in panel (c), and fast increasing peaks in panel(a). Note that annual minima occur in July-August during the summer school vacation period andthat annual maxima occur in January or September during the ﬁrst month after the winter andsummer school vacations.In Figure 4(b)(d)(f)(h), we plot the moduli of Fourier transform of all recovered β ( t ) and observethat there are two competing dominant spectral peaks. These two dominant peaks have 1 and 1 / i.e. , 1 / β ( t ) is due to the three majorschool terms, which are separated by the Christmas, Easter, and Summer breaks (see Figure 6).For stationary β ( t ) (see panels (e)(g)), the one per year spectral peak is dominant (see panels(f)(h)), and the one half per year (2-year period) spectral peak is comparable to the three per yearspectral peak (see panel (h)). We conclude that for a huge range of β (0), the transmission ratealways possesses both strong 1 and 1 / .

3% to account for the underreporting bias in the UK measles data (with estimated meanreporting rate 52%, note that 92 .

3% is computed from 1 / . −

1) [6, 11, 41]. Since both theremoval stage in the SEIR model and the data notiﬁcation period are one week, and the 92 . ecovery algorithm β ( t ) with the 92 .

3% correction factor and for the largerange of β (0). All the recovered β ( t ) have identical spectral peaks as those in Figure 4. Thus ourobservation of the two spectral peaks with frequencies 1/year and 3/year seems robust. Again, weobserve that the scale of the infection data regulates the scale of β ( t ) but does not aﬀect spectralpeaks of Fourier transform of β ( t ).Most previous measles models represent the time-varying transmission rate using a school term-time forcing function (such as sinusoidal or Haar function), and ﬁtted parameters using the methodof least squares, without providing variances for their estimates [8]. Little empirical evidencecurrently supports their assumed functions. The school mixing assumption behind these simpletransmission functions ignores many other seasonal factors such as environmental changes andimmune system changes [18, 32]. However, our observations support this common belief thatmeasles transmission is predominantly driven by school contacts. III. DISCUSSION

We present a new algorithm to compute the time-dependent transmission rate from prevalencedata, which makes no assumptions about the number of susceptibles or vital rates. We do haveto estimate β (0), which can be a formidable challenge. By manipulating our derivation, Hadelerrecently derived an even simpler inversion formula for β ( t ) that requires knowledge of S (0) insteadof β (0) [20].Our algorithm can be viewed as a continuous version of the well-known discrete method forestimating the time-varying transmission rate. The inverse method in this paper can be applied toderive recovery algorithms for a large spectrum of epidemiological models (see Section IV B). Inthis sense our new method provides a more general method than the discrete method since it canaccount for factors such as the transmission mode and the immunity memory period which canhave a signiﬁcant eﬀect on transmission rate.We illustrate the recovery algorithm for the SEIR model with variable vital rates using UKmeasles data from 1948 to 1956. Fourier transform of our recovered transmission rate functionshows dominant spectral peaks at frequencies 1 and 3 per year. The 3 per year frequency arises fromthree major school breaks. Our observations support the common belief that measles transmission ecovery algorithm f ( t ), can not decrease too fast over the full time interval of interest. In general, one canadd a suﬃciently large constant to f ( t ) to ensure this, but this will change the range of applicable β (0), and applicability needs to be checked. Second, one must assume that the proportion (ornumber) of notiﬁcations is always strictly positive. In practice this restriction can be overcome byreplacing zero values in the time series with a very small positive value. Third, for a chosen β (0),the algorithm can only apply to a ﬁnite length of infection data. Finally, one either needs to knowthe value of the transmission rate at some ﬁxed time, or verify that the desired properties of β ( t )hold for all β (0) in the range where the estimated β ( t ) is stationary.The algorithm should apply to the vast majority of infection data sets, and a consequence isthat one can nearly always construct a time-dependent transmission rate β ( t ) such that SIR modelwill ﬁt the data perfectly. This illustrates a potential danger of overﬁtting an epidemic model withtime-dependent transmission rate.Recently, likelihood-based methods for estimating the time-varying transmission function havebeen developed by Cauchemez and Ferguson [10], He et al. [21] using stochastic transmissionmodels. Our algorithm can also be extended to incorporate stochastic eﬀects by using stochasticdiﬀerential equations. IV. MATERIALS AND METHODSA. Extensions of the basic model

Analogous results and inversion formulae hold for all standard variations of the standard SIRmodel and their combinations. The proofs are very similar to the proof of Theorem (II.1). Here, weonly present the full algorithm for the SEIR model with vital rates, since we apply this algorithmto UK measles data.

1. SIR model with vital rates S ′ ( t ) = δ − β ( t ) S ( t ) I ( t ) − δS ( t ) , (12) I ′ ( t ) = β ( t ) S ( t ) I ( t ) − νI ( t ) − δI ( t ) , (13) R ′ ( t ) = νI ( t ) − δR ( t ) . (14)The necessary and suﬃcient condition for recovering β ( t ) given ν and δ is f ′ ( t ) /f ( t ) > − ( ν + δ ). ecovery algorithm

2. SIR model with waning immunity S ′ ( t ) = mR ( t ) − β ( t ) S ( t ) I ( t ) , (15) I ′ ( t ) = β ( t ) S ( t ) I ( t ) − νI ( t ) , (16) R ′ ( t ) = νI ( t ) − mR ( t ) , (17)where 1 /m is the memory period of immunity. The necessary and suﬃcient condition for recovering β ( t ) given ν is f ′ ( t ) /f ( t ) > − ν .

3. SIR model with time-dependent indirect transmission rate (Joh et al. [23]) S ′ ( t ) = − ω ( t ) S ( t ) , (18) I ′ ( t ) = ω ( t ) S ( t ) − νI ( t ) , (19) R ′ ( t ) = νI ( t ) , (20)where ω ( t ) is the time-dependent indirect transmission rate. The necessary and suﬃcient conditionfor recovering β ( t ) given ν is f ′ ( t ) /f ( t ) > − ν .

4. SEIR model S ′ ( t ) = − β ( t ) S ( t ) I ( t ) , (21) E ′ ( t ) = β ( t ) S ( t ) I ( t ) − αE ( t ) , (22) I ′ ( t ) = αE ( t ) − νI ( t ) , (23) R ′ ( t ) = νI ( t ) , (24)where 1 /α is the latent period for the disease. By simple calculations, we can show that thenecessary and suﬃcient condition for recovering β ( t ) from infection data is f ′ ( t ) /f ( t ) > − ν .

5. SEIR model with vital rates S ′ ( t ) = δ − β ( t ) S ( t ) I ( t ) − δS ( t ) , (25) E ′ ( t ) = β ( t ) S ( t ) I ( t ) − aE ( t ) − δE ( t ) , (26) I ′ ( t ) = aE ( t ) − νI ( t ) − δI ( t ) , (27) R ′ ( t ) = νI ( t ) − δR ( t ) . (28) ecovery algorithm β ( t ) from infection data are f ′ ( t ) + ( ν + δ ) f ( t ) > f ′′ ( t ) + ( ν + 2 δ + a ) f ′ ( t ) + ( δ + a )( ν + δ ) f ( t ) > . (29)In this case, β ( t ) satisﬁes the Bernoulli equation β ′ + p ( t ) β + q ( t ) β = 0 , (30)where p ( t ) = − af ′′′ ( t ) f ( t ) − a ( ν + 2 δ + a ) f ′′ ( t ) f ( t ) − a ( δ + a )( ν + δ ) f ′ ( t ) f ( t ) + af ′′ ( t ) f ′ ( t ) + a ( ν + 2 δ + a ) f ′ ( t ) af ( t )[ f ′′ ( t ) + ( ν + 2 δ + a ) f ′ ( t ) + ( δ + a )( ν + δ ) f ( t )]+ a ( δ + a )( ν + δ ) f ′ ( t ) f ( t ) − δaf ′′ ( t ) f ( t ) − δa ( ν + 2 δ + a ) f ′ ( t ) f ( t ) − δa ( δ + a )( ν + δ ) f ( t ) af ( t )[ f ′′ ( t ) + ( ν + 2 δ + a ) f ′ ( t ) + ( δ + a )( ν + δ ) f ( t )] , and q ( t ) = δa f ( t ) − af ′′ ( t ) f ( t ) − a ( ν + 2 δ + a ) f ′ ( t ) f ( t ) − a ( δ + a )( ν + δ ) f ( t ) af ( t )[ f ′′ ( t ) + ( ν + 2 δ + a ) f ′ ( t ) + ( δ + a )( ν + δ ) f ( t )] . The modiﬁed recovery algorithm has ﬁve steps together with three conditions.

Step 1.

Smoothly interpolate the infection data to generate a smooth function f ( t ) that has atleast a continuous second derivative. Check condition 1: f ′ ( t ) + ( ν + δ ) f ( t ) >

0; and checkcondition 2: f ′′ ( t ) + ( ν + 2 δ + a ) f ′ ( t ) + ( δ + a )( ν + δ ) f ( t ) > Step 2.

Compute the function p ( t ) = − af ′′′ ( t ) f ( t ) − a ( ν +2 δ + a ) f ′′ ( t ) f ( t ) − a ( δ + a )( ν + δ ) f ′ ( t ) f ( t )+ af ′′ ( t ) f ′ ( t )+ a ( ν +2 δ + a ) f ′ ( t ) af ( t )[ f ′′ ( t )+( ν +2 δ + a ) f ′ ( t )+( δ + a )( ν + δ ) f ( t )] + a ( δ + a )( ν + δ ) f ′ ( t ) f ( t ) − δaf ′′ ( t ) f ( t ) − δa ( ν +2 δ + a ) f ′ ( t ) f ( t ) − δa ( δ + a )( ν + δ ) f ( t ) af ( t )[ f ′′ ( t )+( ν +2 δ + a ) f ′ ( t )+( δ + a )( ν + δ ) f ( t )] . Step 3.

Choose β (0) and compute the integral P ( t ) = Z t p ( τ ) dτ . Check condition 3:1 β (0) + Z T e − P ( s ) q ( s ) ds > . Step 4.

Compute the function q ( t ) = δa f ( t ) − af ′′ ( t ) f ( t ) − a ( ν +2 δ + a ) f ′ ( t ) f ( t ) − a ( δ + a )( ν + δ ) f ( t ) af ( t )[ f ′′ ( t )+( ν +2 δ + a ) f ′ ( t )+( δ + a )( ν + δ ) f ( t )] . Step 5.

Apply the formula β ( t ) = 1 (cid:30)(cid:20) e P ( t ) /β (0) + e P ( t ) Z t e − P ( s ) q ( s ) ds (cid:21) to compute β ( t ) onthe given interval [0 T ].With variable birth rate η ( t ) and constant death rate δ , then the SEIR model becomes S ′ ( t ) = η ( t ) − β ( t ) S ( t ) I ( t ) − δS ( t ) , (31) E ′ ( t ) = β ( t ) S ( t ) I ( t ) − aE ( t ) − δE ( t ) , (32) I ′ ( t ) = aE ( t ) − νI ( t ) − δI ( t ) , (33) R ′ ( t ) = νI ( t ) − δR ( t ) . (34) ecovery algorithm q ( t ) = η ( t ) a f ( t ) − af ′′ ( t ) f ( t ) − a ( ν + 2 δ + a ) f ′ ( t ) f ( t ) − a ( δ + a )( ν + δ ) f ( t ) af ( t )[ f ′′ ( t ) + ( ν + 2 δ + a ) f ′ ( t ) + ( δ + a )( ν + δ ) f ( t )] . All other steps in the algorithm remain the same.

Acknowledgments

The UK measles data and birth data was obtained from [22] and [9]. The authors very muchappreciate the eﬀorts of David Earn and Ben Bolker for maintaining these public databases ofinfectious disease data. Thank David Earn for helpful discussion. The authors also thank ananonymous referee for identifying the source of the three times per year frequency component inour approximation of the transmission rate for measles. [1] Anderson RM, May RM (1992) Infectious Diseases of Humans: Dynamics and Control. Oxford Uni-versity Press.[2] Becker NG (1989) Analysis of infectious disease data. Chapman and Hall Ltd, New York.[3] Becker NG, Britton T (1999) Statistical studies of infectious disease incidence. Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 61: 287-307.[4] Beggs CB, Shepherd SJ, Kerr KG (2010) Potential for airborne transmission of infection in the waitingareas of healthcare premises: stochastic analysis using a Monte Carlo model. BMC Infectious Diseases10: 247-254.[5] Sanitary Commissioner for Bengal Reports and Bengal Public Health Reports (1891-1942) BengalSecretariat Press, Calcutta and Bengal Government Press, Alipore.[6] Bjørnstad ON, Finkenst¨adt BF, Grenfell BT (2002) Dynamics of measles epidemics: estimating scalingof transmission rates using a time series SIR model. Ecological Monographs 72: 169-184.[7] Anonymous (1947) Day Nurseries and Industry. British Medical Journal 1: 644-645.[8] Bolker BM, Grenfell BT (1993) Chaos and Biological Complexity in Measles Dynamics. Proc. R. Soc.Lond. B 251: 75-81.[9] Infectious disease data, http://people.biology.uﬂ.edu/bolker/measdata.html.[10] Cauchemez S, Ferguson NM (2008) Likelihood-based estimation of continuous-time epidemic modelsfrom time-series data: application to measles transmission in London. J. R. Soc. Interface 5: 885-897.[11] Clarkson JA, Fine PEM (1985) The Eﬃciency of Measles and Pertussis Notiﬁcation in England andWales. International Journal of Epidemiology 14: 153-168.[12] Code¸co CT (2001) Endemic and epidemic dynamics of cholera: the role of the aquatic reservoir. BMCInfect Dis 1: 1. ecovery algorithm [13] Dowell SF (2001) Seasonal Variation in Host Susceptibility and Cycles of Certain Infectious Diseases.Emerging Infectious Diseases 7.[14] Earn DJD, Rohani P, Bolker BM, Grenfell BT (2000) A Simple Model for Complex Dynamical Tran-sitions in Epidemics. Science 287: 667-670.[15] Ellner SP, Bailey BA, Bobashev GV, Gallant AR, Grenfell BT, Nychka DW (1998) Noise and Nonlinear-ity in Measles Epidemics: Combining Mechanistic and Statistical Approaches to Population Modeling.The American Naturalist 151: 425-440.[16] Fine PEM, Clarkson JA (1982) Measles in England and Wales - I: An Analysis of Factors UnderlyingSeasonal Patterns. International Journal of Epidemiology 11: 5-14.[17] Finkenst¨adt BF, Grenfell BT (2000) Time series modelling of childhood diseases: a dynamical systemsapproach. Journal of the Royal Statistical Society - Series C 49: 187-205.[18] Fujinami RS, Sun X, Howell JM, Jenkin JC, Burns JB (1998) Modulation of Immune System Functionby Measles Virus Infection: Role of Soluble Factor and Direct Infection. Journal of Virology 72: 9421-9427.[19] Grassly NC, Fraser C (2006) Seasonal infectious disease epidemiology. Proc Roy Soc B: BiologicalScience 273: 2541-2550.[20] Hadeler K (2010) Parameter Identiﬁcation in Epidemic Models, manuscript in review.[21] He D, Lonides EL, King AA (2009) Plug-and-play inference for disease dynamics: measles in large andsmall populations as a case study. J. R. Soc. Interface (published online: doi:10.1098/rsif.2009.0151).[22] International Infectious Disease Data Archive, http://iidda.mcmaster.ca/.[23] Joh RI, Wang H, Weiss H, Weitz JS (2009) Dynamics of indirectly transmitted infectious diseases withimmunological threshold. Bulletin of Mathematical Biology 71: 845-862.[24] Keeling MJ, Rohani R (2008) Modeling infectious diseases in humans and animals. Princeton UniversityPress.[25] Keeling MJ, Rohani P, Grenfell BT (2001) Seasonally forced disease dynamics explored as switchingbetween attractors. Physica D 148: 317-335.[26] Kermack WO, McKendrik AG (1927) A contribution to the mathematical theory of epidemics. ProcRoy Soc Lond A 115: 700-721.[27] Kincaid D, Cheney W (2002) Numerical analysis: mathematics of scientiﬁc computing (3rd edition).American Mathematical Society, 788 pages.[28] London WP, Yorke JA (1973) Recurrent outbreaks of measles, chickpox and mumps I. Seasonal variationin contact rates. American Journal of Epidemiology 98: 453-468.[29] Markowitz LE, Preblud SR, Fine PE, Orenstein WA (1990) Duration of live measles vaccineinducedimmunity. Pediatr Infect Dis J 9: 101-110.[30] Mollison D (1995) Epidemic Models: Their Structure and Relation to Data. Cambridge UniversityPress.[31] The weekly OPCS (Oﬃce of Population Censuses and Surveys) reports, the Registrar General’s Quar- ecovery algorithm ecovery algorithm f ( t ) =10 − [1 . . t )]; the dashed curve is β ( t ) recovered from f ( t ) using (10). (b) These transmissionfunctions are estimated using spline and trigonometric interpolations on the 21 data points.Figure 2. (a) We extract 21 equally spaced data points from the oscillatory decaying function g ( t ) = 10 − [1 . t )] exp( − . t ); the dashed curve is β ( t ) recovered from g ( t ) using (10). (b)These transmission functions are estimated using spline and trigonometric interpolations on the21 data points.Figure 3. (a) Aggregated monthly measles data from England and Wales in 1948 − − β ( t ) recovered from our extended algorithm withhistoric birth rates. (a) The recovered β ( t ) with β (0) = 270: fast increasing peaks. (b) Fouriertransform of ﬁltered β ( t ) showing the dominant frequency component, 3 per year. (c) The recovered β ( t ) with β (0) = 230: slowly increasing peaks. (d) Fourier transform of ﬁltered β ( t ) showingtwo comparable dominant frequencies components, 1 and 3 per year. (e) The recovered β ( t )with β (0) = 140: stationary peaks. (f) Fourier transform of ﬁltered β ( t ) showing the dominantfrequency component, 1 per year. (g) The recovered β ( t ) with β (0) = 80: stationary peaks. (h)Fourier transform of ﬁltered β ( t ) showing the dominant frequency component, 1 per year; 1 / β ( t ) with data correction. The moduli of Fourier transformof β ( t ) with the 92 .

3% correction factor have identical spectral peaks as those without data correc-tion in Figure 4. (a) The recovered β ( t ) with β (0) = 120. (c) The recovered β ( t ) with β (0) = 100.(e) The recovered β ( t ) with β (0) = 60. (g) The recovered β ( t ) with β (0) = 30. Panels (b)(d)(f)(h)plot moduli of Fourier transform of corresponding β ( t ) in (a)(c)(e)(g), respectively.Figure 6. The transmission rate β ( t ) is generated from Haar function with major school holidaysvia aggregation. The modulus of Fourier transform of β ( t ) generated from the widely used periodic ecovery algorithm ecovery algorithm Time T r an s m i ss i on f un c t i on , β ( t ) ν =5 −5 D a t a f o r f r a c t i on o f i n f e c t ed popu l a t i on , I ( t ) = f ( t ) β ( ) = (a) Β H t L SplineTrigonometric (b)

FIG. 1: ecovery algorithm Time T r an s m i ss i on f un c t i on , β ( t ) ν =5 −5 D a t a f o r f r a c t i on o f i n f e c t ed popu l a t i on , I ( t ) = f ( t ) β ( ) = (a) Β H t L SplineTrigonometric (b)

FIG. 2: ecovery algorithm −3 Time (year) F r a c t i on o f i n f e c t ed popu l a t i on ( a ft e r c o rr e c t i on ) (a) N o r m a li z ed S pe c t r u m Modulus of Fourier transform of monthly data (b) o f b i r t h s i n U K Time (year) (c) N o r m a li z ed S pe c t r u m Modulus of Fourier transform of historical birth data (d)

FIG. 3: ecovery algorithm Β (0)=270 with no data correctionJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (a) S pe c t r u m Modulus of Fourier transform of β (t) Max (b) Β (0)=230 with no data correctionJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (c) S pe c t r u m Modulus of Fourier transform of β (t) Max Max (d) Β (0)=140 with no data correctionJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (e) S pe c t r u m Modulus of Fourier transform of β (t) Max (f) Β (0)=80 with no data correctionJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (g) S pe c t r u m Modulus of Fourier transform of β (t) Max 2−year period peak is large and comparable to 1/3−year period peak (h)

FIG. 4: ecovery algorithm Β (0)=120 with 92.3% data correction factorJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (a) S pe c t r u m Modulus of Fourier transform of β (t) Max (b) Β (0)=100 with 92.3% data correction factorJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (c) S pe c t r u m Modulus of Fourier transform of β (t) Max Max (d) Β (0)=60 with 92.3% data correction factorJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (e) S pe c t r u m Modulus of Fourier transform of β (t) Max (f) Β (0)=30 with 92.3% data correction factorJan 1948 Jan 1950 Jan 1952 Jan 1954 Jan 1956 Time H year L Β H t L (g) S pe c t r u m Modulus of Fourier transform of β (t) Max 2−year period peak is large and comparable to 1/3−year period peak (h)

FIG. 5: ecovery algorithm S pe c t r u m Modulus of Fourier transform of β (t) generated fromperiodic Haar function with major school terms(t) generated fromperiodic Haar function with major school terms