Stochastic Modeling of an Infectious Disease Part II: Simulation Experiments and Verification of the Analysis
SStochastic Modeling of an Infectious DiseasePart II: Simulation Experiments and Verification of the Analysis. ∗ Hisashi Kobayashi † Dept. of Electrical EngineeringPrinceton UniversityPrinceton, NJ 08544, U.S.A.January 25, 2021
Abstract
In [2], we introduced a stochastic model of an infectious disease, based on the birth-and-death-with-immigration (BDI) process. The model can capture the essence of dynamics of an infection process.The most significant finding is that the time-dependent (or transient) probability distribution ofthe infected population size is negative binomial distributed at all times with a very long taildistribution. Consequently, an epidemic pattern exhibits a much larger variation than has beenconsidered by most modeling experts, including the epidemiology community.In this report, Part II, simulation runs and their interpretations are presented to support theabove finding. The sample paths of different simulation runs exhibit indeed enormous variations,which confirm our analysis.The epidemic pattern that a given environment (e.g., city or country) experiences is merely one sample path out of infinitely many possible paths. The enormous variations we observe in simulationruns explain why some cities and countries experience much fewer infections and casualties thanothers. The size of the infected or dead populations could differ by factor of 100 or more even underidentical statistical conditions.There are two important implications of our findings. First, it would be a futile effort to attemptto identify all possible causes or reasons why environments of similar situations differ so much interms of epidemic patterns and the number of casualties. Our stochastic model suggests that “luck”or “chances” play a more significant role than most of us would be led to believe. Second, we shouldbe prepared for a worst possible scenario, which only a stochastic model can provide with someprobabilistic qualification, rather than preparing for the future, based on a single prediction curvethat a deterministic model can provide.For all these probabilistic arguments, however, our analysis also shows that the infection processwill immediately start declining, once the exponential parameter a = λ − µ should turn negative. ∗ This article was originally published on August 5, 2020 in the authored HP [1]. Additional simulation re-sults were reported in the keynote at ITC-32. For the slides and video, see https://hp.hisashikobayashi.com/a-stochastic-model-of-an-infectious-disease/ . † The Sherman Fairchild University Professor of Electrical Engineering and Computer Science, Emeritus, Email:[email protected], Home page: https.hp.HisashiKobayashi.com
Wipipedia: https://en.wikipedia.org/wiki/Hisashi_Kobayashi a r X i v : . [ q - b i o . P E ] J a n his result will be discussed in full, by analysis [3] and simulation [4], where we assume the infectionrate parameter λ and the recovery (which includes removal and death) parameter µ are arbitraryfunctions of time. Keywords:
Stochastic vs. deterministic models, Birth-and-death process with immigration (BDI),Event-driven simulation, Random number generator, Sample paths of the processes I ( t ), R ( t ) and R ( t ), The death process D ( t ), Probability generating function, Time-dependent PMF, Daily newinfections, Percentile curves, Negative binomial distribution, Probability distribution with a longtail, Large coefficient of variation, Branching process, Disparity or large variation among differ-ent sample paths, Analogy to disparity in wealth distribution, Statistical fluctuation in the initialphase, Law of large numbers. Contents I ( t ), the number of infections at time t . . . . . . . 92.3.2 Simulation of the external arrivals A ( t ) . . . . . . . . . . . . . . . . . . . . . 102.3.3 Simulation of the processes B ( t ) and R ( t ) . . . . . . . . . . . . . . . . . . . . 112.4 Other important statistical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.1 Number of daily new infections . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.2 Cumulative number of deaths . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 I ( t ) with the Initial Condition I ≥ I = 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 An efficient computation of the PMF of I ( t ) with I = 1 . . . . . . . . . . . . 183.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Our model formulation began with the set of linear differential-difference equations (Part I, page8, replicated below in (4) and (5) ) for the probability mass function (PMF) P n ( t ) (cid:44) Pr[ I ( t ) = n ] , n = 0 , , , · · · , t ≥ , (1)where I ( t ) is the number of infected, hence infectious persons , at time t . We can express I ( t ) as I ( t ) = I + A ( t ) + B ( t ) − R ( t ) , (2)Copyright ©©
Stochastic vs. deterministic models, Birth-and-death process with immigration (BDI),Event-driven simulation, Random number generator, Sample paths of the processes I ( t ), R ( t ) and R ( t ), The death process D ( t ), Probability generating function, Time-dependent PMF, Daily newinfections, Percentile curves, Negative binomial distribution, Probability distribution with a longtail, Large coefficient of variation, Branching process, Disparity or large variation among differ-ent sample paths, Analogy to disparity in wealth distribution, Statistical fluctuation in the initialphase, Law of large numbers. Contents I ( t ), the number of infections at time t . . . . . . . 92.3.2 Simulation of the external arrivals A ( t ) . . . . . . . . . . . . . . . . . . . . . 102.3.3 Simulation of the processes B ( t ) and R ( t ) . . . . . . . . . . . . . . . . . . . . 112.4 Other important statistical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.1 Number of daily new infections . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.2 Cumulative number of deaths . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 I ( t ) with the Initial Condition I ≥ I = 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 An efficient computation of the PMF of I ( t ) with I = 1 . . . . . . . . . . . . 183.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Our model formulation began with the set of linear differential-difference equations (Part I, page8, replicated below in (4) and (5) ) for the probability mass function (PMF) P n ( t ) (cid:44) Pr[ I ( t ) = n ] , n = 0 , , , · · · , t ≥ , (1)where I ( t ) is the number of infected, hence infectious persons , at time t . We can express I ( t ) as I ( t ) = I + A ( t ) + B ( t ) − R ( t ) , (2)Copyright ©© Hisashi Kobayashi 2021 2here I is the initial value: I (0) = I , (3)and1. A ( t ) is the cumulative count of infected arrivals from the outside. We assume the arrivalpattern is completely random, i.e., a Poisson process with rate ν [persons/day].2. B ( t ) is the cumulative count of the infections that occur in the interval [0 , t ]. An infectionoccurs at the rate λ [person/day/infectious person].3. R ( t ) is the cumulative count of the infected persons who recover, are removed or die in [0 , t ].This event occurs at the rate µ [persons/day/infected person].The P n ( t ) , n = 0 , , , · · · should satisfy the following set of differential equations: dP ( t ) dt = − νP ( t ) + µP ( t ) , (4) dP n ( t ) dt = [( n − λ + ν ] P n − ( t ) − [ n ( λ + µ ) + ν ] P n ( t ) + ( n + 1) µP n +1 ( t ) , n = 1 , , . · · · , (5)with the initial condition (3).In order to find solve the above differential equations, we use the probability generating function (PGF) defined by G ( z, t ) (cid:44) E [ z I ( t ) ] = ∞ (cid:88) n =0 P n ( t ) z n , (6)The set of differential equations (4) and (5) are then transformed into one partial differentialequation (PDE): ∂G ( z, t ) ∂t = ( z − (cid:20) ( λz − µ ) ∂G ( z, t ) ∂z + νG ( z, t ) (cid:21) , (7)with the condition G ( z,
0) = z I . (8)We apply Lagrange’s method to the above PDE, obtaining G ( z, t ) = (cid:18) aλz − µ − λ ( z − e at (cid:19) r (cid:18) λz − µ − µ ( z − e at λz − µ − λ ( z − e at (cid:19) I , (9)where a (cid:44) λ − µ, and r (cid:44) νλ . (10)If I = 0, (9) reduces to G ( z, t ) = (cid:18) aλz − µ − λ ( z − e at (cid:19) r = (cid:18) − β ( t )1 − β ( t ) z (cid:19) r , (11)Copyright ©©
0) = z I . (8)We apply Lagrange’s method to the above PDE, obtaining G ( z, t ) = (cid:18) aλz − µ − λ ( z − e at (cid:19) r (cid:18) λz − µ − µ ( z − e at λz − µ − λ ( z − e at (cid:19) I , (9)where a (cid:44) λ − µ, and r (cid:44) νλ . (10)If I = 0, (9) reduces to G ( z, t ) = (cid:18) aλz − µ − λ ( z − e at (cid:19) r = (cid:18) − β ( t )1 − β ( t ) z (cid:19) r , (11)Copyright ©© Hisashi Kobayashi 2021 3here β ( t ) (cid:44) λ ( e at − λe at − µ . (12)To obtain E [ I ( t )] = I ( t ), we use the formula E [ I ( t )] = ∂G ( z,t ) ∂z (cid:12)(cid:12)(cid:12) z =1 : I ( t ) = rβ ( t )1 − β ( t ) = νa ( e at − . (13)It would be easy to observe that • If a > I ( t ) grows exponentially to infinity, as t → ∞ ; • If a <
0, it converges to − νa > • If a = 0, I ( t ) = νt for all t ≥ T [days] be the number of days that is required for I ( t ) to double. Then from (13), we find e aT = 2 , or aT = ln 2 = 0 . . (14)If a = 0 . − ] (as in the running example of Part I and the present paper), I ( t ) doubles inevery T ≈ . In order to obtain the PMFs P n ( t ), by referring to the definition of the PGF (6), we differentiate G ( z, t ) (11) w.r.t. z , set z = 0, divide the resulting expression by n !. That is, P ( t ) = G (0 , t ) = (1 − β ( t )) r P ( t ) = ∂G ( z, t ) ∂z (cid:12)(cid:12)(cid:12)(cid:12) z =0 = (1 − β ( t )) r rβ ( t ) P ( t ) = 12! ∂ G ( z, t ) ∂z (cid:12)(cid:12)(cid:12)(cid:12) z =0 = (1 − β ( t )) r ( r + 1) r β ( t ) ... P n ( t ) = 1 n ! ∂ n G ( z, t ) ∂z n (cid:12)(cid:12)(cid:12)(cid:12) z =0 = (1 − β ( t )) r ( n + r − · · · ( r + 1) rn ! β ( t ) n , (15)which we can write as P n ( t ) = K ( n, r )(1 − β ( t )) r β ( t ) n , n = 0 , , , · · · , (16)where K ( n, r ) is the binomial coefficient: K ( n, r ) = ( n + r − n + r − · · · ( r + 1) rn ! (cid:44) (cid:18) n + r − n (cid:19) . (17) When the exponential growth rate in the environment of your interest differs from our example, you can stillinterpret our simulation results meaningfully. For example if I ( t ) double every week in the environment of yourinterest, you can scale the time axis by factor of two, i.e., t = 0 , , · · · ,
50 in our simulation plots, you assigndifferent time scale, t (cid:48) = 0 , , , · · · , Copyright ©©
50 in our simulation plots, you assigndifferent time scale, t (cid:48) = 0 , , , · · · , Copyright ©© Hisashi Kobayashi 2021 4he above distribution P n ( t ) takes the form of a Pascal distribution , or negative binomial distri-bution (NBD), often denoted as NB( k, q ), which is the probability of the number of failures n inBernoulli trials needed to achieve k successes, where q is the probability of failure per trial P NBD n = K ( n, k )(1 − q ) k q n , n = 0 , , , · · · . (18)The PGF of this NBD is given by G NBD ( z ) (cid:44) ∞ (cid:88) n =0 P NBD n z n = (cid:18) − q − qz (cid:19) k . (19)Thus, by equating k = r (cid:44) νλ and q = β ( t ), we see from (11) that the BDI process I ( t ) is negativebinomial distributed according to NB( r, β ( t )) at given t .We note the following properties of K ( n, r ) (17) and the NBD (16):i K (0 , r ) = 1, thus P ( t ) = (1 − β ( t )) r for all t ≥ K (1 , r ) = r , thus P ( t ) = r (1 − β ( t )) r β ( t ). Therefore, P ( t ) = rβ ( t ) P ( t ).iii K ( n,
1) = 1 for all n . Then the PMF reduces to a geometric distribution (1 − q ) q n with q = β ( t )iv K ( n,
2) = n + 1 for all n ;v If r <
1, then K ( n, r ) < n ≥
0. From Property ii and (16), we find that for each t , thePMF P n ( t ) is a monotone decreasing function of n , i.e., P n ( t ) ∝ β ( t ) n , where β ( t ) approaches1 from below as t becomes large.vi If 1 < r <
2, then K ( n, r ) is a monotone increasing concave (i.e., convex cap) function of n ,bounded by 1 < K ( n, r ) < n + 1.vii If r >
2, then K ( n, r ) is a monotone increasing convex function of n , bounded from below K ( n, r ) > n + 1.viii By using Stirling’s approximation for factorials and the Gamma function,Γ( n + 1) = n ! ≈ √ πn (cid:16) ne (cid:17) n , for n > , (20)we can approximate K ( n, r ), for n > r >
1, by K ( n, r ) ≈ (cid:114) π ( n + r − n + r − ( r − r − n n . (21)Thus, if r is small, e.g., r ≈
1, the distribution of (16), for any given t , slowly decreases towardszero as n increases. Thus, the distribution has a long tail (see Figures 7-12 of Part I). Consequently,different sample paths of I ( t ) are expected to exhibit enormous disparities, as will be shown in thenext section, where we present simulation results, which demonstrate huge spreads across differentsample paths.It goes without saying that the most decisive factor that shapes the epidemic pattern is theexponential parameter a = λ − µ . However, for a given a , different simulation runs show enormousCopyright ©©
1, the distribution of (16), for any given t , slowly decreases towardszero as n increases. Thus, the distribution has a long tail (see Figures 7-12 of Part I). Consequently,different sample paths of I ( t ) are expected to exhibit enormous disparities, as will be shown in thenext section, where we present simulation results, which demonstrate huge spreads across differentsample paths.It goes without saying that the most decisive factor that shapes the epidemic pattern is theexponential parameter a = λ − µ . However, for a given a , different simulation runs show enormousCopyright ©© Hisashi Kobayashi 2021 5ifferences in their infectious patterns. This wild random behavior is perhaps well beyond whatmost epidemiologists are cognizant of. In other words, we should accept that we won’t be able tofind out all factors that can explain why some cities or countries are experiencing more hardshipthan others in terms of the number of infections and death tolls.This also implies that Japan and other countries that have observed a relatively small numberof infections and casualties should be aware that they may not be as fortunate in new waves ofthe pandemic. They should be well prepared for the worst case scenario in order to protect theircitizens.In order to be able to provide specifically what will be the worst possible scenario, we need tocome up with an accurate estimates of the model parameters λ, ν and ν , by carefully analyzingreal and reliable data, and make the most likely estimates (or ranges of estimate), which will bethe main focus in Part IV [5] of our report. Simulations are often used when analytic techniques to estimate or predict the performance of acomplex system are hard to come by. So-called
Monte-Carlo simulation techniques, such as variancereduction techniques , are often adopted to estimate a given performance measure accurately andefficiently. The main purpose of our simulation experiments here, however, is different from thesesituations in that we fortunately have obtained an exact analytic result of the system, i.e., we havefound closed-form expressions for the PMFs P n ( t )’s by solving a partial differential equation.We present here the results of various simulation runs primarily to better explain and demon-strate the validity and utility of our stochastic model. Plots of various sample paths of our simula-tion experiments should serve as a convincing and understandable evidence to support our model. We shall briefly describe how our simulator is designed for the benefit of some readers who maynot be sufficiently familiar with probabilistic simulation of a Markov process such as the birth-death-immigration process. We adopt the time-asynchronous approach as opposed to the time-synchronous approach, which may be more suited to simulating events that occur periodically atpredetermined moments, such as seen in time-synchronous communication systems. The time-asynchronous approach is also referred to as the event-scheduling approach.In our model, there are three types of events; (A) an arrival of an infected person from theoutside; (B) an infection caused by an infectious person; and (R) a recovery/removal/death of aninfected person.The random process I ( t ) defined by the differential equations (4) and (5) is a Markov process.At an arbitrarily given time t , when I ( t ) = n , the time interval until the next event is a randomvariable (RV) X with a negative exponential distribution of mean 1 /γ ( n ), where γ ( n ) = n ( λ + µ ) + ν. (22)That is, the distribution function of the RV X is F X ( x ) (cid:44) Pr[ X ≤ x ] = 1 − e − γ ( n ) x . (23) As for detailed discussion of simulation techniques, see e.g., [6], Chapter 4, [7], Chapter 16. Note that [ n ( λ + µ ) + ν ] is the coefficient of the P n ( t ) term in the RHS of differential equation of P n ( t ) (5). Copyright ©©
Monte-Carlo simulation techniques, such as variancereduction techniques , are often adopted to estimate a given performance measure accurately andefficiently. The main purpose of our simulation experiments here, however, is different from thesesituations in that we fortunately have obtained an exact analytic result of the system, i.e., we havefound closed-form expressions for the PMFs P n ( t )’s by solving a partial differential equation.We present here the results of various simulation runs primarily to better explain and demon-strate the validity and utility of our stochastic model. Plots of various sample paths of our simula-tion experiments should serve as a convincing and understandable evidence to support our model. We shall briefly describe how our simulator is designed for the benefit of some readers who maynot be sufficiently familiar with probabilistic simulation of a Markov process such as the birth-death-immigration process. We adopt the time-asynchronous approach as opposed to the time-synchronous approach, which may be more suited to simulating events that occur periodically atpredetermined moments, such as seen in time-synchronous communication systems. The time-asynchronous approach is also referred to as the event-scheduling approach.In our model, there are three types of events; (A) an arrival of an infected person from theoutside; (B) an infection caused by an infectious person; and (R) a recovery/removal/death of aninfected person.The random process I ( t ) defined by the differential equations (4) and (5) is a Markov process.At an arbitrarily given time t , when I ( t ) = n , the time interval until the next event is a randomvariable (RV) X with a negative exponential distribution of mean 1 /γ ( n ), where γ ( n ) = n ( λ + µ ) + ν. (22)That is, the distribution function of the RV X is F X ( x ) (cid:44) Pr[ X ≤ x ] = 1 − e − γ ( n ) x . (23) As for detailed discussion of simulation techniques, see e.g., [6], Chapter 4, [7], Chapter 16. Note that [ n ( λ + µ ) + ν ] is the coefficient of the P n ( t ) term in the RHS of differential equation of P n ( t ) (5). Copyright ©© Hisashi Kobayashi 2021 6eneration of instances of the random variable X can be done by calling the random numbergenerator (RNG) that is available in almost all programming languages, including MATLAB. Itwill generate an instances u of the random variable U , which is uniformly distributed between 0and 1. Then we transform u by the function F − X ( · ), i.e., find x such that u = F X ( x ) = 1 − e − γ ( n ) x .In other words, x = log(1 − u ) /γ ( n ). Since u is uniformly distributed between 0 and 1, so is 1 − u ,thus x (cid:48) (cid:44) log u/γ ( n ) can be used instead of x ,At the occurrence of an event, we classify it as one of the three types of events as follows:choose type-(A) with probability P A (cid:44) ν/γ ( n ); choose type-(B) with probability P B = nλ/γ ( n );and choose type-(R) with probability P R = 1 − P A − P B = nµ/γ ( n ). Note that if n = 0, then P A = 1, and P B = P R − memoryless property , (b) the reproductiveadditivity , and (c) the decomposition property. Thus, the core of our simulation program is as follows.1. If I ( t ) = 0, then γ (0) = ν , hence, the only possible event to consider is a type-(A) event.The time until the next arrival, x , is determined by x = log u/ν , where u ∈ [0 ,
1] is an outputdrawn from the RNG. Advance the clock time to t (cid:48) (cid:44) t + x , and increment A ( t ) by 1 for t ≥ t (cid:48) .2. If I ( t ) = n ≥
1, the next event can be any of the three types. The time-interval until the nextevent is determined by x = log u/γ ( n )T, where u ∈ [0 ,
1] as defined above. Take another RNGoutput u (cid:48) ∈ [0 , ≤ u (cid:48) ≤ P A : classify it as a type-(B)event, if P A < u (cid:48) ≤ P A + P B ; and classify it as type-(R), if P A + P B < u (cid:48) ≤ t (cid:48) (cid:44) t + x , and increment the counter A ( t ) , B ( t ) or R ( t ) by one dependingon the classified result is type (A), (B) or (R).3. Go back to step 1 or 2, depending I ( t (cid:48) ) = 0 or I ( t (cid:48) ) ≥
1, and use t (cid:48) as a new clock time, andrepeat the above steps. As we discussed in Part I, the ratio of the standard deviation of a given probability distribution toits mean, called the coefficient of variation (CV), sometimes serve as a simple and good measure The random number generator used today in MATLAB, Python, Pokemon, and others is what is known asMersenne Twisted GFSR (general feedback shift register) sequence, developed by Makoto Matsumoto and TakujiNishimoto (see “Mersenne Twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,”
ACM Transactions on Modelling and Computer Simulation , January 1998). This algorithm generates a pseudo-random sequence of the period 2 −
1, and the implementation is abbreviated as MT19937. It uses a Mersenneprime number M , of the form M = 2 n −
1, for some integer n . Sometimes n is restricted to a prime number, as well. The property (a) has to do with the memoryless property of the variable X which has an exponential distribution.Assume that Y time units have elapsed since the last arrival. The time until the next arrival R (cid:44) X − Y has the sameexponential distribution as X , regardless of Y . The property (b) means that when m independent Poisson processeswith rate λ k , k = 1 , , · · · , m are merged, the resulting stream is another Poisson process with rate (cid:80) mk =1 λ k . Theproperty (c) is an opposite of the property (b). If a Poisson process with rate λ is split into m sub-streams, byassigning each arrival independently into the k th sub-stream with probability p k , where (cid:80) mk =1 p k = 1. Then thesub-streams are independent Poisson processes with rate p k λ ’s. For proofs of these properties, see e.g. [8], pp.403-405). Copyright ©©
1, for some integer n . Sometimes n is restricted to a prime number, as well. The property (a) has to do with the memoryless property of the variable X which has an exponential distribution.Assume that Y time units have elapsed since the last arrival. The time until the next arrival R (cid:44) X − Y has the sameexponential distribution as X , regardless of Y . The property (b) means that when m independent Poisson processeswith rate λ k , k = 1 , , · · · , m are merged, the resulting stream is another Poisson process with rate (cid:80) mk =1 λ k . Theproperty (c) is an opposite of the property (b). If a Poisson process with rate λ is split into m sub-streams, byassigning each arrival independently into the k th sub-stream with probability p k , where (cid:80) mk =1 p k = 1. Then thesub-streams are independent Poisson processes with rate p k λ ’s. For proofs of these properties, see e.g. [8], pp.403-405). Copyright ©© Hisashi Kobayashi 2021 7 t n Percentile curves of I(t) with I(0)=0
Figure 1:
Percentile curves of P n ( t ) of I ( t ) for ≤ t ≤ , λ = 0 . , µ = 0 . , ν = 0 . . t -1012345 l og10 ( I ( t )) Percentile curves of I(t) with I(0)=0
Figure 2:
Semi-log plot of the percentile curvesfor ≤ t ≤ , λ = 0 . , µ = 0 . , ν = 0 . . of the dispersion of the distribution. We obtained the CV of the distribution of I ( t ) in the BDImodel, as given in (74) in Part I.If the distribution of a random variable X is close to a normal (or Gaussian) distribution, it iswell known that realizations of X fall within the mean ± σ X with the probability 68.2%. Similarly,the mean ± σ X provides a 95.4% confidence level, and the mean ± σ X gives 98% confidence level.But this rule of thumb does not apply to the NB( r, q ) with small r , because the distribution ishighly skewed, far from being symmetrical around the mean. A more accurate and reliable way isto compare simulation plots with the percentile curves of the NB( r, q ). To that end, we calculatethe cumulative distribution function (CDF) by F BDI ( x, t ) (cid:44) Pr[ N ( t ) ≤ x ] = (cid:98) x (cid:99) (cid:88) n =0 P n ( t ) , (24)where P n ( t ) is the PMF of (16) and (cid:98) x (cid:99) is the largest integer, not exceeding x . Figure 1 shows thecurves where the CDF take 0.975, and 0.025 (in red dash curves), 0.84 and 0.16 (in green dash)and 0.5 (in blue dash). The stochastic mean E [ I ( t )] (cid:44) I ( t ) is shown in black solid curve.We can expect that if we conduct many simulation experiments, about 68% of the sample pathswill fall within the region between the green dashed curves, and about 95% of the sample pathsshould fall within the range given by the two red dashed curves. Roughly one half of the simulationcurves should be above the blue dashed curve and the other half should be below this curve. Notethat the stochastic mean curve is appreciably above the median (50%) curve: nearly by a factor oftwo.Note that the top curve (the upper half of the 95% confidence interval) climbs up to as large as100,000 by the 50th day, while it is only 1,800 on the 30th day. This is the power of the exponentialgrowth, with which we are all familiar now by observing how rapidly the COVID-19’s infectionshave grown in many parts of the world.Copyright ©©
Semi-log plot of the percentile curvesfor ≤ t ≤ , λ = 0 . , µ = 0 . , ν = 0 . . of the dispersion of the distribution. We obtained the CV of the distribution of I ( t ) in the BDImodel, as given in (74) in Part I.If the distribution of a random variable X is close to a normal (or Gaussian) distribution, it iswell known that realizations of X fall within the mean ± σ X with the probability 68.2%. Similarly,the mean ± σ X provides a 95.4% confidence level, and the mean ± σ X gives 98% confidence level.But this rule of thumb does not apply to the NB( r, q ) with small r , because the distribution ishighly skewed, far from being symmetrical around the mean. A more accurate and reliable way isto compare simulation plots with the percentile curves of the NB( r, q ). To that end, we calculatethe cumulative distribution function (CDF) by F BDI ( x, t ) (cid:44) Pr[ N ( t ) ≤ x ] = (cid:98) x (cid:99) (cid:88) n =0 P n ( t ) , (24)where P n ( t ) is the PMF of (16) and (cid:98) x (cid:99) is the largest integer, not exceeding x . Figure 1 shows thecurves where the CDF take 0.975, and 0.025 (in red dash curves), 0.84 and 0.16 (in green dash)and 0.5 (in blue dash). The stochastic mean E [ I ( t )] (cid:44) I ( t ) is shown in black solid curve.We can expect that if we conduct many simulation experiments, about 68% of the sample pathswill fall within the region between the green dashed curves, and about 95% of the sample pathsshould fall within the range given by the two red dashed curves. Roughly one half of the simulationcurves should be above the blue dashed curve and the other half should be below this curve. Notethat the stochastic mean curve is appreciably above the median (50%) curve: nearly by a factor oftwo.Note that the top curve (the upper half of the 95% confidence interval) climbs up to as large as100,000 by the 50th day, while it is only 1,800 on the 30th day. This is the power of the exponentialgrowth, with which we are all familiar now by observing how rapidly the COVID-19’s infectionshave grown in many parts of the world.Copyright ©© Hisashi Kobayashi 2021 8 t I ( t ) I(t) with I(0)=0: Runs 1 through 6
Run 12345697.5%84%E[I(t)]50%16%2.5%
Figure 3:
Simulated I ( t ) process, Runs 1-6; λ =0 . , µ = 0 . , ν = 0 . . t I ( t ) log I(t) with I(0)=0; Runs 1 through 6 Run 12345697.5%84%E[I(t)]50%16%2.5%
Figure 4:
Semi-log plots of simulated I ( t ) process,Runs 1-6; λ = 0 . , µ = 0 . , ν = 0 . . t I ( t ) I(t) with I(0), Runs 1-6
Run 12345697.5%84%E[I(t)]50%16%2.5%
Figure 5:
Plots of simulated I ( t ) process in thefirst 25 days, Runs 1-6; λ = 0 . , µ = 0 . , ν = 0 . . t I ( t ) I(t) with I(0), Runs 1-6
Run 12345697.5%84%E[I(t)]50%16%2.5%
Figure 6:
Semi-log plots of the I ( t ) process, Runs1-6; λ = 0 . , µ = 0 . , ν = 0 . . I ( t ) , the number of infections at time t We present the results of six consecutive simulation runs (in solid curves) in Figure 3, where wesuperimpose the percentile curves (in dashed curves) obtained in the previous section. The expectedvalue I ( t ) = E [ I ( t )] of (13) is shown in a black dashed curve that ends at t = 500 with the value . . (cid:0) e . ∗ − (cid:1) = e − ≈ . × . Three runs (Runs 4, 1 and 3) are clearly above I ( t ), andthree runs (Runs 6, 5 and 2) lie below I ( t ), although Run 6 (in light blue) is very close to I ( t ).This may not necessarily be a typical situation, because one half of the simulation runs, on average,should be above the “median curve”, i.e., 50 percentile curve (shown in blue dash). Five of the sixruns are within the 68% confidence interval, (between the two green dashed curves).The most important observation to make here is the enormous disparity among the six simula-tions. Run 4 (in cyan) has as many as 1 . × infected people at t = 500 [days], whereas RunCopyright ©©
Semi-log plots of the I ( t ) process, Runs1-6; λ = 0 . , µ = 0 . , ν = 0 . . I ( t ) , the number of infections at time t We present the results of six consecutive simulation runs (in solid curves) in Figure 3, where wesuperimpose the percentile curves (in dashed curves) obtained in the previous section. The expectedvalue I ( t ) = E [ I ( t )] of (13) is shown in a black dashed curve that ends at t = 500 with the value . . (cid:0) e . ∗ − (cid:1) = e − ≈ . × . Three runs (Runs 4, 1 and 3) are clearly above I ( t ), andthree runs (Runs 6, 5 and 2) lie below I ( t ), although Run 6 (in light blue) is very close to I ( t ).This may not necessarily be a typical situation, because one half of the simulation runs, on average,should be above the “median curve”, i.e., 50 percentile curve (shown in blue dash). Five of the sixruns are within the 68% confidence interval, (between the two green dashed curves).The most important observation to make here is the enormous disparity among the six simula-tions. Run 4 (in cyan) has as many as 1 . × infected people at t = 500 [days], whereas RunCopyright ©© Hisashi Kobayashi 2021 9 (in red) has only ≈ × . Their ratio, therefore, is more than a factor of 40. If we run thesimulator many times, for instance, the ratio of the numbers of the infected of the worst (e.g.,97%level) vs. the luckiest (e.g., 2.5% level) cases can be as large as 1,000 (see [9], slide t ) of these simulation runsof the stochastic process I ( t ). From the plots in the semi-log scale, we can see that once I ( t ) hasreached the level of ≈ . This is because once the I ( t ) has reached ≈ I ( t ) is the sum of many statistically independent and identicalinfection processes, and exhibits a more stable and predictable behavior than in the initial phase.In other words, the large variance among different sample paths of the BDI process is due to moreerratic and unpredictable behavior in the early phase of the stochastic process I ( t ). In the initialphase the random arrivals of the infected from outside, as well as the fluctuations in the internalinfection (a branching process) and the recovery/death, all contribute to the stochastic behavior of I ( t ) because I ( t ) is still small. As I ( t ) grows the effect of external arrivals will become negligibleas long as r = νλ is small, i.e., the order of unity or less.As remarked in the present author’s keynote speech [9], it may be worth noting that the BDI pro-cess model can be also used to explain the enormous disparity we observe in the wealth among differ-ent individuals of similar income levels, similar expenditure, and similar intelligence and knowledgein investment, thus they have a more or less similar investment portfolio in their initial phase. After30 or 40 years after beginning their careers after schooling, some may become multi-millionaires (oreven billionaires), whereas some end up with a life of modest means, if not living hand to mouth.The present author believes that this type of disparity in wealth distribution can be explained byapplying the BDI process model. . A ( t )Figure 8 show the arrival patterns. The Poisson distribution of mean λ has the variance equal tothe mean: σ = λ ; and for large λ , its CDF (cumulative distribution function) converges to that ofthe normal (or Gaussian distribution); Thus, 68 percent and 95 percent confidence levels could bewell approximated by the λ ± √ λ , and the λ ± √ λ , respectively. log e at = log e × . t = 0 . t . In this analogy between the infection process and the individual’s wealth growth, we interpret the infection rate λ as the ROI (return on investment), and the recovery/death rate µ as the rate of expenditures. The rate of arrivalsof the infected from outside ν can be translated into the amount of additional money that can be put into investment.The parameter a = λ − µ is equivalent to the net growth rate of the wealth. Given that all these parameters arealmost identical among different individuals, the disparity in their wealth growth depends, to a large extent, on theirluckiness or unluckiness in the early phase of their investments, because the growth or decrease of wealth will bedictated by randomness in success or failure of investment, unexpected large expenditure or loss, and the availabilityof new fund for investment. Once the wealth exceeds a certain level, for instance, a million dollars, the future growthis more or less predictable from their investment strategy and expenditure because failures in some investments willaverage out with successes in other investments. Unavailability of new fund for investment will not affect muchcompared with the early phase when the wealth is small. The net asset growth will behave more or less predictablyas a deterministic model can tell. To the best of the author’s knowledge the applicability of the BDI process modelto explain the disparity in wealth distribution seems a novel approach. Copyright ©©
Semi-log plots of the I ( t ) process, Runs1-6; λ = 0 . , µ = 0 . , ν = 0 . . I ( t ) , the number of infections at time t We present the results of six consecutive simulation runs (in solid curves) in Figure 3, where wesuperimpose the percentile curves (in dashed curves) obtained in the previous section. The expectedvalue I ( t ) = E [ I ( t )] of (13) is shown in a black dashed curve that ends at t = 500 with the value . . (cid:0) e . ∗ − (cid:1) = e − ≈ . × . Three runs (Runs 4, 1 and 3) are clearly above I ( t ), andthree runs (Runs 6, 5 and 2) lie below I ( t ), although Run 6 (in light blue) is very close to I ( t ).This may not necessarily be a typical situation, because one half of the simulation runs, on average,should be above the “median curve”, i.e., 50 percentile curve (shown in blue dash). Five of the sixruns are within the 68% confidence interval, (between the two green dashed curves).The most important observation to make here is the enormous disparity among the six simula-tions. Run 4 (in cyan) has as many as 1 . × infected people at t = 500 [days], whereas RunCopyright ©© Hisashi Kobayashi 2021 9 (in red) has only ≈ × . Their ratio, therefore, is more than a factor of 40. If we run thesimulator many times, for instance, the ratio of the numbers of the infected of the worst (e.g.,97%level) vs. the luckiest (e.g., 2.5% level) cases can be as large as 1,000 (see [9], slide t ) of these simulation runsof the stochastic process I ( t ). From the plots in the semi-log scale, we can see that once I ( t ) hasreached the level of ≈ . This is because once the I ( t ) has reached ≈ I ( t ) is the sum of many statistically independent and identicalinfection processes, and exhibits a more stable and predictable behavior than in the initial phase.In other words, the large variance among different sample paths of the BDI process is due to moreerratic and unpredictable behavior in the early phase of the stochastic process I ( t ). In the initialphase the random arrivals of the infected from outside, as well as the fluctuations in the internalinfection (a branching process) and the recovery/death, all contribute to the stochastic behavior of I ( t ) because I ( t ) is still small. As I ( t ) grows the effect of external arrivals will become negligibleas long as r = νλ is small, i.e., the order of unity or less.As remarked in the present author’s keynote speech [9], it may be worth noting that the BDI pro-cess model can be also used to explain the enormous disparity we observe in the wealth among differ-ent individuals of similar income levels, similar expenditure, and similar intelligence and knowledgein investment, thus they have a more or less similar investment portfolio in their initial phase. After30 or 40 years after beginning their careers after schooling, some may become multi-millionaires (oreven billionaires), whereas some end up with a life of modest means, if not living hand to mouth.The present author believes that this type of disparity in wealth distribution can be explained byapplying the BDI process model. . A ( t )Figure 8 show the arrival patterns. The Poisson distribution of mean λ has the variance equal tothe mean: σ = λ ; and for large λ , its CDF (cumulative distribution function) converges to that ofthe normal (or Gaussian distribution); Thus, 68 percent and 95 percent confidence levels could bewell approximated by the λ ± √ λ , and the λ ± √ λ , respectively. log e at = log e × . t = 0 . t . In this analogy between the infection process and the individual’s wealth growth, we interpret the infection rate λ as the ROI (return on investment), and the recovery/death rate µ as the rate of expenditures. The rate of arrivalsof the infected from outside ν can be translated into the amount of additional money that can be put into investment.The parameter a = λ − µ is equivalent to the net growth rate of the wealth. Given that all these parameters arealmost identical among different individuals, the disparity in their wealth growth depends, to a large extent, on theirluckiness or unluckiness in the early phase of their investments, because the growth or decrease of wealth will bedictated by randomness in success or failure of investment, unexpected large expenditure or loss, and the availabilityof new fund for investment. Once the wealth exceeds a certain level, for instance, a million dollars, the future growthis more or less predictable from their investment strategy and expenditure because failures in some investments willaverage out with successes in other investments. Unavailability of new fund for investment will not affect muchcompared with the early phase when the wealth is small. The net asset growth will behave more or less predictablyas a deterministic model can tell. To the best of the author’s knowledge the applicability of the BDI process modelto explain the disparity in wealth distribution seems a novel approach. Copyright ©© Hisashi Kobayashi 2021 10 t A ( t ) Percentile curves of A(t) with A(0)=0; =0.2
Figure 7:
Percentile curves of Poisson process A ( t ) of rate ν = 0 . . t A(t): Runs 1 through 6, with percentile curves
Run 1234562.5%16%Median84%97.5%E[A(t)]
Figure 8:
External arrival process A ( t ) , Runs 1-6. The coefficient of variation of the Poisson process is given by c A ( t ) = 1 √ νt , (25)which converges to zero as t → ∞ . Thus, the arrival process A ( t ) behaves much more predictablythan the process I ( t ), whose coefficient of variation remains on the order of unity at all t . c I ( t ) = 1 (cid:112) rβ ( t ) , (26)where r = νλ and β ( t ) ≤ t → ∞ , β ( t ) →
1, thus, the coefficient of deviation(CV) converges to (cid:113) λν , which, in our running example is (cid:112) . / . ≈ .
225 as we discussed in PartI, p. 19, Example 3. B ( t ) and R ( t )Although we were able to obtain the time-dependent probability mass function (PMF) of I ( t ), itseems rather difficult to obtain closed form expressions of PMFs for the processes B ( t ) and R ( t ),although we can find their PGFs. It appears that we need to be content with approximate PMFsof B ( t ) and R ( t ). The technique we will explore is the saddle-point integration method, whichBernhard Riemann (1826-1866) pioneered in his pursuit of the famous 1859 conjecture, known asthe Riemann Hypothesis . We will report our full discussion of the processes B ( t ) and R ( t ) inPart V [10].In the present section we only show the results of the above six simulation runs, together with See, e.g., Harold M. Edwards,
Riemann’s Zeta Function , Dover Publications, Inc, 1974. Also http://hp.hisashikobayashi.com/towards-a-proof-of-the-riemann-hypothesis-rh/ and references therein.
Copyright ©©
Copyright ©© Hisashi Kobayashi 2021 11 t B ( t ) B(t): Runs 1 through 6
Run 123456E[B(t)]
Figure 9:
The process B ( t ) : Runs 1-6. t R ( t ) R(t): Runs 1 through 6
Run 123456E[R(t)]
Figure 10:
The process R ( t ) : Runs 1-6. the stochastic means B ( t ) and R ( t ), which we obtained in Part I, viz: B ( t ) = λa ( I ( t ) − I − A ( t ) , (27) R ( t ) = νa ( I ( t ) − I − A ( t )) , (28)As we can see, the process B ( t ) and R ( t ) also exhibit enormous variations across the runs. Theyare positively correlated with each other and with the I ( t ) process, as well. Run 4 (magenta), Run1 (blue) and Run 3 (yellow) are well above the mean curve, and Run 5 (green) and Run 2 (red)are below the mean in all the three processes B ( t ) , R ( t ) and I ( t ). Run 6 (cyan) is just below themean. The fact that these three processes behave in a similar fashion is quite expected, becauseboth dB ( t ) dt and dR ( t ) dt are, on average, proportional to I ( t ) at a given time.Referring to (2), A ( t ) is much smaller than I ( t ) , B ( t ) and R ( t ), except for the initial period,thus we have an approximate formula I ( t ) ≈ B ( t ) − R ( t ) , (30)These large variations we observe in the processes B ( t ), R ( t ) and I ( t ) are all consequences of thebuilt-in positive feedback loop inherent to the internal infection process B ( t ), which is a branchingprocess and gives rise to exponential growth exp( at ). In this section we discuss two additional topics for the benefits of the readers. One is how to derivethe number of new infections for each day; the second is how to estimate the number of deaths. New internal infections (excluding the new arrivals from the outside), occur at the rate of λI ( t ), and recover-ies/removals/deaths occur at the rate of µI ( t ). Their expected values are related by the differential equations dB ( t ) dt = λI ( t ) and dR ( t ) dt = µI ( t ) , (29)as shown in Part I, page 11 (27) and (31). See https://en.wikipedia.org/wiki/Branching_process and references therein.
Copyright ©©
Copyright ©© Hisashi Kobayashi 2021 12 .4.1 Number of daily new infections
The Number of Newly Infected; The first 25 days t [day] N e w i n f e c t i on s Run 2Expected value
Figure 11:
New infections on Daily-basis, Run-2, ≤ t ≤ . The Number of Newly Infected t [day] N e w i n f e c t i on s Run 2Expected value
Figure 12:
New infections on Daily-basis, Run-2, ≤ t ≤ . The Number of Newly Infected; The first 25 days t [day] N e w i n f e c t i on s Run 4Expected value
Figure 13:
New infections on Daily-basis, Run-4, ≤ t ≤ . The Number of Newly Infected t [day] N e w i n f e c t i on s Run 4Expected value
Figure 14:
New infections on Daily-basis, Run-4, ≤ t ≤ . We take “day” as the time unit, as we do in our running examples throughout this report. Wechoose to interpret the interval t ∈ (0 ,
1) as the 1st day, then the number of newly infected persons reported on the t th day, denoted as N ( t ), should be computed as follows: I new [ t ] = B ( t ) − B ( t −
1) + A ( t ) − A ( t − , (31)By substituting the B ( t ) and A ( t ) obtained in Part I, we readily find the stochastic mean or theexpected value of I new [ t ]: I new [ t ] (cid:44) E [ I new [ t ]] = λνa (1 − e − a ) e at − λνa . (32)For our running example of a = λ − µ = 0 . − . .
2, and ν = 0 .
2. Figures 11 through 16show the plots of the first 25 days and 50 days of three runs: Run-2, Run-4 and Run-6. TheseCopyright ©©
2. Figures 11 through 16show the plots of the first 25 days and 50 days of three runs: Run-2, Run-4 and Run-6. TheseCopyright ©© Hisashi Kobayashi 2021 13 he Number of Newly Infected; The first 25 days t [day] N e w i n f e c t i on s Run 6Expected value
Figure 15:
New infections on Daily-basis, Run-4, ≤ t ≤ . The Number of Newly Infected t [day] N e w i n f e c t i on s Run 6Expected value
Figure 16:
New infections on Daily-basis, Run-4, ≤ t ≤ . simulations correspond to shown in the previous figures concerning the processes I ( t ) , B ( t ) , R ( t )and use the same colors as used in the curves. Run-2 (red) is the lowest; Run-4 (magenta) is thehighest, exceeding the expected value E [ I new [ t ]] by factor of 4 or 5; Run-6 (light blue) is close to theexpectation. We observe enormous variations among the 6 runs. Since we have not yet obtained thePMFs for B ( t ), we cannot compute the percentile curves at this point. But from these simulationruns, the variations among different runs seem even more pronounced than we observed in I ( t ).Note that for the short range 1 ≤ t ≤
25, the vertical axis range is [0, 100], whereas for1 ≤ t ≤
50, the range is expanded by a factor of 150. This is consistent with the exponentialgrowth of exp( at ). From t = 25 to t = 25, it will grow by the factor of exp( a ( t − t )) =exp(0 . ×
25) = e = 148 . The most important aspect of any model of an infectious disease should be how to estimate orpredict the number of deaths. In the current COVID-19 epidemic, the case fatality rate is reportedlyless than one percent for young people but will be much higher for 60 years or older, and those withcomorbidity. In the present model, we assume a homogeneous population model, but the modelcan be generalized to multiple “classes” (or types), by assigning the model parameters ν c , λ c and µ c for different classes of infected population. As an illustration, let us assume fr= 2% as an overall fatality rate . Simulation of the death process D ( t ) can be done by randomly splitting the process R ( t ) with the specified fatality rate and produce D ( t ) as a sub-process of R ( t ). Figures 17 showsthe result of the six runs. Figure 18 shows the first 25 days of the same D ( t ). I ( t ) with the Initial Condition I ≥ By looking at the semi-log plots of simulation runs of the process I ( t ), some readers may wonderwhether the huge variations among different runs may have to do with the fact that some simulationruns remain zero for a considerable period. This question can be answered by examining the process I ( t ) with the initial condition I = 1. In referring to (9), the second term represents the PGF ofCopyright ©©
25) = e = 148 . The most important aspect of any model of an infectious disease should be how to estimate orpredict the number of deaths. In the current COVID-19 epidemic, the case fatality rate is reportedlyless than one percent for young people but will be much higher for 60 years or older, and those withcomorbidity. In the present model, we assume a homogeneous population model, but the modelcan be generalized to multiple “classes” (or types), by assigning the model parameters ν c , λ c and µ c for different classes of infected population. As an illustration, let us assume fr= 2% as an overall fatality rate . Simulation of the death process D ( t ) can be done by randomly splitting the process R ( t ) with the specified fatality rate and produce D ( t ) as a sub-process of R ( t ). Figures 17 showsthe result of the six runs. Figure 18 shows the first 25 days of the same D ( t ). I ( t ) with the Initial Condition I ≥ By looking at the semi-log plots of simulation runs of the process I ( t ), some readers may wonderwhether the huge variations among different runs may have to do with the fact that some simulationruns remain zero for a considerable period. This question can be answered by examining the process I ( t ) with the initial condition I = 1. In referring to (9), the second term represents the PGF ofCopyright ©© Hisashi Kobayashi 2021 14 t D ( t ) D(t): Runs 1 through 6
Run 123456E[D(t)]
Figure 17:
The cumulative of deaths D ( t ) : Runs1-6, the fatality rate (fr) 2%. t D ( t ) First 25 days of $D(t)$: Runs 1-6
Run 123456E[D(t)]
Figure 18:
The first 25 days of D ( t ) : Runs 1-6. the BD process without immigration ( ν =0), which is often referred to as the simple birth-and-deathprocess or as Feller-Arley (FA) process . We rewrite (9) as G BDI : I ( z, t ) = G BDI :0 ( z, t ) G FA : I ( z, t ) , (33)where G BDI :0 ( z, t ) is the PGF of the BDI process with the initial condition I = 0, which we havestudied thus far: G BDI :0 ( z, t ) = (cid:18) aλe at − µ − λ ( e at − z (cid:19) r (34)and G FA : I ( z, t ) = (cid:18) µ ( e at −
1) + ( λ − µe at ) zλe at − µ − λ ( e at − z (cid:19) I . (35)The product form expression (33) means that the BDI process with nonzero I is the sum of thetwo independent processes, whose PGFs are given by (34) and (35): I v = I BDI :0 ( t ) + I FA : I ( t ) . (36)By defining α ( t ) (see [13][14]) by α ( t ) (cid:44) µ ( e at − λe at − µ = µλ β ( t ) , (37) In his seminal paper of 1939 [11], W. Feller (1906-1970) introduced this process for the study of population growthof some species. N. Arley [12] applied this model to cascade showers in cosmic ray theory.
Copyright ©©
Copyright ©© Hisashi Kobayashi 2021 15here β ( t ) was earlier defined by (12), we can rewrite (35) as G FA : I ( z, t ) = (cid:18) α ( t ) + [1 − α ( t ) − β ( t )] z − β ( t ) z (cid:19) I . (38)By taking the natural logarithm, differentiating w.r.t. z , and setting z = 1, we find the expectedvalue of this random process: I FA : I ( t ) (cid:44) E [ I FA ( t )] = I (1 − α ( t ))(1 − β ( t )) = I e at , (39)Similarly, we find σ FA : I ( t ) (cid:44) Var[ I FA : I ( t )] = I (1 − α ( t ))( α ( t ) + β ( t ))(1 − β ( t )) = I ( λ + µ ) e at ( e at − a . (40)which leads to the following expression for the coefficient of variation (CV) of the FA process,when a > c FA : I ( t ) ≥ (cid:114) λ + µaI , for all t. (41)For the case of our running example with λ = 0 . µ = 0 .
1, the RHS of the above becomes (cid:112) /I . When I = 1, and the CV c FA :1 ( t ) ≈ .
414 for all t , which is somewhat larger than the CVof the BDI process with I = 0, which is c BDI :0 ( t ) ≈ r − / = 1 .
225 (See Part I, p. 19 Example 2). I = 1 : For the case I = 1, it is easy to expand the PGF (35) in powers of z n , obtaining P FA :1 ( t ) = α ( t ) (42) P FA :1 n ( t ) = (1 − α ( t ))(1 − β ( t )) β n − ( t ) , for n ≥ . (43)We further analyze the above result depending (i) a >
0, (ii) a <
0, or (iii) a = 0.(i) When a = λ − µ >
0: We findlim t →∞ α ( t ) = µλ = R − < , and lim t →∞ β ( t ) = 1 , (44)where R = λµ > basic reproduction number in epidemiology, as defined in (32) of Part I. Thedistribution form of (43) is very similar to (16): it is a geometric distribution of the form ∝ β n ( t ), with β ( t ) ≈
1, and has a long tail similar to Figures 7-12 in Part I. The expression given in our earlier version [1] was incorrect.
Copyright ©©
Copyright ©© Hisashi Kobayashi 2021 16ii) When a <
0: We have lim t →∞ α ( t ) = 1 , and lim t →∞ β ( t ) = λµ = R < , (46)(iii) When a = 0, (i.e., µ = λ ): We rearrange (37) as α ( t ) = µ ( e at − λ ( e at −
1) + a , and β ( t ) = λ ( e at − λ ( e at −
1) + a . (47)Divide both the denominator and the nominator by a , and let a →
0. Using the familiarformula lim a → ( e at − /a = t , we find α ( t ) = β ( t ) = λt λt , t ≥ . (48)Rearrange the PGF (35) in a similar fashion, and we find G FA : I ( z, t ) = (cid:18) λt + (1 − λt ) z λt − λtz (cid:19) I , (49)from which we obtain I FA : I ( t ) (cid:44) E [ I FA : I ( t )] = I , (50) σ FA : I ( t ) (cid:44) Var[ I FA : I ( t )] = 2 I λt, (51)which could have been directly obtained from (39) and (40). The PMFs of (42) and (43) willbe further simplified, when µ = λ : P FA :1 ( t ) = λt λt (52) P FA :1 n ( t ) = ( λt ) n − (1 + λt ) n +1 , for n ≥ . (53)Referring to (16), the PMF of the BDI process with I = 1 is given by the convolution of thetwo PMFs, for which we have found closed form expressions: P BDI :1 n ( t ) = P BDI :0 n ( t ) ⊗ P FA :1 n ( t )= (1 − β ( t )) r (cid:34) n − (cid:88) k =0 (cid:18) k + r − k (cid:19) (1 − α ( t ))(1 − β ( t )) β n − ( t ) + (cid:18) n + r − n (cid:19) β n ( t ) α ( t ) (cid:35) . (54)In Figure 19 is the PMF P n ( t ) of the I ( t ) given in (16) at t = 10 for our running example, i.e. a = 0 . r = 0 .
667 (see Part I, P. 18, Figure 9). Figure 20 is the PMF (42) and (43). Figure 22is the PDF of I ( t ) with the initial condition I = 1 given by (16). We computed this by computingthe convolution of the above two PDFs, i.e., (54). Figure 22 shows the CDFs of I ( t ) at t = 0, with I = 0 (in blue) and with I = 1 (in magenta).Copyright ©©
667 (see Part I, P. 18, Figure 9). Figure 20 is the PMF (42) and (43). Figure 22is the PDF of I ( t ) with the initial condition I = 1 given by (16). We computed this by computingthe convolution of the above two PDFs, i.e., (54). Figure 22 shows the CDFs of I ( t ) at t = 0, with I = 0 (in blue) and with I = 1 (in magenta).Copyright ©© Hisashi Kobayashi 2021 17
MF of I(t) at t=10, with I =0 n P ( n ) Figure 19:
PMF of I ( t ) at t = 10 ; I = 0 PMF of FA process at t=10, with I =1 n P ( n ) Figure 20:
PMF of FA Process at t = 10 , I = 1 For the CDF F X ( x ) of a non-negative random variable X , you can show the following simpleformula: (cid:90) ∞ (1 − F X ( x )) dx = E [ X ] , (55)Thus, the white area should be equal to I ( t ) = νa ( e at −
1) = e − . − .
39 at t = 10.Thus, the blue bar area behind the magenta bars should be equal to I e at = e = 7 .
39, an increasein E [ I ( t )] at t = 10 due to the presence of an infected person at t = 0. I ( t ) with I = 1An alternative and computationally more efficient way to compute the P BDI :1 n will be presentedbelow. We use alternative representation of the PGF (35) as follow G FA : I ( z, t ) = (cid:18) − β ( t )1 − β ( t ) z (cid:19) I (1 − p ( t ) + p ( t ) z ) I , (56)where p ( t ) = 1 − α ( t ) − β ( t )1 − α ( t ) = λ − µe at a . (57)Then we can have an alternative product representation to (33), as follows. G BDI : I ( z, t ) = (cid:18) − β ( t )1 − β ( t ) z (cid:19) I + r (1 − p ( t ) + p ( t ) z ) I . (58)We may term the second term in the above product representation as the PGF of a generalizedbinomial distribution in the sense that the coefficients of z n terms are not necessarily non-negative.For I = 1, this term is extremely simple, having only two terms: P ( t ) = 1 − p ( t ) and P ( t ) = p ( t ) , which makes the convolution extremely simple: Fig 23 and Figure 24 show the PDFs of the abovetwo PGFs. Their convolution results in the PMF as shown in Figure 21. See e.g.,[8], pp.73-74.
Copyright ©©
Copyright ©© Hisashi Kobayashi 2021 18
MF of I(t) t=10, with I =1 n P ( n ) Figure 21:
PMF of I ( t ) at t = 10 ; I = 0 Two CDFs: I(10), with I =0 blue bar); I(10) with I =1 (magenta bar n CD F CDF of I(10) with I =0CDF of I(10) with I =1 Figure 22:
CDFs of I ( t ) at t = 10 , I = 0 (inblue) I = 1 (in Magenta) The percentile curves for the PMF P n ( t ) of I ( t ) with the initial condition I (0) = 1 are plottedtogether with the mean I ( t ) (cid:44) E [ I ( t )] are shown in Figure 25 and its semi-log plots are given inFigure 26.In Figure 27 we show six simulation runs with the initial condition I (0) = 1 and their semi-logplots in Figure 28. The large variances among the sample paths still persist, thus we do not seeany fundamental differences from the case where the simulations start with the initial condition I (0) = 0. This conclusion is not unexpected, if we go back to Equation (36): I BDI : I ( t ) = I BDI :0 ( t ) + I FA : I ( t ) ., (59)in which we set I = 1. Since the two processes are statistically independent, the sum of theirvariances is the variance of the summed process: σ BDI :1 ( t ) = σ BDI :0 ( t ) + σ FA :1 ( t ) , (60)where the first term on RHS is from Part I (53) σ BDI :0 ( t ) = ν ( λe at − µ )( e at − a ≈ νλa e at , (61)and the second term is from (40) σ FA :1 ( t ) = ( λ + µ ) e at ( e at − a ≈ ( λ + µ ) a e at , (62)which leads to σ BDI :1 ( t ) ≈ (cid:18) νλa + ( λ + µ ) a (cid:19) e at = ( ν + λ ) λ − µ ( λ − µ ) e at . (63)Copyright ©©
CDFs of I ( t ) at t = 10 , I = 0 (inblue) I = 1 (in Magenta) The percentile curves for the PMF P n ( t ) of I ( t ) with the initial condition I (0) = 1 are plottedtogether with the mean I ( t ) (cid:44) E [ I ( t )] are shown in Figure 25 and its semi-log plots are given inFigure 26.In Figure 27 we show six simulation runs with the initial condition I (0) = 1 and their semi-logplots in Figure 28. The large variances among the sample paths still persist, thus we do not seeany fundamental differences from the case where the simulations start with the initial condition I (0) = 0. This conclusion is not unexpected, if we go back to Equation (36): I BDI : I ( t ) = I BDI :0 ( t ) + I FA : I ( t ) ., (59)in which we set I = 1. Since the two processes are statistically independent, the sum of theirvariances is the variance of the summed process: σ BDI :1 ( t ) = σ BDI :0 ( t ) + σ FA :1 ( t ) , (60)where the first term on RHS is from Part I (53) σ BDI :0 ( t ) = ν ( λe at − µ )( e at − a ≈ νλa e at , (61)and the second term is from (40) σ FA :1 ( t ) = ( λ + µ ) e at ( e at − a ≈ ( λ + µ ) a e at , (62)which leads to σ BDI :1 ( t ) ≈ (cid:18) νλa + ( λ + µ ) a (cid:19) e at = ( ν + λ ) λ − µ ( λ − µ ) e at . (63)Copyright ©© Hisashi Kobayashi 2021 19igure 23:
NBD with k = 1 + r PMF of Geberalized Binomia1, I =1 n -2-101234 P ( n ) Figure 24:
Generalized binomial distribution
The expected value of the process I ( t ) with I (0) = 1 is from (39) E [ I BDI :1 ( t )] = (1 + νa ) e at − νa ≈ ν + λ − µλ − µ e at , (64)Thus, the CV of the the process I ( t ) remains constant for all t whether not the initial condition iszero or nonzero c BDI :1 ( t ) = (cid:112) ( ν + λ ) λ − µ λ − µ = 1 . , (65)which is even larger than the CV of I ( t ) with I (0) = 0, which is 1.225. In the present report, we have presented simulation results by implementing a simulator of ourstochastic model of an epidemic disease analyzed in Part I [2]. The simulated infection process I ( t ) indeed exhibits huge variations among different simulation runs. In this report, however, wepresented only six consecutive runs in the interest of space. By presenting results of many moreruns we could certainly show an even larger disparity between the worst and the best scenarios ofa given BDI process I ( t ). By showing only several runs, however, we can sufficiently demonstratethe great value of a stochastic model in that a deterministic model is more often than not far fromany sample path. In the same token, a single sample path of I ( t ), which we can observe in a realsituation, can contain limited information about the ensemble of the process I ( t ).In the current pandemic of COVID-19, most experts and policy makers in Japan seem lookingfor all possible factors that might help them understand why they see such enormous disparitiesbetween Japan and many countries in the world in terms of the magnitude of infected populationsand death tolls. Our analysis and simulation shows that the epidemic process which is a branchingprocess and is driven by inherently positive feedback can have enormous variations in the initialbuild-up phase, by mere luck or probabilistic chances, sometimes by factor of 100 or more, amongCopyright ©©
The expected value of the process I ( t ) with I (0) = 1 is from (39) E [ I BDI :1 ( t )] = (1 + νa ) e at − νa ≈ ν + λ − µλ − µ e at , (64)Thus, the CV of the the process I ( t ) remains constant for all t whether not the initial condition iszero or nonzero c BDI :1 ( t ) = (cid:112) ( ν + λ ) λ − µ λ − µ = 1 . , (65)which is even larger than the CV of I ( t ) with I (0) = 0, which is 1.225. In the present report, we have presented simulation results by implementing a simulator of ourstochastic model of an epidemic disease analyzed in Part I [2]. The simulated infection process I ( t ) indeed exhibits huge variations among different simulation runs. In this report, however, wepresented only six consecutive runs in the interest of space. By presenting results of many moreruns we could certainly show an even larger disparity between the worst and the best scenarios ofa given BDI process I ( t ). By showing only several runs, however, we can sufficiently demonstratethe great value of a stochastic model in that a deterministic model is more often than not far fromany sample path. In the same token, a single sample path of I ( t ), which we can observe in a realsituation, can contain limited information about the ensemble of the process I ( t ).In the current pandemic of COVID-19, most experts and policy makers in Japan seem lookingfor all possible factors that might help them understand why they see such enormous disparitiesbetween Japan and many countries in the world in terms of the magnitude of infected populationsand death tolls. Our analysis and simulation shows that the epidemic process which is a branchingprocess and is driven by inherently positive feedback can have enormous variations in the initialbuild-up phase, by mere luck or probabilistic chances, sometimes by factor of 100 or more, amongCopyright ©© Hisashi Kobayashi 2021 20 t I ( t ) Percentile curves of I(t) with I(0)=1
Figure 25:
Percentile curves of P n ( t ) of I ( t ) with I (0) = 1 ; λ = 0 . , µ = 0 . , ν = 0 . . t I ( t ) Smilog-plot of percentile curves of I(t) with I(0)=1
Figure 26:
Semi-log plot of the percentile curvesfor I ( t ) with I (0) = 1 t I ( t ) I(t) with I(0)=1: Runs 1 through 6
Run 12345697.5%84%E[I(t)]50%16%2.5%
Figure 27:
The I ( t ) process with I (0) = 1 , Runs1-6; λ = 0 . , µ = 0 . , ν = 0 . . t l og ( I ( t )) log I(t): Runs 1 through 6 Run 123456
Figure 28:
Semi-log plots of I ( t ) with I (0) = 1 ,Runs 1-6; λ = 0 . , µ = 0 . , ν = 0 . . environments with identical conditions, i.e., environments with the same λ, µ and ν parameters inour BDI process based model.In both analysis and simulations we have so far assumed a time-homogeneous model, whetherthe model parameters do not change in time. In Part III-A[3], we will report on our analysis oftime-nonhomogeneous models, whereby we allow the model parameters λ ( t ) and µ ( t ) to be arbitraryfunctions of time. This generalization will help us better understand, for instance, how a change in social distancing , availability of effective vaccines and/or an improvement/degradation in medicaltreatment will affect the infection process. In Part III-B [4] we will report simulation results to helpthe reader better understand the significance of the analysis of the time-nonhomogeneous modelsand augment the analysis, which is limited in finding a closed form solution for a full-fledged BDIprocess.In Part IV [5], we plan to develop a statistical theory to estimate the model parameters from realdata of COVID-19 epidemics and demonstrate how our stochastic model can be used in predictingCopyright ©©
Semi-log plots of I ( t ) with I (0) = 1 ,Runs 1-6; λ = 0 . , µ = 0 . , ν = 0 . . environments with identical conditions, i.e., environments with the same λ, µ and ν parameters inour BDI process based model.In both analysis and simulations we have so far assumed a time-homogeneous model, whetherthe model parameters do not change in time. In Part III-A[3], we will report on our analysis oftime-nonhomogeneous models, whereby we allow the model parameters λ ( t ) and µ ( t ) to be arbitraryfunctions of time. This generalization will help us better understand, for instance, how a change in social distancing , availability of effective vaccines and/or an improvement/degradation in medicaltreatment will affect the infection process. In Part III-B [4] we will report simulation results to helpthe reader better understand the significance of the analysis of the time-nonhomogeneous modelsand augment the analysis, which is limited in finding a closed form solution for a full-fledged BDIprocess.In Part IV [5], we plan to develop a statistical theory to estimate the model parameters from realdata of COVID-19 epidemics and demonstrate how our stochastic model can be used in predictingCopyright ©© Hisashi Kobayashi 2021 21 t B ( t ) B(t): Runs 1 through 6
Run 123456E[B(t)]
Figure 29:
The process B ( t ) with I (0) = 1 : Runs1-6. t R ( t ) R(t): Runs 1 through 6
Run 123456E[R(t)]
Figure 30:
The process R ( t ) : Runs 1-6. the future behavior of an infectious disease, given its past and present value.Part V [10] will be devoted to use of saddle-point integration technique to approximately charac-terize the internal infection process B ( t ) and the recovery process R ( t ), which seem to defy an exactsolution unlike the process I ( t ) for which we have an exact time-dependent probability distributionfunction.Our BDI process based model has a fundamental advantage over nonlinear models such asSIR model and other models dominantly used in the epidemiology community in that our modelshould be readily extensible to more complex and realistic situations where different variants ofthe infectious diseases may coexist, and different classes of susceptible populations (e.g., agedpopulation, those with comorbidity, etc.). These situations can be modeled by introducing a set ofmodel parameters λ r , µ r and ν r , where r = ( v, c ) represent the variant v and the class c . , Acknowledgments
I thank Prof. Brian L. Mark of George Mason University for his valuable suggestions and help inthe MATLAB simulation. Were it not for his kind help, I could not have completed this ratherlaborious study.
References [1] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part II: Simulation Experimentsand Verification of the Analysis .” http://hp.hisashikobayashi.com , August 5 2020.[2] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part I: Understand theNegative Binomial Distriution and Predict an Epidemic More Reliably.” https://hp.hisashikobayashi.com and https://arxiv.org/pdf/2006.01586.pdf , May 28 2020.Copyright ©©
References [1] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part II: Simulation Experimentsand Verification of the Analysis .” http://hp.hisashikobayashi.com , August 5 2020.[2] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part I: Understand theNegative Binomial Distriution and Predict an Epidemic More Reliably.” https://hp.hisashikobayashi.com and https://arxiv.org/pdf/2006.01586.pdf , May 28 2020.Copyright ©© Hisashi Kobayashi 2021 223] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part III-A: Analysis of Time-Nonhomogeneous Models.” http://hp.hisashikobayashi.com , and https://arxiv.org/abs/2101.09109 , January 19 2021.[4] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part III-B: Simulation of Time-Nonhomogenous Models.” http://hp.hisashikobayashi.com , (in preparation) 2021.[5] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part IV: Estimation of ModelParameters from Real Data, and Validation of Our Model.” http://hp.hisashikobayashi.com , (in preparation) 2021.[6] H. Kobayashi,
Modeling and Analysis: An Introduction to System Performance EvaluationMethodology . Addison Wesley, 1978.[7] H. Kobayashi and B. L. Mark,
System Modeling and Analysis: Foundations for System Per-formance Evaluation . Prentice Hall, 2008.[8] H. Kobayashi, B. L. Mark and W. Turin,
Probability, Random Processes, and Statistical Anal-ysis . Cambridge University Press, 2012.[9] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Keynote speech at ITC-32.” https://hp.hisashikobayashi.com/a-stochastic-model-of-an-infectious-disease/ ,September 23 2020.[10] H. Kobayashi, “Stochastic Modeling of an Infectious Disease: Part V: Approximate Analysisof the Internal Infection Process B ( t ) and the Recovery Process R ( t ), based on Saddle-pointIntegration.” http://hp.hisashikobayashi.com , (in preparation) 2021.[11] W. Feller, “Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrschein-lichkeitstheoritischer Behandlung,” Acta Biotheoretica , vol. 5, pp. 11–40, 1939.[12] N. Arley,
On the theory of stochastic processes and their application to the theory of cosmicradiation . Wiley, 1943.[13] N. T. Bailey,
The Elements of Stochastic Processes With Applications to the Natural Sciences .John Wiley & Sons, Inc., 1964.[14] H. Takagi, “Lecture Note: Birth-and-Death Process and Its Application (in Japanese),” March2007. Copyright ©©