Optimal Sampling Regimes for Estimating Population Dynamics
Rebecca E. Atanga, Edward L. Boone, Ryad A. Ghanam, Ben Stewart-Koster
OOptimal Sampling Regimes for Estimating Population Dynamics
Rebecca E. Bergee , Edward L. Boone , Ryad A. Ghanam , Ben Stewart-Koster Department of Statistical Sciences and Operations Research,Virginia Commonwealth University, Richmond, VA 23284, USA Virginia Commonwealth University - Qatar,Education City, Doha, Qatar. Australian Rivers Institute,Griffith University, Southport, QLD, Australia.
September 25, 2020
Abstract
Ecologists are interested in modeling the population growth of species in various ecosystems.Studying population dynamics can assist environmental managers in making better decisions forthe environment. Traditionally, the sampling of species and tracking of populations have beenrecorded on a regular time frequency. However, sampling can be an expensive process due toavailable resources, money and time. Limiting sampling makes it challenging to properly trackthe growth of a population. Thus, we propose a new and novel approach to designing samplingregimes based on the dynamics associated with population growth models. This design studyminimizes the amount of time ecologists spend in the field, while maximizing the informationprovided by the data.
Keywords:
Environmental Flow, Bayesian Hierarchical Models, Population Dynamics, Lotka-Volterra,Optimality Criteria;
Population growth models are commonly used in ecology when studying plants, animals, and or-ganisms as they grow over time and interact with the environment. In earlier research, Hsu et al.(1984) evaluate seedling growth under laboratory and field conditions. According to Huang et al.(2016), early life-cycle events play critical roles in determining the dynamics of plant populationsand their interactions with the surrounding environment. Germination patterns modeled by growthhelp ecologists understand the diversity, variation, and climate change within a system. Gamito(1998) further emphasize the importance of modeling growth and community dynamics using ap-plications to fish populations. Population dynamics of fish are commonly studied to learn aboutwater quality and the environment.In riverine settings, the health of a waterway can be determined by the abundance of fish present.Kennard et al. (2005) study species that are highly tolerant of human-induced disturbances andconsider certain fish as good indicators of river health. Ecologists model the population growth of1 a r X i v : . [ s t a t . M E ] S e p ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
Population growth models are commonly used in ecology when studying plants, animals, and or-ganisms as they grow over time and interact with the environment. In earlier research, Hsu et al.(1984) evaluate seedling growth under laboratory and field conditions. According to Huang et al.(2016), early life-cycle events play critical roles in determining the dynamics of plant populationsand their interactions with the surrounding environment. Germination patterns modeled by growthhelp ecologists understand the diversity, variation, and climate change within a system. Gamito(1998) further emphasize the importance of modeling growth and community dynamics using ap-plications to fish populations. Population dynamics of fish are commonly studied to learn aboutwater quality and the environment.In riverine settings, the health of a waterway can be determined by the abundance of fish present.Kennard et al. (2005) study species that are highly tolerant of human-induced disturbances andconsider certain fish as good indicators of river health. Ecologists model the population growth of1 a r X i v : . [ s t a t . M E ] S e p ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster The well-known logistic equation was first developed by Verhulst (1838). The rediscovery of theequation by Reed and Pearl (1927) is often used in ecology to represent the population dynamicwhere the rate of reproduction is proportional to both the existing population and the amountof available resources. This model is popular in ecology due to the realistic self-limiting growthof a population and the existence of its analytic solution. The logistic equation is written in themathematical form dNdt = rN (cid:18) − NK (cid:19) (2.1)with solution N ( t ) = KN ( K − N ) e − rt + N . The variables N and t represent population size and time respectively. Parameter r is the populationgrowth rate, and parameter K is the maximum level that the population can reach also knownas the carrying capacity. N represents the population at time t = 0 referred to as the initialpopulation. Given that ecologists have expertise in modeling logistic growth, the Bayesian approachcan be used to incorporate prior knowledge of the system. Thus, the ecological model is specifiedprobabilistically in the Bayesian framework.Given our well-defined model, we implement the Bayesian approach using Bayes’ Theorem(Bayes, 1763). The theorem combines prior beliefs with the likelihood of an experiment to de-termine a posterior belief. The prior belief of a parameter θ is noted by π ( θ ). The likelihood ofthe experiment is a conditional probability of the data x given the parameter θ noted by L ( x | θ ).The posterior belief is given by the continuous case of Bayes’ Theorem, also known as the posteriordistribution. The posterior distribution defines the conditional probability of a parameter θ giventhe observance of data x , which is written mathematically as P ( θ | x ) = L ( x | θ ) π ( θ ) (cid:82) Θ L ( x | θ ) π ( θ ) dθ . (2.2)Specifying equation (2.1) as a Bayesian model requires the specification of the likelihood of theexperiment and the prior distributions for each parameter. The nature of the logistic equationtracks the population of a species N across a fixed time t where the growth rate parameter r and carrying capacity K are independent fixed values. The population at a specified time N i | t i is assumed to have a Poisson likelihood given that the observations N i are independent countsoccurring at known times t i . The analytic solution to the logistic equation (2.1) provides theexpected value of the likelihood set as λ i = N ( t i ). This provides the conditional likelihood of theexperiment N i | t i ∼ P oisson ( λ i | t i ) , (2.3)where λ i depends on the parameters r and K . Thus, the prior distributions must be specified forthe growth rate and carrying capacity parameters.When modeling logistic growth, ecologists are aware that a population can only reach a maximumcapacity by increasing at a positive rate. This implies that the growth rate r and carrying capacity K must be positive values. Though many distributions have positive supports, the log-normaldistribution easily specifies numerical information by using the mean and variance. Less informativepriors such as the gamma distribution can cause model instability, which defeats the purpose ofincorporating prior beliefs. Thus, informative priors are specified as r ∼ lnnorm (1 , K ∼ lnnorm (2000 ,
110 ) , (2.4) ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
110 ) , (2.4) ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster Optimal designs are used to estimate statistical models while reducing experimental costs. Theobjective of optimal design is to eliminate uncertainty by minimizing the variability of the parameterestimates. Traditionally, optimal designs (Kiefer, 1959) minimize the variance of the parameterestimates while maximizing the information matrix. Specifically, the Fisher information matrixis defined as the negative expectation of the second derivative of the log-likelihood function withrespect to the parameter θ F ( θ ) = − E [ ∂ ∂θ logL ( x | θ )] , (2.5)where the expectation is taken over the sample space of the observations x and parameter spaceof θ . Optimality criteria are applied to the Fisher information matrix F ( θ ) and provide measuresof fit to assist with model selection. The popular A and D optimal designs are commonly usedin experimentation due to their ability to limit computational expenses. A optimality criterionminimizes the trace of the inverse of the Fisher information matrix, which equivalently minimizesthe average variance of the parameter estimates of a model. Whereas, D optimal designs consist ofminimizing the determinant of the inverse of the Fisher information matrix, again minimizing theparameter estimates of the model. A - Optimal: min tr ( F − ( θ )) D - Optimal: min | F − ( θ ) | (2.6)Though these designs are commonly used in practice, F ( θ ) can be difficult to calculate whenthe model parameters are unknown. Instead of estimating the Fisher information matrix usingparameter estimates noted by ˆ θ , we consider the variance of the parameter estimates. The inverseof the estimated Fisher information matrix F (ˆ θ ) is an estimator of the asymptotic covariance matrix(Abt and Welch, 1998). Φ = V ar (ˆ θ ) = [ F (ˆ θ )] − (2.7)The covariance matrix is much easier to calculate than the Fisher information matrix for certainmodels. Thus, defining F (ˆ θ ) in terms of Φ redefines the optimality criteria in terms of the estimatedcovariance matrix. I-optimal designs (Atkinson et al., 2007) known for minimizing the averageprediction variance over the entire design region are also considered. A Φ - Optimal: min tr (Φ) D Φ - Optimal: min | Φ | I - Optimal : min ¯Φ pred (2.8)¯Φ pred notes the average prediction variance of the model. Rather than minimizing the trace of theinverse of the Fisher information matrix, A Φ -optimal designs minimize the trace of the covariancematrix. D Φ -optimal designs minimize the determinant of the covariance matrix. While, I-optimaldesigns minimize the average prediction variance over the design space. Again, optimality criteriaare measures of fit used to guide optimization processes. A Φ , D Φ and I optimality criteria are usedin the proposed sequential optimality algorithm presented in the next section. In practice, design techniques are used prior to collecting data in order to optimize the amountof information obtained during experimentation. Thus, we propose a new and novel approach todesigning experiments that predicts the next best design point based on current information of thesystem. Our algorithm is an adaptation of the well-known simulated annealing algorithm appliedby Laarhoven and Aarts (1987). Simulated annealing searches a discrete design space for the globaloptimum of the system. Our proposed method sequentially explores subsets of the design space insearch of the global optimum within a set window of time.
Algorithm 1:
Sequential Optimality
Begin
Choose an initial design t , ..., t n Set a design budget, b Set a design window, w = Tb Set criteria, C :(i.) A Φ = min tr (Φ)(ii.) D Φ = min | Φ | (iii.) I = min ¯Φ pred For D = t , ..., t n :(a) Draw a sample t ∗ = { t n +1 , ..., t n + w } (b) Accept the new state t new = min arg( t ∗ c ) Repeat until D = t , ..., t b End
The algorithm begins by setting an initial design of size n typically set as the number of param-eters in the system plus one. Then, a design point budget, b , is chosen according to the number ofruns the experimenter can afford. Once the initial data is collected and a design budget is set, thedesign window, w , needs to be determined. The design window can be established by dividing theplanned time interval, T , by the design point budget. For example, this study focuses on samplingacross a one hundred day season with a design budget of size ten. Thus, a practical design windowwould be to search ten days into the future. The window of points following the current design D are explored as candidate samples. Each candidate is evaluated by running the Metropolis-Hastings(MH) algorithm (Metropolis et al., 1953) guided by specified optimality criterion, C . The optimalpoint in the window t new is added to the design, and the process repeats until the design pointbudget, b , is exhausted.The sequential optimality algorithm is intended to search the design space of temporal models.The specified Bayesian model in Section 2.1 tracks the population of a species across one hundredtime points with a carrying capacity of two thousand. The simulation study involves differentgrowth rate parameters and simulates data using MCMC sampling to predict the probability ofthe model parameters. Then, the proposed method is implemented using all design criteria from ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
The algorithm begins by setting an initial design of size n typically set as the number of param-eters in the system plus one. Then, a design point budget, b , is chosen according to the number ofruns the experimenter can afford. Once the initial data is collected and a design budget is set, thedesign window, w , needs to be determined. The design window can be established by dividing theplanned time interval, T , by the design point budget. For example, this study focuses on samplingacross a one hundred day season with a design budget of size ten. Thus, a practical design windowwould be to search ten days into the future. The window of points following the current design D are explored as candidate samples. Each candidate is evaluated by running the Metropolis-Hastings(MH) algorithm (Metropolis et al., 1953) guided by specified optimality criterion, C . The optimalpoint in the window t new is added to the design, and the process repeats until the design pointbudget, b , is exhausted.The sequential optimality algorithm is intended to search the design space of temporal models.The specified Bayesian model in Section 2.1 tracks the population of a species across one hundredtime points with a carrying capacity of two thousand. The simulation study involves differentgrowth rate parameters and simulates data using MCMC sampling to predict the probability ofthe model parameters. Then, the proposed method is implemented using all design criteria from ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster This simulation study demonstrates methods that can be used to design sampling schedules forecologists based on varying population dynamics. Real data cannot be used due to the fact thatthis experimental study has not yet been implemented in practice. Thus, data is simulated fornormal, fast and slow growth models across one hundred time points. The normal growth rate isset to 10%, the fast rate is set to 100%, and the slow rate is set to 5% growth. All models havea maximum carrying capacity of 2000, and ten thousand MCMC samples are used to predict theprobability of the model parameters. Based on the planned time interval of one hundred days, alloptimal designs will consist of ten points that search windows of size ten. This allows the entirebudget to be used should the farthest point in each window be selected. Therefore, ten designpoints are also selected during simulated annealing for comparison purposes. Simulated annealingis implemented first to demonstrate the global optimum for each scenario as a reference whencomparing sequential optimality. Then, sequential optimality is demonstrated as a novel approachto designing optimal sampling regimes.All ground truth models are generated using the logistic equation and analytic solution providedby equation (2.1). Each simulation plots the ground truth model as a black curve tracking thepopulation size across time. 10,000 MCMC samples are used for each simulation, and the optimaldesign points selected are plotted as blue open circles. The designs are fit by a red curve with theprediction interval plotted as red dotted lines calculated by the 2.5% and 97.5% quantiles of thepredicted values. It will become clear by the results that there are many ways to successfully designexperiments for ecological models.
Simulated annealing is implemented as an adaptation of the Metropolis-Hastings algorithm (Metropo-lis et al., 1953) to explore the design space for an optimal solution. Considering normal, fast andslow growth rates, the specified optimality criteria from equation (2.8) are used to guide the algo-rithm and find global optimums. As stated previously, the models simulate growth at 10%, 100%,and 5% rates. Each criterion guides the simulated annealing process to produce optimal designsfor each model. lll ll l l lll
Time P opu l a t i on (a) llllllllll Time P opu l a t i on (b) l l l ll l l l l l Time P opu l a t i on (c) Figure 1: 10,000 MCMC samples are used to simulate A Φ - optimal designs for (a) normal, (b)fast and (c) slow growth models. All models are simulated across 100 time points with a carryingcapacity of 2000. The ground truth models are plotted in black. The optimal design points areplotted in blue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predictedvalues are plotted as red dotted lines. ll l ll l l l ll Time P opu l a t i on (a) llllllllll Time P opu l a t i on (b) l l l l l l l l l l Time P opu l a t i on (c) Figure 2: 10,000 MCMC samples are used to simulate D Φ - optimal designs for (a) normal, (b)fast and (c) slow growth models. All models are simulated across 100 time points with a carryingcapacity of 2000. The ground truth models are plotted in black. The optimal design points areplotted in blue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predictedvalues are plotted as red dotted lines. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
Time P opu l a t i on (a) llllllllll Time P opu l a t i on (b) l l l ll l l l l l Time P opu l a t i on (c) Figure 1: 10,000 MCMC samples are used to simulate A Φ - optimal designs for (a) normal, (b)fast and (c) slow growth models. All models are simulated across 100 time points with a carryingcapacity of 2000. The ground truth models are plotted in black. The optimal design points areplotted in blue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predictedvalues are plotted as red dotted lines. ll l ll l l l ll Time P opu l a t i on (a) llllllllll Time P opu l a t i on (b) l l l l l l l l l l Time P opu l a t i on (c) Figure 2: 10,000 MCMC samples are used to simulate D Φ - optimal designs for (a) normal, (b)fast and (c) slow growth models. All models are simulated across 100 time points with a carryingcapacity of 2000. The ground truth models are plotted in black. The optimal design points areplotted in blue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predictedvalues are plotted as red dotted lines. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster l l l l l ll l l l Time P opu l a t i on (a) ll l l l l l l ll Time P opu l a t i on (b) l l l l l l ll l l Time P opu l a t i on (c) Figure 3: 10,000 MCMC samples are used to simulate I-optimal designs for (a) normal, (b) fast and(c) slow growth models. All models are simulated across 100 time points with a carrying capacityof 2000. The ground truth models are plotted in black. The optimal design points are plotted inblue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predicted values areplotted as red dotted lines.Simulated annealing successfully captures the dynamics of all growth models using each criterion.Figure 1 demonstrates the designs generated using A Φ optimality criterion. Figure 2 fits the D Φ -optimal designs. Figure 3 illustrates the fit of I - optimal designs. In the next section, simulationstudies are conducted using sequential optimality to demonstrate a new method for predictionbased optimization. Sequential optimality is an adaptation of simulated annealing that predictsoptimal design points. Sequential optimality searches subsets of the design space to find the next best point. Normal, fastand slow growth at 10%, 100% and 5% rates respectively are simulated. All simulations use 10,000MCMC samples plotted across 100 time points. The carrying capacity is 2000 specified as priorknowledge in equation (2.4). The design point budget consists of ten points, and the algorithmuses the optimality criteria from equation (2.8). As each criterion guides the sequential optimalityprocess, the final optimal designs are compared for each model. Again, the ground truth model isplotted as a black curve. The optimal design points are plotted as blue points fit by a solid redcurve with the 2.5% and 97.5% quantiles of the predicted values plotted as red dotted lines. As thesystem updates, our goal is to fit the ground truth model as accurately as possible with the finaloptimal design. I, A Φ , and D Φ optimal designs are compared across normal, fast, and slow growthrates.The simulations begin with the I-optimality criterion to demonstrate prediction based designs.Figure 4 demonstrates sequential optimality using the prediction based I-optimality criterion. Aninitial design is set of three points given the logistic equation has two parameters, carrying capacityand growth rate. One hundred day seasons are modeled with a design point budget of ten. The tencandidate points following the base design are evaluated using the I-optimality criterion to selectthe point with the minimum prediction variance. Once the optimal point is added to the design,the process repeats until all ten design points are sampled. Based on the simulation, the predictionbased I optimality criterion is successful in capturing the dynamics of a normal growth model.However, different combinations of criteria and growth models are simulated to test the robustnessof the technique. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
Time P opu l a t i on (a) llllllllll Time P opu l a t i on (b) l l l ll l l l l l Time P opu l a t i on (c) Figure 1: 10,000 MCMC samples are used to simulate A Φ - optimal designs for (a) normal, (b)fast and (c) slow growth models. All models are simulated across 100 time points with a carryingcapacity of 2000. The ground truth models are plotted in black. The optimal design points areplotted in blue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predictedvalues are plotted as red dotted lines. ll l ll l l l ll Time P opu l a t i on (a) llllllllll Time P opu l a t i on (b) l l l l l l l l l l Time P opu l a t i on (c) Figure 2: 10,000 MCMC samples are used to simulate D Φ - optimal designs for (a) normal, (b)fast and (c) slow growth models. All models are simulated across 100 time points with a carryingcapacity of 2000. The ground truth models are plotted in black. The optimal design points areplotted in blue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predictedvalues are plotted as red dotted lines. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster l l l l l ll l l l Time P opu l a t i on (a) ll l l l l l l ll Time P opu l a t i on (b) l l l l l l ll l l Time P opu l a t i on (c) Figure 3: 10,000 MCMC samples are used to simulate I-optimal designs for (a) normal, (b) fast and(c) slow growth models. All models are simulated across 100 time points with a carrying capacityof 2000. The ground truth models are plotted in black. The optimal design points are plotted inblue and fit by a solid red curve, where the 2.5% and 97.5% quantiles of the predicted values areplotted as red dotted lines.Simulated annealing successfully captures the dynamics of all growth models using each criterion.Figure 1 demonstrates the designs generated using A Φ optimality criterion. Figure 2 fits the D Φ -optimal designs. Figure 3 illustrates the fit of I - optimal designs. In the next section, simulationstudies are conducted using sequential optimality to demonstrate a new method for predictionbased optimization. Sequential optimality is an adaptation of simulated annealing that predictsoptimal design points. Sequential optimality searches subsets of the design space to find the next best point. Normal, fastand slow growth at 10%, 100% and 5% rates respectively are simulated. All simulations use 10,000MCMC samples plotted across 100 time points. The carrying capacity is 2000 specified as priorknowledge in equation (2.4). The design point budget consists of ten points, and the algorithmuses the optimality criteria from equation (2.8). As each criterion guides the sequential optimalityprocess, the final optimal designs are compared for each model. Again, the ground truth model isplotted as a black curve. The optimal design points are plotted as blue points fit by a solid redcurve with the 2.5% and 97.5% quantiles of the predicted values plotted as red dotted lines. As thesystem updates, our goal is to fit the ground truth model as accurately as possible with the finaloptimal design. I, A Φ , and D Φ optimal designs are compared across normal, fast, and slow growthrates.The simulations begin with the I-optimality criterion to demonstrate prediction based designs.Figure 4 demonstrates sequential optimality using the prediction based I-optimality criterion. Aninitial design is set of three points given the logistic equation has two parameters, carrying capacityand growth rate. One hundred day seasons are modeled with a design point budget of ten. The tencandidate points following the base design are evaluated using the I-optimality criterion to selectthe point with the minimum prediction variance. Once the optimal point is added to the design,the process repeats until all ten design points are sampled. Based on the simulation, the predictionbased I optimality criterion is successful in capturing the dynamics of a normal growth model.However, different combinations of criteria and growth models are simulated to test the robustnessof the technique. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster Time P opu l a t i on l l l (a) l l l Time P opu l a t i on (b) l l l l Time P opu l a t i on (c) l l l l l Time P opu l a t i on (d) l l l l l l Time P opu l a t i on (e) l l l l l l l Time P opu l a t i on (f) l l l l l l l l Time P opu l a t i on (g) l l l l l l l ll Time P opu l a t i on (h) l l l l l l l ll l Time P opu l a t i on (i) Figure 4: 10,000 MCMC samples are used to simulate the normal 10% growth curve across a100 day period with a carrying capacity of 2000. The ground truth model is plotted as a blackcurve. I-optimality criterion guides the sequential optimality process. The optimal design pointsare plotted in blue and are fit by a solid red curve. The red dotted lines represent the 2.5% and97.5% quantiles of the predicted values. Panel (a) plots the base design of three points, panel (b)plots the fit of the base design, and panels (c)-(i) plot the fourth through tenth optimal points inthe design as they are fit.1 l l l l l l l ll l
Time P opu l a t i on (a) l l l l l l l l l l Time P opu l a t i on (b) l l ll l ll l l l Time P opu l a t i on (c) Figure 5: 10,000 MCMC samples are used in this simulation of a normal growth curve increasing ata 10% rate across 100 time points with a carrying capacity of 2000. Sequential optimality is usedto find the ten best points for (a) I, (b) A Φ and (c) D Φ optimal designs. The optimal points areplotted in blue and fit with a solid red curve. The red dotted lines represent the 2.5% and 97.5%quantiles of the predicted values of the model. The black curve represents the ground truth model. l l ll l l l ll l Time P opu l a t i on (a) l l ll l l lll l Time P opu l a t i on (b) l l l l l l l l l l Time P opu l a t i on (c) Figure 6: 10,000 MCMC samples are used in this simulation of a rapid growth curve increasing ata 100% rate across 100 time points with a carrying capacity of 2000. Sequential optimality is usedto find the ten best points for (a) I, (b) A Φ and (c) D Φ optimal designs. The optimal points areplotted in blue and fit with a solid red curve. The red dotted lines represent the 2.5% and 97.5%quantiles of the predicted values of the model. The black curve represents the ground truth model. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
Time P opu l a t i on (a) l l l l l l l l l l Time P opu l a t i on (b) l l ll l ll l l l Time P opu l a t i on (c) Figure 5: 10,000 MCMC samples are used in this simulation of a normal growth curve increasing ata 10% rate across 100 time points with a carrying capacity of 2000. Sequential optimality is usedto find the ten best points for (a) I, (b) A Φ and (c) D Φ optimal designs. The optimal points areplotted in blue and fit with a solid red curve. The red dotted lines represent the 2.5% and 97.5%quantiles of the predicted values of the model. The black curve represents the ground truth model. l l ll l l l ll l Time P opu l a t i on (a) l l ll l l lll l Time P opu l a t i on (b) l l l l l l l l l l Time P opu l a t i on (c) Figure 6: 10,000 MCMC samples are used in this simulation of a rapid growth curve increasing ata 100% rate across 100 time points with a carrying capacity of 2000. Sequential optimality is usedto find the ten best points for (a) I, (b) A Φ and (c) D Φ optimal designs. The optimal points areplotted in blue and fit with a solid red curve. The red dotted lines represent the 2.5% and 97.5%quantiles of the predicted values of the model. The black curve represents the ground truth model. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster l l l l l l l l l l Time P opu l a t i on (a) l l l l l l l l ll Time P opu l a t i on (b) l l ll l lll l l Time P opu l a t i on (c) Figure 7: 10,000 MCMC samples are used in this simulation of a slow growth curve increasing ata 5% rate across 100 time points with a carrying capacity of 2000. Sequential optimality is usedto find the ten best points for (a) I, (b) A Φ and (c) D Φ optimal designs. The optimal points areplotted in blue and fit with a solid red curve. The red dotted lines represent the 2.5% and 97.5%quantiles of the predicted values of the model. The black curve represents the ground truth model.The simulations of normal growth show varying in Figure 5 results across the optimality criteria.The I and A Φ optimal designs successfully capture the dynamics of the population with ten points.Whereas, the D Φ optimal design did not capture the dynamic of the carrying capacity. Thisindicates the need for a larger design point budget or smaller design window when using D Φ optimality criterion with a normal growth curve. On the other hand, all three optimality criteriacapture the dynamics of the fast growth rate in Figure 6, which is expected given that the modelplateaus rapidly. This implies that the design point budget could be decreased in this case. As forthe slow growth rate model in Figure 7, no criteria could capture the population dynamics. Thiscould indicate a need to increase the design point budget or decrease the design window for slowgrowth models. The size of the design point budget was set to ten points with a window of tenpoints to simply demonstrate this novel approach of sequential optimality.Bayesian optimal designs (Chaloner and Verdinelli, 1995) are also examined for comparisonpurposes. Bayesian optimal designs can be written in the form of a utility function U ( d ∗ x f ), where d ∗ x f represents a design chosen from the design region d x f . The design region is explored usingBayes I, D, and A optimal designs written mathematically as follows.Bayesian I-Optimal¯ U I ( d ∗ x f ) = min (cid:90) Θ min d xf ( Y − E [ Y | y , d x f ]) p ( Y | y , d x f ) dY Bayesian D-Optimal¯ U D ( d ∗ x f ) = min (cid:90) Θ min d xf | Cov ( k, K | Y , y , d x f ) | p ( Y | y , d x f ) dY Bayesian A-Optimal¯ U A ( d ∗ x f ) = min (cid:90) Θ min d xf tr ( k, K | Y , y , d x f ) p ( Y | y , d x f ) dY Y represents the predicted design points, while y represents the current design. Θ is the parameterspace including parameters k and K from the model in equation (2.1). Similar to the criteria fromequation (2.8), the Bayesian optimality criteria are minimizing the posterior predictive distributionacross the parameter space Θ and design space d x f . The Bayesian I-Optimal designs minimizes thedistance between the squared predictions. Bayesian D-optimal designs minimize the determinant ofthe covariance of the posterior predictive distribution. The Bayesian A-optimal designs minimizethe trace of the posterior predictive distribution. Implementing the Bayesian criteria into ourprocess of sequential optimality gives various results. The frequency tables illustrate the precisionof each criteria based on the probabilities assigned to each candidate point.
11 12 13 14 15 16 17 18 19 20Candidate F r equen cy (a)
11 12 13 14 15 16 17 18 19 20Candidate F r equen cy (b)
11 12 13 14 15 16 17 18 19 20Candidate F r equen cy (c) Figure 8: Bayesian optimality criteria are used to guide the sequential optimality process. Thefrequency tables plot the probabilities assigned to the first ten candidate points following theinitial base design for a normal growth model. Panel (a) represents candidate probabilities underthe ¯ U I criterion. Panel (b) plots probabilities of the candidates for ¯ U A criterion. Panel (c) providesthe frequencies associated with ¯ U D criterion.The frequency plots in Figure 8 show the probability densities associated with the first set ofcandidate points evaluated in the sequential optimality algorithm. The Bayesian I-optimal designgives weight to specific candidate points, whereas the Bayesian A and D optimality criteria providea wide range of optimal candidates. Based on the frequency charts, it is clear that the BayesianI-optimal design can provide a precise optimal design point. Whereas, the Bayesian A and Doptimal designs appear to lack precision. These results are further visualized by plotting theBayesian sequential designs in Figure 9 for the normal growth model. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster
11 12 13 14 15 16 17 18 19 20Candidate F r equen cy (c) Figure 8: Bayesian optimality criteria are used to guide the sequential optimality process. Thefrequency tables plot the probabilities assigned to the first ten candidate points following theinitial base design for a normal growth model. Panel (a) represents candidate probabilities underthe ¯ U I criterion. Panel (b) plots probabilities of the candidates for ¯ U A criterion. Panel (c) providesthe frequencies associated with ¯ U D criterion.The frequency plots in Figure 8 show the probability densities associated with the first set ofcandidate points evaluated in the sequential optimality algorithm. The Bayesian I-optimal designgives weight to specific candidate points, whereas the Bayesian A and D optimality criteria providea wide range of optimal candidates. Based on the frequency charts, it is clear that the BayesianI-optimal design can provide a precise optimal design point. Whereas, the Bayesian A and Doptimal designs appear to lack precision. These results are further visualized by plotting theBayesian sequential designs in Figure 9 for the normal growth model. ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster l l l l l l l l l l Time P opu l a t i on (a) l ll lll ll l l Time P opu l a t i on (b) l l llllllll Time P opu l a t i on (c) Figure 9: 10,000 MCMC samples simulate a normal growth curve increasing at a 10% rate across100 time points with a carrying capacity of 2000. Sequential optimality is used with the Bayesiandesign criteria to find the ten best points for (a) ¯ U I , (b) ¯ U A and (c) ¯ U D optimal designs. Theoptimal points are plotted in blue and fit with a solid red curve. The red dotted lines representthe 2.5% and 97.5% quantiles of the predicted values of the model. The black curve represents theground truth model.Given all of the criteria and varying growth rates, the simulations using sequential optimal-ity resulted in different designs. Some designs are able to capture the dynamics of the systemwhile others require a larger design point budget. Ultimately, this indicates that there are severalways to design experiments for dynamic models. Using the simple example of population growthmodeled by the logistic equation, the sequential optimality simulations demonstrate an adaptationof the commonly performed simulated annealing algorithm. Thus, sequential optimality can berecommended when limitations arise for sampling. In this study, the dynamics of various population growth models were investigated in order todesign optimal sampling regimes for ecologists. Efforts were focused on a simple model with lim-ited external factors to demonstrate a new and novel approach of sequential optimality. For thepurpose of this study, the logistic equation provided a straightforward model used to compare sam-pling techniques. Simulated annealing was implemented to explore the design space using variousoptimality criteria. Despite the growth rate and criterion used, simulated annealing was able tocapture the dynamics of the models by exploring the entire design region. However, sequentialoptimality found sampling regimes that capture the dynamics of a system in a sequential manner.The proposed method of sequential optimality incorporates prior information into the designprocess. Rather than exploring the entire design space at once, the sampling method evaluatessubsets of the region using prediction based criteria. Sequential optimality captured the dynamicsof a normal growth rate using the I optimality criterion. However, implementing this method acrosscriteria and growth rates led to different results. The normal growth rate model was captured by Iand A Φ optimal designs but required more design points for the D Φ optimal design. Models withfast growth rates were always captured leading to the belief that a smaller design point budget5could be explored. On the other hand, slow growth rates required a larger design point budget nomatter the criteria used. The algorithm had a more difficult time distinguishing the dynamics ofthe slower growth rate.Bayesian optimal designs were also considered as a method for comparison. The ¯ U I -optimalitycriterion had more precise candidate points than the ¯ U A or ¯ U D optimality criteria. This led to theresults that only the ¯ U I Bayesian optimal design was able to capture the dynamics of the curve. Theother two designs required more design points to capture the dynamics of the model. ComparingBayesian sequential designs to the traditional criteria provided yet another technique available fordetermining the optimal design points in the region. Comparing these processes across criteriaand models suggests that there are several ways to design experiments. However, the approach ofsequential optimality is beneficial given that the procedure predicts the next optimal design point,which could be helpful when sampling budgets change.
In this paper, the ecological model known as the logistic equation was used based on the variousapplications and popularity of the model. The proposed design space tracked the change in pop-ulation dynamics across time and was explored using our novel approach of sequential optimality.When using prediction based criteria for a normal growth model, we captured the dynamics of thesystem and found an optimal design that translates to an optimal sampling regime for ecologists.Rather than sampling out of convenience or across an equal interval, we were able to demonstratea method that can provide the next optimal time to sample given the current state of the environ-ment. Real data cannot be incorporated at this time given that this paper introduces a new methodfor designing sampling regimes. However, the developed method does learn about processes in asequential manner and has the potential to assist ecologists when planning sampling schedules.Though the analysis is performed on a straightforward univariate model, we acknowledge thatmore complex dynamics exist and can represent more realistic environmental encounters. Incor-porating external factors into the model could lead to substantial improvements to our method.Furthermore, there are many design problems that can be investigated by a Bayesian approach.Specifically related to sequential optimality, this study focused on designs of a set size that explorea set window of time into the future. The method could be expanded upon by studying varying de-sign sizes and design windows. Also, sampling techniques used by ecologists when fishing, farming,or testing germination could be invasive and change the process. These changes could be taken intoconsideration as they may affect the model. In this paper, we focused on providing optimal designsbased on temporal models. However, in the future we could explore spatial models or create ahierarchical network of temporal models. Given these additional conditions, there are many levelsof uncertainty that we could incorporate to reflect the reality of an ecological system.
References
Abt M, Welch WJ (1998) Fisher information and maximum-likelihood estimation of covarianceparameters in gaussian stochastic processes. Canadian Journal of Statistics 26(1):127–137Atkinson A, Donev A, Tobias R (2007) Optimum experimental designs, with SAS, vol 34. OxfordUniversity Press ergee, Boone, Ghanam & Stewart-Kosterergee, Boone, Ghanam & Stewart-Koster