[PDF] Applying Evolutionary Metaheuristics for Parameter Estimation of Individual-Based Models

Abstract

Individual-based models are complex and they have usually an elevated number of input parameters which must be tuned for reproducing the observed population data or the experimental results as accurately as possible. Thus, one of the weakest points of this modelling approach lies on the fact that rarely the modeler has the enough information about the correct values or even the acceptable range for the input parameters. Consequently, several parameter combinations must be tried to find an acceptable set of input factors minimizing the deviations of simulated and the reference dataset. In practice, most of times, it is computationally unfeasible to traverse the complete search space trying all every possible combination to find the best of set of parameters. That is precisely an instance of a combinatorial problem which is suitable for being solved by metaheuristics and evolutionary computation techniques. In this work, we introduce EvoPER, an R package for simplifying the parameter estimation using evolutionary computation methods.

Full PDF

PP REPRINT RESEARCH ARTICLE Applying Evolutionary Metaheuristics forParameter Estimation of Individual-BasedModels by Antonio Prestes García, Alfonso Rodríguez-Patón

Abstract

Individual-based models are complex and they have usually an elevated number of inputparameters which must be tuned for reproducing the observed population data or the experimentalresults as accurately as possible. Thus, one of the weakest points of this modelling approach lies on thefact that rarely the modeler has the enough information about the correct values or even the acceptablerange for the input parameters. Consequently, several parameter combinations must be tried to ﬁndan acceptable set of input factors minimizing the deviations of simulated and the reference dataset. Inpractice, most of times, it is computationally unfeasible to traverse the complete search space tryingall every possible combination to ﬁnd the best of set of parameters. That is precisely an instanceof a combinatorial problem which is suitable for being solved by metaheuristics and evolutionarycomputation techniques. In this work, we introduce EvoPER, an R package for simplifying theparameter estimation using evolutionary computation methods.

Introduction

Modeling and simulation is certainly a vast discipline with a broad and complex body of knowledgehaving, beyond the surface, a large technical and theoretical background (Minsky, 1965) (Banks et al.,2009) (Zeigler et al., 2000) (Boccara, 2003) which consequently, is hard of being completely masteredfrom modelers coming from disperse domains like biology, ecology or even computer science. Amongthe existing formalisms, the agent-based or individual-based is increasing gradually the number ofadepts in the recent years. The Individual-based modeling is a powerful methodology which is havingmore and more acceptance between researchers and practitioners of distinct branches from social tobiological sciences, including speciﬁcally the modeling of ecological processes and microbial consortiastudies. Certainly, one of the main reasons for the success of this approach is the relative simplicityfor capturing micro-level properties, stochasticity and spatially complex phenomena without therequirement of a high level of mathematical background (Grimm and Railsback, 2005). But thecounterpart of the ease for building complex and feature rich models, is the lack of a closed formalmathematical form of the model which implies that the study of these models cannot be attackedanalytically. Thereby, the only way to explore and adjust the parameters of these type of models is thebrute-force approach, executing the model many and many times and evaluating the results of eachexecution.The systematic search of for the best set of model parameters is a costly task for which there arebasically two different types of approaches for exploring the solution space of simulation outputs.The ﬁrst one is using some static sampling scheme based on the design of experiments (DoE), such asrandomized, factorial or Latin hypercube (Little and Hills, 1978) Loh (1996) (Thiele et al., 2014) designswhich works by generating a collection of sampling points of parameter space which are furtherevaluated. Alternatively, the parameter estimation can be also stated as an instance of an optimizationproblem and therefore, addressed more conveniently using the whole arsenal of metaheuristics andevolutionary strategies. The main difference between these two ways of tackling with the problem lieson the fact that the ﬁrst one is fundamentally a static sampling technique whereas the second is anintrinsically dynamic form of a guided partial search over the solution space where the set of initialsolutions are continuously improved and hopefully converging towards a global optimum (Weise,2009) (Boussaïd et al., 2013). It turns out that comparatively, the optimization approach may requireless model evaluations to ﬁnd the best, or at least an acceptable solution for the parameter estimationproblem which, in the case of models with an average complexity level, means a difference betweenan upper bound computational cost from hours to days.The parameter estimation of individual-based models is a particularly hard instance of an opti-mization problem as they are highly stochastic and they parameter space most often tend to shownonlinear interactions which difﬁcult the localization of good parameter combinations producingthe minimal deviation from reference data. Differently from deterministic and closed form models,where the most signiﬁcant computational cost is due to the optimization algorithm itself, in the caseof individual-based models the time required every single model execution alone is responsible formost of the computational time taken in the optimization process. It is important to consider thatsome metaheuristics may produce better results than others depending on the problem type and more

ArXiv.org ISSN 2331-8422 a r X i v : . [ c s . N E ] M a y REPRINT RESEARCH ARTICLE specially on the structure and characteristics of model being analyzed. Therefore, it is interestingbefore undertake a full length run to explore different algorithms in order to ﬁnd the best suited forthe problem instance.In the next sections, we will brieﬂy describe the scope of parameter estimation problem and theusage examples of the EvoPER R package which has been developed for facilitating the tasks ofestimating the parameters of Individuals-based models. The current version of EvoPER includesimplementations of Particle Swarm Optimization (PSO), Simulated annealing (SA) and Ant ColonyOptimization (ACO) algorithms developed exclusively for this work and adapted for their use inthe parameter estimation of agent-based models. We are also introducing two simple evolutionarystrategies implemented for exploratory analyses for the parameter space of individual-based modelswhich can be useful for mapping the promising zones of solution space. Parameter Estimation and Optimization

The terms model calibration and parameter estimation, although informally are used interchangeablyand being functionally similar are semantically distinct entities having a different scope and objectives.Therefore, to provide a more formal deﬁnition of these terms let us brieﬂy deﬁne the basic structureof a mathematical model. A model is normally expressed as some form of the algebraic compositionexpressing the relationship between of three element types, namely the independent variables, thedependent or the state variables and ﬁnally the constants. For the sake of simplicity, a model expressingsome linear relationship between variables is shown bellow y = α + β x where x and y are independent and the state variable respectively and α and β are the modelconstants. More generally, the structure of and stochastic model can be represented by the functionalrelation Haefner (1996) given by the expression shown in the Equation (1), y = f ( (cid:126) x , (cid:126) p , (cid:126) (cid:101) ) , (1)The terms (cid:126) x , (cid:126) p and (cid:126) (cid:101) denotes respectively the vectors of independent variables, the vector of modelparameters and the stochastic deviations.The model constants are referred as the model parameters which necessarily do not have to haveany correspondence to some element in the system being modeled (Beck and Arnold, 1977). The directproblem is, being known the model structure and, also knowing the independent variables and theparameters, to estimate the value of state variable. Of course, this oversimpliﬁed case is rarely seenwhen modeling real systems, especially when dealing with biological systems. In addition, in themost cases the constants and the independent variables are impossible to observe directly being alsounknown the right model structure for representing the system under study.Usually the only value elucidated experimentally or backed by observations of some populationdata is the state variable; therefore, the parameters which are the structural part of model must beestimated having as the only reference, the measurements of dependent variable. Hence the term calibration can be deﬁned as the procedure to where the values of state variable ” y ” are compared tothe known standard values, let’s say ” Y ”, which in the context of biological research are those sampledfrom population true values (Zeigler et al., 2000).On the other hand, the parameter estimation is the task of estimating the values of the constantsof a model and it can be seen somehow as an inverse problem, since we are using the reference values Y in order to determine the suitable values for the model constants (Ashyraliyev et al., 2009) (Beck andArnold, 1977). The parameter estimation procedure implicitly encompasses the calibration process as,in order to discover the values for the constants the model outputs must be checked to the referencevalues. Thus the problem can be also stated as an optimization problem, just because the processrequires the search for the minimum values of some function f ( y i , Y i ) measuring the distance between y i and Y i which are the simulated and the reference values respectively.The family of functions measuring how close are the simulated and the reference values is thegoodness of ﬁt metric of a model and is known as the objective function. The objective functionfacilitates the determination of how well the model is able to reproduce the reference data. In otherwords, the objective function provides a numerical hint about how close are the output of model tothe reference data. For any given model, a family of different objective functions can be deﬁned overthe output data, depending on the chosen distance metric and on what is the target of parameterestimation process. More formally, the objective function can be deﬁned as f : R n → R and can befurther generalized for an individual-based model where, differently from a pure mathematical form, ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE the objective function domain may assume any valid computational type. Thus, being S the set ofvalid computational model parameters, the objective function can be rewritten as f : S → R .Therefore, every candidate solution x is instantiated from the solution set S being the best of themknown as the solution or the optimal and represented as x ∗ . Hence, the target of optimization processis to ﬁnd the solution x ∗ which minimize the objective function such that f ( x ∗ ) < f ( X ) , being X theset of all candidate solutions. It is worth to mention that although uncommon, the optimization can bealso deﬁned as a maximization process. Another important aspect to note is that the objective functioncan be much more than a simple distance measurement, thus more complex tasks can be carried outusing an algorithmic approach Weise (2009). We will illustrate that kind of approach, tuning a problemsolution for making model output to oscillate in a ﬁxed period in the example section of this work.There are fundamentally three approaches to deﬁne the distance metrics for a model (Thiele et al.,2014). The ﬁrst approach is based on using acceptable ranges for the model outputs being the moststraightforward one. That approach is also known as categorical calibration and works deﬁningintervals for the model output values and when the output falls inside the interval it is consideredas having a good ﬁt. One of the main drawback of this approach is the fact that it is not possible todetermine how close are the model and the reference data. The second metric relies on measuring thedifferences between simulated and observed values, being the least squares the most commonly usedmethod for computing the quality of ﬁt (Beck and Arnold, 1977). Finally, that last approach requiresthe use of likelihood functions. It is hard to implement and requires that the underlying distributionmust be known, which usually precludes its application on complex non-linear computer models.The systematic exploration of solution space which is compulsory for the calibration processrequires many model executions as well as many evaluations of goodness of ﬁt function over theoutput data to ﬁnd the best estimation for the model parameters provided that they minimize thediscrepancies between simulated and observed values. This is a computationally expensive task,especially in the case of Individual-based models, as the problem bounds increases with modelcomplexity and the number of input parameters which must be tested. Roughly speaking thereare basically two different approaches for generating the sampling points required for estimatingparameters. The ﬁrst of them is based on the deﬁnition of sampling schemes such as Monte Carlosampling, Factorial designs or the Latin Hypercube sampling. These techniques work by generatingan a priori set of samples in the search space, that is to say, a collection of parameter combinationswhich are further used for running model and evaluating the cost function (Thiele et al., 2014) (Viana,2013). The Latin hypercube sampling is a generalization of the Latin squares classical experimentationdesign randomization typically found on agricultural experimentation Little and Hills (1978).On the other hand, in the case of using optimization methods, only an initial set of points, sampledfrom the input space are instantiated and these solutions are updated dynamically searching for neigh-boring solutions which could approximate better towards to the minima. The exact implementationdetails, depends on the metaheuristic chosen for the parameter estimation process, but despite ofdiversity of existing metaheuristics practically all of them can be functionally described and completelycharacterized by combining the building blocks contained in the pseudocode shown in Figure 1. P ← I n i t i a l i z e ( ) f ← Evaluate ( P ) while ! terminate P (cid:48) ← S e l e c t i o n ( P , f ) P (cid:48) ← Recombination ( P (cid:48) ) P (cid:48) ← Mutation ( P (cid:48) ) f (cid:48) ← Evaluate ( P (cid:48) ) ( P , f ) ← Replace ( P , f , P (cid:48) , f (cid:48) ) end Figure 1: The general outline of a metaheuristic optimization method. The pseudocodeshows the initialization step followed by the main loop where the initial solution is im-proved and guided through the search space using the value of the ﬁtness or the solutioncost.

Despite of the multiplicity of existing metaphors inspired on a many different sources, rangingfrom physics or the collective behavior of some eusocial insects to the musical theory (Sörensen, 2015),most of them are just slight variations over the basic evolutionary strategy skeleton. As can be seen inFigure 1 the metaheuristic structure contains few operators depicted in the algorithm by the functions

Selection () , Recombination () , Mutation () and Replace () the other components are the Initialization () and the Evaluate () function. The selection function is responsible for picking parents from the currentpopulation which will be used later for producing the offspring in the next generation. The selectionprocess can be stochastic or using the ﬁtness metric for selecting breeding individuals. The objective ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE of recombination process is to mimic, in some extent, the genetic chromosomic recombination andmix together n , being n > n parents. The mutation operator, roughly speaking, is to generate stochasticallyrandom changes in the solution structure providing the necessary variability for exploring the problemspace. Finally, the replacement process will select the individuals based on their ﬁtness values fromthe current solution which will be conserved in the next algorithm iteration returning a tuple ( P , f ) with the individual solutions and its associated ﬁtness. The accessory functions initialize and evaluate are required respectively for instantiating the initial solution and for evaluating the ﬁtness metric forthe provided solution set S . With respect to the termination condition for the algorithm, the mostcommonly used approach is a combination of the convergence criteria and the maximum numberof iterations. It is worth to say that not necessarily all these components are required to be presenton speciﬁc metaheuristics. For instance, the simulated annealing (Kirkpatrick et al., 1983) uses apopulation of size equal to one, therefore the selection and recombination process are superﬂuous inthis metaheuristic. Metaheuristics for Parameter Estimation

In order to facilitate the parameter estimation task of Individual-based models we introduce the GNUR (R Core Team, 2015) package

EvoPER - Evolutionary Parameter Estimation for Repast, an opensource project intended to facilitate de adoption and application of evolutionary optimization methodsand algorithms to the parameter estimation of IBMs developed using the Repast Symphony framework(North et al., 2013). The EvoPER package is released under the MIT license being the binaries availablefor download from CRAN ( https://cran.r-project.org/web/packages/evoper/ ) and the completesource code for the project can be found on GitHub ( https://github.com/antonio-pgarcia/evoper ).The package EvoPER provides implementations of most common and successful metaheuristicsalgorithms for optimization specially crafted for searching the best combination of input parameters forIndividual-based models developed in Repast Simphony. Current version of EvoPER package supportsthe Particle Swarm Optimization (PSO) (Kennedy and Eberhart, 1995), the Simulated Annealing (SA)(Kirkpatrick et al., 1983) and the Ant Colony Optimization (ACO) (Dorigo et al., 2006) algorithmsfor parameter estimation. We also plan to support more algorithms in successive versions. Themetaheuristic algorithms use bio-inspired, natural or physical system analogy having each of themsubtleties making them suitable for different types of problems. Nonetheless, despite of the differencesin the chosen natural metaphor all algorithms share an important aspect which is that the search spaceis traversed downhill but allowing uphill moves to avoid to get trapped in a local optimum far fromthe global one.The basic PSO algorithm uses the idea of particles moving in a multidimensional search space beingthe direction controlled by the velocity . The velocity has two components, one towards to the directionof best value of particle p i and other towards to the best value found in the neighborhood of particle p i (Kennedy and Eberhart, 1995) Poli et al. (2007). The behavior and convergence of the algorithm iscontrolled by the particle population size and by the φ , φ parameters which respectively controlsthe particle acceleration towards the local and the neighbor best. The algorithm implementation andthe default values for the algorithm parameters follows the guidelines and standard values for thealgorithm parameters facilitated by (Clerc, 2012) which are proved to provide the best results.The metaheuristic known as Simulated Annealing uses the idea of a cooling scheme to controlhow the problem solutions are searched. The algorithm generates an initial solution and then iterates,looking for neighbor solutions, accepting new solutions when they are better than the current solu-tion or with some probability P which is function of current temperature and the cost of solutions.Important parameters are the initial temperature T , the ﬁnal temperature and the cooling scheme(Kirkpatrick et al., 1983). In the implementation, available on the EvoPER package, the default functionfor temperature update is T = α T , being α the parameter controlling how fast the temperature isdecremented. In addition, there are other methods readily available on the package for cooling andthe users can also provide their own temperature decrement function.The Ant Colony Optimization algorithm is settled over the computational metaphor of the stig-mergy mechanism found in ant communities and used by the individuals for coordinating theiractivities in the search for food. Speciﬁcally, in the case of ant foraging behavior, the stigmergy is im-plemented by the pheromone reinforcing system where the most travelled way becomes the preferredone, owing to the proportional increment of pheromones deposited on the environment (Dorigo andGambardella, 1997). The algorithm controls the convergence with the pheromone update and thepheromone evaporation processes. The evaporation avoids the rapid convergence to a local optimum(Dorigo et al., 2006). The standard version of ACO algorithm is well suited to discrete combinatorialproblems but its application to continuous problems require some tweaking. Thus, to cope withthese limitations an extension generalizing ACO for continuous domains and denominated ACO R ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE has been proposed (Socha and Dorigo, 2008). That extension, while keeping the underlying idea,replace the pheromone system by an equivalent structure called solution archive which stores the s l solutions, the results of f ( s l ) function evaluations and ﬁnally the weight ω l (Socha and Dorigo, 2008),for i =

1, . . . , k , where k is an algorithm parameter for conﬁguring the size solution archive. Finally,another component of algorithm is the Gaussian kernel which is sampled for update the solutions.The kernel contains k Gaussian functions, one for each row l in the solution archive.The two metaheuristics introduced in this work, the ees1 and ees2 are simple strategies for tacklingwith the high computational cost of executing complex individual-based models the minimal numberof iterations required for ensuring the algorithm has found an acceptable optimum value for thecombination of model parameters when there is no information about what are their physicallyrelevant ranges. We had developed the ees1 and ees2 for analyzing the parameter space of our ownindividual-based models (Prestes García and Rodríguez-Patón, 2015) (Prestes García and Rodríguez-Patón, 2015) of conjugation plasmid (Arutyunov and Frost, 2013) dynamics within bacterial colonies.The underlying idea behind these metaheuristics is that we are interested on keeping the track andmapping the visited search space, rather than getting a point estimate for the best value, whichmay not be completely suitable for individual-based models because of the high stochasticity in themodel output response. The obvious alternative for facing with the variability in the model outputis increasing the number of replications for each parameter combination, but it would increase theexecution time so much, rendering impractical the approach without parallelizing and distributingthe load across several computer nodes.The ees1 metaheuristic introduced in this work stands for EvoPER Evolutionary Strategy 1 andit is an instance of a custom evolutionary strategy which can be described by the commonly usednotation as ( µ + ρ w λ ) -ES being µ = λ , which basically means that every generation only the ﬁttestindividuals or, some suboptimal individuals selected with a probability P taken from the parentsand offspring collection, will become parents for the next generation. The mating selection processimplemented in ees1 choses the half of existing µ parents for being used in the recombination round.The algorithm has a parameter for parent selection which also uses the Greek letter mu butit must not be confounded with term used previously for describing the parent number in theevolutionary strategy descriptive notation. The parents are sorted by their ﬁtness values and they areselected with an exponentially decreasing probability weight which is calculated using the expression P ( µ ) k , ∀ k = N where P ( µ ) is the probability of selecting individuals with a suboptimal ﬁtness,that is to say, when the values of P ( µ ) are small the solutions with the best ﬁtness are string preferredand as the value P ( µ ) tends to 1 the selection becomes a random process where the individuals areselected using an uniform deviate, see the Figure ?? for visualizing how the parameter µ affects theprobability of picking an individual having higher cost values.The selected parents are used for calculating the geometric average which in turn serves as thecentroid measure C for the recombination process in a sense that all individuals in the current solutionpopulation are recombined by calculating the arithmetic mean of the solution value and the C . Forexecuting the recombination process two different approaches are used, one of them is chosen witha probability P = − P . The ﬁrst approach uses a weight value calculated as w k = f ( x k ) / ∑ Nk = f ( x k ) for calculating the recombination component R = ( x k + C ) ∗ w . The newvalue of x k is the average ( x k + R ) /2. The second approach selected with probability 1 − P is simplythe average of x k and C . More details on the algorithm can be seen in Figure 3.The last component of ees1 metaheuristics is the environmental selection where the ﬁttest individu-als are selected for being part of next generation replacing the current solution elements. The best of allindividuals from current population is always chosen for participating as a parent in the subsequentalgorithm iteration and the remaining population components of suboptimal individuals can bereplaced by the new individuals with a worst ﬁtness value with a probability P ( S ) which representsthe strength of selective pressure. Thus, the range of κ parameter may vary between 0 ≤ κ ≤

1, beingthe lower bound the absence of selective preference and the upper bound the maximum selectivepressure over the population. In other words, a value of κ = κ = κ parameter range, the selection processexecute downhill movements with a probability P ( − κ ) .The second algorithm ees2 is a very simple metaheuristic which is based on gradually reducing theinitial dimensions of search space. The algorithm works by taking samples points of parameter spaceusing Latin hypercube sampling scheme and allowing them vary over their full range. Thus, for eachiteration step, the initial set of problem solution points are further reﬁned using the deﬁned ﬁtnessmetric as guidance for reducing the full span of variation of each of the analyzed input factors. Theparameters of this algorithm are the population size N , and the fraction of population size ρ which willbe used for calculating the new ranges for model parameters. The default values of these parameters ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE Si P mu Probability P(mu,Si)

Figure 2: The probability P ( µ ) k of selecting and individual having a suboptimal ﬁtnessvalue. ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE N ← µ ← ρ ← κ ← iterations ← P ← LatinHypercubeSampling ( (cid:126) p , N ) f ← Evaluate ( P ) S ← Sort ( { P , f } ) while i < iterations mates ← s e l e c t N/2 S i , i f ( U < P ( µ , i ) ) r ← ( N /2 ) G ← ( ∏ mates ) r f o r i = N w ← f ( S i ) / ∑ S i f ( U < S (cid:48) i ← ( S i + S i + G ∗ w ) /2 e l s e S (cid:48) i ← ( S i + G ) /2 end end S (cid:48) ← f o r each ( S (cid:48) i ) p i f ( U < ρ ) ( S (cid:48) i ) p + U i f f ( S (cid:48) ) b e t e r than f ( S ) or U < κ S ← S (cid:48) end end Figure 3: . The pseudocode of evolutionary strategy 1 ees1 . The algorithm encom-passes the standard components of an evolutionary strategy. First, the initial set ofsolutions is instantiated and evaluated. Subsequently, inside the main loop N/2 indi-viduals are selected with a probability P, for estimating the geometric mean which willbe employed for being recombined with the current solution population using a ﬁtnessweighted arithmetic mean. The next step is the mutation process, consisting in makingrandom changes in solution components with a probability ρ from top variables, being p the number of model parameters. Finally, if new solution improves the ﬁtness, thecurrent best solution is updated. The current best population solution S i , for i > , isalso updated with worst solutions with a probability κ . are N =

100 and ρ = (cid:126) p and the population size N . The input parameter vector (cid:126) p contains tuples with therange of values for input factors. Once initialized, the ﬁtness function is evaluated and the results areadded to solution S and sorted using as sort key the ﬁtness value. Thus, the best input parametercombinations for the problem are on the initial rows of S . The solution S is a matrix with m = N rowsand n = | X | +

1, being N the parameter population size and X the set of input parameters, henceevery row of S have an instance problem input parameters and the ﬁtness value for that parametercombination. The next section is the algorithm main loop where the search region is updated everyiteration considering the ﬁrst k values from solution matrix. The number k is calculated using theparameter ρ and usually should be between 5-10% of population size N . With the subset of solutionmatrix, the value of interval I is calculated as the arithmetic average for the minimum maximumvalues of ﬁrst k elements of solution matrix S . The minimum and maximum values of parameterrange is then calculated using the average of ﬁrst k values of solution matrix S , the interval and asmall random perturbation, then the new vector of input parameters (cid:126) p is used for generating the newround of sampling points using the Latin hypercube. The new population is evaluated and combinedwith the current solution S . The new solution is sorted and new iteration takes place using again the ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE ρ ← N ← iterations ← k ← trunc ( N ∗ ρ ) r ← P ← LatinHypercubeSampling ( (cid:126) p , N ) f ← Evaluate ( P ) S ← Sort ( { P , f } ) while i < iterations P ← S ( { P , f } ) (cid:126) m ← ( min ( P ) × max ( P )) r (cid:126) m ← | min ( P ) + max ( P ) | i f U < (cid:126) p min ← (cid:126) m − (cid:126) m − U (cid:126) p max ← (cid:126) m + (cid:126) m + U e l s e (cid:126) p min ← (cid:126) m − (cid:126) m ∗ U (cid:126) p max ← (cid:126) m + (cid:126) m ∗ U end P ← LatinHypercubeSampling ( (cid:126) p , N ) f ← Evaluate ( P ) S ← S add { P , f } S ← Sort ( S ) i ← i + end Figure 4: The complete pseudocode for ees2 metaheuristic. This algorithm uses the Latinhypercube sampling together with a solution cost metric for reducing the solution searchspace towards the possible solution zone. ﬁrst k values from solution matrix. As can be observed the algorithm is pretty simple but effective formapping promising zones of solution space with a relative few number of iterations. It has not beenextensively tested yet but when applied to standard optimization problems produce consistent results.The package was designed following an object-oriented approach, being structured around anentry point function and a class hierarchy representing an abstraction for the objective function to beoptimized and by class encapsulating return type for the optimization methods. The classes abstractingthe objective function are the basic input for the optimization algorithms available on the EvoPERpackage and can be extended for supporting other simulation platforms.There is a parent class called ObjectiveFunction with two subclasses, namely the

PlainFunction and the

RepastFunction . The purpose of the ﬁrst subclass is to allow the user to run the optimizationalgorithms using their own mathematical functions which can be useful for testing purposes or forwrapping other types of simulation subsystems. The second subclass encapsulates the Repast modelcalls for executing the chosen optimization algorithm for estimating the model parameters. The entrypoint function returns an object instance of

Estimates class. A brief description of package classesand the main methods is given in Table 1 but for a complete and updated reference please refer to thepackage manual.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE Table 1: The partial structure of EvoPER classes for encapsulating the target of param-eter estimation.

Class name Methods DescriptionObjectiveFunction The base class in hierarchy providing the skeleton for run-ning the optimization algorithms.Parameter The objective of this method is adding parameter withrange between a minimum and a maximum value. Theparameters must coincide with those deﬁned in the model.getParameter Returns a previously deﬁned parameter. The companionmethod getParameterNames returns a list with the namesof all user deﬁned parameters.Evaluate The

Evaluate method is responsible for wrapping theobjective function evaluation and must be overrid-den by all those classes extending the parent class

ObjectiveFunction .EvalFitness Execute the objective function and returns the results. Theuser code must prefer this method for executing the objec-tive function.RawData Return the complete raw output of objective functionwhenever it is available.stats Provide some basic statistics for the

ObjectiveFunction execution.PlainFunction Allows the optimization of plain functions implementedin R.initialize The initialize method must be overridden in subclasses of

ObjectiveFunction and it is responsible for bootstrappingthe real implementation of target function. In the case of

PlainFunction it requires any user provided R function asparameter.Evaluate Override superclass method with the speciﬁc function call.RepastFunction Wrapper for the Repast Model objective function.initialize This method is a wrapper initializing the Repast Modelconstructor. Requires the model directory, an aggregateddata source, the total simulation time and a user deﬁnedcost function.Evaluate Override superclass method with the speciﬁc function call.Estimates The

Estimates class serves as the standardized return typefor all optimization methods available on the package.The initialized instance of this class stores the values ofbest value ever found during the metaheuristic execution,the list of best values for every algorithm iteration andﬁnally the complete collection of all points which havebeen visited in the solution space which can be particularlyuseful for mapping the promising zones solution space.getBest This method returns the best value ever found for theobjective function.getIterationBest Returns a list with the best values found for every iteration.getVisitedSpace The method returns a collection which contains the resultsof all evaluations of the objective function.The object-oriented approach allows the easy extension of the package for other types of Individual-based modeling tools or methods. As can be seen in Table 1 the only requirement to apply themethods contained in the EvoPER package is to extend the

ObjectiveFunction class and override theEvaluate method to support the new parameter estimation target. One of the useful aspects of EvoPERimplementation is the possibility to specify constraints in the search space by individually setting lowerand upper bounds for every parameter being analyzed using the

ObjectiveFunction$Parameter(name,min, max) method. That is an important point for limiting the parameter values only to the acceptablebiological range.The workﬂow for carry out the parameter estimation consists in a simple sequence of steps. First,an object instance of any

ObjectiveFunction subclasses must be created and properly initialized. Asmentioned previously, currently we have two options available for parameter estimation: one for

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE simple functions which could be used for testing purposes ( PlainFunction ) and another for estimatingparameters of Repast models (

RepastFunction ). Once the objective function has been initialized, therequired parameters must be provided with the appropriate lower and upper bounds. Finally, the extremize function can be applied to the previously deﬁned function. The required parameters are theoptimization method and the objective function instance. The function has a third optional parameterfor providing the custom options for the underlying optimization method.The optimization functions available are shown in the Table 2 for providing an overview on thepackage contents. The EvoPER package is still in and earl phase of development therefore the listcould change over the time. The package manual will be the most updated source of information forthe package contents.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE Table 2: The overview of the most relevant EvoPER functions. The current a list withimplementation of metaheuristic methods for parameter estimation.

Function Descriptionextremize This is the entry point function for all available parameter estimation methods andshould be preferred instead of direct call to the underlying functions. It has threeparameters, being the ﬁrst two of them required and the third optional. The ﬁrstparameter is a string indicating the metaheuristic algorithm, currently acceptedvalues are ( ” pso ” | ” saa ” | ” acor ” | ” ees | ” ees ) for particle swarm, simulated anneal-ing, ant colony respectively, the evolutionary strategy 1 and evolutionary strategy2. The next version will include also genetic algorithms (GA) and Tabu Search (TS).The second is an instance of the objective function and ﬁnally, the third one is aninstance of a subclass if Options class speciﬁc for algorithm. If not provided thedefault options for the metaheuristic will be used. The extremize returns an initial-ized object instance of

Estimates class containing the results for the optimizationmethod.abm.acor The abm.acor implements the

Ant Colony Optimization for continuous domains.The function requires an instance of a subclass of

ObjectiveFunction and an op-tional parameter with an instance of

OptionsACOR . Currently there are two sub-classes of ObjectiveFunction, one for optimizing plain R functions (

PlainFunction )and another for Repast Models (

RepastFunction ).abm.pso The function call for running the

Particle Swarm Optimization method. It isnecessary to provide a subclass of the

ObjectiveFunction and optionally an instanceof

OptionsACOR . If not options are given a default instance will be used for themaximum iterations, the swarm size, the acceleration coefﬁcients, the inertia weightor constriction coefﬁcient and ﬁnally the neighborhood type.abm.saa This is the implementation of

Simulated Annealing algorithm and identically asin the previous cases the function requires an instance of the objective function andaccepts an instance of

OptionsSAA . The options class have acceptable the defaultvalues for the initial temperature, the minimum temperature, the temperaturelength, the cooling ratio, the neighborhood distance as a fraction of parameterrange and the neighborhood function.abm.ees1 The

EvoPER Evolutionary Strategy 1 (ees1) is a simple evolutionary strategywhich uses the geometric mean as the focal point for constructing the recombinationmodel for the next generation of candidate solutions. The metaheuristic allowsthe conﬁguration of several parameters, namely the solution size ( N ), the matingselection strength ( µ ), the mutation rate rho and the selective pressure ( κ ). The de-fault values can be changed by providing an instance of options class OptionsEES1 which the desired values.abm.ees2 That is not exactly an evolutionary strategy stricto sensu because the new generationsolution is not created directly from parent solution but instead, parents are usedfor searching the new range of solution parameter space. The algorithm is based onreducing the initial parameter space and generating new solutions with the newranges for each iteration. The solutions are generated using the

Latin hypercubesampling scheme. The conﬁgurable parameters are basically the population size N , the number of algorithm iterations and the selection ratio ρ which allows thespeciﬁcation of a fraction of N for estimates the new boundaries. The default valuesfor both parameters are 10 and 100 respectively. For modifying these settings,it is necessary to provide an instance of options class OptionsEES2 which thedesired values. It is intended to provide an acceptable approximate for parameterestimation in fewer model executions.The particle swarm optimization metaheuristic implementation requires a topological neighbor-hood function which provide the structure for the swarm particles allowing the algorithm to select thebest position in the solution search space. The package provides three different implementations for theneighborhood selection: pso.neighborhood.K2 , pso.neighborhood.K4 and pso.neighborhood.KN .The ﬁrst topology function returns two neighbors of solution particle x i , where the neighbors are theparticles x i − and x i + using a ring topology Zambrano-Bigiarini et al. (2013). The second functionreturns four neighbors of particle x i using a von Neumann neighborhood function. Finally, the lastfunction returns a complete graph with the whole set of particles. The default implementation usesthe entire population as the neighborhood. In addition to these functions, it is also possible to providea user deﬁned neighborhood function creating a non-default instance of the OptionsPSO class and

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE passing the reference to the alternative implementation using the method neighborhoodFunction() ofOptions class. The neighborhood function signature is shown in Figure 5. The functions is invokedinside the PSO code and which pass the position of current particle i the size of particle population n ;the function must return a collection of integers with the neighbor positions, grouped with the R c () call. myneighborhood< − function ( i , n) { c ( ) } Figure 5: The function signature for the custom particle swarm neighborhood. Thefunction can return any number of neighbors from i to n and the returned values mustnot be greater than n . Most of the aspects implemented in the optimization code are standard and, perhaps the onlypoints which are speciﬁc to the EvoPER package, are the neighborhood function for pso.neighborhood.K4 and saa.neighborhood . The von Neumann neighborhood for particle swarm optimization is generatedusing a topology created converting the linear collections of particles to a matrix using the R code m <- matrix(seq(1,N),nrow=(ceiling(sqrt(N)))) where N is the swarm size.The case of generating the neighborhood solutions for simulated annealing has been attacked usingthe following logic for generating new solutions: ﬁrst pick randomly the parameters to be perturbed and update them using two different paths selected randomly, preferring the second with a probability P = S (cid:48) = ∆ Z + ¯ S where S (cid:48) represents the new solution, ∆ the standard deviation calculated as the range ofparameter p k multiplied by the algorithm parameter distance d = S isthe arithmetic average between the minimum and maximum allowed values of parameter p k . Thesecond one uses the expression S (cid:48) = S + S ∗ U ( −

1, 1 ) where S (cid:48) , S , U are respectively the new neighborsolution, the current solution, a uniform random number between [ −

1, 1 ] .The simulated annealing algorithm also uses needs a function for generating other points in thesolution space close to the actual current solution. The currently available neighborhood functions forperturbing the best solution are: saa.neighborhood1 , saa.neighborhoodH and saa.neighborhoodN .The difference between these implementations is basically the number of problem dimensions to beperturbed. Thus, the ﬁrst function alters just one element of current solution, the second changes thehalf of solution elements at a time and ﬁnally the last one modify all dimensions of a solution. Thesolution components to be perturbed are chosen randomly. Again, it is possible for the package usersprovide their own implementation for the neighborhood function.The package provides acceptable default values for most of parameters related to the optimizationmethod in use. In spite of the fact that the parameter estimation functions can be called directly, theusers should use the function extremize(m, f, o) which is the standard entry point for the optimizationmethods. As has been mentioned previously, the function has three parameters, which are respectivelythe method ( m ), the objective function ( f ) and the options ( o ). Only the ﬁrst two are required and thethird is optional. When the options parameter is not provided the default values are used. If settingdifferent from the default values are required, the user must pass an instance of the correspondingoption class. For example, if more iterations are required for PSO method an instance of OptionsPSO must be created and the method setValue("iterations", value) with the appropriate value. Many otherparameters can be customized in order to ﬁt the speciﬁc needs for the model being analyzed such asthe neighborhood functions or the temperature update for the simulated annealing.

Discussion

In this section, we will show some small and illustrative examples about how to use the EvoPERpackage for estimating the parameters of different kinds of models. The ﬁrst example includes theparameter estimation required minimizing the standard functions employed for testing optimizationmethods. The second example shows show to adjusting the output of a real individual-based model tomatch the experimental data. The next one is on how to tune the model output for oscillating with anyspeciﬁc user deﬁned period. Finally, the last one give an example on how to explore the parametersolution space for getting a landscape with suitable solutions for the problem being addressed. The neighborhood functions currently implemented allows choosing from 1, 1/2 n or n, being n the number ofparameters which will be perturbed. The implementation can be easily extended for accommodating any userdeﬁned neighborhood algorithm.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE Optimizing simple functions

It is worth mentioning that although the package is oriented to the application of evolutionaryoptimization methods to the parameter estimation of models developed using Repast Simphony itcan also be used to minimize mathematical functions as well as, extended for other individual-basedmodeling frameworks. In the following example shown in Figure 6 we demonstrate the package usageapplying it to the two variables Rosenbrock’s function. rm( l i s t = l s ( ) ) l i b r a r y ( evoper ) set . seed (161803398) rosenbrock2< − function ( x1 , x2 ) { (1 − x1 ) ^2 + 100 * ( x2 − x1 ^2)^2 } o b j e c t i v e< − PlainFunction $new( rosenbrock2 ) o b j e c t i v e $Parameter (name=" x1 " ,min= − o b j e c t i v e $Parameter (name=" x2 " ,min= − r e s u l t s< − extremize ( " pso " , o b j e c t i v e ) Figure 6: A simple example for minimizing the Rosenbrock’s function using the EvoPERpackage.

As can be seen in Figure 6 the step 1 shows the deﬁnition of a simple function to be minimized; the step 2 demonstrate how to create an instance of

PlainFunction class; in the step 3 the parameter rangesfor each function’s parameter is provided and ﬁnally in the step 4 the EvoPER extremize function isused to minimize the objective function. The results of running the example are shown in Figure 7where can be seen the estimated parameters, the value of ﬁtness function, the execution time and thenumber of times the function has been evaluated. > system . time ( r e s u l t s< − extremize ( " pso " , o b j e c t i v e ) ) user system elapsed > r e s u l t s $ getBest ( ) x1 x2 pset f i t n e s s − > o b j e c t i v e $ s t a t s ( ) t o t a l _ evals converged tolerance [ 1 , ] 2416 1 2.013409 e − Figure 7: The R console output session showing the results of running the previousexample.Tuning oscillations

The oscillatory behavior is a structural component of many types of systems requiring timers forcontrolling and coordinating its processes. It is an integral part of several types of systems, rangingfrom electronic to ecological or biological processes which normally relies on circadian clocks forregulating faithfully all their internal activities and interactions with environment. Therefore, thedesign of synthetic biological oscillatory circuits is an important research subject and tuning thesecircuits for oscillating with a precise period a cumbersome and trial and error activity Khalil andCollins (2010). Fortunately, tuning the model parameters for ﬁnding the desired oscillatory behaviorcan be expressed as am optimization problem which can be solved using evolutionary algorithms.It is worth to mention that the parameters estimated for any model are just starting point for thewet-lab work because the reality is extremely stubborn insisting in not working in line with the valuesestimated by the model.For illustrating how to turn that problem into an optimization problem we introduce an exampleshown in Figure 8 where the cost function is crafted for tuning the model parameters in order toaccomplish a speciﬁc output. Speciﬁcally, it is a simple toy model representing the Lotka-Volterraordinary differential equation system, also known predator-prey is presented and we want to estimatethe parameters required to make the output oscillate with a speciﬁc period. The model, despite of

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE being developed for modeling the predator and prey relationship, has a broad range of applicationsand can be used for representing many types of ecological and biological interactions Shonkwiler(2008). Additionally, the standard model can be extended for supporting N species interactions. Table 3: The results of applying particle swarm optimization metaheuristic for periodtuning. The table shows the optimum parameter values for the oscillation period.

Period c1 c2 c3 c4 cost12 1.798102 1.618035 1.192361 1.453045 024 0.675586 1.375913 1.169076 0.8311187 0.0416666748 0.4558475 0.4602389 1.192546 0.5483637 072 0.3297914 0.4675479 1.650108 0.778639 0The standard predator-prey model has four parameters which are necessary to estimate as can beseen in Equation (2), dxdt = c x − c xy (2) dydt = − c y + c xy .where the terms c c c c Exploring the solution space

The particularities of individual-based models which make them so appealing for modeling pop-ulations and ecosystems, such as the structural realism, the predictive power, the individual levelstochasticity and the emergence of complex global dynamics from the elemental interactions (Grimmet al., 2016) also implies that that parameter estimation become a complicated matter even when usingapproximated techniques as those presented in this work. Even simple models contain many levels ofuncertainty and certainly presenting nonlinear behavior and possibly discontinuities consequentlyit is very complicated, using a computationally tractable number of model executions, to make surethat the solution converges successfully to an optimum which is close to the better solution. Normally,modelers have not enough information about all model parameters and it is no uncommon to makeassumptions or educated guesses for the acceptable ranges considering the physical or biologicalconstraints. Off course, it is not random choice but it is far from being a perfect process and, despiteof guessed parameter ranges are hopefully within the same order of magnitude of their real values,usually they may diverge by a factor of two three (Mil et al., 2016) which may produce odd resultswhen adjusting several parameters.It is worth noting that it should not be expected a perfect match between the model predictions andthe experimental data, consequently it is very unlikely that optimization algorithms converge usingthe tolerance levels used normally for numerical optimization of plain functions. One way to tacklewith this situation is deﬁning the objective function for the parameter estimation using a categoricalapproach with a not much strict range of acceptance but that may lead to loosing information whichmay be relevant and giving the false felling that the right parameter combination has been found. Thatis serious issue which may render impossible to draw any conclusion from the parameter estimationresults. A possible alternative is not relying exclusively on the best value ever found for the costfunction but instead, leveraging the intermediate results for analyzing the problem using it for buildinga landscape of solution space. The metaheuristics described in this work have a slight modiﬁcationfor saving the partial best results for every interaction and the complete set of points visited in theproblem solution space which are made available as two methods of

Estimates class, respectively getIterationBest() and getVisitedSpace() . Using these two methods the solution space can be mapped for

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE rm( l i s t = l s ( ) ) l i b r a r y ( evoper ) set . seed (2718282) . . . f0 . periodtuningpp< − function ( x1 , x2 , x3 , x4 , period ) { v< − predatorprey ( x1 , x2 , x3 , x4 ) np< − naiveperiod ( i f e l s e ( v [ , "y" ] < 0 , . Machine$double . xmax , v [ , "y" ] ) ) rrepast : : AoE.NRMSD(np$period , period ) } . . . o b j e c t i v e< − PlainFunction $new( f0 . periodtuningpp72 ) o b j e c t i v e $Parameter (name=" x1 " ,min=0.2 ,max=2) o b j e c t i v e $Parameter (name=" x2 " ,min=0.2 ,max=2) o b j e c t i v e $Parameter (name=" x3 " ,min=0.2 ,max=2) o b j e c t i v e $Parameter (name=" x4 " ,min=0.2 ,max=2) r e s u l t s< − extremize ( " pso " , o b j e c t i v e ) Figure 8: Tuning the oscillation period of predator-prey model. The listing has ﬁvesections identiﬁed by the tags

Step 0 to Step 4 . The ﬁrst section consists in loading the li-brary and setting the random seed. The next section is where the cost function is deﬁned,consisting in solving the initial value problem with provided parameters and using theresults of ODE for feeding the function named naiveperiod for ﬁnding the periods inthe differential equation output which is later compared with the reference period us-ing a normalized root mean square deviation (AoE.NRMSD). The subsequent sectionsencompasses: the initialization of function to be optimized which is a wrapper for thepreviously deﬁned cost function; the deﬁnition of range of variation for the model pa-rameters and ﬁnally the application of the metaheuristic with the extremized functioncall. > system . time ( r e s u l t s< − extremize ( " pso " , f ) ) user system elapsed > r e s u l t s $ getBest ( ) x1 x2 x3 x4 pset f i t n e s s > f $ s t a t s ( ) t o t a l _ evals converged tolerance [ 1 , ] 672 1 2.013409 e − Figure 9: The R console output session showing the results of running predator-preymodel in Figure 8. viewing the most promising zones. The code in Figure 11 shows how to generate a contour plot forthe solutions space for a four variables instance of

Rosenbrock function.The complete plot generated with the fourth step of command sequence provided in 11 is shownin Figure 12. These contour plots facilitate mapping and visualizing the promising zones of solutionspace using the ﬁtness of solution generated as the z value. The model parameters are disposed twoby two making easy to ﬁnd the zone where the best solution of adjacent parameters may be possiblysituated. The algorithm employed was the Ant Colony Optimization for continuous domains whichhave not converged and the complete execution has required approximately 32K model evaluations asthe default options for the metaheuristic are 500 iterations using a population of 64 ants. The previousvalue is certainly not acceptable for a costly individual based model which may require from severalhours to days for such high number of evaluations. Thus, it is necessary to tune the metaheuristicsfor reducing the total number of iterations or alternatively using the algorithm introduced in thiswork ees2 for making the initial tour to the solution landscape. The ees2 is not intended to ﬁnd theminima but instead it is well suited for partitioning the problem space towards the good solutionzones. The 13 shows the contour plots generated using ees2 which required just 600 model evaluationsfor reducing the solution zone. The global minimum for the Rosenbrock function of N variables iszero and is found setting all variables to 1.The Figures 12 and 13 are demarcating the possible zones were the problem solution can be found. ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE time v a l ue species xy (a) pso, period: 12 time units time v a l ue species xy (c) pso, period: 48 time units time v a l ue species xy (b) pso, period: 24 time units time v a l ue species xy (d) pso, period: 72 time units Figure 10: An example of tuning the oscillation periods of predator-prey model. Inthis ﬁgure, we can observe how x and y species, respectively the prey and predator com-ponents oscillates with different periods. The example uses the particle swarm opti-mization metaheuristic for ﬁnding the parameter combination required for making themodel produce oscillations with periods of 12, 24, 48 and 72 time units as can be seenrespectively in subﬁgures (a), (b), (c) and (d). As can be observed the objective functionof metaheuristic can be tweaked for generating any desired output behavior. The mostcommon one is to assess the quality of ﬁt between simulated and experimental data butit is no limited and can be used to ﬁnd parameter combinations which generate practi-cally any global behavior. ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE rm( l i s t = l s ( ) ) l i b r a r y ( evoper ) set . seed (161803398) o b j e c t i v e< − PlainFunction $new( f0 . rosenbrock4 ) o b j e c t i v e $Parameter (name=" x1 " ,min= − o b j e c t i v e $Parameter (name=" x2 " ,min= − o b j e c t i v e $Parameter (name=" x3 " ,min= − o b j e c t i v e $Parameter (name=" x4 " ,min= − r e s u l t s< − extremize ( " acor " , o b j e c t i v e ) p1< − contourplothelper ( v$ getVisitedSpace ( ) [ 1 : 2 0 0 , ] , " x1 " , " x2 " , " f i t n e s s " ) p2< − contourplothelper ( v$ getVisitedSpace ( ) [ 1 : 2 0 0 , ] , " x2 " , " x3 " , " f i t n e s s " ) p3< − contourplothelper ( v$ getVisitedSpace ( ) [ 1 : 2 0 0 , ] , " x3 " , " x4 " , " f i t n e s s " ) p4< − contourplothelper ( v$ getVisitedSpace ( ) [ 1 : 2 0 0 , ] , " x4 " , " x1 " , " f i t n e s s " ) Figure 11: Exploring the solution space for Rosenbrock function of four variables usingthe Ant Colony Optimization for continuous domains (acor) algorithm. The sequence ofsteps is practically the same presented previously with the exception of

Step 4 whichshow the generation of four contour plots using the ﬁrst 200 values retuned by the getVisitedSpace() method which are sorted in ascendant order by the ﬁtness value.

In the case of Rosenbrock test function, which has been used in these examples, the best solution isknown a priori to be zero when all parameters are 1. Thus, looking on the ﬁrst plot, can be easilyobserved that the hot zones marked with best ﬁtness are those corresponding the best problem solution.Nonetheless, the complete solution landscape encompasses a very wide zone when compared to thesolution estimated using the ees2 which shows a much more restricted portion of solution space. It isimportant to note that real models normally do not generate results so evident as those generated withtest functions. Usually, the real individual-based models will show discontinuities and possibly morethan one zone with good ﬁtness values owing to the nonlinear or second order interaction betweenmodel parameters.

Comparing metaheuristics

The task of choosing the most suitable metaheuristic for the parameter estimation problem is not aneasy one, mainly because the available algorithms behave differently depending on the problem type.It is also a consequence of intrinsic stochasticity present on individual-based models as well as therandom nature of algorithms itself. Therefore, it is hard to provide general recipes for deciding whatmethod is right for a particular problem instance. The algorithms enclosed in this work are tuned withthose parameters adequate for general cases but some tweaking may be required for achieving thebest results. Speciﬁcally, the optimization metaheuristics can be very sensitive to the neighborhoodstructure and to the parameters controlling the balance between local search which accelerate theconvergence speed and breadth of search which may avoid to get stuck in local optima failing toconverge to the best solution. It may be necessary some trial and error approach, testing differentalgorithms, observing the convergence speed, the value of cost function and then tuning the algorithmparameters accordingly.The application of optimization metaheuristics to individual-based models, as mentioned previ-ously, poses an additional problem because every model execution is computationally costly whencompared to other models types and the cost of algorithm itself can be neglected. Therefore, the one ofthe most important factors for selecting an algorithm is the minimal number of evaluations of objectivefunctions which are needed for ﬁnding an acceptable solution satisfying the optimization target.Bearing this in mind, this section provides a systematic comparison between some of metaheuristicsmentioned in this work. The metaheuristics have been compared using the functions known as

Cigar , Schaffer , Griewank and

Bohachevsky (Qu et al., 2016) (Jamil and Yang, 2013) which are standard testfunctions commonly employed for benchmarking the optimization algorithms. The benchmarks wereperformed using the four variables version of test functions and the experiments were replicated seventimes using randomly selected initial random seeds . The summarized numerical results obtainedfrom the benchmark process are shown on Table 4. The details and the code used for the benchmark are enclosed along the package sources which are availableon https://github.com/antonio-pgarcia/evoper

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE Table 4: The output of benchmarking metaheuristics algorithms. These results wereproduce using the function compare.algorithms1 included in package distribution andare the average values of seven replications with different initial random seeds. Theconvergence values are the ratio of replications which actually converged over thetotal number of performed experiments.

Function Algorithm Evaluations Convergence FitnessCigar saa 458.14 1.0 0.04702100pso 2621.71 1.0 0.05033806acor 2651.43 1.0 0.07268033ees1 332.86 1.0 0.06308869Schaffer saa 649.57 1.0 0.07438586pso 6269.71 0.3 0.57479139acor 2249.14 1.0 0.08662474ees1 495.71 1.0 0.08821531Griewank saa 501.00 1.0 0.04579390pso 4214.86 0.6 0.09860997acor 3812.57 1.0 0.07698225ees1 308.57 0.9 0.07089164Bohachevsky saa 258.14 1.0 0.05232314pso 5053.71 0.4 0.44120018acor 1362.29 1.0 0.04323902ees1 258.57 1.0 0.06660127The experiments were conducted setting a tolerance level of 10 − in order to avoid a time-consuming process and for mimicking tolerance levels which may be relevant for individual-basedmodels which may be considered to converge with high values than plain mathematical functions.These values can be taken as a starting point for deciding what algorithm is most likely to provideacceptable results for the optima with lower number of model evaluations. The Figure 14 presentsgraphically the results for the benchmark, showing the total number of model evaluations required formodel converging with the provided tolerance level. As can be observed, the evolutionary strategy1 (ees1) metaheuristic consistently require fewer evaluations of objective function than the otheralgorithms, excepting for the Bohachvsky function which required practically the same number ofevaluations as the second better algorithm which is the simulated annealing (saa). The third bestalgorithm is the ant colony for continuous domains (acor) followed by particle swarm optimization(pso) that curiously have not behaved as expected with the parameters tuned according the recom-mended values Clerc (2012) and we are evaluating other combination of parameters and neighborhoodfunctions.The Figure 15 shows the value of objective function when algorithm terminates even whenconvergence criteria is met or when the algorithm reaches the maximum number of conﬁgurediterations. It is important to note that the comparisons shown here are just an initial set of hints forproviding a general overview for behavior of each of the algorithms mentioned in this work. The factof the simulated annealing has been the best performer is the expected result because differently fromother algorithms, it is using a population of size N = N = N =

64 and N =

10 respectively the number of particles of particle swarm optimization , thenumber of ants of ant colony optimization and the solution size of evoper evolutionary strategy 1 . This isone of the factors causing the differentiate performance ﬁgures.Consequently, the initial parameter set, deﬁned of each algorithm, should be seen as the startingpoint for tuning the metaheuristics for achieving the desired results and considerable amount of testingmay be necessary for getting the best results for the parameter estimation process of a particular modelinstance. Additionally, some algorithm can be more adequate for a problem than another, therefore,checking the initial outputs of multiple algorithms limiting the number of iterations, may be aninteresting exercise for choosing the most suitable metaheuristic.

Parameter estimation of individual-based models

One of the remarkable aspects is that the syntax is simple and consistent independent of the functionfor which parameters are being estimated which means that the set of API primitives required forapplying the metaheuristics are the same independently the target model is a plain mathematical

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE function, an ODE, an individual-based model implemented in Repast or in any other environmentfor with an add-on have been implemented. The following example consists in the search for thebest parameter combination which minimizes the discrepancies between the simulated values forbacterial conjugation produced using the BactoSIM simulation model (Prestes García and Rodríguez-Patón, 2015) (Prestes García and Rodríguez-Patón, 2015) and the experimental values for conjugativeplasmids taken experimentally (del Campo et al., 2012). The model provides several outputs butonly two values will be used as reference for the parameter estimation process: the conjugation ratesand the doubling time for donor and transconjugant cells because experimental observations areavailable only for these variables. Both model outputs conjugation rates and the doubling time aretime series but we will just compare the ﬁrst for every simulated time step and for the generation timethe overall average will be employed for deﬁning the cost metric. The conjugation rate metric usedin the simulation experiments is the ration between the number of transconjugant cells and sum oftransconjugant cells plus the uninfected recipients, deﬁned as T / ( T + R ) . For building the metrics forcomparing both experimental and simulated time, two approaches have been explored, one using asimple metamodeling (Jin et al., 2003) (Saltelli, 2008) using a linear model ﬁt to the model output andthe observed data and another using the dynamic time warping technique (Lee et al., 2015). Thesealternative approaches are presented In Figure 16 where the functions my.cost1 and my.cost2 showsrespectively the implementations of cost function using a metamodel and the dynamic time warpalgorithm(Giorgino, 2009) for comparing experimental and simulated conjugation rates time series.The implementation of cost function shown in Step 1a creates two simple linear regression modelsof observed and simulated data for comparing the slope and the intercept coefﬁcients which areserves as the distance metric for measuring how close are both time series using root mean squaredeviations. The cost function also considers the values of doubling time for creating a compositemetric. Additionally, the function my.cost1 uses a hybrid categorical-quantitative metric for the doublingtime output which is described in the Equations (3) and (4), respectively the cost estimator for thedoubling time of donors (D) and transconjugant (D) bacterial cells. Basically, the cost is zero if thevalues estimated by the model falls within a limited range around the average value of experimentaldata or the root mean square deviation (RMSD) between the simulated and observed values otherwise. C ( g D ) = (cid:40)

0, if 42 ≤ g D ≤ ( g D , 52 ) , otherwise. (3) C ( g T ) = (cid:40)

0, if 33 ≤ g T ≤ ( g T , 43 ) , otherwise. (4)In both cost functions my.cost1 and my.cost2 , the ﬁrst 60 minutes are eliminated from being used inthe distance metric because during that time lapse the experimental data is a somewhat noise, possiblydue to the lagging time or the adaptation to the culture medium. It is also worth to mention thatbefore undertaking a full length run, which certainly would last a large amount of time, would bemuch better start trying several algorithms with a limited number iterations for getting an initial mapof problem solution or using the ees2 metaheuristic which is limited to 600 evaluations of objectivefunction. An example of an initial mapping for a real individual-based model is shown in Figure 17.One of the things that can be perceived at a ﬁrst glance is that this plot have more diffuse bordersdelimiting the best zones of problem solution space than the examples presented previously in Figures12 and 13 where the object of study were plain mathematical functions. That is the commonly observedpattern for real models owing to the stochasticity and the nonlinear iterations between the modelelements. Hence, it is normally necessary to make several initial mappings of problem for recognizingthe zones of best ﬁtness and then making the adjustments on the initial parameter ranges for achievingbest estimation results.Returning to the 17, despite of the lack of clearly deﬁned limits for best parameter combinations, itcan be observed that some zones are generating better cost values than others. Of course, the processshould not be guided by just one algorithm with a single run but a more exhaustive exploration withseveral runs of available algorithms, also using different sets of random seeds. But for the sake ofbrevity and just illustrating the process, we will extract conclusions from this single execution. Thus,the ﬁrst contour plot which shows the parameters cyclePoint and conjugationCost allows to detectinteresting zones for both parameters circumscribed to those values between 25% and 75% with peaksfor cyclePoint settled approximately over values of 40% and 70% and the conjugationCost being closeto the 50%. The plot relating the conjugationCost and the piliExpressionCost shows similar results forthe ﬁrst parameter as in the previous case, which have its better ﬁtness values nearby the 50% of cellcycle, moreover the second has three promising zones at 25%, 40% and 70%. Finally, the last twoplots, relating the piliExpressionCost – gamma0 and gamma0 - cyclePoint are far from being conclusive but The cycle point parameter represents the point of time, from cell birth to division when the conjugation is mostlikely to happen.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE seems to indicate better performances rounding the zone of γ =

5. The next step in the analysis ofparameter space may include reﬁning somehow the initial assumptions about the parameter bounds,for instance, limiting the initial range of parameter cyclePoint for the values found to contain better cost(25%-75%) and running again the exploratory analysis. Therefore, adjusting the range and runningthe simulated annealing algorithm with the maximum number of iterations limited to 50, new resultsare obtained and presented on Figure 18. These exploratory steps can be repeated a number of timesbefore staring a complete run of metaheuristics.

Summary

The systematic parameter estimation should be a fundamental part of individual-based modeling butit is normally omitted by modelers. One of the main reasons is the relative complexity of availablemethods, the effort required for applying them and the lack of simple tools for the practitionerswhich usually come from different domains with different backgrounds. The ecology has alwaysgreatly beneﬁted from the application of mathematical models to the description of complex processesand iterations. Recently, the Individual-based models are becoming a lingua franca for ecologicalmodeling but normally the acceptance of produce results are hindered by lack of a thorough parameterestimation and analysis. The cause may be attributed to the deﬁcit of experience with methods andtechniques required for carrying out the analysis of simulation output. The individual-based modelsare complex, stochastic and non-linear in their nature, therefore the evaluation input parameters formaking the model reproducing reliably the reference data is hard computation task. The best availableapproach is to estate the parameter estimation as an instance of an optimization problem and applythe existing arsenal of optimization metaheuristics.Beating this in mind, we have introduced in this work the partial set of features available onEvoPER package alongside with illustrative usage cases, including one application to a real individual-based model with the interpretation of outputs produce and the steps necessary to the completeparameter estimation of model. The package is being developed keeping in mind the idea of minimiz-ing the effort required to the application of sophisticated methods in the parameter estimation processof Individual-based models. This package allows the modelers to try different alternatives withouthaving to code ad hoc and complex integration code to the existent packages.

Acknowledgments

This work was supported by the European FP7 - ICT - FET EU research project: 612146 (PLASWIRES"Plasmids as Wires" project) and by Spanish Government (MINECO) research grantTIN2012-36992.

Bibliography

D. Arutyunov and L. S. Frost. F conjugation: Back to the beginning.

Plasmid , 70(1):18–32, July 2013.ISSN 0147619X. doi: 10.1016/j.plasmid.2013.03.010. URL http://dx.doi.org/10.1016/j.plasmid.2013.03.010 . [p]M. Ashyraliyev, Y. Fomekong-Nanfack, J. A. Kaandorp, and J. G. Blom. Systems biology: Parameterestimation for biochemical models, 2009. ISSN 1742464X. [p]J. Banks, J. S. Carson, B. L. Nelson, and D. M. Nicol.

Discrete-Event System Simulation (5th Edition) . Pren-tice Hall, 5 edition, July 2009. ISBN 0136062121. URL .[p]J. V. Beck and K. J. Arnold.

Parameter estimation in engineering and science . Wiley series in probabilityand mathematical statistics. Wiley, New York, 1977. ISBN 0471061182. [p]N. Boccara.

Modeling Complex Systems (Graduate Texts in Contemporary Physics) . Springer, 1 edition,Nov. 2003. ISBN 0387404627. URL . [p]I. Boussaïd, J. Lepagnot, and P. Siarry. A survey on optimization metaheuristics.

Information Sciences ,237:82–117, 2013. doi: 10.1016/j.ins.2013.02.041. [p]M. Clerc. Standard Particle Swarm Optimisation. 2012. [p]

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE I. del Campo, R. Ruiz, A. Cuevas, C. Revilla, L. Vielva, and F. de la Cruz. Determination of conjugationrates on solid surfaces.

Plasmid , 67(2):174–182, mar 2012. ISSN 0147619X. doi: 10.1016/j.plasmid.2012.01.008. [p]M. Dorigo and L. Gambardella. Ant colony system: a cooperative learning approach to the travelingsalesman problem.

IEEE Transactions on Evolutionary Computation , 1(1):53–66, apr 1997. ISSN1089778X. doi: 10.1109/4235.585892. URL http://ieeexplore.ieee.org/document/585892/ . [p]M. Dorigo, M. Birattari, and T. Stutzle. Ant colony optimization.

IEEE Computational IntelligenceMagazine , 1(4):28–39, nov 2006. ISSN 1556-603X. doi: 10.1109/MCI.2006.329691. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4129846 . [p]T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.

Journal Of Statistical Software , 31(7):1–24, 2009. ISSN 15487660. doi: 10.18637/jss.v031.i07. URL . [p]V. Grimm and S. F. Railsback.

Individual-based Modeling and Ecology: (Princeton Series in Theoretical andComputational Biology) . Princeton University Press, Princeton, July 2005. ISBN 069109666X. URL . [p]V. Grimm, D. Ayllón, and S. F. Railsback. Next-Generation Individual-Based Models Integrate Biodiver-sity and Ecosystems: Yes We Can, and Yes We Must.

Ecosystems , pages 1–8, nov 2016. ISSN 1432-9840.doi: 10.1007/s10021-016-0071-2. URL http://link.springer.com/10.1007/s10021-016-0071-2 .[p]J. W. Haefner.

Modeling biological systems : principles and applications . Springer-Verlag, New York, 1996.ISBN 9781461498087. [p]M. Jamil and X. S. Yang. A literature survey of benchmark functions for global optimisation problems.

International Journal of Mathematical Modelling and Numerical Optimisation , 4(2):150, 2013. ISSN2040-3607. doi: 10.1504/IJMMNO.2013.055204. URL . [p]R. Jin, X. Du, and W. Chen. The use of metamodeling techniques for optimization under uncertainty.

Structural and Multidisciplinary Optimization , 25(2):99–116, jul 2003. ISSN 1615-147X. doi: 10.1007/s00158-002-0277-0. URL http://link.springer.com/10.1007/s00158-002-0277-0 . [p]J. Kennedy and R. Eberhart. Particle swarm optimization.

Neural Networks, 1995. Proceedings., IEEEInternational Conference on , 4:1942–1948 vol.4, 1995. ISSN 19353812. doi: 10.1109/ICNN.1995.488968.[p]A. S. Khalil and J. J. Collins. Synthetic biology: applications come of age.

Nature Reviews Genetics ,11(5):367–379, may 2010. ISSN 1471-0056. doi: 10.1038/nrg2775. URL . [p]S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated Annealing.

Science , 220(4598):pp. 671–680, 1983. ISSN 00368075. doi: 10.1126/science.220.4598.671. URL . [p]J.-S. Lee, T. Filatova, A. Ligmann-Zielinska, B. Hassani-Mahmooei, F. Stonedahl, I. Lorscheid, A. Voinov,J. G. Polhill, Z. Sun, and D. C. Parker. The Complexities of Agent-Based Modeling Output Analysis.

Journal of Artiﬁcial Societies and Social Simulation , 18(4):4, 2015. ISSN 1460-7425. [p]T. M. T. M. Little and F. J. Hills.

Agricultural experimentation : design and analysis . Wiley, 1978. ISBN9780471023524. [p]W.-L. Loh. On Latin hypercube sampling.

The Annals of Statistics , 24(5):2058–2080, oct 1996. ISSN0090-5364. doi: 10.1214/aos/1069362310. URL http://projecteuclid.org/Dienst/getRecord?id=euclid.aos/1069362310/ . [p]R. Mil, R. Phillips, and O. Nigel.

CELL BIOLOGY by the numbers . Garland Science, New York, NewYork, USA, 2016. ISBN 978-0-8153-4537-4. [p]M. Minsky. Matter, Mind and Models. In

Proceedings of IFIP Congress 65 , pages 45–49, Jan. 1965. URL http://groups.csail.mit.edu/medg/people/doyle/gallery/minsky/mmm.html . [p]M. J. North, N. T. Collier, J. Ozik, E. R. Tatara, C. M. Macal, M. Bragen, and P. Sydelko. Complexadaptive systems modeling with Repast Simphony.

Complex Adaptive Systems Modeling , 1(1):3,2013. ISSN 2194-3206. doi: 10.1186/2194-3206-1-3. URL http://casmodeling.springeropen.com/articles/10.1186/2194-3206-1-3 . [p]

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE R. Poli, J. Kennedy, and T. Blackwell. Particle swarm optimization.

Swarm Intelligence , 1(1):33–57,2007. ISSN 1935-3812. doi: 10.1007/s11721-007-0002-0. URL http://link.springer.com/10.1007/s11721-007-0002-0 . [p]A. Prestes García and A. Rodríguez-Patón. BactoSim — An Individual-Based Simulation Environmentfor Bacterial Conjugation. pages 275–279. Springer International Publishing, 2015. doi: 10.1007/978-3-319-18944-4{\_}26. URL http://link.springer.com/10.1007/978-3-319-18944-4{_}26 . [p]A. Prestes García and A. Rodríguez-Patón. A Preliminary Assessment of Three Strategies for theAgent-Based Modeling of Bacterial Conjugation. In R. Overbeek, M. P. Rocha, F. Fdez-Riverola,and J. F. De Paz, editors, , volume 375 of

Advances in Intelligent Systems and Computing , pages 1–9. SpringerInternational Publishing, 2015. doi: 10.1007/978-3-319-19776-0\_1. URL http://dx.doi.org/10.1007/978-3-319-19776-0_1 . [p]B. Qu, J. Liang, Z. Wang, Q. Chen, and P. Suganthan. Novel benchmark functions for continuousmultimodal optimization with comparative results.

Swarm and Evolutionary Computation , 26:23–34,2016. ISSN 22106502. doi: 10.1016/j.swevo.2015.07.003. [p]R Core Team.

R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria, 2015. URL . [p]A. A. Saltelli.

Global sensitivity analysis : the primer . John Wiley, 2008. ISBN 9780470059975. [p]R. W. Shonkwiler.

Mathematical Biology: An Introduction with Maple and Matlab . Springer PublishingCompany, Incorporated, 2nd edition, 2008. [p]K. Socha and M. Dorigo. Ant colony optimization for continuous domains.

European Journal ofOperational Research , 185(3):1155–1173, 2008. ISSN 03772217. doi: 10.1016/j.ejor.2006.06.046. [p]K. Sörensen. Metaheuristics-the metaphor exposed.

International Transactions in Operational Research , 22(1):3–18, jan 2015. ISSN 09696016. doi: 10.1111/itor.12001. URL http://doi.wiley.com/10.1111/itor.12001 . [p]J. C. Thiele, W. Kurth, and V. Grimm. Facilitating Parameter Estimation and Sensitivity Analysis ofAgent-Based Models: A Cookbook Using NetLogo and ’R’.

Journal of Artiﬁcial Societies and SocialSimulation , 17(3), 2014. ISSN 1460-7425. doi: 10.18564/jasss.2503. URL http://jasss.soc.surrey.ac.uk/17/3/11.html . [p]F. A. C. Viana. Things You Wanted to Know About the Latin Hypercube Design and Were Afraid toAsk. , pages 1–9, 2013. [p]T. Weise.

Global Optimization Algorithms – Theory and Application . self-published, Germany, 2009. URL . [p]M. Zambrano-Bigiarini, M. Clerc, and R. Rojas. Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements. In , pages 2337–2344, 2013. ISBN 9781479904549. doi: 10.1109/CEC.2013.6557848. [p]B. P. Zeigler, H. Praehofer, and T. G. Kim.

Theory of Modeling and Simulation, Second Edition . AcademicPress, 2 edition, Jan. 2000. ISBN 0127784551. URL .[p]

Antonio Prestes GarcíaDepartamento de Inteligencia Artiﬁcial, Universidad Politécnica de MadridCampus de Montegancedo s/n, Boadilla del Monte, MadridSpain [email protected]

Alfonso Rodríguez-PatónDepartamento de Inteligencia Artiﬁcial, Universidad Politécnica de MadridCampus de Montegancedo s/n, Boadilla del Monte, MadridSpain [email protected]

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE solution landscape for fitness ~ x1, x2−40 0 40−75−50−2502550 x1 x solution landscape for fitness ~ x3, x4−50 0 50−200−1000100 x3 x solution landscape for fitness ~ x2, x3−60 −30 0 30−50050 x2 x solution landscape for fitness ~ x4, x1−200 −100 0 100−4004080 x4 x Figure 12: Exploring solution landscape for visited space generated during the executionof

Ant Colony for Continuous domains (acor) algorithm. The four contour plots provides apanoramic view for the ﬁtness surface of the variable pairs (x1,x2), (x2,x3), (x3,x4) and(x4,x1). The contour curves are employing a color scheme, from light green to red forindicating the cost value, which means respectively the worst and the best ﬁtness for thefunction.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE solution landscape for fitness ~ x1, x2−2 −1 0 1 2−101 x1 x solution landscape for fitness ~ x3, x4−2 −1 0 1 2−10123 x3 x solution landscape for fitness ~ x2, x3−1 0 1−202 x2 x solution landscape for fitness ~ x4, x1−1 0 1 2−1012 x4 x Figure 13: Exploring solution landscape for visited space generated during the executionof

EvoPER Evolutionary strategy 2 (ees2) algorithm. The four contour plots provides apanoramic view for the ﬁtness surface of the variable pairs (x1,x2), (x2,x3), (x3,x4) and(x4,x1). The contour curves are employing a color scheme, from light green to red forindicating the cost value, which means respectively the worst and the best ﬁtness for thefunction.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE (a) Cigar functionsaa pso acor ees1010002000 algorithm e v a l s (c) Griewank functionsaa pso acor ees101000200030004000 algorithm e v a l s (b) Schafer functionsaa pso acor ees10200040006000 algorithm e v a l s (d) Bohachevsky functionsaa pso acor ees1010002000300040005000 algorithm e v a l s ] Figure 14: Comparing the number of objective function evaluations required for eachalgorithm. The subplots show the average number of model evaluations which the meta-heuristics required for reaching convergence using the benchmark functions (a) Cigar,(b) Schaffer, (c) Griewank and (d) Bohachevsky. The meaning of gray bar is that numberof experiments which algorithm converge were inferior to 60%.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE (a) Cigar functionsaa pso acor ees10.000.020.040.06 algorithm f i t ne ss (c) Griewank functionsaa pso acor ees10.0000.0250.0500.0750.100 algorithm f i t ne ss (b) Schafer functionsaa pso acor ees10.00.20.40.6 algorithm f i t ne ss (d) Bohachevsky functionsaa pso acor ees10.00.10.20.30.4 algorithm f i t ne ss ] Figure 15: Comparing the ﬁnal value of objective function (ﬁtness) for each algorithm.The subplots show the average ﬁtness value for the benchmark functions (a) Cigar, (b)Schaffer, (c) Griewank and (d) Bohachevsky. The meaning of gray bar is that number ofexperiments which algorithm converge were inferior to 60%.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE rm( l i s t = l s ( ) ) l i b r a r y ( evoper ) set . seed (161803398) my. cost1< − function ( params , r e s u l t s ) { r e s u l t s< − r e s u l t s [ r e s u l t s $Time > 60 ,] mm1< − with ( results , lm(X. Simulated . ~ Time , data = r e s u l t s ) ) mm2< − with ( results , lm(X. Experimental . ~ Time , data = r e s u l t s ) ) rate< − AoE.RMSD( coef (mm1) [ 1 ] , coef (mm2) [ 1 ] ) + AoE.RMSD( coef (mm1) [ 2 ] , coef (mm2) [ 2 ] ) gT< − AoE.RMSD( i f e l s e ( ( r e s u l t s $G. T . > 41 & r e s u l t s $G. T . < 63) , 52 , r e s u l t s $G. T . ) , 52 ) gD< − AoE.RMSD( i f e l s e ( ( r e s u l t s $G.D. > 32 & r e s u l t s $G.D. < 54) , 43 , r e s u l t s $G. T . ) , 43 ) c r i t e r i a< − cbind ( rate , gT ,gD) return ( c r i t e r i a ) } my. cost2< − function ( params , r e s u l t s ) { r e s u l t s< − r e s u l t s [ r e s u l t s $Time > 60 ,] alignment< − dtw( r e s u l t s $X. Simulated , r e s u l t s $X. Experimental , keep=TRUE) rate< − alignment$ distance . . . c r i t e r i a< − cbind ( rate , gT ,gD) return ( c r i t e r i a ) } o b j e c t i v e< − RepastFunction $new( "/usr/models/BactoSim " , " ds : : Output " ,360 ,my. cost ) o b j e c t i v e $Parameter (name=" cyclePoint " ,min=1 ,max=90) o b j e c t i v e $Parameter (name=" conjugationCost " ,min=0 ,max=100) o b j e c t i v e $Parameter (name=" pilusExpressionCost " ,min=0 ,max=100) o b j e c t i v e $Parameter (name="gamma0" ,min=1 ,max=10) o b j e c t i v e $ setTolerance ( 0 . 1 ) my. options< − OptionsACOR$new ( ) my. options $ setValue ( " i t e r a t i o n s " , 30) r e s u l t s< − extremize ( " acor " , objective , my. options ) Figure 16: The code required running the parameter estimation for an individual-basedmodel using the Ant Colony Optimization algorithm. This code snippet shows the im-plementation details for two alternative implementations of cost function. As can beseen, the sequence of steps required are:

Step 0 loading the required libraries and setsthe random seed; The

Step 1a and

Step 1b are the implementation of two alternative costfunctions one using a metamodel ﬁtted to the simulated data and another the dynamictime warping distance as the cost metric;

Step 2

Creates an instance of a

RepastFunction class for the underlying model, initializing the model directory and the total simulatedtime; The

Step 3 initialize the model parameters of interest which can be a subset of alldeclared parameters; The

Step 4 shows the creation of a non-default options class for set-ting the maximum number of algorithm iterations to 30; Finally, in the

Step 5 the extrem-ize function perform the optimization of cost function. This example shows how ﬁndthe best combination of model parameters which minimize the differences between theobserved and the simulated data for the simulated variables conjugation rate and doublingtime . ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE fitness ~ cyclePoint, conjugationCost0 25 50 750255075100 cyclePoint c on j uga t i on C o s t fitness ~ pilusExpressionCost, gamma00 25 50 75 1002.55.07.510.0 pilusExpressionCost ga mm a0 fitness ~ conjugationCost, pilusExpressionCost0 25 50 75 1000255075100 conjugationCost p il u s E x p r e ss i on C o s t fitness ~ gamma0, cyclePoint2.5 5.0 7.5 10.00255075 gamma0 cyc l e P o i n t Figure 17: The mapping of solution space of BactoSIM individual-based model of bac-terial conjugation dynamics. The series of four contour plot shows the effects on theﬁtness value, deﬁned by the cost function my.cost1 , for the different values of parameterexplored by the optimization algorithm.

ArXiv.org ISSN 2331-8422

REPRINT RESEARCH ARTICLE fitness ~ cyclePoint, conjugationCost20 40 60 80 1000255075100 cyclePoint c on j uga t i on C o s t fitness ~ pilusExpressionCost, gamma00 40 80 12004812 pilusExpressionCost ga mm a0 fitness ~ conjugationCost, pilusExpressionCost0 25 50 75 10004080120 conjugationCost p il u s E x p r e ss i on C o s t fitness ~ gamma0, cyclePoint0 4 8 1220406080100 gamma0 cyc l e P o i n t Figure 18: Reﬁning the initial mapping of solution space of BactoSIM limiting the vari-ation range of cyclePoint parameter based on the previous mapping results.parameter based on the previous mapping results.