[PDF] Agent-Based Model Calibration using Machine Learning Surrogates

Abstract

Taking agent-based models (ABM) closer to the data is an open challenge. This paper explicitly tackles parameter space exploration and calibration of ABMs combining supervised machine-learning and intelligent sampling to build a surrogate meta-model. The proposed approach provides a fast and accurate approximation of model behaviour, dramatically reducing computation time. In that, our machine-learning surrogate facilitates large scale explorations of the parameter-space, while providing a powerful filter to gain insights into the complex functioning of agent-based models. The algorithm introduced in this paper merges model simulation and output analysis into a surrogate meta-model, which substantially ease ABM calibration. We successfully apply our approach to the Brock and Hommes (1998) asset pricing model and to the "Island" endogenous growth model (Fagiolo and Dosi, 2003). Performance is evaluated against a relatively large out-of-sample set of parameter combinations, while employing different user-defined statistical tests for output analysis. The results demonstrate the capacity of machine learning surrogates to facilitate fast and precise exploration of agent-based models' behaviour over their often rugged parameter spaces.

Full PDF

AAgent-Based Model Calibration using MachineLearning Surrogates

Francesco Lamperti ∗ , Andrea Roventini † and Amir Sani ‡ April 7, 2017

Abstract

Taking agent-based models (ABM) closer to the data is an open challenge. This paperexplicitly tackles parameter space exploration and calibration of ABMs combining supervisedmachine-learning and intelligent sampling to build a surrogate meta-model. The proposedapproach provides a fast and accurate approximation of model behaviour, dramaticallyreducing computation time. In that, our machine-learning surrogate facilitates large scaleexplorations of the parameter-space, while providing a powerful ﬁlter to gain insights into thecomplex functioning of agent-based models. The algorithm introduced in this paper mergesmodel simulation and output analysis into a surrogate meta-model, which facilitates fast andeﬃcient ABM calibration. We successfully apply our approach to the Brock and Hommes(1998) asset pricing model and to the “Island” endogenous growth model (Fagiolo and Dosi,2003). Performance is evaluated against a relatively large out-of-sample set of parametercombinations, while employing diﬀerent user-deﬁned statistical tests for output analysis.The results demonstrate the capacity of machine learning surrogates to facilitate fast andprecise exploration of agent-based models’ behaviour over their often rugged parameterspaces.

Keywords : agent based model; calibration; machine learning; surrogate; meta-model.

JEL codes:

C15, C52, C63. ∗ Corresponding author. Institute of Economics, Scuola Superiore Sant’Anna, Piazza Martiri della Libert`a 33,56127 Pisa (Italy). Email: [email protected]. † Institute of Economics, Scuola Superiore Sant’Anna (Pisa) and OFCE-Sciences Po (Nice). Email:[email protected]. ‡ Universit´e Paris 1 Path´eon-Sorbonne and CNRS. Email: [email protected]. a r X i v : . [ q -f i n . E C ] A p r Introduction

This paper proposes a novel approach to model calibration and parameter space exploration inagent-based models (ABM), combining supervised machine learning and intelligent sampling inthe design of a novel surrogate meta-model.Agent-based models deal with the study of socio-ecological systems that can be properlyconceptualized through a set of micro and macro relationships. One problem with this frame-work is that the relevant statistical properties for variables of interest are a priori unknown,even to the modeler. Such properties emerge indeed from the repeated interactions amongecologies of heterogeneous, boundedly-rational and adaptive agents. As a result, the dynamicproperties of the system cannot be studied analytically, the identiﬁcation of causal mechanismsis not always possible and interactions give rise to the emergence of relationships that cannotsimply be deduced by aggregating those of micro variables (Anderson et al., 1972, Tesfatsionand Judd, 2006, Grazzini, 2012, Gallegati and Kirman, 2012). This raises the issue of ﬁndingappropriate tools to investigate the emergent behavior of the model with respect to diﬀerentparameter settings, random seeds, and initial conditions (see also Lee et al., 2015). Once thissearch is successful, one can safely move to calibration, validation and, ﬁnally, employ the modelfor policy exercises (more on that in Fagiolo and Roventini, 2017). Unfortunately, this procedureis hardly implementable in practice, notably due to large computation times.Indeed, many ABMs simulate the evolution of a complex system using many parameters anda relatively large number of time steps. In a calibration setting, this rich expressiveness resultsin a “curse of dimensionality” that lends to an exponential number of critical points along theparameter space, with multiple local maxima, minima and saddle points, which negatively im-pact the performance of gradient-based search procedures. Exploring model behaviour throughall possible parameter combinations (a full factorial exploration) is practically impossible evenfor small models. Budgetary constraints also restrict our use from multi-objective optimizationprocedures, such as multimodel optimization or niching (for a review, see e.g. Li et al., 2013;Wong, 2015), and kriging-based procedures due to the large number of evaluations requiredfor these procedures to converge to meaningful interpretations of the model parameter space. However, if a model is to be useful for policy makers, it must provide timely and accurateinsights into the problem. As a result, for computationally expensive models such as ABMs toprovide practical insights with their rich expressiveness, they must be eﬃciently calibrated ona limited budget of evaluations.Traditionally, three computationally expensive steps are involved in ABM calibration; run-ning the model, measuring calibration quality and locating parameters of interest. As remarked In the last two decades a variety of ABM have been applied to study many diﬀerent issues across a broadspectrum of disciplines beyond economics and including ecology (Grimm and Railsback, 2013), health care (Eﬀkenet al., 2012), sociology (Macy and Willer, 2002), geography (Brown et al., 2005), bio-terrorism (Carley et al.,2006), medical research (An and Wilensky, 2009), military tactics (Ilachinski, 1997) and many others. See alsoSquazzoni (2010) for a discussion on the impact of ABM in social sciences, and Fagiolo and Roventini (2012,2017) for an assessment of macroeconomic policies in agent-based models. For example, consider a model with 5 parameters and assume that a single evaluation of the ABM requires5 seconds on a single compute core (CPU). If one discretizes the parameter space by splitting each dimensioninto 10 intervals, 10 evaluations would require approximately 6 CPU days to explore. With a ﬁner partition ofof say 15 intervals, 10 evaluations would roughly require 1 .

2n Grazzini et al. (2017), such steps account for more than half of the time required to estimateABMs, even for extremely simple models. Recently, kriging (also known as Gaussian processes)has been employed to build surrogate meta-models of ABMs (Salle and Yildizoglu, 2014; Dosiet al., 2016, 2017c,b; Bargigli et al., 2016) to facilitate parameter space exploration and sensi-tivity analyses. However, kriging cannot be reasonably applied to large scale models with morethan 20 parameters even in the linear time extensions proposed in Wilson et al. (2015) andHerlands et al. (2015). Moreover, the smooth surfaces produced by kriging meta-models donot provide an accurate approximation of the rugged parameter spaces characteristic of mostABMs.In this paper, we explicitly tackle the problem of eﬃciently exploring the complex param-eter space of agent-based models by employing an eﬃcient, adaptive, gradient-free search overthe parameter space. The proposed approach exploits both labeled and unlabeled parametercombinations in a semi-supervised manner to build a fast, eﬃcient, machine-learning surrogate mapping a statistic, based on a user-deﬁned measure of ﬁt, and a speciﬁc parameterization ofthe ABM. This procedure results in a dramatic reduction in computation time, while providingan accurate surrogate of the original ABM. This surrogate can then be employed for detailedexploration of the possibly wild parameter space. Moreover, we move towards calibration byidentifying parameter combinations that allow the ABM to match user-desired properties. Surrogate meta-models are traditionally employed to approximate or emulate computation-ally costly experiments or simulation models of complex physical phenomena (see Booker et al.,1999). In particular, surrogates provide a proxy that can be exploited for fast parameter-spaceexploration and model calibration. Given their speed advantage, surrogates are regularly ex-ploited to locate promising calibration values and gain rapid intuition over a model. Notethat the objective is not to return a single optimal parameter, but all parametrizations thatpositively identify the ABM with user-desired behaviour. Accordingly, if the surrogate approx-imation error is small, it can be interpreted as an eﬃcient and reasonably good replacement forthe original ABM during parameter space exploration and calibration.Our approach to learning a surrogate occurs over multiple rounds. First, a large “pool” ofunlabelled parametrizations are drawn using a standard sampling routine, such as quasi-randomSobol sampling. Next, a very small subset of the pool is randomly drawn without replacement forevaluation in the ABM, making sure to have at least one example of the user-desired behaviour.These points are “labelled” according to the statistic measured on the output generated by theABM and act as a “seed” set of samples to initialize the surrogate model learned in the ﬁrstround. This ﬁrst surrogate is then exploited to predict the label for unlabelled points remainingin the pool. Another very small subset of points are drawn from the pool for evaluation inthe agent-based model. Then, over multiple rounds, this process is repeated until a speciﬁedbudget of evaluations is achieved. In each round, the surrogate directs which unlabelled pointsare drawn from the pool to maximize the performance of the surrogate learned in the nextround. This semi-supervised “active” learning procedure incrementally improves the surrogatemodel, while maximizing the information gained over the ABM parameter space. The interested reader might want to look at van der Hoog (2016) for a broad discussion on possible applica-tions of machine learning algorithms to agent based modelling. In the Machine Learning jargon supervised learning refers to the task of inferring a function from labeled

As stated in Fagiolo et al. (2007) and Fagiolo and Roventini (2012, 2017), the extreme ﬂexibilityof ABMs concerning e.g. various forms of individual behaviour, interaction patterns and institu-tional arrangements has allowed researchers to explore the positive and normative consequencesof departing from the often over-simplifying assumptions characterizing most mainstream ana-lytical models. Recent years have witnessed a trend in macro and ﬁnancial modeling towardsmore detailed and richer models, targeting a higher number of stylized facts, and claiming astrong empirical content. A common theme informing both theoretical analysis and methodological research concerns training data, that is, data that are assigned either a numerical value or a symbol. Semi-supervised learningindicates a setting when there is a small amount of labelled data relatively to unlabelled ones. The term activerefers instead refers an algorithm that actively selects which data point to evaluate and, therefore, to label. See e.g. Dosi et al. (2010, 2013, 2015); Caiani et al. (2016); Assenza et al. (2015) and Dawid et al. (2014a)on business cycle dynamics, Lamperti et al. (2017) on growth, green transitions and climate change, Dawid et al.(2014b) on regional convergence and Leal et al. (2014) on ﬁnancial markets. The surveys in Fagiolo and Roventini(2012, 2017) provides a more exhaustive list. Finally, Recchioni et al. (2015) use a simple gradient-basedcalibration procedure and then test the performance of the model they obtained through out ofsample forecasting.A parallel stream of research has recently focusing on the development of tools to investigatethe extent ABM outputs are able to approximate reality (see Marks, 2013; Lamperti, 2017,2016; Barde, 2016b,a; Guerini and Moneta, 2016). Some of these contributions also oﬀer newmeasures that can be used to build objective functions in the place of longitudinal momentswithin an estimation setting (e.g. the GSL-div introduced in Lamperti, 2017). However, acommon limitation of both these calibration/estimation and validation exercises lies in theircomputational time, which is usually extremely high. As discussed in detail by Grazzini et al.(2017), simulating the model is the most computationally expensive step for all these procedures.For instance, in order to train his algorithm, Barde (2016b) needs Monte Carlo (MC) runs eachhaving length of about 2 periods, and many macroeconomic ABMs might take weeks justto perform a single MC exercise of this kind. This explains why the vast majority of previouscontributions employ extremely simple ABMs (few parameters, few agents, no stochastic draws)to illustrate their approach, and large macro ABMs are usually poorly validated and calibrated.Hence, using standard statistical techniques, the number of parameters must be minimized toachieve feasible estimation.From a theoretical perspective, the curse of dimensionality implies that the convergence ofany estimator to the true value of a smooth function deﬁned on a high dimensional parameterspace is very slow (Weeks, 1995; De Marchi, 2005). Several methods have been introduced in thedesign of experiments literature to circumvent this problem, but the assumptions of smoothness,linearity and normality do not generally hold for ABMs (see the extensive discussion in Lee et al.,2015).Unfortunately, recent developments in agent-based macro-economics have led to the devel- See also Grazzini and Richiardi (2015) and Fabretti (2012) for other applications of the same approach imbalanced nature of thesample and the non-negligible computation cost of evaluating ABM parameters, it makes senseto carefully select which parametrizations to evaluate, while exploiting the cheap (almost free)cost of generating unevaluated parametrizations. The problem of sequentially selecting themost informative subset of samples over multiple sampling rounds underlies active learning (seeSettles, 2010, for a survey). In particular, given a large pool of unlabelled parametrizations anda ﬁxed evaluation budget, active learning chooses parametrizations from the pool that maximizethe generalization or learning performance of the surrogate meta-model.6

Surrogate modelling methodology

One can represent an agent-based model as a mapping m : I → O from a set of input parameters I into an output set O . The set of parameters can be conceived as a multidimensional spacespanned by the support of each parameter. The number of parameters in large macroeconomicABMs generally range up to several dozen. The output set is generally larger, as it correspondsto time-series realizations of a very large number of micro and macro level variables. Thisrich set of outputs allows a qualitative validation of agent-based models based on their abilityto reproduce the statistical properties of empirical data (e.g. non-stationarity of GDP, cross-correlations and relative volatilities of macroeconomic time series), as well as microeconomicdistributional characteristics (e.g. distribution of ﬁrms’ size, of households’ income, of assets’returns). Beyond stylized facts, the quantitative validation of an agent-based model also requiresthe calibration/estimation of the model on a (generally small) set of aggregate variables (e.g.GDP growth rates, inﬂation and unemployment levels, asset returns etc.).Without loss of generality, we can represent this quantitative calibration as the determinationof input values such that the output satisﬁes certain calibration conditions, coming from, e.g,a statistical test or the evaluation of a likelihood or loss functions. This is in line, for example,with the method of simulated moments (Gilli and Winker, 2003; Franke and Westerhoﬀ, 2012).We consider two settings: • Binary outcome . In this setting the calibration criterion can be considered as a function, v : O → { , } , that maps the ABM output to a binary variable that takes 1 if a certainproperty of the output (or set of properties) is found, and 0 otherwise. For example, aproperty that one might want a ﬁnancial ABM to match is the presence of excess kurtosisin the distribution of returns. This setting leads to what is referred in the machine learningliterature as a classiﬁcation problem. • Real-valued outcome . In this setting the calibration criterion can be considered as afunction, v : O → R , that maps the ABM output to a real valued number providing aquantitative assessment of a certain property of the model. For example, one might wantto compute excess kurtosis of simulated data and then compare it to the one obtainedfrom real data. This setting leads to what is referred in the machine learning literatureas a regression problem.To keep consistency with the machine learning terminology, we say that function v assignsa label to the parameter vector x . Obviously, one would like to ﬁnd the set of input parameters x ∈ I such that their labels indicate that a chosen condition is met. More formally, we saythat C is the set of labels indicating that the condition is satisﬁed. For example, in the caseof binary outcome we can say that C = { } , which indicates that the chosen property isobserved; in the case of real-valued outcome, assuming that v expresses the distance betweensome statistic of the simulated and real data, one might consider C = { x : v ( x ) ≤ α } or C = { min x ∈ I j v ( x ) , j = 1 , , , .., J } . The latter case reﬂects exactly the common calibrationproblem of minimizing some loss function over the parameter space with random restart toavoid ending up in local minima. 7 eﬁnition 1. We say that a positive calibration is a parameter vector x ∈ I whose label incontained in the set C , i.e. x : v ( x ) ∈ C . By contrast, a negative calibration is a parametervector whose label is not contained in C . The problem now is to ﬁnd all positive calibrations. However, an intensive exploration ofthe input set I is computationally infeasible. As emphasized above, it is crucial to drasticallyreduce the computation time required to identify positive calibrations.This paper proposes to train a surrogate model that eﬃciently approximates the value of f ( x ) = v ◦ m ( x ) using a limited number of input parameters (budget) to evaluate the true ABM.Once the surrogate is trained, it provides an eﬃcient mean of exploring the behaviour of theABM over the entire parameter space. The surrogate training procedure requires three decisions:1. Choosing a machine learning algorithm to act as a surrogate for the original ABM, takingcare that the assumptions made by the machine learning model do not force unrealisticassumptions on the parameter space;2. Selecting a sampling procedure to draw samples from the parameters space in order totrain the surrogate;3. Selecting a score or criterion that can be used to evaluate the performance of the surrogate.We prefer to avoid smoothness assumptions and the challenges of selecting a good priorand kernel when using a kriging-based approach (see Rasmussen and Williams, 2006; Ryabko,2016), so we propose to use extreme gradient boosted trees (XGBoost) (Chen and Guestrin,2016, see)that form a random ensemble (see Breiman, 2001) of “boosted” (see Freund, 1990;Freund et al., 1996) classiﬁcation and regression trees (CART) (Breiman et al., 1984, see).This choice endows our surrogate with the ability to learn non-linear “knife-edge” properties,which typically characterize ABM parameter spaces. Sampling should carefully select whichparametrizations of an ABM should be evaluated according to the performance of the surrogate.Here, we leverage pool-based active learning according to a pre-speciﬁed budget of evaluations The structure of the surrogate, active learning approach and performance criterion are detailedbelow.

Here, we employ an iterative training procedure (see Figure 1) to construct a diﬀerent surrogateat each of several rounds until we approach a predeﬁned budget of evaluations on the true ABM.At each round an additional parameter vectors is used in the iterative procedure. The budgetis set in advance by the user according to a pre-determined, acceptable, computation cost oflearning the surrogate. In each round, a surrogate is trained using all available parametervectors, and their respective labels, which have been aggregated up to that round. Once the Notwithstanding its precision, the surrogate remains an approximation of the original model. We suggestthe user, in any case, to identify positive calibrations and further study model’s behaviour therein and in theirclose neighbourhoods employing the original ABM. For a review of active learning, see e.g. Settles (2010).

A trained surrogate can be used to eﬃciently explore the behaviour of the ABM over the entireparameter space. Relevant parameter combinations can then be selected for evaluation usingthe original ABM. Given the desire to avoid evaluating the computationally expensive trueABM, while also identifying positive calibrations, it is critical to maximize the performance ofthe surrogate to predict these calibrations. Recall that positive calibrations are points in theparameter space that fulﬁl the speciﬁc conditions, speciﬁed by an ABM modeller/user. Suchconditions might include any test that compares simulated output with real data (e.g. distancebetween real and simulated moments, a non-parametric test on distribution equality, meansquared prediction errors, etc.) and/or any speciﬁc feature the model might generate (e.g. fattails in a speciﬁc distribution, growth rates of any variable above or below a given threshold,correlation patterns among a set of variables, etc.). In the two exercises presented in this paper9igure 2: An example classiﬁcation and regression tree (CART) used for regression. Features arelabelled f , . . . , f Though ouraim is to maximize the TPR of our surrogate, the scores used to train the surrogate depend onthe particular form of the output condition. According to the two settings introduced above we Several procedures exist for tuning machine learning hyper-parameters, see e.g. Feurer et al. (2015). • Binary outcome . In this case the output of the calibration condition is discrete, suchas Accept/Reject, and a measure of classiﬁcation ability is needed. Speciﬁcally, we aim atmaximizing the F -score. The F -score is an harmonic mean between p , which indicatesthe ratio between true positives and total positives and r , which represents the ratio oftrue positives to predicted ones: F = 2 p · rp + r , (1)The F -score takes a value between 0 and 1. In terms of Type I and Type II errors, itequates to: F = 2 · true positives2 · true positives + false positives + false negatives . (2) • Real-valued outcome . In this case, our aim is to minimize the mean-squared error(MSE),

M SE = P Ni =1 (ˆ y i − y i ) N , (3)where the surrogate predicts ˆ y i over N evaluation points with a true labelling y . We noticethat this approach is in line, for instance, with Recchioni et al. (2015). The XGBoost algorithm employed in our surrogate modelling procedure allow us also to performparameter sensitivity analysis at no costs. In particular, the machine learning algorithm providesan intuitive procedure of assessing the explained variance of the surrogate according to therelative number of times a parameter was “split-on” in the ensemble (for details see e.g. Archerand Kimes, 2008; Louppe et al., 2013; Breiman, 2001). As each tree is constructed according toan optimized splitting of the possible values for a speciﬁc parameter vector, and it is increasinglyfocusing on diﬃcult-to-predict samples, splits dictate the relative importance of parameters indiscriminating the output conditions of the ABM. Accordingly, the relative number of splitsover a speciﬁc parameter provides a quantitative assessment of the surrogate model’s sensitivityto the user-speciﬁed conditions speciﬁed by the parameter. This also allows a ranking overparameters on the basis of their relative importance in producing model behaviour that satisﬁeswhatever conditions speciﬁed by the user. As this procedure is non-parametric, the resultingvalues should be interpreted as a rank-based statistic. In particular, the relative importancevalues associated to the number of splits only characterize the speciﬁc instantiation of theensemble. The resulting counts provide insight into the relative performance for each parameter.A changing number of trees would result in a diﬀerent number of splits for each parameter. Asthe number of trees approach inﬁnity, the number of splits will converge to the true ratio ofsplits per parameter by the law of large numbers. Note that there is “no free lunch” with regard to performance measures, so their choice depends on theproblem setting (see e.g. Wolpert, 2002) For a detailed description of the F -Score, see e.g. Van Rijsbergen(1979). .4 Training procedure The primary constraint we face is the limited number of parameter combinations that can beused for model evaluation (budget) without incurring in excessive computational costs. Toaddress this issue, we propose a budgeted online active semi-supervised learning approach thatiteratively builds a training set of parameter vectors on which the agent-based model is actuallyevaluated in order to provide labelled data points for the training of the surrogate. The aimof actively sampling the parameter space is to reduce the discrepancy between the regions thatcontain a manifold of interest and the function approximation produced by the surrogate model.This semi-supervised learning approach (see e.g. Zhu, 2005; Goldberg et al., 2011) minimizes thenumber of required evaluations, while improving the performance of the surrogate. Given thatevaluated parametrizations are aggregated over several rounds and the stationary nature of theparameter space labels, we can use the log convergence results proved in Ross et al. (2011) toprovide a guideline on the number of parametrizations to evaluate in each round. In particular,we evaluate C log budget parameters per round, with C = 1, also ensuring at least one positivecalibration in the initial seed round. Noting that the constant C can be increased or decreasedaccording to the particular ABM.Generally, positive calibrations represent a very small percentage of points in the parameterspace. For example, the concentration of positive calibrations in both of the ABMs presentedin this paper represents less than 1% of the parameters. Our approach exploits this imbalanceby iteratively selecting a random subset of positive predicted calibrations over a ﬁnite numberof rounds. As we use positive predicted calibrations, we exploit semi-supervised learning withthe surrogate to select which parameters should be evaluated in the next round. In orderto maximize computing speed, the algorithm is initialized with a ﬁxed subset of evaluatedparameter combinations that are drawn according to a quasi-random Sobol sampling over theparameter space (Morokoﬀ and Caﬂisch, 1994). Further, the number of samples are drawnaccording to the “total variation” analysis presented in Saltelli et al. (2010). These initial“training” points are then evaluated through the ABM, their labels recorded and ﬁnally used toinitialize the ﬁrst surrogate model. Once the surrogate is trained, new parameter combinationsare sampled over the entire parameter space and labelled using the surrogate. A random subsetof points x i are then selected from the predicted positive calibrations of the surrogate andevaluated for their true labels y i using the ABM. Given the log convergence rates presented inRoss et al. (2011).These new points are then added to the training set to train a new surrogate in the nextround. This “self-training” procedure exploits the imbalance in the data to incrementally in-crease true positives, while reducing false positives. Note that this simple self-training proceduremay result in no new predicted positives. In this case, the algorithm selects new points accord-ing to their predicted binary label entropy, where the latter is deﬁned as the entropy betweenthe predicted positive and negative calibration label probabilities. This incremental procedurecontinues until the targeted training budget is achieved. The algorithm pseudo-code is presentedin Figure 3. Note that in high dimensional spaces, standard design of experiments are computationally costly and showlittle or no advantage over random sampling (Bergstra and Bengio, 2012; Lee et al., 2015). et : • Agent Based Model ABM ∈ R J • Sampling distribution ν ∈ R J • Calibration function C ( · ) • Learning algorithm A , with parameters Θ • Evaluation budget B • Initial training set size N (cid:28) B • X T raining ∈ R N × J • Calibration labels Y T raining ∈ N N binary outcome case (at least 1 positivecalibration) • Calibration labels Y T raining ∈ R N real-valued outcome case (at least 1 posi-tive calibration) • Hyper-parameter optimization algorithm (HPO)

Initialize : • Per-round sampling size S (cid:28) B • Per-round out-of-sample size K (cid:29) B While | Y | < B , repeat

1. Θ = HPO( A (Θ , X T raining , Y

T raining ))2. Draw out-of-sample points X OOS ∈ R K × J ∼ ν

3. Select X sample ∈ R S × J from X OOS

4. Evaluate X T raining = X T raining ∪ X sample

5. Evaluate Y sample = { C (ABM( X samplei )) } i =1 ...S

6. Evaluate Y T raining = Y T raining ∪ Y sample end while Figure 3: Pseudo-code of our training algorithm. Note: Y indicates labels; X indicates param-eter vectors. HPO: hyper parameter optimization; OOS: out of sample13 Surrogate modelling examples: The Brock and Hommes model

In their seminal contribution, Brock and Hommes (1998) develop an asset pricing model (re-ferred here as B&H), where an heterogeneous population of agents trade generic assets accordingto diﬀerent strategies (fundamentalist, chartists, etc.). We brieﬂy introduce the model in (cf.Section 4.1). Then we report the empirical setting (see Section 4.2) and results of our ma-chine learning calibration and exploration exercise (cf. Section 4.3). Recall that the seed ofthe pseudo-random number generator is ﬁxed and kept constant across runs of the model overdiﬀerent parameter vectors.

There is a population of N traders that can invest either in a risk free asset, which is perfectlyelastically supplied at a gross return R = (1 + r ) >

1, or in a risky one, which pays an uncertaindividend y and has a price denoted by p . Wealth dynamics is given by W t +1 = RW t + ( p t +1 + y t +1 − Rp t ) z t , (4)where p t +1 and y t +1 are random variables and z t is the number of the shares of the risky assetbought at time t .Traders are heterogeneous in terms of their expectations about future prices and dividendsand are assumed to be myopic mean-variance maximizers. However, as information about pastprices and dividends is publicly available in the market, agents can apply conditional expectedvalue E t , and variance V t . The demand for share z h,t of agents with expectations of type h iscomputed, solving: max z h,t (cid:26) E h,t ( W t +1 ) − ν V h,t ( W t +1 ) (cid:27) , (5)which in turns implies z h,t = E h,t ( p t +1 + y t +1 − Rp t ) / ( νσ ) , (6)where ν controls for agents’ risk aversion and σ indicates the conditional volatility, assumed tobe equal across traders and constant over time. In case of zero supply of outside shares anddiﬀerent trader types, the market equilibrium equation can be written as: Rp t = X n h,t E h,t ( p t +1 + y t +1 ) , (7)where n h,t denotes the share that traders of type h hold at time t . In presence of homogeneoustraders, perfect information and rational expectations, one can derive the no-arbitrage marketequilibrium condition: Rp ∗ t = E t ( p ∗ t +1 + y t +1 ) , (8)where the expectation is conditional on all histories of prices and dividends up to time t andwhere p ∗ indicates the fundamental price. Dividends are independent and identically distributedover time with constant mean, equation (8) has a unique solution where the fundamental priceis constant and equal to p ∗ = E ( y t ) / ( R − x t = p t − p ∗ t .At the beginning of each trading period t = { , , ..., T } , agents form expectations aboutfuture prices and dividends. Agents are heterogeneous in their forecasts. More speciﬁcally,investors believe that, in a heterogeneous world, prices may deviate from the fundamental valueby some function f h ( · ) depending upon past deviations from the fundamental price. Accordingly,the beliefs about p t +1 and y t +1 of agents of type h evolve according to: E h,t ( p t +1 + y t +1 ) = E t ( p ∗ t +1 ) + f h ( x t − , ..., x t − L ) . (9)Many forecasting strategies specifying diﬀerent trading behaviours and attitudes have beenstudied in the economic literature, (see e.g. Banerjee, 1992; Brock and Hommes, 1997; Luxand Marchesi, 2000; Chiarella et al., 2009). Brock and Hommes (1998) adopt a simple linearrepresentation of beliefs: f h,t = g h x t − + b h , (10)where g h is the trend component and b h the bias of trader type h . If b h = 0, the agent h can be either a pure trend chaser if g h > g > R ), or a contrarian if g < g < R ). If g h = 0, the agent of type h is purely biased (upward ordownward biased if b h > b h < g h and b h are equal tozero, the agent is a “fundamentalists”, i.e. she believes that prices return to their fundamentalvalue. Agents can also be fully rational, with f rational,t = x t +1 . In such a case, they have perfectforesight but, they must pay a cost C . In our application, we use a simple model with only two types of agents, whose behavioursvary according to the choice of trend components, biases and perfect forecasting costs. Com-bining equations (7), (9) and (10), one can derive the following equilibrium condition: Rx t = n ,t f ,t + n ,t f ,t , (11)which allows to compute the price of the risky asset (in deviation from the fundamental) attime t .Traders switch among diﬀerent strategies according to the their evolving proﬁtability. Morespeciﬁcally, each strategy h is associated with a ﬁtness measure of the form: U h,t = ( p t + y t − Rp t − ) z h,t − C h + ωU h,t − (12)where ω ∈ [0 ,

1] is a weight attributed to past proﬁts. At the beginning of each period, agentsreassess the proﬁtability of their trading strategy with respect to the others. The probabilitythat an agent choose strategy h is given by: n h,t = exp( βU h,t ) P h exp( βU h,t ) , (13) In our experiments we allow for the possibility that a positive cost might be by paid also by non-rationaltraders. This mirrors the fact that some trader might want to buy additional information, which they might notbe able to use (due e.g. to computational mistakes).

Parameter Brief description Theoretical support Explored range

Brock and Hommes Model β intensity of choice [0; + ∞ ) [0 .

0; 10 . n initial share of type 1 traders [0; 1] 0.5 b bias of type 1 traders ( −∞ ; + ∞ ) [ − .

0; 2 . b bias of type 2 traders ( −∞ ; + ∞ ) [ − .

0; 2 . g trend component of type 1 traders ( −∞ ; + ∞ ) [ − .

0; 2 . g trend component of type 2 traders ( −∞ ; + ∞ ) [ − .

0; 2 . ∞ ) [0 .

0; 5 . ω weight to past proﬁts [0 . , .

0] [0 .

0; 1 . σ asset volatility (0; + ∞ ) (0 .

0; 1 . ν attitude towards risk [0; + ∞ ] [0; 100]r risk-free return (1; + ∞ ) [1 . , . BH number of periods N where the parameter β ∈ [0 , + ∞ ) captures traders’ intensity of choice. According to equation13, successful strategies gain an increasing number of followers. In addition, the algorithmintroduces a certain amount of randomness, as less proﬁtable strategies may still be chosen bytraders. In this way, the model captures imperfect information and agents’ bounded rationality.Moreover, the system can never be stacked in an equilibrium where all traders adopt the samestrategy. Despite the model is relatively simple, diﬀerent contributions have tried to match the statisticalproperties of its output with those observed in real ﬁnancial markets (Boswijk et al., 2007;Recchioni et al., 2015; Lamperti, 2016; Kukacka and Barunik, 2016). This makes the model anideal test case for our surrogate: it is relatively cheap in terms of computational needs, it oﬀersa reasonably large parameter space and it has been extensively studied in the literature.There are 12 free parameters (Table 1) determined through calibration. The ranges forparameters’ values have been identiﬁed relying on both economic reasoning and previous ex-periments on the model. However, their selection is ultimately a user speciﬁc decision. Ourprocedure manages the computational constraints faced by modellers working with large param-eter spaces. In what follows, we refer to the parameter space spanned by the intervals speciﬁedin the last column of Table 1. Naturally, it can be further expanded or reduced according tothe user’s needs and the available budget.Let us now consider the conditions identifying positive calibrations. As already discussedabove, any feature of model’s output can be employed to express such conditions. Accordingto Section 3 two types of calibration criteria are considered, giving respectively binary andreal-valued outcomes. In the binary outcome case, we employ a two samples Kolmogorov-Smirnov (KS) test between the distribution of logarithmic returns obtained from the numerical We underline that the dimension of the parameter space is in line or even larger that in recent studies onABM meta-modelling (see e.g. Salle and Yildizoglu, 2014; Bargigli et al., 2016). More speciﬁcally,we rely on daily adjusted closing prices for the S&P 500 from December 09, 2013 to December07, 2015, for a total of 502 observations, and we compute the following test statistic: D RW ,S = sup r | F RW ( r ) − F S ( r ) | , (14)where r indicates logarithmic returns and F RW and F S are the empirical distribution functionsof the real world ( RW ) and simulated ( S ) samples respectively. Then, in a real-valued outcomesetting, we use the p-value of the KS test, P ( D > D

RW,S ), as an expression of model’s ﬁt withthe data. We also consider an equivalent condition for binary outcomes, where predicted labelswith a p-value above 5% are considered as positive calibrations. This choice is intentional asequivalent conditions allow a comparison between the binary and real-valued outcomes in termsof precision (ability to identify true calibrations) and computational time (in the real-valuedscenario there is more information to be processed.)We train the surrogate 100 times over 10 diﬀerent budgets of 250, 500, 750, 1000, 1250, 1500,1750, 2000, 2250, 2500 labelled parameter combinations and evaluate it on 100000 unlabelledpoints. Having a large number of out-of-sample, unlabelled, possibly well-spread points isfundamental to evaluate the performance of the meta-model. We use a larger evaluation set thanany other meta-modelling contribution we are aware of (see, for instance, Salle and Yildizoglu,2014; Dosi et al., 2017c; Bargigli et al., 2016).

In Figure 4, we show the parameter importance results for the Brock and Hommes (B&H)model. We ﬁnd that the most relevant parameters to ﬁt the empirical distribution of returnsobserved in the SP500 are those characterizing traders’ attitude towards the trend ( g and g )and, secondly, their bias ( b and b ). This result is in line with recent ﬁndings by Recchioniet al. (2015) and Lamperti (2016) obtained using the same model. Moreover, the “intensity ofchoice” parameter ( β , cf. Section 4) is of crucial importance in the original model developed byBrock and Hommes (1998), but does not appear to be particularly relevant in determining theﬁt of the model with the data when compared to other behavioural parameters (at least withinthe range expressed by Table 1) . Also traders’ risk attitude ( α ) and the weight associated topast proﬁts ( ω ) are relatively unimportant to shape the empirical performance of the model.Let us now consider the behaviour of the surrogate. As outlined in Section 3.2, we run aseries of exercises where the surrogate is employed to explore the behaviour of the model overthe parameter space and ﬁlter out positive calibrations matching the distribution of real stock-market returns. Figure 5 collects the results and show the performance of the surrogate in thetwo proposed settings (binary and real-valued outcome). Within the binary outcome exercise,the F -score increases with the size of the training sample and reaches a peak of around 0.80 Let p t and p t − be the prices of an asset at two subsequent time steps. The logarithmic return from t − t is given by r t = log( p t /p t − ) ’ ( p t − p t − ) /p t − . The data have been obtained from Yahoo Finance: https://finance.yahoo.com/quote/%5EGSPC/history .The test is passed if the null hypothesis “equality of the distributions” is not rejected at 5% conﬁdence level. See also Boswijk et al. (2007) where the authors estimate the B&H model on the SP500 and, in manyexercises, ﬁnd the switching parameter not to be signiﬁcant. searching ability ,which is reported in Figure 5c and indicates the share of total positive calibration that thesurrogate is able to ﬁnd. Speciﬁcally, we ﬁnd that around 75% of the positive calibrationspresent in the large set of out-of-sample points are found.Obviously, the surrogate’s performance worsens as the training sample size is reduced. How-ever, once we move to the real-valued setting, where the surrogate is learned using a continuousvariable (containing more information about model’s behaviour), its performance is remarkablyhigher. Indeed, even when the sample size of the training points is particularly low (500), theTrue Positive Ratio (TPR) is around 70%, and reaches almost 95% (on average) when 2500parameter vectors are employed (see Figure 5d).Timing results are reported according to the average number of seconds required for asingle compute core to complete the speciﬁc task 100 times. The increase in performance fromclassiﬁcation (see Figure 5e) to regression (see Figure 5f) requires roughly 3X the modelling timeand a nearly equivalent prediction time. Given this negligible prediction time, our approachfacilitates a nearly costless exploration of the parameter space, delivering good results in termsof F -score, TPR and MSE. The time savings in comparison to running the original ABM aresubstantial. In this exercise over a set of 10000 out-of-sample points, the surrogate is 500Xfaster on average in prediction. Note also that the learned surrogate is reusable on any numberof out-of-sample parameter combinations, without the need for additional training. Further, weremark that computational gains are expected to be larger as more complex and expensive-to-18 a) Binary-outcome: F1 Score (b) Real-valued outcome: Mean Squared Error(c) Binary-outcome: True Positive Rate (d) Real-valued outcome: True Positive Rate(e) Binary-outcome: Computation Time (f) Real-valued outcome: Computation Time Figure 5: Brock and Hommes surrogate modelling performance averaged over a pool of 10000parametrizations. Black vertical lines indicate 95% conﬁdence intervals on 100 repeated andindependent experiments. 19imulate models are used. The next section goes in this direction.

In the “Island” growth model (Fagiolo and Dosi, 2003), a population of heterogeneous ﬁrmslocally interact discovering and diﬀusing new technologies, which ultimately lead to the emer-gence (or not) of endogenous growth. After presenting the model in Section 5.1, we describethe empirical setting (see Section 5.2) and the results of the machine learning calibration andexploration exercises (cf. Section 5.3). Recall that the seed used for the pseudo-random numbergenerator is ﬁxed and kept constant across runs of the model over diﬀerent parameter vectors.

A ﬁxed population of heterogeneous ﬁrms ( I = 1 , , ..., N ) explore an unknown technologicalspace (“the sea”), punctuated by islands (indexed by j = 1 , , ... ) representing new technologies.The technological space is represented by a 2-dimensional, inﬁnite, regular lattice endowed withthe Manhattan metrics d . The probability that each node ( x, y ) is an island is equal to p ( x, y ) = π . There is only one homogeneous good, which can be “mined” from any island. Eachisland is characterized by a productivity coeﬃcient s j = s ( x, y ) >

0. The production of agent i on island j having coordinates ( x j , y j ) is equal to: Q i,t = s ( x j , y j )[ m t ( x j , Y j )] α − , (15)where α ≥ m t ( x j , y j ) indicates the total number of miners working on j at time t . TheGDP of the economy is simply obtained by summing up the production of each island.Each agent can choose to be a miner and produce an homogeneous ﬁnal good in her currentisland, to become an explorer and search for new islands (i.e. technologies), or to be an imitator and moves towards a known island. In each time step, miners can decide to become explorerwith probability (cid:15) >

0. In that case, the agent leaves the island and “sails” around until another(possibly still unknown) island is discovered. During the search, explorers are not able to extractany output and randomly move in the lattice. When a new island (technology) is discover, itsproductivity is given by: s j new = (1 + W ) { [ | x j new | + | y j new | ] + ϕQ i + ω } (16)where W is a Poisson distributed random variable with mean λ > ω is a uniformly distributedrandom variable with zero mean and unitary variance, ϕ is a constant between zero and oneand, ﬁnally, Q i is the output memory of agent i . Therefore, the initial productivity of a newlydiscovered island depends on four factors (see Dosi, 1988): (i) its distance from the origin;(ii) cumulative learning eﬀects ( φ ); (iii) a random variable W capturing radical innovations(i.e. changes in technological paradigms); (iv) a stochastic i.i.d. zero-mean noise controlling forhigh-probability low-jumps (i.e. incremental innovations).Miners can also decide to imitate currently available technologies by taking advantage ofinformational spill-overs stemming from more productive islands located in their technological20able 2: Parameters and explored ranges in the Island model. Parameter Brief description Theoretical support Explored range

Islands Model ρ degree of locality in the diﬀusion of knowledge [0 , + ∞ ) [0; 10] λ mean of Poisson r.v. - jumps in technology [0; + ∞ ) 1 α productivity of labour in extraction [0 , + ∞ ) [0 .

8; 2] ϕ cumulative learning eﬀect [0 ,

1] [0 .

0; 1 . π probability of ﬁnding a new island [0 . , .

0] [0 .

0; 1 . (cid:15) willingness to explore [0 ,

1] [0 .

0; 1 . m initial number of agents in each island [2 , + ∞ ) 50 T IS number of periods N neighbourhoods. More speciﬁcally, agents mining on any colonized island deliver a signal,which is instantaneously spread in the system. Other agents in the lattice receive the signalwith probability: w t ( x j , y j ; x, y ) = m t ( x j , y j ) m t exp {− ρ [ | x − x j | + | y − y j | ] } , (17)which depends on the magnitude of technology gap as well as on the physical distance betweentwo islands ( ρ > i chooses the strongest signal and become an imitator sealing toisland according to the shortest possible path. Once the imitated island is reached, the imitatorwill start mining again.The model shows that the very possibility of notionally unlimited (albeit unpredictable)technological opportunities is a necessary condition for the emergence of endogenous exponen-tial growth. Indeed, self-sustained growth is achieved whenever technological opportunities(captured by both the density of islands π and the likelihood of radical innovations λ ), path-dependency (i.e. the fraction of idiosyncratic knowledge, ϕ , agents carry over to newly discov-ered technologies), and spreading intensity in the information diﬀusion process ( ρ ), are beyondsome minimum thresholds (Fagiolo and Dosi, 2003). Moreover, the system endogenously gener-ate exponential growth if the trade-oﬀ between exploration and exploitation is solved, i.e. if theecology of agents ﬁnd the right balance between searching for new technologies and masteringthe available ones. The Island model employs eight input parameters to generate a wide array of growth dynamics.We report the parameters, their theoretical support and the explored range in Table 2. Wekept the number of ﬁrms ﬁxed (and equal to 50) to study what happens to the same economicsystem, when the parameters linked to behavioural rules are changed. Similarly to section 4.2, we characterize a binary outcome and a real-valued outcome setting.In the ﬁrst case, the surrogate is learnt using a binary target variable y taking value 1 if a user-deﬁned speciﬁc set of conditions is satisﬁed and zero otherwise. More speciﬁcally, we deﬁnetwo conditions characterizing the GDP time series generated by the model. The ﬁrst condition Note that the Island model does not exhibit scale eﬀects: the results generated by the model does not dependon the number of agents in the system (Fagiolo and Dosi, 2003).

AGR ): AGR = log(

GDP T ) − log( GDP ) T − , (18)sustained growth emerges if AGR > f ( x ) = 12 ab b Γ(1 + b ) e − b | x − µa | b (19)where a controls for the standard deviation, b for the shape of the distribution and µ representsthe mean. As b gets smaller, the tails become fatter. In particular, when b = 2 the distributionreduces to a Gaussian one, while for b = 1 the density is Laplacian. We say that the outputgrowth-rate distribution exhibits fat tails if b ≤

1. Note that there is a hierarchy in the conditionswe have just deﬁned: only those parametrizations satisfying the ﬁrst one (

AGR > b of the symmetric powerexponential distribution and a positive calibration is found if b > Again, the choice of the In the real-valued outcome setting our exercise is comparable to those performed in Dosi et al. (2017c), where a) Binary-outcome: F1 Score (b) Real-valued outcome: Mean Squared Error(c) Binary-outcome: True Positive Rate (d) Real-valued outcome: True Positive Rate(e) Binary-outcome: Computation Time (f) Real-valued outcome: Computation Time Figure 7: Islands surrogate modelling performance versus budget size averaged over a pool of10000 parametrizations. Black vertical lines indicate 95% conﬁdence intervals on 100 repeatedand independent experiments. 23ondition to be satisﬁed ensures (partial, in this case) consistency between the two settings.We train the surrogate as we did with the B&H model, but given the higher computationalcomplexity of the Island model, we reduce the number of unlabelled points to 10000. As for the Brock and Hommes model, we start our analysis reporting the relative importance forall the parameters characterizing the Island model (ﬁgure 6). We ﬁnd that all the parametersof the model linked to production, innovation and imitation appear to be relevant for theemergence of sustained economic growth.The surrogate’s performances is presented in Figure 7, where the ﬁrst column of the plotsrefers to the binary outcome setting, while the second one to the real-valued one. The F -score displays relatively high values even for low training sample sizes (250 and 500) pointingto a good classiﬁcation performance of the surrogate (see Figure 7a). However, it quicklysaturates, reaching a plateau around 0.8. Conversely, in the real-valued setting, the surrogate’sperformance keeps increasing with the training sample size, and it displays remarkably lowvalues of MSE when more than 1000 points are employed (cf. Figure 7b).In both settings, the searching ability of the surrogate behaves in a similar way: the TRPsteadily increases with the training sample size (cf. Figures 7c and 7d). In absolute terms,the real-valued setting delivers much better results than the binary one, as for the Brock andHommes model (section 4.3). In particular, the largest true positive ratio reaches 0.9 for thereal-valued case and 0.8 for the binary one. Therefore, by training the surrogate on 2500 pointswe are able to (i) ﬁnd 90% of true positive calibrations (Figure 7d) and predict the thickness ofthe associated distribution of growth rates incurring in a mean squared error of less than 0.08(Figure 7b) using a continuous target variable and, (ii) ﬁnd 80% of the true positives (panel 7c)and correctly classifying around the 80% of them (panel 7a) using a binary target variable.Given the satisfactory explanatory performance of the surrogate, do we also achieve consid-erable improvements in the computational time required to perform such exploration exercises?Figures 7e and 7f provides a positive answer. Indeed, the surrogate is 3750 times faster thanthe fully-ﬂedge Island agent-based model. Moreover, the increase in speed is considerably largerthan in the Brock and Hommes model. This conﬁrms our intuition on the increasing usefulnessof our surrogate modeling approach when the computational cost of the ABM under study ishigher. Such a result is a desirable property for real applications, where the complexity of theunderlying ABM could even prevent the exploration of the parameter space. We now assess the robustness of our training procedure with respect to diﬀerent surrogatemodels. More speciﬁcally, we compare the XGBoost surrogate employed in the previous analysiswith the simpler and more widely used Logit one. Our comparison exercise is performed in a the same distribution and parameters are used in a model of industrial dynamics. This choice is motivated by the fact that we need to run the model on the out-of-sample points in order toevaluate the surrogate.

AGR > . The algorithm mirror ex-actly the one described in Section 3. The exercise begins by sampling 1000000 points at randomfrom the Islands parameter space. Given the ﬁxed budget of 500 evaluations of the true ABM,for both the XGBoost and Logit, the ﬁrst surrogate is provided with 35 labelled parameters se-lected at random from the 1000000 points, according to the total-variation sampling procedurein Saltelli et al. (2010). Then, over several rounds, a surrogate will be ﬁt to the labelled param-eters and used to predict a labelling over the 1000000 points. The predicted labels will then beemployed by the proposed procedure to select points that will be added at each round to theset of labelled points. A new surrogate is learned in the subsequent round and the procedurewill repeat until the budget of true evaluations has been reached.The proposed procedure results in a comparable precision of 94.17% and 94.72% betweenLogit and XGBoost, respectively. The negligible diﬀerence between the precision of the two sur-rogates suggests that out training procedure provides satisfying results even when the standardLogit statistical model is employed. However, when the XGBoost predicted probabilities are See https://github.com/amirsani/BOASM urrogate Algorithm True Negatives False Positives False Negatives True Positives PrecisionLogit 62 22 61 355 94.17%XGBoost 178 17 0 305 94.72%XGBoost (scaled) 193 2 0 305 99.35% Table 3: Surrogate modelling performance using the learning procedure presented in this paper.Note that only the Precision is computable in a real-life scenario as only True and False Positivesare available when positive predicted calibrations are evaluated.corrected through the Platt scaling procedure, its precision rises to 99.35%. Moreover, scaledXGBoost performs is considerably superior to Logit with regard to true vs. false positives.Considering its higher computation costs and need for hyperparameter optimization in usingthe more precise XGBoost surrogate, users might prefer the faster Logit surrogate when falsepositives are cheap. Nevertheless, our proposed surrogate modelling procedure works well inboth the Logit and XGBoost cases. In this paper, we have proposed a novel approach to the calibration and parameter space explo-ration of agent-based models (ABM), which combines the use of supervised machine learning andintelligent sampling to construct a cheap surrogate meta-model. To the best of our knowledge,this is the ﬁrst attempt to exploit machine-learning techniques for calibration and explorationin an agent-based framework.Our machine-learning surrogate approach is diﬀerent from kriging, which has been recentlyapplied to ABMs dealing with industrial dynamics (Salle and Yildizoglu, 2014; Dosi et al.,2017c), ﬁnancial networks (Bargigli et al., 2016) and macroeconomic issues (Dosi et al., 2016,2017b). In particular, apart from the diﬀerent statistical framework kriging relies on (it assumesa multivariate Gaussian process), the results it delivers once applied to ABMs may suﬀer fromthree relevant limitations. First, kriging is diﬃcult to apply to large scale models, where thenumber of parameters goes beyond 20. This constrains the modeller to introduce additionalprocedures to select, a priori, the subset of parameters to study, while leaving the rest constant(see e.g. Dosi et al., 2016). Second, the machine-learning surrogate approach performs betterin out-of-sample testing: the typical kriging-based meta-model is tested on 10-20 points withinan extremely large space, while our surrogate is tested on samples with size 10000 in the ﬁrstset of exercises and 1000000 points in the last exercise. Finally, the response surfaces generatedby kriging meta-models suﬀer from smoothness assumptions that collapse interesting patterns,which cannot be captured by common Gaussianity assumptions. This results in incrediblysmooth and well-behaved surfaces, which may falsely relate parameters and model behaviour.Given the ragged, unsmooth surfaces commonly reported in agent-based models (see e.g. Gilliand Winker, 2003; Fabretti, 2012; Lamperti, 2016), inferring the behaviour of the true ABM onthe basis of the insights produced by kriging meta-model may results in large errors. Further, Unlike Logit, which produces accurate probabilities for each of the class labels, probabilities produced by non-parametric algorithms such as XGBoost require scaling. Here, we use Platt Scaling to correct the probabilitiesproduced with XGBoost. For more information, see Platt et al. (1999). However, the main advantage of ourmethodology remains in its practical usefulness. Indeed, the surrogate can be learnt at virtuallyzero computational costs (for research applications) and requires a trivial amount of time topredict areas of the parameter space the modeller should focus on with reasonably good results.Two modelling options are presented, a binary outcome setting and a real-valued one: the ﬁrstis faster and especially useful when a large number of samples is available, while the second hasmore explanatory power. Furthermore, the usual trade-oﬀ between the quantity of informationthat needs to be processed (computational costs) and the surrogate performance improvementsis, in practice, absent. Ultimately, the surrogate prediction exercises proposed in this papertake less than a minute to complete, with the majority of computation coming from the timeto assess the budget of true ABM model evaluations. This means, in practical terms, thatthe modeller can use an arbitrarily large set of parameter combinations and a relatively smalltraining sample to build the surrogate at almost no cost and leverage the resulting meta-modelto gain an insight on the dynamics of the parameter space for further exploration using theoriginal ABM.Finally, an additional relevant result emerges from the exercises investigated in this paper.The surrogate is much more eﬀective in reducing the relative cost of exploring the propertiesof the model over the parameter space for the “Islands” model, which is more computationallyintensive than the Brock and Hommes. This suggests that the adoption of surrogate meta-modelling allows to achieve increasing computational gains as the complexity of the underlyingmodel increases.This work is only the ﬁrst step towards a comprehensive assessment of agent-based modelproperties through machine-learning techniques. Such developments are especially importantfor complex macroeconomic agent-based models (see e.g. Dosi et al., 2010, 2013, 2015, 2017a;Popoyan et al., 2017) as they could allow the development of a standardized and robust proce-dure for model calibration and validation, thus closing the existing gap with Dynamics Stochas-tic General Equilibrium models (see Fagiolo and Roventini, 2017, for a critical comparison ofABM and DSGE models). Accordingly, a user-friendly Python surrogate modelling library willalso be released for general use.

Acknowledgements

A special thank goes to Antoine Mandel, who constantly engaged in fruitful discussions with the authorsand provided incredibly valuable insights and suggestions. We would also like to thank Daniele Giachini,Mattia Guerini, Matteo Sostero and Bal´az K´egl for their comments. Further, we would like to thank allthe participants in seminars and workshops held at Scuola Superiore Sant’Anna (Pisa), the XXI WEHIA In the current work, we also focus on examples dealing with relatively few parameters. This choice ismotivated by illustrative reasons and the willingness to use well established models whose code is easily replicable.Further, the results from this paper were produced using a relatively common laptop computer with 16 gigabytesof memory and a 2.4Ghz Intel i7 5500 CPU. The application to a large scale model is currently under development.However, the computational parsimony of the algorithm used to construct our surrogates strongly points to theability to deal with much richer parameter spaces. onference (Castellon), the XXII CEF conference (Bordeaux), NIPS 2016 ”What If? Inference and Learn-ing of Hypothetical and Counterfactual Interventions in Complex Systems” workshop (Barcelona), CCS2016 conference (Amsterdam) and 2016 Paris-Bielefeld Workshop on Agent-Based Modeling (Paris). FLacknowledges ﬁnancial support from European Union’s FP7 project IMPRESSIONS (G.A. No 603416).AS acknowledges ﬁnancial support from the H2020 project DOLFINS (G.A. No 640772) and hardwaresupport from NVIDIA Corporation, AdapData SAS and the Grid5000 testbed for this research. ARacknowledges ﬁnancial support from European UnionâĂŹs FP7 IMPRESSIONS, H2020 DOLFINS andH2020 ISIGROWTH (G.A. No 649186) projects. References

Alfarano, S., Lux, T., and Wagner, F. (2005). Estimation of agent-based models: The case of an asymmetricherding model.

Computational Economics , 26(1):19–49.Alfarano, S., Lux, T., and Wagner, F. (2006). Estimation of a simple agent-based model of ﬁnancial mar-kets: An application to australian stock and foreign exchange data.

Physica A: Statistical Mechanics and itsApplications , 370(1):38 – 42.Amilon, H. (2008). Estimation of an adaptive stock market model with heterogeneous agents.

Journal of EmpiricalFinance , 15(2):342 – 362.An, G. and Wilensky, U. (2009). From artiﬁcial life to in silico medicine. In Komosinski, M. and Adamatzky, A.,editors,

Artiﬁcial Life Models in Software , pages 183–214. Springer London, London.Anderson, P. W. et al. (1972). More is diﬀerent.

Science , 177(4047):393–396.Archer, K. J. and Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures.

Computational Statistics & Data Analysis , 52(4):2249–2260.Assenza, T., Gatti, D. D., and Grazzini, J. (2015). Emergent dynamics of a macroeconomic agent based modelwith capital and credit.

Journal of Economic Dynamics and Control , 50:5–28.Banerjee, A. V. (1992). A simple model of herd behavior.

The Quarterly Journal of Economics , 107(3):797–817.Barde, S. (2016a). Direct comparison of agent-based models of herding in ﬁnancial markets.

Journal of EconomicDynamics and Control , 73:329 – 353.Barde, S. (2016b). A practical, accurate, information criterion for nth order markov processes.

ComputationalEconomics , pages 1–44.Bargigli, L., Riccetti, L., Russo, A., and Gallegati, M. (2016). Network Calibration and Metamodeling of aFinancial Accelerator Agent Based Model. Working papers, economics, Universit´a degli Studi di Firenze,Dipartimento di Scienze per l’Economia e l’Impresa.Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization.

Journal of MachineLearning Research , 13(Feb):281–305.Booker, A., Dennis, J.E., J., Frank, P., Seraﬁni, D., Torczon, V., and Trosset, M. (1999). A rigorous frameworkfor optimization of expensive functions by surrogates.

Structural optimization , 17(1):1–13.Boswijk, H., Hommes, C., and Manzan, S. (2007). Behavioral heterogeneity in stock prices.

Journal of EconomicDynamics and Control , 31(6):1938 – 1970.Bottazzi, G. and Secchi, A. (2006). Explaining the distribution of ﬁrm growth rates.

The RAND Journal ofEconomics , 37(2):235–256.Breiman, L. (2001). Random forests.

Machine learning , 45(1):5–32.Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984).

Classiﬁcation and regression trees . CRC press.Brock, W. A. and Hommes, C. H. (1997). A rational route to randomness.

Econometrica , 65(5):1059–1095. rock, W. A. and Hommes, C. H. (1998). Heterogeneous beliefs and routes to chaos in a simple asset pricingmodel. Journal of Economic Dynamics and Control , 22(8–9):1235 – 1274.Brown, D. G., Page, S., Riolo, R., Zellner, M., and Rand, W. (2005). Path dependence and the validation ofagent-based spatial models of land use.

International Journal of Geographical Information Science , 19(2):153–174.Caiani, A., Godin, A., Caverzasi, E., Gallegati, M., Kinsella, S., and Stiglitz, J. E. (2016). Agent based-stockﬂow consistent macroeconomics: Towards a benchmark model.

Journal of Economic Dynamics and Control ,69:375–408.Carley, K. M., Fridsma, D. B., Casman, E., Yahja, A., Altman, N., Chen, L.-C., Kaminsky, B., and Nave,D. (2006). Biowar: scalable agent-based model of bioattacks.

IEEE Transactions on Systems, Man, andCybernetics-Part A: Systems and Humans , 36(2):252–265.Castaldi, C. and Dosi, G. (2009). The patterns of output growth of ﬁrms and countries: Scale invariances andscale speciﬁcities.

Empirical Economics , 37(3):475–495.Chen, S.-H., Chang, C.-L., and Du, Y.-R. (2012). Agent-based economic models and econometrics.

The KnowledgeEngineering Review , 27:187–219.Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In

Proceedings of the 22Nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining , pages 785–794. ACM.Chiarella, C., Iori, G., and Perell´o, J. (2009). The impact of heterogeneous trading rules on the limit order bookand order ﬂows.

Journal of Economic Dynamics and Control , 33(3):525–537.Cire¸san, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2013). Mitosis detection in breast cancerhistology images with deep neural networks. In

International Conference on Medical Image Computing andComputer-assisted Intervention , pages 411–418. Springer.Claesen, M., Simm, J., Popovic, D., Moreau, Y., and De Moor, B. (2014). Easy hyperparameter search usingoptunity. arXiv preprint arXiv:1412.1114 .Conti, S. and O’Hagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models.

Journal of statistical planning and inference , 140(3):640–651.Dawid, H., Gemkow, S., Harting, P., Van der Hoog, S., and Neugart, M. (2014a). Agent-based macroeconomicmodeling and policy analysis: the eurace@ unibi model. Technical report, Bielefeld Working Papers in Eco-nomics and Management.Dawid, H., Harting, P., and Neugart, M. (2014b). Economic convergence: Policy implications from a heteroge-neous agent model.

Journal of Economic Dynamics and Control , 44:54–80.De Marchi, S. (2005).

Computational and mathematical modeling in the social sciences . Cambridge UniversityPress.Dosi, G. (1988). Sources, procedures and microeconomic eﬀects of innovation.

Journal of Economic Literature ,26:126–71.Dosi, G., Fagiolo, G., Napoletano, M., and Roventini, A. (2013). Income distribution, credit and ﬁscal policiesin an agent-based Keynesian model.

Journal of Economic Dynamics and Control , 37(8):1598–1625.Dosi, G., Fagiolo, G., Napoletano, M., Roventini, A., and Treibich, T. (2015). Fiscal and monetary policies incomplex evolving economies.

Journal of Economic Dynamics and Control , 52(C):166–189.Dosi, G., Fagiolo, G., and Roventini, A. (2010). Schumpeter meeting Keynes: A policy-friendly model of endoge-nous growth and business cycles.

Journal of Economic Dynamics and Control , 34(9):1748–1767.Dosi, G., Pereira, M., Roventini, A., and Virgillito, M. (2017a). When more ﬂexibility yields more fragility:The microfoundations of keynesian aggregate unemployment.

Journal of Economic Dynamics and Control ,forthcoming.Dosi, G., Pereira, M., Roventini, A., and Virgillito, M. E. (2016). The Eﬀects of Labour Market Reforms uponUnemployment and Income Inequalities: an Agent Based Model. LEM Working Papers Series 2016-27, ScuolaSuperiore Sant’Anna. osi, G., Pereira, M., Roventini, A., and Virgillito, M. E. (2017b). Causes and consequences of hysteresis:Aggregate demand, productivity and employment. LEM Working Papers Series 2017-07, Scuola SuperioreSant’Anna.Dosi, G., Pereira, M. C., and Virgillito, M. E. (2017c). On the robustness of the fat-tailed distribution of ﬁrmgrowth rates: a global sensitivity analysis. Journal of Economic Interaction and Coordination , pages 1–21.Eﬀken, J. A., Carley, K. M., Lee, J.-S., Brewer, B. B., and Verran, J. A. (2012). Simulating nursing unitperformance with orgahead: strengths and challenges.

Computers, informatics, nursing: CIN , 30(11):620.Fabretti, A. (2012). On the problem of calibrating an agent based model for ﬁnancial markets.

Journal ofEconomic Interaction and Coordination , 8(2):277–293.Fagiolo, G., Birchenhall, C., and Windrum, P. (2007). Empirical validation in agent-based models: Introductionto the special issue.

Computational Economics , 30(3):189–194.Fagiolo, G. and Dosi, G. (2003). Exploitation, exploration and innovation in a model of endogenous growth withlocally interacting agents.

Structural Change and Economic Dynamics , 14(3):237–273.Fagiolo, G., Napoletano, M., and Roventini, A. (2008). Are output growth-rate distributions fat-tailed? someevidence from oecd countries.

Journal of Applied Econometrics , 23(5):639–669.Fagiolo, G. and Roventini, A. (2012). Macroeconomic policy in dsge and agent-based models.

Revue de l’OFCE ,124:67–116.Fagiolo, G. and Roventini, A. (2017). Macroeconomic policy in dsge and agent-based models redux: Newdevelopments and challenges ahead.

Journal of Artiﬁcial Societies and Social Simulation , 20(1).Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., and Hutter, F. (2015). Eﬃcient and robustautomated machine learning. In

Advances in Neural Information Processing Systems , pages 2962–2970.Franke, R. (2009). Applying the method of simulated moments to estimate a small agent-based asset pricingmodel.

Journal of Empirical Finance , 16(5):804 – 815.Franke, R. and Westerhoﬀ, F. (2012). Structural stochastic volatility in asset pricing dynamics: Estimation andmodel contest.

Journal of Economic Dynamics and Control , 36(8):1193–1211.Freund, Y. (1990). Boosting a weak learning algorithm by majority. In

COLT , volume 90, pages 202–216.Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. In

Icml , volume 96, pages148–156.Gallegati, M. and Kirman, A. (2012). Reconstructing economics.

Complexity Economics , 1(1):5–31.Gilli, M. and Winker, P. (2003). A global optimization heuristic for estimating agent based models.

ComputationalStatistics & Data Analysis , 42(3):299 – 312. Computational Ecomometrics.Goldberg, A. B., Zhu, X., Furger, A., and Xu, J.-M. (2011). Oasis: Online active semi-supervised learning. In

AAAI .Grazzini, J. (2012). Analysis of the emergent properties: Stationarity and ergodicity.

Journal of ArtiﬁcialSocieties and Social Simulation , 15(2):7.Grazzini, J. and Richiardi, M. (2015). Estimation of ergodic agent-based models by simulated minimum distance.

Journal of Economic Dynamics and Control , 51:148 – 165.Grazzini, J., Richiardi, M. G., and Tsionas, M. (2017). Bayesian estimation of agent-based models.

Journal ofEconomic Dynamics and Control , 77:26 – 47.Grimm, V. and Railsback, S. F. (2013).

Individual-based modeling and ecology . Princeton university press.Guerini, M. and Moneta, A. (2016). A Method for Agent-Based Models Validation. LEM Papers Series 2016/16,Laboratory of Economics and Management (LEM), Sant’Anna School of Advanced Studies, Pisa, Italy.Herlands, W., Wilson, A., Nickisch, H., Flaxman, S., Neill, D., Van Panhuis, W., and Xing, E. (2015). Scalablegaussian processes for characterizing multidimensional change surfaces. arXiv preprint arXiv:1511.04408 . lachinski, A. (1997). Irreducible semi-autonomous adaptive combat (isaac): An artiﬁcial-life approach to landwarfare. Technical report, DTIC Document.Kukacka, J. and Barunik, J. (2016). Estimation of ﬁnancial agent-based models with simulated maximumlikelihood. IES Working Paper 7/2016, Charles University of Prague.Lamperti, F. (2016). Empirical Validation of Simulated Models through the GSL-div: an Illustrative Application.LEM Papers Series 2016/18, Laboratory of Economics and Management (LEM), Sant’Anna School of AdvancedStudies, Pisa, Italy.Lamperti, F. (2017). An information theoretic criterion for empirical validation of simulation models. Economet-rics and Statistics , forthcoming.Lamperti, F., Dosi, G., Napoletano, M., Roventini, A., and Sapio, A. (2017). Faraway, so close: coupled climateand economic dynamics in an agent based integrated assessment model. Lem working papers series, ScuolaSuperiore Sant’Anna.Lamperti, F. and Mattei, C. E. (2016). Going Up and Down: Rethinking the Empirics of Growth in theDeveloping and Newly Industrialized World. LEM Papers Series 2016/01, Laboratory of Economics andManagement (LEM), Sant’Anna School of Advanced Studies, Pisa, Italy.Leal, S. J., Napoletano, M., Roventini, A., and Fagiolo, G. (2014). Rock around the clock: an agent-based modelof low-and high-frequency trading.

Journal of Evolutionary Economics , pages 1–28.Lee, J.-S., Filatova, T., Ligmann-Zielinska, A., Hassani-Mahmooei, B., Stonedahl, F., Lorscheid, I., Voinov, A.,Polhill, J. G., Sun, Z., and Parker, D. C. (2015). The complexities of agent-based modeling output analysis.

Journal of Artiﬁcial Societies and Social Simulation , 18(4):4.Li, X., Engelbrecht, A., and Epitropakis, M. G. (2013). Benchmark functions for cec’2013 special session andcompetition on niching methods for multimodal function optimization.

RMIT University, Evolutionary Com-putation and Machine Learning Group, Australia, Tech. Rep .Louppe, G., Wehenkel, L., Sutera, A., and Geurts, P. (2013). Understanding variable importances in forests ofrandomized trees. In

Advances in neural information processing systems , pages 431–439.Lux, T. and Marchesi, M. (2000). Volatility clustering in ﬁnancial markets: a microsimulation of interactingagents.

International journal of theoretical and applied ﬁnance , 3(04):675–702.Macy, M. W. and Willer, R. (2002). From factors to actors: Computational sociology and agent-based modeling.

Annual review of sociology , pages 143–166.Marks, R. E. (2013). Validation and model selection: Three similarity measures compared.

Complexity Economics ,2(1):41–61.Morokoﬀ, W. J. and Caﬂisch, R. E. (1994). Quasi-random sequences and their discrepancies.

SIAM Journal onScientiﬁc Computing , 15(6):1251–1279.Moss, S. (2008). Alternative approaches to the empirical validation of agent-based models.

Journal of ArtiﬁcialSocieties and Social Simulation , 11(1):5.Petrovic, S., Osborne, M., and Lavrenko, V. (2011). Rt to win! predicting message propagation in twitter.

ICWSM , 11:586–589.Platt, J. et al. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihoodmethods.

Advances in large margin classiﬁers , 10(3):61–74.Popoyan, L., Napoletano, M., and Roventini, A. (2017). Taming Macroeconomic Instability: Monetary and MacroPrudential Policy Interactions in an Agent-Based Model.

Journal of Economic Behavior & Organization ,134:117–140.Rasmussen, C. E. and Williams, C. K. I. (2006).

Gaussian processes for machine learning . MIT Press.Recchioni, M. C., Tedeschi, G., and Gallegati, M. (2015). A calibration procedure for analyzing stock pricedynamics in an agent-based framework.

Journal of Economic Dynamics and Control , 60:1 – 25.Ross, S., Gordon, G. J., and Bagnell, D. (2011). A reduction of imitation learning and structured prediction tono-regret online learning. In

AISTATS , volume 1(2), page 6. yabko, D. (2016). Things bayes can’t do. In International Conference on Algorithmic Learning Theory , pages253–260. Springer.Salle, I. and Yildizoglu, M. (2014). Eﬃcient Sampling and Meta-Modeling for Computational Economic Models.

Computational Economics , 44(4):507–536.Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., and Tarantola, S. (2010). Variance basedsensitivity analysis of model output. design and estimator for the total sensitivity index.

Computer PhysicsCommunications , 181(2):259–270.Settles, B. (2010). Active learning literature survey. Technical Report 55-66, University of Wisconsin, Madison.Squazzoni, F. (2010). The impact of agent-based models in the social sciences after 15 years of incursions.

Historyof Economic Ideas , pages 197–233.Subbotin, M. T. (1923). On the law of frequency of error.

Matematicheskii Sbornik , 31(2):296–301.ten Broeke, G., van Voorn, G., and Ligtenberg, A. (2016). Which sensitivity analysis method should i use for myagent-based model?

Journal of Artiﬁcial Societies & Social Simulation , 19(1).Tesfatsion, L. and Judd, K. L. (2006).

Handbook of computational economics: agent-based computational eco-nomics , volume 2. Elsevier.Thiele, J. C., Kurth, W., and Grimm, V. (2014). Facilitating parameter estimation and sensitivity analysis ofagent-based models: A cookbook using netlogo and r.

Journal of Artiﬁcial Societies and Social Simulation ,17(3):11.van der Hoog, S. (2016). Deep Learning in Agent-Based Models: A Prospectus. Technical report, Faculty ofBusiness Administration and Economics, Bielefeld University.Van Rijsbergen, C. (1979).

Information Retrieval . London: Butterworths.Weeks, M. (1995). Circumventing the curse of dimensionality in applied work using computer intensive methods.

The Economic Journal , 105(429):520–530.Wilson, A. G., Dann, C., and Nickisch, H. (2015). Thoughts on massively scalable gaussian processes. arXivpreprint arXiv:1511.01870 .Winker, P., Gilli, M., and Jeleskovic, V. (2007). An objective function for simulation based inference on exchangerate data.

Journal of Economic Interaction and Coordination , 2(2):125–145.Wolpert, D. H. (2002). The supervised learning no-free-lunch theorems. In

Soft Computing and Industry , pages25–42. Springer.Wong, K.-C. (2015). Evolutionary multimodal optimization: A short survey. arXiv preprint arXiv:1508.00457 .Zhu, X. (2005). Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison..Zhu, X. (2005). Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison.