How to Identify Investor's types in real financial markets by means of agent based simulation
11 How to Identify Investor’s typesin real financial markets bymeans of agent based simulation
Filippo NeriDepartment of Electrical and Computer Engineering,University of Naples, NaplesProMarket 11 srl, Milanemail:[email protected] 31, 2020
Abstract
The paper proposes a computational adaptation of theprinciples underlying principal component analysis with agentbased simulation in order to produce a novel modelingmethodology for financial time series and financialmarkets. Goal of the proposed methodology is to find areduced set of investor’s models (agents) which is able toap- proximate or explain a target financial time series. Ascomputational testbed for the study, we choose thelearning system L-FABS which combines simulated annealingwith agent based simulation for approx- imating financialtime series. We will also comment on how L-FABS’sarchitecture could exploit parallel computation to scalewhen dealing with massive agent simulations. Twoexperimental case studies show- ing the efficacy of theproposed methodology are reported.Keywords: Computing methodologies Artificial intelligence,Com- puting methodologies Learning paradigms, Appliedcomputing Eco- nomics Introduction
In this paper, we show how the principles of Principal Component Analysis[12] can be combined with Agent based Simulation and Evolutionary Com-putation resulting in a novel methodology for identifying investor’s types infinancial markets. The discussed methodology would discover a computa-tional function whose independent variables are models of investor’s behav-iors and whose dependent variable is the target market financial time series.The computational function is implemented by means of software agents. Byexploiting the PCA principles, we will show how to build subsets ofinvestor’s behaviors that are able to approximate the financial time series.Any subset of models could then be considered a simpler explanatory modelboth for the studied market.The learning system L-FABS [17, 18], which combines simulated anneal-ing with an agent based simulation, will be used as experimental testbed.An additional advantage of using agent based simulation as implemented inL-FABS is that, because the execution of each agent computational proce-dure is independent from the others, L-FABS can exploit parallelprocessing and scale relatively easily when large markets, i.e. made of alarge number of investors, have to be analysed.To frame the complexity of the modeling task at hand, let us considerthat because the relationship among investors’ models used by the machinelearn- ing system L-FABS is not linear, the plain application of the PCAmethod- ology as used in Statistics cannot be applied. Indeed, given theiterative and intricate interactions, yet computationally definable, amongseveral investors that make up a financial market, it would not make anysense trying to de- termine simple linear correlations among them. We willreturn to this point below with a specific case.A key feature that we would like to point out and also one of thechallenges of our work consists in the use of real financial time series. Ourresearch will not use artificial time series generated by artificial financialmarkets because we believe that works studying artificially generatedfinancial time series may either overfit the hypothetical time series’generative model or the artificial time series may simply miss to displayimportant properties that can be found in real ones. That is the reason why inour work we only use publicly available and real financial time series.When modeling data in financial time series (values of a dependent vari-able in Statistics’ terminology) using linear regression, it is quite common toinvoluntarily over select many regression variables (independentvariables/components) which either have poor statistical relations with thedependent variable orthat are strongly correlated with other independent variables occurring inthe regression equation. Thus resulting in a linear regression equation (ormodel) with a redundancy of explanatory/independent variables.Consider for instance the following multiple linear regression explainingthe SP500 data by using a number of independent variables (such as theGross Domestic Product, the Consumer Price index / inflation index, etc.): SP today ) = constantK + α ∗ SP yesterday ) + β ∗ GDP ( currentmonth )+ γ ∗ CPI ( currentmonth ) + δ ∗ yieldsof Y earTreasuryBonds ( today )+ θ ∗ rainyDaysInNewY ork ( pastY ear ) ... In the above regression equation, we may have that some of the variables(case 1) may have no linear regression relation with the dependent variable,like in the case hopefully of the number of rainy days in New York in thepast year, or (case 2) may have some linear relations with the dependentvariable but also be correlated with one or more other independent variablesin the linear regression equation, like in the case of the CPI and the yield of1 year Treasury Bonds. For case 1, the multiple linear regression expressionwill contain a coefficient near to zero for the variable thus signalling that theindependent variable does not contribute to the explanation of the depen-dent variable (null/low explanatory power). For case 2, the use of principalcomponent analysis will help identifying which independent variables makethe higher contribution to the prediction of the dependent variable (highexplanatory power). Thus PCA can guide the researcher in removing fromthe regression equation those variables/components/factors that have lowexplanatory power. The resulting regression equation will then contains onlysome of the original independent variables: those with the highest explana-tory power.In the above example, it is reasonable to estimate linear relationshipsamong the supposedly independent variables because the regression equa-tion is based on Econometrics’ theory. However if the hypothesized modelfor the time series would assume the existance of non linear interactions orif the hypothesized model would be a recursive combination of time indexedvariables or be procedurally defined, like in the case of agent basedmodeling, then it would be meaningless to test for linear correlation or forother linearrelationships among the variables. Thus all the array of statistical tests usedto test for linear correlation and based on the assumption of normal distribu-tion of the variable values actually became void of utility when dealing withsophisticated and interrelated components.The rest of the paper is organized as follows: in Section 2, we commenton how agent based modeling can be used to model financial time series; inSection 3 and 4, we describe our methodology and how the approximationerror between two time series can be measured; Section 5 and 6 report thecommented experimental analysis; Section 7 shows a comparison of L-FABS to other learning systems, and in Section 8, we draw ourconclusions. Agent based modeling as acomputation tool for evaluating models of investors
By considering the structure of financial markets, it can be observed thatany market as a whole is made up of the investment decisions of many indi-viduals, the investors. If we try and match this structural perspective aboutfinancial markets with the observations made by researchers in the agentbased modeling community, we open up the possibility of studying financialmarkets by using agent based computational simulation. In fact the state- of-the-art literature shows that agent based modeling is a flexible modelingmethodology for simulating several types of domains [3, 4, 27, 7, 10] includ-ing consumer markets [26], economies [8] or societies [9] and financial timeseries [2, 11, 18]. Moreover examples of how evolutionary computation [5,6] and agent based modeling can be used to deal with economic tasks can befound in [1, 25]. The cited papers have been selected with the only intent toprovide examples of the listed approaches and without any claim to be anexhaustive list of previous work on the topic.A classic approach to model a financial market by using agent basedsimulation would be to define the behavior of a group of investors as a set ofdecision making algorithms that could be implemented into a computationalprocedure (an agent). Then a simulator could be run to make several sodefined agents interact the ones with the others in order to reproduce thatparticular behavior of the financial market that has to be studied. As alreadysaid, in this paper, we will use a machine learning systems: the LearningFinancial Agent Based Simulator L-FABS [17] as the computational contextwhere sets of investors’ models can be combined to approximate a givenfinancial time series. Because of L-FABS’s modular architecture where allthe agents/investors are instantiated and run independently one from theothers, L-FABS would allow for an easy parallel implementation whereseveral processors would run many agents’ decision making procedures inparallel. Therefore L-FABS could both exploit parallel processing and scale relatively easily when massive markets , made of a large numberof investors, may need to be investigated. We refer the reader to [17] for adetailed description of the L-FABS architecture, which may allow thereader to obtain a better understanding of how investor’s behaviors can bemodeled by in an agent based system to explain/predict financial time series.Also different research aspects of L-FABS have been studied in [23, 22, 21,19, 18, 16].For sake of completeness, we mention the fact that alternative approachesto model investors’ or customers’ behavior in a variety markets or tradingsituations also exists as discussed for instance in [14, 20]. Applying the PCA principles to find a re- duced set of agent based models of investors
The approach that we describe here can be used to discover simpler ex-planatory/predictive models of a target financial time series in terms of acomputational combination of a set of investors’ behaviors. As computa-tional tool to define and manage different models of investors, we willuse the system L-FABS. L-FBAS is essentially an agent based simulatorcom- bined with a machine learning algorithm. The machine learningalgorithm is used to discover good simulation’s parameters so that thesimulated time se- ries can closely approximate the target one. If thiswere the case, the learned computation model, which would includeseveral agent based models, could then be thought as a simplifiedcomputational representation of the financial market that has generatedthe target financial time series. By applying the PCA principles to theinvestors/agent models learned by L-FABS then a reduced set ofinvestors’ models may be found, with respect to the input one, while stillmantaining an high explanatory power for the target time series. Here ishow the PCA principles are adapted and employed in our methodology:a) start with an hypothetical and possibly redundant set Orig of investor’s behavior models and run Orig in L-FABS. The output agent based model produced by Orig and its approximation error will act as the benchmark model and error.b) measure the explanatory power of each of the models in Origc) add to an initially empty set Reduced the individual model with the highest explanatory power in Orig, that is with the lowest approximation error. Remove the selected model from Orig.d) repeat step c) until the approximation error of the set Reduced, when run in L-FABS, is better than or close enough to the approximation error of model Orig as built in step a).The informed reader would have recognized in the above algorithm a classichill climbing procedure as described in any artificial intelligence textbook.Of course we are aware of the limitations of getting stuck on local maximawhen using a simple hill climbing method but the selection of the bestoptimization method is not the focus of the research discussed here. Here weare indeed making the point and empirically showing that PCA principlescan be suc- cessfully combined with agent based simulation to discover setsof investors’ models. We leave to a future work the investigation of if andhow using a better optimization function could improve the composition ofthe discov- ered reduced set of investors’ models. This future investigationwould also open the door to exploring connections with meta-learningstudies that have appeared in the machine learning community, just as anexample reference [24]. Measuring the explanatory power of mod- els
In order to measure the explanatory power of a set of investors’ behaviors,first we will codify them into a set of agents that can be run into L-FABS,then we will run L-FABS with the objective to approximate a target financialtime series and, finally, we will measure the Mean Absolute PercentageError (MAPE) of the predicted time series with respect to the target one[18].We selected the Mean Absolute Percentage Error or MAPE function tomeasure the approximation error between two time series because it is com-monly used in Statistics when two data samplings have to be compared. . .
MAPE ( X, Y ) = N . .Σ x i − y i i =1 x i Given two time series X and Y, the lower the values for MAPE, the closerthe two are. Thus the lower the MAPE value, the highest the explanatorypower of the agent based model (set of investor’s behaviors) run in L-FABS. Experimental analysis
The financial time series selected for our experiments consists of a period ofthe SP500 index from 3 Jan 2008 to 20 Aug 2010. As usual when workingwith learning systems, we will train L-FABS on a part of the dataset, thelearning set, and then we will use the remaining part of the dataset as testset to assess the performances of the learned model. We then divided theoriginal period in a learning set: SP500 data from 3 Jan 2008 to 31 Dec 2008and a test set: SP500 from 2 Jan 2009 to 20 Aug 2010.Also, the interested reader, may note that we will run L-FABS configuredin the partial knowledge (PK) operating modality. In the PK modality, onlythe starting point of the time series, t=0, is given to L-FABS in order toinitialize the simulation. In this configuration, the time series model in L-FABS will move from one predicted value for the time series to the nextwithout knowing/using the correct value of the time series at time t-1 inorder to estimate a value for time t. We selected the PK scenario becausethe predicted time series will not make use of any other information apartfrom the value of the target time series at time 0 and the informationcoded and expressed by the set of models of investors behaviors .In the experimental settings, we will explore two different sets of agentbased models of investor’s behavior denoted as Configuration A andConfigu- ration B. To keep things simple, we will use four types of investors(Financial Agents) to capture the variety of investment decisions and thevariety of size of financial transactions that occur in real financial markets.According to each investor’s type, many agents (Financial Agents) are thencreated in the simulation with similar but not identical behavior. The fourtypes of investors that we model can be thought as: individual investors (andthe likes), banks (and the likes), hedge funds (and the likes), and centralbanks A little thought objection to our choice to include
Central Banks among the actors (and the likes). They differ in term of the size of the assets they can investin financial markets and for their risk/reward appetite. In addition, theirnumerical presence is also different. As already said two different configura-tions of investor types (or two different set of agent based models) have beenstudies, we will identify them as Configuration A (Table 1) andConfiguration B (Table 2).Table 1:
Configuration A of Investor types
Investortype Total Assets per investor type(in millions) Number ofIndividual 0.1 150Funds 100 100Banks 1000 245Govt/Central Banks 10000 5Table 2:
Configuration B of Investor types
Investortype Total Assets per Investor type(in millions) Number ofIndividual 0.1 150Funds 100 100Banks 1000 245Govt/Central Banks 100000 5
Case Configuration A
Let start by observing the performances of L-FABS when all the investors’models are used: as it can be seen in fig. 1a, L-FABS is able to output apredicted time series that is very close to the real one. The correspondingMAPE errors for all configurations are reported in Table 3.If we disable all the models for the investors, the output of L-FABS be-come a constant value equal to the value of the target time series at time 0as expected and causing a very high MAPE error as reported in fig. 1b. influencing financial markets has to be easily and strongly rejectedconsidering how, in recent decades, Central Banks have acted to ’pump up’financial markets by adding large quantities of liquidity to the relatedfaltering real economies. Thus Central Banks, who have always acted in thebackground, have finally lost their image of neutral agents with respect tothe financial systems.
Let us consider what happens when only one type of investors can actin approximating the financial time series, figg. 1c, 1d, 1e, and 1f. As itcan be seen from the graphs, each type of investor has an explanatory powerranging from high to low when it came to model the target time series.If we also look at the MAPE values, we can observe that just by using theInstitutional Investors/Banks type of investors’ model we can achieveoptimal approximation of the target time series. Thus all the other investors’types are redundant in this case.Applying the adated principal component methodology, described inSec- tion 3, to model selection in this case is then trivial: in fact it willreduce the original set of investor’s types by selecting the model associatedwith Institutional Investors/Banks.
Case Configuration B
Let us then observe the results obtained when L-FABS is run in the Con-figuration B case. Again starting with all the investors’ models, in fig. 2a, L-FABS is able to output a predicted time series that is very close to the tar-get one. The corresponding MAPE errors for all configurations are reportedin Table 4. When only one type of investors is used, for example see fig. 2band 2c, each type of investors displays its own explanatory power rangingfrom high to low when it came to model the target time series. The behaviorof using only the models for Retail/Private Investors or Hedge Funds is thesame as observed in Case Configuration A and so not reported given theirlimited expanatory power. Instead we report in fig. 2d what will happen ifboth of them are used to predict the target time series.Table 3: MAPE values for Case Configuration A. The reported MAPE valuesare averaged on 10 runs. To keep the table readable, we report the standarddeviation only for the lowest and closest values of MAPE.case MAPEa 3.21 ± ± ± ± ± ± ± ± ± ± ± (a)(b) (c)(d) (e)(f) Figure 1: Comparison of the actual and predicted time series obtained by L-FABS under the experimental settings Configuration A as measured on thetesting set. (a)(b) (c)(d)(e) Figure 2: Comparison of the actual and predicted time series obtained by L-FABS under the experimental settings Configuration B.3The combined approximation error is better than their separate ones butstill very far from the benchmark MAPE value of the original set of models.Also in Configuration B, and this is a significant difference with respect toConfiguration A, none of the models for the various investors’ types whentaken individually is able to approximate very well the target time series. Ifwe apply the adapted principal component methodology to this case, then wewill start by selecting the model for Govt. type of investors plus the modelfor Institutional Investors/Banks as first candidates for a reduced set ofmodels of types of investors’ behaviors. We will thereafter run L-FABS withthe two models active, and we will obtain the graph in fig. 2e andcorresponding MAPE value shown in Table 4. As it can be seen theapproximation result is very good and the adapted principal componentmethodology would stop here providing as output the reduced set ofmodels: { Govt. investors, Institutional Investors/Banks } . Commentary on the experiments
Note that in the reported set of experiments, we have only explored how thechange in the balance of available assets among the types of investors, seethe table below, can alter the composition of the reduced set of investors’models.
Investortype Total Assetsin Configuration A(in millions) Total Assetsin Configuration B(in millions)Individual 15 15Funds 10000 10000Banks 245000 245000Govt/CentralBanks 50000 500000
The Total Assets column show that in Configuration A case, a type of in-vestors has the majority of available assets to invest, whereas inConfiguration B case the balance of available assets among investors havebeen changed and now there are two types of investors with similarinvestment capacity. However the adapted principal componentmethodology described here is not limited to the evaluation of onlyquantitative changes in a model’ parameters. In fact we could use the samemethodology of model simplifica- tion to explore algorithmic differences inthe decision making process of the investors. Thus we could explore howdifferent ways to participate in the4market could result in more complex or simpler models of the time series.This point will be object of future research. Experimental comparison of L-FABS to other systems
Even though a comparison of L-FABS to other learning systems is beyondthe scope of the paper, we also briefly report about its performances withrespect to those obtained by other learning algorithms for which enoughimplementation details have been reported in the literature. Results aretaken from [18] and we refer to the cited paper for more information on theempirical evaluation of L-FABS on several other time series. In Table 5, wecompare L-FABS, a Particle Swarm Optimization algorithm (PSO) [13], anda Multi-Layer Perceptron (MLP) [28] when operating on time series fromthe SP500 and DJIA respectively. These results have been reported with theaim to locate L-FABS with respect to other learning systems representativeof the state of the art and without intention lay claim on the superiority ofone system over the others.Table 5: Experimental results, averaged over 10 runs, for Dataset DJIA andDataset SP500. Day to PSO MLP L-FABSDataset predict MAPE % MAPE % MAPE % SP500 1 0.66 1.00 0.714 ± ± ± ± Conclusion
We proposed a computational adaptation of the principles underlying Prin-cipal Component Analysis and we implemented them in the agent basedsimulator L-FABS. We have also pointed out how easily and agent basedsimulation could allow for a parallel implementation and thus scale whenmarkets made by a large number of investors are to be studied. The method-ology we propose can be applied to the task of finding a reduced set of models of investor’s behaviors used to approximate a target financialtime series and its generative market. The reported methodology can thus beused to to evaluate the explanatory power of a set of investor’s models withrespect to a given market. Thus allowing also to artificially reproduce thebehavior of the target market, in terms of its generated financial time series,by using L-FABS. We believe that the use of simple behavioral models, suchas those found by L-FABS, can allow for a better understanding of theunderlying and usually hidden mechanism that result in a macro behaviorlike those captured by a market index. Two case studies have been employedto show the efficacy of the proposed methodology in two instances of realfinancial time series. As a future research we plan to explore how to use theagent based modeling to find simplest explanatory models in new domainssuch as when tackling control problems [15].
References [1] B. Alexandrova-Kabadjova, A. Krause, and E. Tsang. An agent-basedmodel of interactions in the payment card market. In
Intelligent DataEngineering and Automated Learning-IDEAL 2007 , pages 1063–1072.Springer, 2007.[2] W. B. Arthur, J. H. Holland, B. LeBaron, R. Palmer, and P. Taylorm.Asset pricing under endogenous expectation in an artificial stockmarket. In
The Economy as an Evolving Complex System II , pages15–44. Santa Fe Institute Studies in the Sciences of Complexity LectureNotes, 1997.[3] R. Axelrod. Agent-based modeling as a bridge between disciplines. InK. L. Judd and L. Tesfatsion, editors,
Handbook of Computational Eco- nomics, Vol. 2: Agent-Based Computational Economics, Handbooks in Economics Series , pages 1562–1583. North-Holland, Amsterdam, The Nederlands, 2006.[4] C. Bruun. The economy as an agent-based whole simulating schumpete-rian dynamics.
Industry and Innovation , 10, 2003.[5] M. Camilleri and F. Neri. Parameter optimization in decision tree learn-ing by using simple genetic algorithms.
WSEAS Transactions onCom- puters , 13:582–591, 2014.[6] M. Camilleri, F. Neri, and M. Papoutsidakis. An algorithmic approachto parameter selection in machine learning using meta-optimizationtechniques.
WSEAS Transactions on Systems , 13(1):203–212, 2014.[7] S. Chen. Agent-based artificial markets in the ai-econ research center: Aretrospect from 1995 to the present.
Systems, Control andInformation , 46(9):23–30, 2002.[8] M. A. H. Dempster, T. W. Payne, Y. Romahi, and G. W. P. Thompson.Computational learning techniques for intraday fx trading using populartechnical indicators.
IEEE Transactions on Neural Networks , 12(4):744–754, 2001.[9] J. M. Epstein and R. Axtell.
Growing artificial societies: socialscience from the bottom up . The Brookings Institution, Washington,DC, USA, 1996.[10] I. Garc´ıa-Margarin˜o, I. Plaza, and F. Neri. Abs-mindburnout: Anagent- based simulator of the effects of mindfulness-based interventionson job burnout.
Journal of Computational Science , 36, 2019.[11] A. O. I. Hoffmann, S. A. Delre, J. H. von Eije, and W. Jager. Arti-ficial multi-agent stock markets: Simple strategies, complex outcomes.In
Advances in Artificial Economics, volume 584 of LectureNotes in Economics and Mathematical Systems , pages 167–176.Springer-Verlag, 2006.[12] I. Jolliffe.
Principal Component Analysis . Springer Verlag, 1986.[13] J. Kennedy and R. Eberhard. Particle swarm optimization. In
Int. Conf.on Neural Networks , pages 1942–1948. IEEE press, 1995.7[14] B. Lariviere and D. Van den Poel. Banking behaviour after the life-cycle event of moving in together: An exploratory study of the roleof marketing investments.
European Journal of OperationalResearch , 183(1):345–369, 2007.[15] A. Marino and F. Neri. Pid tuning with neural networks.
IntelligentInformation and Database Systems , LNCS 11431:476–487, 2019.[16] F. Neri. Learning and predicting financial time series by combiningnatural computation and agent simulation.
Lecture Notes in ComputerScience , 6625:111–119, 2011.[17] F. Neri. Agent-based modeling under partial and full knowledgelearning settings to simulate financial markets.
AI Communications ,25(4):295– 304, 2012.[18] F. Neri. A comparative study of a financial agent based simulator acrosslearning scenarios.
Lecture Notes in Computer Science (includingsub- series Lecture Notes in Artificial Intelligence and Lecture Notesin Bioin- formatics) , 7103:86–97, 2012.[19] F. Neri. Learning predictive models for financial time series by usingagent based simulations.
Lecture Notes in Computer Science(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) , 7190:202–221, 2012.[20] F. Neri. Open research issues on computational techniques for financialapplications.
WSEAS Transactions on Systems , 13(1):390–391, 2014.[21] F. Neri. Case study on modeling the silver and nasdaq financial timeseries with simulated annealing.
Advances in Intelligent Systemsand Computing , 746:755–763, 2018.[22] F. Neri. Combining machine learning and agent based modeling for goldprice prediction.
Artificial life and Evolutionary Computation ,Commu- nications in Computer and Information Science 900:91–100,2019.[23] F. Neri and I. Margarino. Simulating and modeling the dax index andthe uso etf financial time series by using a simple agent-based learningarchitecture.
Expert Systems , 37(4), 2020.8[24] M. Reif, F. Shafait, and A. Dengel. Meta-learning for evolutionaryparameter optimization of classifiers.
Machine Learning , 87(3):357—- 380, 2012.[25] S. G. Rodr´ıguez, D. Quintana, I. M. Galv´an, and P. Isasi. Portfolioopti- mization using spea2 with resampling. In in Intelligent dataengineering and automated learning - IDEAL 2011: 12thInternational Conference, LNCS 6936 , pages 127–134. Springer, 2011.[26] S. Schulenburg, P. Ross, and S. Bridge. An adaptive agent based eco-nomic model. In
Learning Classifier Systems: From Foundations toAp- plications, volume 1813 of Lecture Notes in ArtificialIntelligence , pages 265–284. Springer-Verlag, 2000.[27] L. Tesfatsion. Agent-based computational economics: Growingeconomies from the bottom up.
Artif. Life , 8(1):55–82, 2002.[28] J. Zirilli.