Risk-Seeking versus Risk-Avoiding Investments in Noisy Periodic Environments
J. Emeterio Navarro Barrientos, Frank E. Walter, Frank Schweitzer
aa r X i v : . [ q -f i n . P M ] S e p ⋆ , Frank E. Walter † , Frank S hweitzer † ∗ ⋆ Institute for Informati s, Humboldt University, 10099 Berlin, Germany † Chair of Systems Design, ETH Zuri h, Kreuzplatz 5, 8032 Zuri h, SwitzerlandAbstra tWe study the performan e of various agent strategies in an arti(cid:28) ial investment s enario.Agents are equipped with a budget, x ( t ) , and at ea h time step invest a parti ular fra tion, q ( t ) , of their budget. The return on investment (RoI), r ( t ) , is hara terized by a periodi fun tion with di(cid:27)erent types and levels of noise. Risk-avoiding agents hoose their fra tion q ( t ) proportional to the expe ted positive RoI, while risk-seeking agents always hoose amaximum value q max if they predi t the RoI to be positive ((cid:16)everything on red(cid:17)). In additionto these di(cid:27)erent strategies, agents have di(cid:27)erent apabilities to predi t the future r ( t ) ,dependent on their internal omplexity. Here, we ompare 'zero-intelligent' agents usingte hni al analysis (su h as moving least squares) with agents using reinfor ement learning orgeneti algorithms to predi t r ( t ) . The performan e of agents is measured by their averagebudget growth after a ertain number of time steps. We present results of extensive omputersimulations, whi h show that, for our given arti(cid:28) ial environment, (i) the risk-seeking strategyoutperforms the risk-avoiding one, and (ii) the geneti algorithm was able to (cid:28)nd this optimalstrategy itself, and thus outperforms other predi tion approa hes onsidered.keywords: risk, investment strategies, geneti algorithmPACS Nos.: 05.40.-a, 89.65.Gh1 Introdu tionIn the ourse of this paper, we investigate a model in whi h agents with di(cid:27)erent strategiesparti ipate in an simple investment s enario with noisy returns [28℄. We use this setup to approa hthe question how the (internal) omplexity of agents enhan es their performan e in a hard-to-predi t environment. In the (cid:28)eld of arti(cid:28) ial intelligen e and omplex systems, one an distinguishbetween two types of agents: (cid:28)rst, agents whi h only rea t on external hanges (also known as(cid:16)zero-intelligen e agents(cid:17) [9, 11℄) and se ond, agents whi h have a omplex internal ar hite ture(e.g. (cid:16)belief-desire-intention agents(cid:17)). Despite these lear di(cid:27)eren es in agent ar hite ture, it isdi(cid:30) ult to determine what in(cid:29)uen e these properties have on the overall performan e of theagents. In order to study this question in a ontrolled environment, we have hosen an investmentmodel with noisy returns, to ompare the performan e of simple and omplex agents. To whi h ∗ x ( t ) and is able to invest a ertain fra tion of its budget on amarket. The gain or loss it makes depends on the market return, or return on investment (RoI).In other words, at ea h time step t , the agent adjust its risk propensity, the fra tion of its budgetthat it are willing to invest on the market, denoted by q ( t ) , thereby ontrolling gains and lossesresulting from the RoI, denoted by r ( t ) . We assume that only the past and urrent values of r ( t ) are known to the agent; it does not know the dynami s governing future values of r ( t ) . Agentsobserve the market through the value of r ( t ) and, based on analysing a set of past r ( t ) values,they predi t future r ( t ) values and determine their behaviour on that market through spe ifying q ( t ) .In this simple model, we onsider agents that invest independently in the market, i.e. there is nointera tion or ommuni ation with other agents. Also, there is no feedba k of the investmentsdone by agents on the market return. In other words, the environment of the agents is notin(cid:29)uen ed by their investments. This is a ru ial assumption whi h makes our model di(cid:27)erentfrom other attempts to model real market dynami s, e.g. as for (cid:28)nan ial markets [18, 20, 32℄.Consequently, we do not onstru t and investigate a market model; rather, our fo us lies oninvestigating what are good and what are bad strategies (cid:21) in a rather arti(cid:28) ial and ontrolledmarket environment (see also Se tion 4). Regarding the relevan e of our results for real (cid:28)nan ialmarkets, see also our omments in the on luding Se tion 7.The essen e of the model is aptured in Figure 1: (a) plots the returns in per ent of a real sto kitem over the range of about two years. This illustrates the range and shape of values of returnsof a real-world sto k item. (b) illustrates the dynami s of the model: r ( t ) , the market return,in(cid:29)uen es the strategies agents have to adjust q ( t ) , the risk propensity. In this parti ular model,we do not onsider the in(cid:29)uen e that adjusted risk propensity has on the market return, i.e. thein(cid:29)uen e of q ( t ) on r ( t ) .The hallenge for the agents thus is twofold: (cid:28)rst, agents have to predi t r ( t ) as a urately aspossible, and se ond, they have to adjust q ( t ) Time -15-10-5051015 R e t u r n s i n % q (t)r(t) influence market strategy (a) (b)Figure 1: (a) Real returns of a sto k item (in this example, the sto k was AAA, the AltanaGroup) in per ent over time (01/05/2002 (cid:21) 01/03/2004). r ( t ) was omputed for a urrent pri e p ( t ) as follows: r ( t ) = log p ( t ) − log p ( t − . (b) The dynami s of the market as indi ated bythe market return r ( t ) and the strategy as de(cid:28)ned by the invested budget q ( t ) . Note that ourmodel does not onsider the feedba k of investments on the market dynami s (dashed line).at a parti ular time. In the theory of risk, several authors assume that individuals hoose amongassets based on the mean return and on the varian e of the return [17, 21, 22℄. Others havefo used their attention to the important task of how to measure risk, whi h lead to di(cid:27)erent typeof measures [1, 31℄. In general, these measures are based on the risk aversion of a de ision makerhaving the hoi e to re eive a random or a non-random amount.A typi al s enario to study investment strategies is to let an agent hoose between investingin a risk-free asset or in a risky asset[31, 38℄. It was shown[38℄ that sometimes it may be morereasonable to invest in a risk-free asset as a means to transfer wealth over time. However, assuminga model with no onsumption [13℄, those agents investing in risk-free assets will be driven out ofthe market in the long run by agents investing in a risky asset. When dealing with risky assets, itis typi ally assumed that the agent onsiders the expe ted return and its volatility as indi atorsfor the investment strategy [21℄.For the sake of simpli ity, in this paper we assume that the agent's behavior is risk-neutral in thesense that the agent estimates only the expe ted return, r ( t ) , and does not onsider risk measuressu h as the volatility. Based on the estimation of r ( t ) , the de ision to in rease or de rease theinvestment fra tion of the risky asset should be taken. Hen e, the two terms 'risk-seeking' and'risk-avoiding' refer only to the hoi e of the investment fra tion, q ( t ) x ( t ) , whi h is a measure of their (cid:16)wealth(cid:17) or (cid:16)liquidity(cid:17), and2. the strategy that they employ in order to ontrol the fra tion of the budget q ( t ) to investat ea h time step.In other words, at ea h time step t , an agent invests a portion q ( t ) x ( t ) of its budget. Theinvestment yields a gain or a loss, determined by the value of r ( t ) . Being a fra tion of theinvestment budget, q ( t ) , is, of ourse, restri ted to the interval between [0 , . However, in ourmodel we further restri t it to be from the interval [ q min , q max ] where we hoose q min = 0 . and q max = 1 . . This implies that, at ea h time step t , there is a minimal investment of . of thebudget, and a maximal investment of the entire budget.We an then de(cid:28)ne the dynami s for the budget of agents x ( t ) as x ( t + 1) = x ( t ) h r ( t ) q ( t ) i (1)where r ( t ) is the market return at the previous time step t . The market return fun tion r ( t ) isrestri ted to the range of [ − , . A value of r ( t ) = − orresponds to a total loss of the investedfra tion of the budget q ( t ) and r ( t ) = 1 orresponds to a gain equivalent to the invested fra tion.Thus, an agent an, at any time step t , loose its omplete budget (for q ( t ) = 1 and r ( t ) = − ),but also double its budget (for q ( t ) = 1 and r ( t ) = 1 ). In prin iple, there is no upper boundary for r ( t ) , r ( t ) = 1 was hosen to obtain a mean of zero for r ( t ) whi h allows us to better understandthe basi dynami s of this model. We emphasize again that, in our model, the aim is not a mostrealisti simulation of the market return, but a omparison of di(cid:27)erent agent strategies. Thedi(cid:30) ulty for the agents lies in properly predi ting the next value of r ( t ) and then adjusting q ( t ) r ( t ) to the range of [ − , is not a realisti assumption for a realmarket. There, some r ( t ) will also fall into the range ) − , (cid:21) these are rare, extreme eventsthat o ur, e.g. in ases of sto k market bubbles and rashes. However, normally, returns willbe in the range of [ − , , e.g. as the ones of the sto k depi ted in Figure 1. As we would liketo fo us on the questions of hoosing appropriate agent strategies in environments with noisy,periodi returns, it is reasonable to ex lude su h rare, extreme events and assume a restri tionof r ( t ) to the range of [ − , .In the next two se tions, we will outline the agent strategies (Se tion 3) and return on investment(Se tion 4) that we onsider.3 Agent StrategiesAs explained before, we are interested in how the market dynami s, r ( t ) , a(cid:27)e t the di(cid:27)erentinvestment strategies of the agent, q ( t ) . It is very important to realise that the market dynami s(cid:21) while a(cid:27)e ting ea h agent's q ( t ) (cid:21) are not known to the agents. I.e., at time t , ea h agentonly re eives the a tual value of the Return on Investment (RoI) and adjusts its risk propensitya ordingly, without having a omplete knowledge about the dynami s of r ( t ) . The agent may, of ourse, have some bounded memory about past RoI that ould be used for predi tions of futureRoI. However, the agent has to gather information about the ups and downs of the RoI and todraw its own on lusions from this information by itself. Therefore, the agent will perform betterin the environment if it is able to guess the market dynami s.In the following, we present a sele tion of strategies that an be applied by agents. We distinguisha referen e strategy, whi h serves as a frame of referen e to ompare and evaluate the performan eof other strategies, as well as te hni al analysis-based and ma hine learning-based strategies.Usually (there are ex eptions, as will be dis ussed in the following), a strategy onsists of two omponents: a predi tion omponent and an a tion omponent. For su h strategies, the predi tion omponent predi ts a variable in the system (cid:21) in this ase, the next value of r ( t ) (cid:21) and the a tion omponent then de(cid:28)nes an a tion upon the predi tion of the variable (cid:21) in this ase, it de(cid:28)nesthe appropriate value for q ( t ) .3.1 Referen e StrategyIn order to ompare di(cid:27)erent strategies, we need a point of referen e against whi h the perfor-man e of ea h strategy an be measured. The referen e strategy that we are using is the mostsimple strategy possible, i.e. that an agent always assumes a onstant risk-propensity value q atevery time step t : q ( t ) = q = const . q ( t ) is always (cid:28)xed, this is not really a (cid:16)strategy(cid:17), but it plays a role in morephysi s-inspired investment models [28, 34, 37℄. We use this strategy to ompare it with more omplex strategies. Note that this referen e strategy requires no knowledge on the RoI.3.2 Te hni al AnalysisThe following simple strategies for risk adjustment are based on (cid:16)te hni al analysis(cid:17) [3℄. Te hni alanalysis tries to dedu e information about the dynami s of r ( t ) by looking at trends (averages,varian es, higher order moments) of the RoI values over a range of time. This assumes that anagent has a bounded memory of size M to re ord previous RoI; this information is then pro essedin di(cid:27)erent ways to predi t the next RoI. In the following, we onsider two strategies from the(cid:28)eld of te hni al analysis: the (cid:28)rst strategy is based on al ulating moving averages (MA) onprevious RoI, while the se ond strategy uses moving least squares (MLS) on previous RoI, r ( t ) ,over a (cid:28)xed period of time, M . Both of them an be regarded as (cid:16)zero-intelligen e(cid:17) strategies, asagents do not do any reasoning or learning.3.2.1 Moving AveragesThe moving averages te hnique omputes ˆ r MA ( t ) , an estimate of the next r ( t ) , as the average ofthe previous M values of r ( t ) : ˆ r MA ( t ) = 1 M t − X n = t − M r ( n ) (3)3.2.2 Moving Least SquaresThe moving least squares te hnique (cid:28)ts a fun tion to the data of the previous M values of r ( t ) to estimate the next r ( t ) . In our ase, we hoose this fun tion to be a linear trend-line, whi h isfound by minimising the distan e to the data points of r ( t ) . Based on the previous M values of r ( t ) , the squared estimation error ǫ r is de(cid:28)ned as: ǫ r ( t ) = 1 M t X n = t − M +1 [ r ( n ) − ˆ r MLS ( n )] (4)where ˆ r MLS ( t ) is the predi ted RoI based on the linear regression trend-line, de(cid:28)ned as: ˆ r MLS ( t ′ ) = m ( t ) t ′ + b ( t ) for t − M ≤ t ′ ≤ t m and b are obtained by minimising the squared error estimation,eq. (4). From ∂ǫ r /∂m = 0 and ∂ǫ r /∂b = 0 , we get, as it is well known: m ( t ) = M t P n = t − M +1 n r ( n ) − t P n = t − M +1 n ! t P n = t − M +1 r ( n ) ! M t P n = t − M +1 n − t P n = t − M +1 n ! (6) b ( t ) = 1 M " t X n = t − M +1 r ( n ) − m ( t ) t X n = t − M +1 n (7)These two strategies use di(cid:27)erent approa hes to estimate future r ( t ) ; it remains to de(cid:28)ne the orresponding adjustment of the risk propensity: here, we onsider two possibilities. First, a risk-seeking (RS) and, se ond, a risk-avoiding (RA) approa h. In the risk-seeking approa h, the valueof q RS ( t ) is de(cid:28)ned as follows for ˆ r ( t ) ∈ { ˆ r MA ( t ) , ˆ r MLS ( t ) } , i.e. for ˆ r ( t ) being an MA or MLSestimate of r ( t ) : q RS ( t ) = ( q min ˆ r ( t ) ≤ q max ˆ r ( t ) > (8)where q min , q max ∈ [0 , and q min < q max . In other words, agents invest q min if the next valueof r ( t ) is predi ted to be negative or zero, and agents invest q max if the next value of r ( t ) ispredi ted to be positive.In the risk-avoiding approa h, the value of q RA ( t ) is de(cid:28)ned as follows for ˆ r ( t ) ∈{ ˆ r MA ( t ) , ˆ r MLS ( t ) } , i.e. for ˆ r ( t ) being an MA or MLS estimate of r ( t ) : q RA ( t ) = q min ˆ r ( t ) ≤ q min ˆ r ( t ) q min < ˆ r ( t ) < q max q max ˆ r ( t ) ≥ q max (9)where q min , q max ∈ [0 , and q min < q max . Here, the respe tive q ( t ) is set to the predi ted r ( t ) (with appropriate adjustments to ensure that q ( t ) = q min whenever ˆ r ( t ) ≤ q min and q ( t ) = q max whenever ˆ r ( t ) ≥ q max ← OldEst + StepSize [ Target − OldEst ] (10) OldEst and
N ewEst are the old and new estimates for the quantity of interest. So,
T arget − OldEst gives the error of the urrent estimation, whi h is weighted by the fa tor
StepSize . Thisis, a new estimate is omputed by taking the old estimate and adjusting it by the error of the urrent estimate.
N ewEst has to be updated at ea h time step. Applying eq. (10) to our model,we (cid:28)nd the following instan e of the in remental update rule: ˆ r IUR ( t + 1) = ˆ r IUR ( t ) + γ (cid:2) r ( t ) − ˆ r IUR ( t ) (cid:3) (11)Consequently, OldEst and
N ewEst are the old and new estimates for the return, ˆ r IUR ( t ) and ˆ r IUR ( t +1) ; furthermore, r ( t ) − ˆ r IUR ( t ) is the error of the urrent estimate. Be ause of its re ursivede(cid:28)nition, the in remental update rule onsiders an in(cid:28)nite history of returns (cid:21) of ourse, theweight of a value depends on its age and its impa t fades over time. We hose ˆ r IUR (0) = 0 asthe initial value of ˆ r IUR ( t ) . Di(cid:27)erent values of γ lead to di(cid:27)erent performan e of the algorithm;in other words, for small γ , the adjustment of the estimate will be small, and for large γ , theadjustment of the estimate will be large. It is important to hoose an optimal value for γ in orderto be able to ompare the algorithm with other algorithms; in the next se tion, we will dis ussthis in more detail. Finally, it remains to spe ify what a tion to do given a parti ular estimatefor the next return; we again de(cid:28)ne a risk-seeking and a risk-avoiding approa h, similar to eq. 8and 9 for the MA and MLS strategies:In the risk-seeking approa h, q RS ( t ) is de(cid:28)ned as follows for ˆ r IUR ( t ) : q RS ( t ) = ( q min ˆ r IUR ( t ) ≤ q max ˆ r IUR ( t ) > q RA ( t ) is de(cid:28)ned as follows for ˆ r IUR ( t ) : q RA ( t ) = q min ˆ r IUR ( t ) ≤ q min ˆ r IUR ( t ) q min < ˆ r IUR ( t ) < q max q max ˆ r IUR ( t ) ≥ q max (13)where, for both de(cid:28)nitions, q min , q max ∈ [0 , and q min < q max .It is important to note that reinfor ement learning and the in remental update rule are notidenti al; rather, reinfor ement learning des ribes a group of ma hine learning approa hes andthe in remental update rule is one instan e of these approa hes.We note eventually that a di(cid:27)erent representation of γ in eq.(11) ould be used to study someaspe ts of the prospe t theory of de ision-making. This theory takes into a ount that de isionsare made based on hanges from a ertain referen e point, i.e. humans for example de ide dif-ferently for pro(cid:28)ts and for losses, as they ognize losses twi e as large as pro(cid:28)ts, [6, 14, 15, 36℄.This, however, is not the target of the present investigations.3.3.2 Geneti AlgorithmGeneti algorithms (GA) are a te hnique from the (cid:28)eld of arti(cid:28) ial intelligen e whi h (cid:28)ndsapproximate solutions to problems. Geneti algorithms belong to the lass of evolutionary al-gorithms. Geneti algorithms are based on modelling solutions to a problem as a populationof hromosomes; and the hromosomes are andidate solutions to the problem whi h graduallyevolve to better solutions to the problem. The following is a des ription of the instan e of ageneti algorithm whi h we apply to our s enario:Let j = 1 , ..., C be a hromosome with population size C . Ea h hromosome j is an array ofgenes, g jk ( k = 0 , ..., G − . The values of the genes are real numbers [24℄. In our model, ea h hromosome j represents a set of possible strategies of an agent, so the g jk refers to possiblevalues for the risk propensity q .In the beginning, ea h g jk is assigned a random value: g jk ∈ ( q min , q max ) . Ea h hromosome j isthen evaluated by a (cid:28)tness fun tion, f j ( τ ) , whi h is de(cid:28)ned as follows: f j ( τ ) = G − X k =0 r ( t ) g jk ; k ≡ t mod G (14)In our model, the (cid:28)tness is determined by the gain/loss that ea h strategy g jk yields dependingon the RoI, r ( t ) . Sin e the (cid:28)tness of a hromosome must to be maximised, negative r ( t ) lead tovery small values of g jk , i.e. a low risk propensity, whereas positive r ( t ) lead to larger values of g jk . This lets us onsider the produ t of r ( t ) g jk g jk are always multiplied by di(cid:27)erent r ( t ) values (cid:21) i.e., depending on t . For the hromosome, we de(cid:28)ne a further time s ale τ in terms of generations. A generation is ompletedafter ea h g jk is multiplied by a RoI from onse utive time steps, t . This means that the index k refers to a parti ular time t in the following manner: k ≡ t mod G , whi h means k = ˆ t ∈ { , G } ,with t = ˆ t + τ G , τ = 0 , , , ... .After time τ , the population of hromosomes is repla ed by a new population of better (cid:28)tting hromosomes with the same population size C . This new population is determined in the followingmanner: after al ulating the (cid:28)tness of ea h hromosome a ording to eq. (14), we (cid:28)nd the best hromosomes from the old population by applying elitist and tournament sele tion of size two: • Elitist sele tion onsiders the best s per entage of the population whi h is found by rankingthe hromosomes a ording to their (cid:28)tness. The best hromosomes are dire tly transferredto the new population. • Tournament sele tion is done by randomly hoosing two pairs of two hromosomes fromthe old population and then sele ting from ea h pair the one with the higher (cid:28)tness.These two hromosomes are not simply transferred to the new population, but undergo atransformation based on the geneti operators rossover and mutation, as follows: A single-point rossover operator (cid:28)nds the ross point, or ut point, in the two hromosomes beyondwhi h the geneti material from two parents is ex hanged, to form two new hromosomes.This ut point is the integer part of a random number drawn from a uniform distribution p c ∈ U (1 , G ) .After the rossover, a mutation operator is applied to ea h gene of the newly formed hromosomes.With a given mutation probability p m ∈ U (0 , , a gene is to be mutated by repla ing its valueby a random number from a uniform distribution U ( q min , q max ) . After the y le of sele tion, rossover and mutation is ompleted, we eventually arrive at a new population of hromosomesthat onsists of a per entage of the best (cid:28)tted hromosomes from the old population plus anumber of new hromosomes that ensure further possibilities for the evolution of the set ofstrategies.Given the optimised population of hromosomes representing a set of possible strategies, theagent still needs to update its a tual risk propensity, q i ( t ) . This works as follows: at time t = τ ,the agent takes the set of strategies g jk from the hromosome j with the highest (cid:28)tness in theprevious generation. Given G = T , this means that the agent for ea h time step of the up oming y li hange hooses the appropriate risk propensity by omputing the following: q GA ( t ) = g jk with j = arg (max j =1 ,...,C f j ) ; t ≡ k mod G r ( t ) whi h is independent of q ( t ) . In some of the models previously studied in the literature [16℄, the in(cid:29)uen e of the marketis simply treated as random, i.e. r ( t ) is a random number drawn from a uniform distribution inthe interval [ − , .However, it is known that for returns with a uniform distribution entered around the originand agents whi h do not have any information on future market returns, this situation will leadthe agents to a omplete loss of their budget. This is a well-known property of multipli ativesto hasti pro esses [28℄. The only way to make a pro(cid:28)t on the return is by having a ertainknowledge on future returns of the market. This requires both that there exist orrelations inthe market return fun tion, and that the agents are able to resolve and use those orrelations tomake orre t predi tions. Consequently, we hoose to introdu e orrelations in our market return r ( t ) under the form of a seasonal or periodi signal.We study two di(cid:27)erent market return fun tions that depend on a noise level σ , ; for σ , = 0 ,they orrespond to a pure sine wave fun tion with frequen y w and for σ , = 1 , they are ompletely un orrelated: r P hase ( t ) = sin( w t + σ π ξ ) (16) r Amplitude ( t ) = (1 − σ ) sin( w t ) + σ ξ (17)where ξ is distributed uniformly in the interval [ − , , i.e. ξ ∈ U ( − , . There are two typesof noise that an o ur with su h a sine wave fun tion (cid:21) noise on the phase and noise on theamplitude. We onsider both ases: the (cid:28)rst fun tion an be seen as a periodi market returnsignal with phase noise (determined by σ ), the se ond as a periodi market return signal withamplitude noise (determined by σ ). In our simulations, we hose the arbitrary value of T = 100 for the period of the sine wave. Fig. 2 shows plots for these two kinds of return fun tions withdi(cid:27)erent noise levels. Note that periodi returns with a periodi ity hanging over time are investedre ently as well.[27℄The noise parameter σ , gives us a way of ontrolling the noise in the RoI, thereby allowing usto evaluate the various strategies for di(cid:27)erent s enarios, ranging from a ompletely lear signalwith no noise at all (for σ , = 0 ) to a noise-only signal (for σ , = 1 -1-0.500.51 r( t ) t -1-0.500.51 r( t ) -1-0.500.51 r( t ) t -1-0.500.51 r( t ) (a) (b)Figure 2: Plots of the return fun tions r ( t ) , eq. 16 for: (a) two di(cid:27)erent phase noise levels:(top) σ = 0 . and (bottom) σ = 0 . ; (b) two di(cid:27)erent amplitude noise levels: (top) σ = 0 . and (bottom) σ = 0 . .Agents have no knowledge about future market returns; of ourse, they do not know the fun tionsthat determine r ( t ) . Thus, the only way for agents to maximize their gain and minimize theirlosses is by making a orre t predi tion of the future market return and hoosing the appropriateinvestment a tion. Con eptually, we an separate an agent's strategy into two omponents: apredi tion omponent and an a tion omponent. The predi tion algorithm estimates the futurevalues of the market return and the a tion algorithm determines the best a tion based on thepredi ted results.We study the performan e of the di(cid:27)erent algorithms or strategies that are explained in se tion3. We de(cid:28)ne the performan e of an agent employing a parti ular strategy as being the averagegrowth of the budget x ( t ) of the agent olle ted after a ertain number of time steps. We hooseto take this average over t = T = 100 time steps, T -1 -0.5 0 0.5 1 r(t) p r ob a b ilit y σ =0.1σ =0.5σ =0.9 -1 -0.5 0 0.5 1 r(t) p r ob a b ilit y σ =0.1σ =0.5σ =0.9 (a) (b)Figure 3: Probability distribution of the RoI, r ( t ) , eq. 16, for: (a) phase noise σ = { . , . , . } and (b) amplitude noise σ = { . , . , . } .probability distributions of r ( t ) with phase noise, we see that there is a higher probability forvalues lose to − and , and a lower probability for values lose to . Note that this is the samedistribution that is found for a sine wave with no noise at all. For phase noise, the value of σ hasno e(cid:27)e t (cid:21) the distributions are virtually identi al. For the probability distributions of r ( t ) withamplitude noise, we observe a probability distribution whi h is a ombination of the probabilitydistribution of a sine wave without noise ( aused by the sine wave) and a uniform probabilitydistribution ( aused by the noise). For higher levels of noise, the onvolution of probabilitydistributions more losely resembles the uniform distribution, and for lower levels of noise, the onvolution of probability distributions more losely resembles the sine wave distribution. Foramplitude noise, the value of σ is ru ial and di(cid:27)erent values lead to di(cid:27)erent distributions. Notethat the distribution of the returns for phase noise is independent of the level of noise, whereasfor the RoI with amplitude noise there is a signi(cid:28) ant hange as the level of noise is in reased.Sin e the distribution of the RoI is independent of the noise level σ for phase noise, it is expe tedthat the average absolute RoI, Fig. 4, is onstant with respe t to σ in the RoI. This, as we haveexplained, is not the ase for the noise level σ for amplitude noise, where the average absoluteRoI is varying with respe t to σ in the RoI. For σ = 0 and σ = 0 , the average absolute value ofthe RoI are equal. Roughly, the average absolute value of the RoI with amplitude noise de reasesfor σ < . and it in reases for σ ≥ . . This mat hes with the observations for the probabilitydistributions: there, for σ = 0 . , the values are on entrated around r ( t ) = 0 , leading to smaller h| r ( t ) |i , and for σ = 0 . and σ = 0 . , the values are less on entrated around r ( t ) = 0 , leadingto larger h| r ( t ) |i σ ,σ 〈 | r( t ) | 〉 σ , σ =0σ =0, σ Figure 4: Average absolute value of the RoI, r ( t ) , eq. 16, for phase noise (red, top line) and foramplitude noise (bla k, bottom line).The average absolute RoI is of importan e be ause its known from multipli ative sto hasti pro esses [28, 34℄, that for a onstant investment q ( t ) = q the better performing onstantstrategies are the ones that invest the least possible amount. In our model, the agents are for edto invest the minimum amount of q min = 0 . . Sin e q ( t ) is multiplied with r ( t ) in eq. 1, the hange in average absolute value of r ( t ) has an impa t similar to the hange in q seen in themultipli ative sto hasti pro esses [28℄ studied. This leads to hanges in performan e that arenot ne essarily related with the performan e of agents, and should be taken into a ount wheninterpreting the results. -1 -0.5 0 0.5 1 r(t)r(t+1) p r ob a b ilit y σ =0.1σ =0.5σ =0.9 -1 -0.5 0 0.5 1 r(t)r(t+1) p r ob a b ilit y σ =0.1σ =0.5σ =0.9 (a) (b)Figure 5: Distribution of the orrelations of the RoI in time, r ( t ) , eq. 16, for: (a) phase noisewith σ = { . , . , . } and (b) amplitude noise with σ = { . , . , . } r ( t ) on the sign of r ( t + 1) . In Fig. 5 we show the distribution ofthe orrelations of the RoI with respe t to two onse utive returns, r ( t ) r ( t + 1) . We an learlysee that for low levels of noise there is bigger orrelation between onse utive values. As thenoise in reases this orrelation diminishes until, (cid:28)nally, for high levels of noise, the returns are ompletely un orrelated.Most of the algorithms studied are sensitive to orrelations in onse utive RoI with the samesign. We noti e that between the returns with phase noise and amplitude noise, orrelations donot vary exa tly in the same manner with noise. In parti ular, it an learly be seen that for σ , = 0 . the amplitude noise has still more orrelation than the phase noise. This di(cid:27)eren e an a ount for some dis repan ies seen between the performan e of the agents for the two typesof market return fun tions.5 Optimal Parameter AdjustmentIn the previous se tions, we have de(cid:28)ned several di(cid:27)erent strategies that an be applied by agentsto determine when to invest whi h amount of money. In the next se tion, we want to ompare theirperforman e in a periodi environment. However, in order to make this omparison meaningful,we have to ensure that we have adjusted the di(cid:27)erent parameters of the strategies properly. Onlyif the strategies perform at their optimum, they an really be ompared.The pro edure that we apply to adjust the optimal parameters is straightforward: we omparethe performan e (cid:21) averaged over periods (cid:21) of ea h of the algorithms for a range of possibleparameters and then hoose the optimal one. At this point, it remains to de(cid:28)ne the notion ofoptimality: we have already de(cid:28)ned that we measure the performan e of agents as the averageof their budget growth over a ertain number of time steps. The optimal strategy is the strategythat performs better than all the other strategies, i.e. the strategy that, on average, leads to thegreatest budget growth. Of ourse, for the measurement, ea h agent has to be provided withenough time to gather the information ne essary for the proper alibration of the algorithm thatit applies.For the MA, MLS, and IUR strategies, there is only one parameter that requires adjustment:either the memory size M (in the ase of MA and MLS) or the step size γ (in the ase of IUR).This implies that for these strategies, it is possible to hoose the optimal value of the parameterby omparing the average budget h x ( t ) i for several possible values of the parameter, and thentake the one whi h gives the best results. For MA and MLS, we have onsidered memory sizes M ∈ [1 , and for IUR, we have onsidered step sizes γ ∈ [0 , C , the rossover probability, p c , the mutation probability, p m , and the elitism size, s . Consequently, the pro ess of (cid:28)nding the optimal ombination of values for the parametersis not as trivial as for the other strategies. The +CARPS (Multiagent System for Con(cid:28)guringAlgorithms in Real Problem Solving) tool [25, 26℄ was used during this step. This appli ationuses autonomous, distributed, ooperative agents that sear h for solutions to a on(cid:28)gurationproblem, thereby (cid:28)ne-tuning the meta-heuristi 's parameters. The agents in +CARPS apply aRandom Restart Hill-Climbing approa h and they ex hange their so-far best solutions to theproblem in the pro ess. The intervals of de(cid:28)nition, i.e. the intervals in whi h the most a eptableGA on(cid:28)gurations should lie, were set as follows: C ∈ { , , , , } , p c ∈ [0 . , . , p m ∈ [0 . , . , and s ∈ [0 . , . ,Table 1: Optimal Strategy ParametersAlgorithm ParametersMA M = 5 (risk-seeking), M = 2 (risk-avoiding)MLS M = 25 (both risk-seeking and risk-avoiding)IUR γ = 0 . (both risk-seeking and risk-avoiding)GA C = 1000 , p c = 0 . , p m = 0 . , s = 0 . Table 1 shows the optimal parameters that we hoose for the omparison of the di(cid:27)erent strate-gies. Of ourse, the optimal parameters usually are not the same for di(cid:27)erent types and levels ofnoise or for risk-seeking and risk-avoiding behaviour, so at times, a ompromise between severalalternative values for di(cid:27)erent situations had to be found.6 ResultsIn this se tion, we ompare all strategies presented in this arti le for RoI with periodi ity T = 100 and di(cid:27)erent noise levels for both phase and amplitude noise. In our omparison, we onsider aset of agents, ea h one using one of the following strategies: Q0 eq. (2), MA eq. (3), MLS eq. (6),IUR eq. (11), and GA eq. (15). Note that periodi returns with a periodi ity hanging over timeare invested re ently as well.[27℄In our omparison, we make two assumptions: (cid:28)rst, all agents re eive the same RoI at a parti ulartime, i.e. the fa t that some agents win or loose more than others is in(cid:29)uen ed only by theirdi(cid:27)erent strategies to determine the orre t risk-propensity value; se ond, all agents use theoptimal parameter values of their respe tive strategies. Let us state again that only the past and urrent values of r ( t ) are known to the agents; they do not know the dynami s governing futurevalues of r ( t ) N = 100 trials of the same experiment, i.e. RoI with same parameters, where atea h end of a y le of the RoI, i.e. for all t su h that t mod T ≡ , an average budget is obtainedfor ea h agent over the 100 trials. This is done for a large number of time steps, i.e. t = 10 .We vary the amplitude noise values, σ ∈ (0 , , while leaving the phase noise value onstant, σ = 0 , and we vary the phase noise values, σ ∈ (0 , , while leaving the amplitude noise value onstant, σ = 0 . For the simulations whi h distinguish between a risk-seeking and a risk-avoidinga tion upon a predi tion of the RoI, we ompute the average budget for both approa hes. Thisgives us four variants of the simulations: amplitude noise/risk seeking, phase noise/risk seeking,amplitude noise/risk avoiding, and phase noise/risk avoiding.6.1 ComparisonFig. 6 shows the result of the simulations by plotting the average budget resulting from the dif-ferent strategies against the noise level for ea h of the four variants of the simulations (amplitudenoise/risk seeking in (a), phase noise/risk seeking in (b), amplitude noise/risk avoiding in ( ),and phase noise/risk avoiding in (d)).For all variants of the simulations, the onstant-risk strategy is the worst strategy. The onstant-risk strategy always puts a onstant proportion of the budget at stake. This money is won whenthe return is positive, but also lost when the return is negative; even though h| r ( t ) |i = 0 , thisleads to a loss in budget over time, as this is a well known property for multipli ative sto hasti pro esses.Furthermore, for all strategies, the average budget de reases with in reasing noise. That is theexpe ted behaviour: with in reasing noise, the a ura y of the predi tions made by the agentsde reases, and thus they annot ne essarily hose the appropriate risk propensity in the a tion.There are no signi(cid:28) ant rossovers of the performan e of di(cid:27)erent strategies. In general, thisimplies that if a strategy s performs better than a strategy s for a given noise level σ a (eitheron the phase or on the amplitude), s an be expe ted to perform better than s for a di(cid:27)erentnoise level σ b . Consequently, the hoi e of strategy is independent of the noise in the return (cid:21)a good strategy is a good strategy for all noise levels, and a bad strategy is a bad strategy forall noise levels, too. However, for low noise levels, the GA is slightly outperformed by the otherstrategies (cid:21) this is due to the intrinsi sto hasti nature of the algorithm; for the same reason,this algorithm performs better for high noise levels. Note that the experiments in this simulationsare done for t = 10 amplitude noise -1 〈 x 〉 Q0MAMLSGAIUP phase noise -1 〈 x 〉 Q0MAMLSGAIUR (a) (b) amplitude noise -1 〈 x 〉 Q0MAMLSGAIUR phase noise -1 〈 x 〉 Q0MAMLSGAIUR ( ) (d)Figure 6: Average budget over N = 100 trials, for agents using strategies Q0, MA, MLS, GA,IUR and SW (square wave, to be introdu ed in se tion 6.2), over t = 10 time steps. Agentsuse optimal parameter values for RoI with periodi ity, T = 100 , and di(cid:27)erent noise levels: (a)di(cid:27)erent amplitude noise values, σ ∈ (0 , , and no phase noise, σ = 0 with a risk-seekingstrategy; (b) di(cid:27)erent phase noise values, σ ∈ (0 , , and no amplitude noise, σ = 0 with arisk-seeking strategy; ( ) di(cid:27)erent amplitude noise values, σ ∈ (0 , , and no phase noise, σ =0 with a risk-avoiding strategy, and (d) di(cid:27)erent phase noise values, σ ∈ (0 , , and no ampli-tude noise, σ = 0 q ( t ) as hosen by the GA for r ( t ) over time, in order to analyse why the GAperforms so well. Fig. 7 plots the values of r ( t ) and the orresponding q ( t ) as hosen by the GAagainst time t for di(cid:27)erent noises and from di(cid:27)erent times t n on. From the graph, it is visible thatthe behaviour of the GA resembles a square wave fun tion whi h is a type of a ramp-re tanglefun tion. -1-0.500.51 r( t ) q ( t ) t_n+0 t_n+50 t_n +100 t_n+150 t_n+200 t -1-0.500.51 r( t ) q ( t ) t_n+0 t_n+50 t_n+100 t_n+150 t_n+200 t (a) (b) -1-0.500.51 r( t ) q ( t ) t_n+0 t_n+50 t_n+100 t_n+150 t_n+200 t -1-0.500.51 r( t ) q ( t ) t_n+0 t_n+50 t_n + 100 t_n+150 t_n+200 t ( ) (d)Figure 7: Values of the return r ( t ) and the risk propensity q ( t ) as hosen by the GA for RoIwith di(cid:27)erent types of noise and for di(cid:27)erent times during the simulation: (a) amplitude noise, σ = 0 , σ = 0 . , t n ≈ , , (b) amplitude noise, σ = 0 , σ = 0 . , t n ≈ , , ( ) phasenoise, σ = 0 . , σ = 0 , t n ≈ , , (d) phase noise, σ = 0 . , σ = 0 , t n ≈ , q ( t + 1) = (cid:16) q max − q min h (cid:17) ˆ t + q min if ˆ t ∈ (0 , h ) q max if ˆ t ∈ [ h , h ] (cid:16) q max − q min h − h (cid:17) (ˆ t − h ) + q max if ˆ t ∈ ( h , h ) q min if ˆ t ∈ [ h , h ] (18)In this fun tion, h ( h ) sets the transition from an in reasing (de reasing) ramp fun tion toa re tangle fun tion and h ( h ) sets the transition from a re tangle fun tion to a de reasing(in reasing) ramp fun tion. Moreover, for ea h time step, t , the following ongruen e is used: ˆ t ≡ t mod h ; this maps ea h time step t ∈ (0 , ∞ ) to a time step in the ramp-re tangle fun tion, ˆ t ∈ (0 , h ) .Furthermore, we assume that the di(cid:27)eren es between time steps when an agent in reases andde reases its risk propensity values are symmetri . This means that the time di(cid:27)eren e △ h between when the ramp fun tion starts and stops to in rease or de rease an be expressed asfollows: △ h = h = h − h (19)whi h for △ h = 1 , means that agents use a Square Wave (SW) strategy. We are parti ularlyinterested in this ase of the ramp-re tangle fun tion: it implies that an agent invests q max fortime steps ˆ t ∈ (0 , T / , and invests q min for time steps ˆ t ∈ [ T / , T ] . This is the optimal strategy.The GA approa hes the optimal strategy: for all di(cid:27)erent noises, the risk propensity q ( t ) hosenby the GA approximates the one that would have been hosen by SW. Considering that the GAdoes not have an `a priori'-behaviour de(cid:28)ned, it is interesting to realise that it (cid:28)nds the optimalstrategy (cid:21) investing the maximum when, at a parti ular time t in the period, the probability ofwinning is higher than loosing and vi e versa (cid:21) on its own.Fig. 7 illustrates this behaviour. It plots the values of r ( t ) and the orresponding q ( t ) as hosenby the geneti algorithm against time t for di(cid:27)erent noises and from di(cid:27)erent times t n on. Fromthis, it is learly visible that the behaviour of the GA is very similar to the behaviour of theSW, whi h is the optimal strategy. Comparing (cid:28)g. 7 (a) with (b) and (cid:28)g. 7 ( ) with (d), i.e. thesame s enario, but at di(cid:27)erent times t n = 10 , and t n = 100 , , one an see that the q ( t ) r ( t ) to be periodi with a period of T and, for the moment, assume that there is no noise, i.e. σ = 0 as well as σ = 0 . Then, theoptimal strategy would be to invest the omplete budget or q max during [0 , T / and to investnothing or q min during [ T / , T ) . This is be ause it is ertain (cid:21) we assumed that there is no noise(cid:21) that during the (cid:28)rst half of the period, [0 , T / , the value of r ( t ) will be positive and during these ond half of the period, [ T / , T ) , the value of r ( t ) will be negative. No matter what the pre isevalues of r ( t ) are, on e they are positive, this leads to a gain, and thus q ( t ) should be as largeas possible to maximise the gain; onversely, on e the values of r ( t ) are negative, this leads to aloss, and thus q ( t ) should be as small as possible or zero to minimise the loss. In other words,for determining q ( t ) , not the quantity of the expe ted return matters, but whether the probabilityof the expe ted return being positive is greater than the probability of the expe ted return beingnegative. This explains why the risk-seeking behaviour outperforms the risk-avoiding behaviourfor periodi returns with no noise. The behaviour of this strategy is shown in Fig. 8 (a).For periodi returns with noise, i.e., σ = 0 or σ = 0 , the situation is quite similar. Dependingon the values of σ and σ , there will be two intervals [0 + ǫ, ( T / − ǫ ) and [( T /
2) + ǫ, T − ǫ ) su h that during [0 + ǫ, ( T / − ǫ ) , the value of r ( t ) will (cid:21) on average (cid:21) be positive and su hthat during [( T /
2) + ǫ, T − ǫ ) , the value of r ( t ) will (cid:21) on average (cid:21) be negative, see Fig. 8 (a).In these intervals, the optimal strategy would again be to invest the omplete budget or q max and to invest nothing or q min , respe tively. The value of ǫ , of ourse, depends on σ and σ , i.e.the more noise, the greater ǫ . Now, what still has to be onsidered are the intervals [0 , ǫ ) , [( T / − ǫ, ( T /
2) + ǫ ) , and [ T − ǫ, T ) . Be ause of the noise, it is not possible to determine theexa t sign of r ( t ) during these intervals.However, it still is possible to say that (cid:21) on average (cid:21) the probability of r ( t ) being positive isgreater than the probability of r ( t ) being negative during [0 , ǫ ) and [( T / − ǫ, T / and theprobability of r ( t ) being negative is greater than the probability of r ( t ) being positive during [ T / , ( T /
2) + ǫ ) and [ T − ǫ, T ) . Consequently, it makes sense to invest during [0 , ǫ ) and [( T / − ǫ, T / and not to invest during [ T / , ( T /
2) + ǫ ) and [ T − ǫ, T ) .With su h behaviour, there will, however, be the situation that an agent invests the ompletebudget, but the return is negative. In this type of situation, | r ( t ) | depends on σ and σ -101 r( t ) , q ( t ) t ε T/2- ε T/2 T/2+ ε T- ε T t -1-0.500.51 r( t ) (a) (b)Figure 8: Intervals of ertainty and un ertainty: (a) shows r ( t ) with no noise and the orre-sponding q ( t ) of the square wave (SW) strategy plotted against t and (b) shows r ( t ) with noiseand the di(cid:27)erent intervals for whi h di(cid:27)erent on lusions about the sign of the return an bedrawn: (2) and (5) are the intervals in whi h the sign of r ( t ) is ertain to be positive or neg-ative, respe tively, and (1), (3), (4), and (6) are the intervals in whi h the the sign of r ( t ) isun ertain. σ and σ , it will be small, too. Consequently, for low levels of noise, the produ t of q ( t ) r ( t ) wouldbe a small value, whi h signi(cid:28)es, for r ( t ) < , a small loss, and for r ( t ) > , a small gain. Thus,even for q ( t ) = 1 , the loss is bound to a proportion of the budget orresponding to the value of r ( t ) . This explains why the risk-seeking behaviour outperforms the risk-avoiding behaviour forlow levels of noise. For high levels of noise, the produ t of q ( t ) r ( t ) needs not ne essarily be asmall value, whi h potentially signi(cid:28)es, for r ( t ) < , a large loss, and for r ( t ) > , a large gain.Thus, an agent ould potentially loose a signi(cid:28) ant amount of its budget if it invests the ompletebudget; this is the reason why, for high levels of noise, the risk-avoiding behaviour outperformsthe risk-seeking behaviour.This also provides a straightforward explanation why di(cid:27)erent algorithms using the same rule todetermine q ( t ) perform di(cid:27)erently. Even though the best strategy is to still invest the maximumwhen there is a slightly better probability that r ( t ) > than that r ( t ) ≤ , the algorithms failto predi t the exa t probabilities of r ( t ) > and of r ( t ) ≤ r ( t ) st International Workshop onHybrid Metaheuristi s, HM'2004, at the 16 thth