[PDF] Minority games played by arbitrageurs on the energy market

Abstract

Along with the energy transition, the energy markets change their organization toward more decentralized and self-organized structures, striving for locally optimal profits. These tendencies may endanger the physical grid stability. One realistic option is the exhaustion of reserve energy due to an abuse by arbitrageurs. We map the energy market to different versions of a minority game and determine the expected amount of arbitrage as well as its fluctuations as a function of the model parameters. Of particular interest are the impact of heterogeneous contributions of arbitrageurs, the interplay between external stochastic events and nonlinear price functions of reserve power, and the effect of risk aversion due to suspected penalties. The non-monotonic dependence of arbitrage on the control parameters reveals an underlying phase transition that is the counterpart to replica symmetry breaking in spin glasses. As conclusions from our results we propose economic and statutory measures to counteract a detrimental effect of arbitrage.

Full PDF

MMinority games played by arbitrageurs on the energy market

Tim Ritmeester and Hildegard Meyer-Ortmanns

Physics and Earth SciencesJacobs University Bremen28759 Bremen, GermanyEmail: [email protected]

Abstract

Along with the energy transition, the energy markets change their organization toward more decentralizedand self-organized structures, striving for locally optimal proﬁts. These tendencies may endanger the physicalgrid stability. One realistic option is the exhaustion of reserve energy due to an abuse by arbitrageurs. Wemap the energy market to diﬀerent versions of a minority game and determine the expected amount ofarbitrage as well as its ﬂuctuations as a function of the model parameters. Of particular interest are theimpact of heterogeneous contributions of arbitrageurs, the interplay between external stochastic events andnonlinear price functions of reserve power, and the eﬀect of risk aversion due to suspected penalties. Asconclusions from our results we propose economic and statutory measures to counteract a detrimental eﬀectof arbitrage.

Keywords:

Arbitrage, Energy Markets, Statistical Physics, Agent-based Modeling

1. Introduction

Along with the energy transition, not only the physical realization of the power grid gets reorganizedtowards a more decentralized structure, but also energy markets try to act more self-organized and on aspatially local platform , striving for their own optimum of proﬁt. Occasionally, these eﬀorts for makingmaximal proﬁt in a given local trading area of the grid may endanger the global physical grid stability.Supply and demand of power need to be balanced to ensure the stability of the power transmission grid.Any entity that wishes to trade on the European energy market is obligated to be managed by a BalancingResponsible Party (BRP), which have a legal obligation to match the supply and demand of power in theirportfolio to the best of their abilities [1]. However, deviations from this delicate balance are inevitable due tounforeseen contingencies, ﬂuctuating renewable power sources from wind and solar energy, and ﬂuctuationsin the power consumption. Responsible for the stability of the power grid itself are the Transmission GridOperators (TSOs), who counteract these deviations by activating reserve power, which is kept in reserve.This can be both positive reserve power (in case of lack of supply) or negative reserve power (in case of lackof demand). The reserve power has some cost, which is paid for by BRPs proportionally to the imbalance local in contrast to country-wide Preprint submitted to Elsevier December 11, 2020 a r X i v : . [ ec on . T H ] D ec n their portfolios. Of course there is only a limited amount of reserve power available ( ∼ is determined by the so-called merit order. This means that the cheapestpower is activated ﬁrst, and the larger the imbalance is the more expensive power has to be activated.The merit order is essential to prevent the arbitrage from becoming a runaway process. From the point ofthe BRPs, the merit order implies a limitation of arbitrage opportunities: The more BRPs engage in thisbehaviour, the higher the cost of the reserve power, until eventually the possibility for arbitrage disappears.Even with the merit order in place, this leaves room for some amount of arbitrage, thereby using preciousreserve power. Here, the incentives for the BRPs amount to a so-called ’minority mechanism’: Performingarbitrage is only advantageous for a BRP if not many other BRPs behave in the same way. Since any BRPonly gets to ﬁnd out what the reserve energy price is after he makes his decision (trade on the intraday marketstops 15 minutes before the actual delivery of the power), this minority mechanism leads to an uncertaintyin who actually contributes to buying reserve power and what the reserve power price is after all. This In reality, diﬀerent types of reserve power are distinguished. Relevant for us are secondary and tertiary reserve power, ofwhich a total of respectively around 2500 MW and 1500 MW are available in Germany and Austria. Both have diﬀerent start-uptimes, and whether (and when) tertiary reserve power gets activated depends on the expected duration of the disturbance. Inour model we consider a single reserve market, which represents the combination of these two types of reserve power. anti-coordinate to the average behaviour:predict the total arbitrage performed by the other BRPs, and anti-align their own behaviour with theaggregated behaviour of the other BRPs (that is, refrain from getting involved in arbitrage if many othersget involved, otherwise perform arbitrage). Therefore, The BRPs, our agents, have to learn the behaviourof the other agents, and adjust their own behaviour accordingly. This is not straightforward, as there isno ’best strategy’ which is the same for all that they can follow to achieve this: If all agents would usethe same strategy, all of them would come to the same decision, and the strategy would invalidate itself.Anti-coordination thus requires the agents to reach some degree of heterogeneity in making their decisions.If they would manage to anti-coordinate perfectly, they always estimate the reserve price correctly, and thereare no ﬂuctuations around the average amount of arbitrage and no additional risk caused by ﬂuctuations oftheir actions. However, they infer information on the behaviour of other agents only indirectly in hindsight,so ﬂuctuations due to their decisions are unavoidable.The minority mechanism that we have just described has some universal features as it underlies a numberof optimization problems in dynamical systems whenever it is favorable to belong to the minority. It is oftenformulated as a game, the minority game, whose players are the agents. Known as a prototype of theminority game is the El Farol’ bar problem [2, 3], for which ref. [4] provided a deeper understanding in termsof statistical physics. Diﬀerent market mechanisms have been described in terms of minority games in [5].There the spectrum of agents reached from producers to speculators and “noise traders”, focussing on theinformation ﬂow between the agents. In [3, 6] it is shown that real markets seem to operate near criticalitywhere they are marginally eﬃcient. The stylized description in terms of minority games captures collectiveeﬀects, even when the minority game is extended toward models of real markets. Also in relation to ﬁnancialmarkets, the minority mechanism takes eﬀect as considered in [7], where the minority game was extendedtoward a realistic model of the stock market. After all, models of minority games can be mapped to spinglass models (see e.g. [3, 5]), sharing features of many random variables, quenched disorder and a phasetransition between a phase with replica symmetry being broken or restored.In this paper we focus on the energy market and consider only one type of agents, the arbitrageurs,representing the BRPs of the real market when they get involved in arbitrage. Thus we ﬁrst translate thedynamics in relation to arbitrage to a minority game. On the formal level, we mainly follow the frameworkof [5], but extend their versions of minority games to include (in our applications) a non-vanishing intradayprice in combination with risk aversion, moreover external stochastic events in combination with nonlinearpayoﬀ functions. Furthermore, we use real data for the contributions of BRPs to the exhaustion of reserveenergy as a result of arbitrage. In contrast to the perspective from economics, our focus in on the impactof ﬂuctuations around the average arbitrage and their dependence on the model parameters, which is subtle3nd sometimes counterintuitive.The paper is organized as follows. Section 2 presents the general form of minority games with anoverview of special cases considered later, followed by a summary of an algorithmic implementation of theminority games. In section 3.1, we derive some analytical bounds on the ﬂuctuations suited for a comparisonwith numerical results. Numerical results are presented in section 3, ﬁrst for the minimal version of theminority game to introduce the basic framework (subsection 3.2), extended to non-vanishing intraday prices I and nontrivial distributions of the contributions to the imbalance of power (subsection 3.3), the impact ofnonlinear price functions in combination with noise (subsection 3.4), and the implementation of some degreeof risk aversion (subsection 3.5). We conclude with a proposal of countermeasures to protect the marketfrom arbitrage (section 4) and summarize some features, which should survive more realistic modelling ofthe energy market, in section 5.

2. The Energy Market in Terms of Minority Games

We do not distinguish between diﬀerent types of energy markets, besides the intraday market and thereserve power market. The markets themselves are described by a few parameters and functions.

Parameters.

The parameters are the total number of BRP-parties N , represented by individual agents,who behave as arbitrageurs, the total available reserve power P res (which should not be exhausted), thetotal amount of power available for arbitrage W , where each of our agents i ∈ { , ..., N } has access to withan amount of power w i , and the price I on the intraday market, here kept ﬁxed over time. Functions and distributions

The reserve power cost function R has to be taken from real data orspeciﬁed as a (linear or nonlinear) function of the required power. Ongoing ﬂuctuations in the energy balancewhich are caused by diﬀerences between forecast and actual consumption or estimated and real productiondue to ﬂuctuations in renewables are altogether captured by noise η of various types and strengths. This“noise” adds upon imbalances evoked by our agents, the latter are in the main focus of this paper. A furthercharacteristic is the “information” that is in principle available on the market. It may refer to the real historyof trade or forecasts on weather, consumption or production. All this information will be implemented ina highly stylized way as diﬀerent options of integer numbers µ = 1 , , .., P . To each value of µ , a value ofeither +1 or − ± P possible strings of ±

1, whichlater will serve as a “pool” of strategies, to be deﬁned below. Thus information enters only indirectly in thechoices of strategies.

The agents.

Our agents are the players of the minority games. They represent BRP-parties, but do notbehave like general traders (whose aim is to trade such that upcoming or existing imbalances get balanced,while proﬁting from regular trade of energy). Our agents are arbitrageurs with two possible decisions a i = ± a i = +1, the agent sells energy on the intraday market, while not feedingenough power into the grid. (This should mimic a situation where the agent tries to buy reserve energy at aprice R < I and trades energy on the intraday market as an uncovered sale.) For

R < I , his payoﬀ per unit4f power u i given by u i := a i ( I − R ) (1)will be positive, for I < R and otherwise the same behavior, his payoﬀ will be negative. The decision a i = − I < R ( I > R ), leading to u i > u i < A ,the arbitrage is given as A = N (cid:88) i =1 w i a i . (2)The weights w i with which the agents contribute to the arbitrage need not bechosen uniformly, but fromsome distribution P w ( w i ). The arbitrage A adds upon other sources of imbalance, summarized under η .(Note that if A + η is positive, some agents can actually reduce the total imbalance by playing a i = −

1, andvice versa.) The price of the reserve power is a function of the total imbalance A + η (see section 1), denotedas R ( A + η ). Therefore, more precisely, including this in Eq. 1, the pay-oﬀ (per unit of power) for an agent i is given by: u i = a i · [ I − R ( A + η )] . (3)Since the reserve power is activated according to the merit order, a larger imbalance means that moreexpensive reserve power is activated. Thus, the reserve power price is increasing when the imbalance grows: d R ( x )d x > I and the typical shape of the reserve power price function R . Strategies.

Aspects of game theory enter in how the agents take their decisions. The decisions are basedon strategies s i ( µ ) = a i ∈ {± } , µ ∈ { , ..., P } . These are maps from information coded in µ to decisions a i = ±

1. In this section we give only an overview of the diﬀerent choices that we consider later in moredetail. Each agent can have S ≥ i chooses a i = 1 with probability p i and a i = − − p i (independently of µ ). If strategies arechosen deterministically, those which are available to the agents are chosen as a certain subset of the poolof all 2 P strategies . The subset can be chosen uniformly randomly or with a certain bias, the latter case isconsidered in section 3.3. In the course of time, individual agents can switch between the S strategies that His total payoﬀ is given by w i × u i . The number of possible strategies 2 P is the number of maps from the information coded in µ = 1 , . . . , P to the decisions ± a) (b) Figure 1: a) Histogram of the average price on the intraday market over all 15-min intervals (weighted over the total amountof power during these intervals), between 01-01-2017 00:00 and 07-11-2017 24:00, log y-scale. Data are taken from [8]. (b) Thereserve power price (for secondary reserve) as a function of the size of the total imbalance, on June 1st, 2020, 16:00-20:00 (datafrom [9]). The shape is typical, however, numerical values diﬀer between the four-hour intervals. In particular it can happenthat prices are signiﬁcantly lower than the ones displayed here, leading to dangerous situations. had been selected from the pool of 2 P ones, depending on their learning process. During the learning phasethey evaluate the success of all their strategies in the past, keeping track of their evaluations, recursivelydeﬁned according to U t +1 s i = U ts i + s i ( µ t ) · [ I − R ( A t )] (4)with s i ( µ t ) denoting the decision of agent i according to strategy s , given information µ at time t , so s i ( µ t ) · [ I − R ( A t )] denotes the payoﬀ at time t in the past. In comparing the score of his diﬀerent strategies,the agent uses for all of them (usually for both if S = 2) the actual value A t that was measured when hedecided in favor of one strategy at time t . In the evaluation he therefore neglects the change in A t causedby himself if he had chosen another strategy than the actually chosen one. (This error of the order of 1 /N need not be negligible in certain cases.) At time t + 1 he chooses the strategy that would have been the bestone in hindsight according to his evaluation. Additionally, in section 3.5 we will implement a certain degreeof risk aversion by a constraint that an agent only takes a decision if the expected payoﬀ lies above somethreshold (cid:15) , otherwise he refrains from playing at this time step. Observables.

Our main observables are the expected amount of arbitrage (cid:104) A (cid:105) and the variance in theamount of arbitrage σ A ≡ (cid:104) A (cid:105) − (cid:104) A (cid:105) , averaged over time, as a function of the model parameters N, P, η ,risk aversion (cid:15) and price function R . We measure the resulting distribution of A , and, importantly, theassociated ﬂuctuations σ A and their scaling. The main interest is in the dependence of σ A on the modelparameters. Large ﬂuctuations may lead to outliers in the amount of arbitrage and induce an exhaustion ofthe reserve energy. Agent Based Modelling of Minority Games.

Apart from the analytical derivation of bounds onthe ﬂuctuations in section 3.1, the steps of the agent-based-modelling can be summarized in the following6lgorithm: • Initialization and ﬁxing the choicesChoose the parameters N , the number of agents, P , the number of patterns of information, the intradayprice I , the number of strategies S , the price function R ( x ), the distributions P ( η ), the distribution ofexternal noise, P w ( w i ), the distributions of weights, the updating rules for the evaluations U t +1 s i (wewill here only consider the updating rule given in Eq. 4) and set the initial values for U t =0 s i (here wealways set them to zero). Choose furthermore the value of the bias in the strategies p and assign (ingeneral diﬀerent) subsets of S strategies s i to each agent i . Keep these sets of strategies for each agentﬁxed over the simulation time. • Learning phaseAll agents simultaneously update their evaluation U i at time t + 1, based on the measured value of A t = (cid:80) i w i a i ( t ), to determine the most promising strategy s besti in hindsight (the strategy out oftheir subset of strategies with the highest evaluation). They select for the most promising and decideaccordingly a i = s besti ( µ ) at time t + 1. Repeat the learning steps until the observables converge. Thespeciﬁc stopping criterion that we use is as follows: Calculate σ A both for the last quarter of time-stepsand for the third quarter of time-steps. If the diﬀerence between these values is less than 0 .

1% stopthe simulation and calculate observables over the last half of all time-steps. Otherwise continue. • Measurement of observablesCalculate the mean value of arbitrage (cid:104) A (cid:105) as (cid:80) t + Tt = t A ( t ) from the last T measurements as well as thehistogram of A over the time interval T . Calculate the corresponding ﬂuctuations σ A . • Gain statistics for the measurementsRepeat the whole procedure so far, including a new initialization, for 100 times and average (cid:104) A (cid:105) and σ A over the hundred iterations. (As we shall see, depending on the ratio of P/N the results may dependon the initialization.)

3. Results

The pay-oﬀs and possible decisions given in the previous section deﬁne a game. To gain some insightinto the structure of the pay-oﬀs and its inﬂuence on the agents, we consider a version where the game isplayed only once (in reality the game is repeated every 15 minutes, which we will investigate in more detailin section 2). To ﬁnd the expected behaviour of the BRPs, we ﬁrst try to ﬁnd Nash equilibria (following astandard game-theoretical procedure [10]). These are sets of strategies where no agent has an incentive todeviate from his strategy. In this sense, Nash equilibria are ’stable’ solutions of the game.7e consider strategies where each agent i chooses a i = 1 with probability p i and a i = − − p i . If p i equals either 0 or 1 the choice is deterministic. Assuming that an individual agent’s contributionis small compared to the total imbalance, that is w i /A = O ( N ) for all i (while keeping R ( A + η ) = O (1)),then at large N we can neglect the individual agent’s contribution to A . A set of { p i } is a Nash equilibriumif no agent has an incentive to change p i . Looking at Eq. 3, this is the case when either:1. I − R ( (cid:80) i w i + η ) > p i = 1,2. I − R ( − (cid:80) i w i + η ) < p i = 0, or,3. (cid:104) I − R ( A + η ) (cid:105) = 0.Cases 1 and 2 occur when the diﬀerence in intraday price and reserve energy price is so high that thereare simply not enough agents to abuse all of the arbitrage opportunities. They correspond to all agentsperforming arbitrage, respectively by selling or by buying power on the intraday market. The ﬁrst casecorresponds to the aforementioned events in June 2019. Such extreme amounts of arbitrage are rare. Fur-thermore, measures have been taken to reduce the incentives for the arbitrageurs, reducing the likelihood ofcase 1 to occur. Case 3 requires less extreme diﬀerences in price, and can be expected to occur more oftenthan the other two cases. It is also the more interesting case. Any set { p i } which leads to this equationbeing satisﬁed corresponds to a Nash equilibrium. In general there are many solutions to these equations,each usually corresponding to a diﬀerent strength of ﬂuctuations around the mean amount of arbitrage andthus to diﬀerent levels of anti-coordination. Since there are many Nash equilibria, the identiﬁcation of Nashequilibria as such does not tell the whole story, as it is not clear to which equilibrium interacting agents willconverge, or whether they would even converge to an equilibrium at all. Nevertheless it is useful to identifytwo extremes: • Perfect anti-coordination.Agents choose deterministically, with p i equal to either 0 or 1, such that I − R ( A + η ) is always equalto zero . It requires the agents to separate themselves into two groups, one group playing a = 1 andthe other group playing a = −

1. The sizes of the groups must be such that A solves I = R ( A + η ). • No anti-coordination.An equilibrium which requires no (anti-)coordination between agents can be found assuming the strate-gies for all agents are homogeneous, that is p i = p for all i . One then needs to ﬁnd p such that (cid:104) I − R ( A + η ) (cid:105) = 0.From the agents’ point of view, in either case their pay-oﬀ is zero up to corrections of order 1 /N . Keep-ing N ﬁnite and looking at these corrections, the corrections give an expected total pay-oﬀ to all agents Since we assumed each agents weight w i to be of order 1 /N , for large N the total amount of arbitrage A is a real number,which can be adjusted to satisfy the equation. (cid:104) ( I − R ( A + η )) (cid:80) i a i (cid:105) = (cid:104) A ( I − R ( A + η )) (cid:105) , which is the correlation between A and ( I − R ( A + η )).Since R ( A + η ) is increasing with increasing A , this pay-oﬀ is only zero if the variance of A is zero, and thepay-oﬀ is negative otherwise. The precise value depends on the degree of anti-coordination that the agentsachieve. These pay-oﬀs are small (the average pay-oﬀ is of order 1 /N ), but for ﬁnite N one does expect sometendency for the agents to move towards lower variances, and thus achieve some level of anti-coordination.In the ’perfect anti-coordination’ equilibrium, the pay-oﬀ is exactly zero. From the agents’ point of viewthis is optimal in the sense that arbitrage is maximally exploited just to the limit from which on they makeno longer proﬁt. However, it requires internal organization between the agents, and it is not clear howagents would reach this state without explicit agreements. The ’no anti-coordination’ solution requires noorganization between the agents. In general we expect agents to reach some intermediate position betweenthese extremes. To study the extent to which agents can learn to anti-coordinate, we need to give themsome explicit learning dynamics. We will introduce this in section 3.2. Estimates of the order of magnitude of ﬂuctuations.

We can ﬁnd an order of magnitude ofthe strength of the ﬂuctuations by looking at the variance predicted by the ’no anti-coordination’ Nashequilibrium. Let us denote the amount of arbitrage that removes all incentives for further arbitrage by A ∗ . That is, A ∗ is such that the price of reserve power equals the price of power on the intraday market, R ( A ∗ + η ) = I . If we approximate R as linear around R ( A ∗ + η ) = I , that is R ( A + η ) ∼ I + c ( A − A ∗ ) for some c , then a simple solution can be found. If ﬂuctuations around A ∗ are small, this is a good approximation.If the reserve price function is linear, one simply has to set the mean amount of arbitrage µ A ≡ (cid:104) A (cid:105) = A ∗ ,giving (cid:104) a (cid:105) = 2 p − A ∗ / (cid:80) i w i and variance: σ A ≡ (cid:104) A (cid:105) − µ A = W N/X × (1 − µ A /W ) = O ( N ) . (5)Here we deﬁned the total power available for arbitrage by W ≡ (cid:88) i w i (6)(which is O ( N )) and X ≡ (cid:0) N (cid:88) i w i (cid:1) / (cid:0) N (cid:88) i w i (cid:1) , (7)which measures the non-uniformity of the distribution of weights w i . (Here expectation values (cid:104) . . . (cid:105) refer tothe distribution of a i .) For a uniform distribution of weights X = 1, while for a non-uniform distribution itis always greater than 1 (for example, for exponential distributions X = 2). N/X is an eﬀective number ofagents contributing to the ﬂuctuations: if N (cid:48) agents have (uniform) non-zero weight (and the rest has weightzero), then N/X = N (cid:48) . We note that W = O ( N ), giving also σ A = O ( N ). In fact, adding an additionalarbitrageur always increases σ A , disproportionally so for arbitrageurs with relatively high weight. To be precise, adding an arbitrageur with weight w j (cid:28) W tot and expanding the variance σ A to ﬁrst order in w j /W tot , we

9o estimate the signiﬁcance of these ﬂuctuations to arbitrage on the energy market, we need some estimateof A ∗ and of the weights w i (note that the variance is actually independent of c ). In Appendix A we usethe description of the events of June 2019 given by [1] to make such an estimate, which leads to an orderof magnitude σ A /µ A ≈ . − .

5. The fact that these ﬂuctuations are not much smaller than the expectedamount of arbitrage µ A , means that they contribute signiﬁcantly to the risk of exhausting the reserve power.What is more, in Sec. 3.4 we will ﬁnd that for a nonlinear price function, the expected amount of arbitrageitself also changes dependent on the magnitude of the ﬂuctuations. These ﬁndings make clear that theﬂuctuations require a careful study. Let us ﬁrst reproduce known results and discuss the ’standard’ minority game, which uses a special caseof the pay-oﬀ described in Sec. 2. This will lead to an understanding of the basic structure of minority gamesresulting from the learning dynamics. We consider the following case (studied in [5]): • The weights w i are uniformly equal to 1, • the intraday price I = 0, • the imbalance due to other causes η = 0, • the reserve price is given by R ( A + η ) = R ( A ) = A/N . This gives the payoﬀ: u i = − a i A/N . (8)We note that setting R ( A ) = cA/N for some constant c simply multiplies the payoﬀs by c and does notchange the behaviour of the agents compared to the c = 1 case. We furthermore note that [3] studies thepayoﬀ u i = − a i sgn( A ) instead. We follow the learning dynamics introduced in Sec. 2. The strategies aredrawn with zero bias: in any given strategy, a = 1 has the same probability of occurring as a = − N = 4100, P = 2050 and S = 2. The time evolution of A t is shown in Fig. 2a.The precise value of A t at a given time-step is highly unpredictable. Plotting a histogram of all the valuesattained by A t , (Fig. 2(b) shows that the values of A t follow a Gaussian distribution. A general trend in thetime evolution of A (Fig. 2(a) can furthermore be observed: For small t , values of A that are far from theaverage value are relatively common; as time evolves the agents learn to anti-coordinate, and such valuesbecome more rare.We can quantify this eﬀect by interpreting the histogram shown in Fig. 2(b) as a probability distributionfrom which A t are drawn independently (indeed, in ref. [6] it is shown that for a wide range of parameters,including the ones chosen here, this is the case). This implies that the values A t have eﬀectively becomerandom variables . Averages over the distribution P ( A ) from which A t are drawn can then be calculated by ﬁnd that it is multiplied by a factor 1 + w j µ A W ( W − µ A ) + w j (cid:80) i w i > This is despite the fact that once the initial conditions have been chosen (i.e. the strategies have been drawn), the processis completely deterministic. The randomness is eﬀective , caused by the large amount of agents interacting disorderly with eachother. Such an eﬀect occurs more often in disordered systems [11]. a) (b) Figure 2: (a) The evolution of A t , for N = 4100, P = 2050 and S = 2, showing its decrease as a function of time. (b) Ahistogram of A t (for the same parameters) over 10 time-steps. The solid line shows a Gaussian distribution with the samevariance as the histogram.Figure 3: Evolution of σ A for N = 4100, P = 2050 and S = 2, showing that the agents learn to anti-coordinate; σ A is calculatedby using a running average over 2000 time steps, only initially it is proportional to √ N as naively expected. performing time averages: (cid:104) f ( A ) (cid:105) P ≡ T t + T (cid:88) t = t f ( A t ) , (9)for large T and t . Time instants t and t + T determine the time interval over which the average is taken,and the distribution P ( A ) is assumed to remain (approximately) the same throughout this interval . Thestatement that values of A t become closer to the mean as the agents learn to anti-coordinate, then translatesto the statement that the variance σ A ≡ (cid:104) A (cid:105) − (cid:104) A (cid:105) decreases as time progresses.The time evolution of σ A is shown in Fig. 3. At the start of the game, agents have not yet learned toanti-coordinate, and they behave according to the ’no anti-coordination’ Nash equilibrium (which would give σ A / √ N = 1). As time goes on, they learn from the results in the past to predict A to some extent: The This is true for t (cid:29) T , as it is shown in [5] that the system eventually approaches a ﬁxed distribution P ( A ). A .The extent to which they anti-coordinate depends on the values of the parameters. Even for this simplestcase, we have three diﬀerent parameters to tune: N, P and S . To investigate the dependence of the collectivebehaviour on these parameters, we run the minority game for diﬀerent values of N , P and S . For any givenset of parameters, we run the minority game until σ A converges to a constant value (see section 2). Thevalues of σ A to which the system converges are shown in Fig. 4(a) for ﬁxed N , and varying values of P and S . For each of the values of S , the standard deviation σ A to which the system converges shows anon-trivial dependence on P . For high P , the agents fail to reach any anti-coordination at all, and behaveequivalently to the ’no anti-coordination’ Nash equilibrium. As P is lowered, the anti-coordination that theagents achieve increases, until ﬁnally a turning point is reached, where σ A shoots up, eventually reachingvalues much higher than they would have achieved if they had not learned at all.Diﬀerent values of N are easily incorporated, as σ A / √ N depends only on the ratio α ≡ P/N [3, 5]; thisis shown for S = 2 in Fig. 4(b). The same holds for S >

2. The non-monotonic behaviour is known to be dueto a phase transition at α c ≈ .

34 ([3, 5]). The phase transition corresponds to a type of transition knownfrom the theory of disordered systems, called replica-symmetry breaking [3, 11]. For α > α c , the distributionfrom which A is eﬀectively drawn is Gaussian, as in Fig. 2(b) (and repeated in Fig. 5(a). The low- α phaseshows qualitatively diﬀerent behaviour: The histogram shown in 5(b) shows that the distribution of A is nolonger Gaussian, and has rather extreme outliers. In the context of the reserve power these outliers representdangerous situations, where the amount of arbitrage is much higher than would be naively expected.The low- α phase has the further detrimental property that σ A / √ N increases with decreasing α ; Fig. 4(b)shows that for small α , the standard deviation scales as σ A / √ N ∼ / √ α . Since α = P/N , this means thatfor ﬁxed P the deviations scale as σ A ∼ N instead of the expected σ A ∼ √ N . The strength of the ﬂuctuationstherefore depends non-trivially on the number of agents N . Starting at high α (low N ), σ A / √ N is more orless constant and equal to 1. Therefore, σ A increases as √ N , as one would initially expect. Increasing N further decreases α and actually makes σ A growing slightly slower ( σ A / √ N decreases, as seen in Fig. 4(b),due to anti-coordination by the agents). Increasing N even more, the phase transition is reached, untileventually σ A ∼ N . To apply the ideas developed for the minority game to reserve power arbitrage, we need to generalize thegame to more realistic assumptions. First of all, let us consider some realistic parameter values. Formallywe follow in this section ref. [4], where, however, the meaning of I is quite diﬀerent. In ref. [4], the El Farol’sbar problem is studied (the prototype realization of a minority game), where W = N players consider goingto the El Farol bar, and I plays the role of the maximum number of visitors of the El Farol bar, for whichthe available space in the bar is still convenient for the visitors. We leave W and I arbitrary, and ﬁrst takethe weights uniformly: w i = W/N . We note that the value of W simply rescales the amount of arbitrage A ,12 a) (b) Figure 4: (a) Scaled ﬂuctuations of arbitrage for N = 1025, as a function of P for diﬀerent S . Each data-point is an averageover 100 samples. The dashed line corresponds to the hypothetical case, where agents would not have learned at all, given bythe ’no anti-coordination’ Nash equilibrium. (b) Scaled ﬂuctuations for S = 2. The curves for diﬀerent N collapse on eachother if they are shown as a function of α ≡ P/N . and for linear cost-functions therefore does not inﬂuence the dynamics. The case discussed in the previoussection corresponds to W = N and I = 0.We change the bias of the initial drawing of strategies (as discussed in section 2) such that a = ± S = 1), implying they have no ability to learn, they behave according to theno anti-coordination Nash equilibrium. Here we will choose also S = 2, for which it is not automaticallyguaranteed that they approach this Nash equilibrium. We note that for realistic learning dynamics alsothe bias itself should be obtained by some learning process; to our knowledge, no such dynamics has beenstudied so far, although it would amount to an interesting extension. We stick with the procedure given in[4], and change the bias as discussed above, according to the no anti-coordination Nash equilibrium. Thiscorresponds to choosing a i = ± p = ± A ∗ W . As a reminder, A ∗ solves I − R ( A ∗ + η ) = 0.For linear reserve price function R , it is equal to the mean amount of arbitrage. The no anti-coordinationNash-equilibrium would give (assuming uniform weights) the standard deviation: σ A = (cid:112) W − A ∗ √ N = W √ N × (cid:112) − ( A ∗ /W ) . (10)For S = 1 the agents do not learn, and (due to the bias) behave according to this Nash equilibrium, deﬁnedin terms of the probability p in the stochastic strategy which leads to I − R ( A ∗ + η ) = 0. For S ≥ α phase the standard deviation still scales exactly with (cid:112) W − A ∗ , simply replacing the ﬁrst equalitysign in Eq. 10 with proportionality: σ A ∝ (cid:112) W − A ∗ √ N , (11)13 a) (b)

Figure 5: Histograms of A t (after convergence) for (a) α = 1 > α c and (b) α ≈ . < α c ; N = 4100, averages over 10 time-steps. For α < α c (b), the distribution is strongly non-Gaussian, in contrast to the α > α c case (a). where the proportionality factor depends on α . However, while the proportionality is exact in the high- α phase, there is a small correction in the low- α phase.To illustrate the scaling according to Eq. (11) we will use the reserve price function R ( A + η ) = A .Solving I − R ( A ∗ + η ) = 0 then simply gives A ∗ = I . In Fig. 6(a) we plot the resulting standard deviationas a function of α , for diﬀerent values of I . The results are nearly indistinguishable from simply multiplyingthe standard deviation by √ W − I (Fig. 6(b).The case with non-uniform weights has been worked out in [6]. It was found that for weights w i , drawnfrom a distribution P w ( w ), the results are not simply changed through the scaling σ A ∝ √ X ≡ (cid:113) w /w ,(where the overbar denotes an average over P w ( w )), which is what the ’no anti-coordination’ Nash equilibriumwould imply (section 3.1). In Fig. 7(b) we show the results for an exponential distribution ( X = 2), thePareto distribution P w ( w ) ∝ w − for w > / X = 4 / X ≈ σ A ∝ √ X is not exact, it still gives a good approximation. This isdespite the fact that the realistic distribution has a very small eﬀective number of agents ( N/X ≈ N/X and the realistic weightsof contributions to the overall power W do not interfere with the underlying structure of a phase transition.The scaling of the variance σ A ∝ ( W − I ) / ( N/X ) is not exact. However, in this section we have foundthat despite a few quantitative diﬀerences, the scaling gives a very good approximation over a large range ofdiﬀerent values. Apparently the choice of bias in the selected strategies (here also for S ≥

2) is responsiblefor the success of the scaling relation. It thus provides a very useful way of understanding the behaviour ofthe variance of A for a very wide variety of intraday prices I and distributions of weights { w i } and shouldbe exploited for optimizing the choice of parameters in realistic market designs.14l (a) (b) Figure 6: Rescaling of the ﬂuctuations as a function of α for various intraday prices I with N = 1025 and S = 2, P = αN .Each data point is an average over 100 samples. For further explanations see the text. In general arbitrage is not the only contribution to the total imbalance. Even in the absence of arbitrage,there are BRPs that do not have their portfolio balanced due to deviations of renewable power or consumptionfrom their predictions. As in section 2 we denote the imbalance caused by these ﬂuctuations by η . Since bydeﬁnition the deviations cannot be predicted by the BRPs, we treat them as a random variable, with zeromean. They are independently drawn for each time-step. They thus introduce noise into the minority game.We will now inspect how this noise aﬀects the dynamics of the agents. First, let us understand how the noise changes the pay-oﬀ for the agents. Up to order 1 /N , we can splitthe expected pay-oﬀ of correlated a and I − R ( A + η ) into the product (cid:104) a (cid:105)(cid:104) I − R ( A + η ) (cid:105) . The expectation (cid:104) I − R ( A + η ) (cid:105) thus determines which of the choices a i = ± R ( A + η ) = I + c ( A + η − A ∗ ), the expected pay-oﬀ is given by −(cid:104) c · ( A + η − A ∗ ) (cid:105) = − c · (cid:0) (cid:104) A − A ∗ (cid:105) + (cid:104) η (cid:105) ). Any noise with mean zero will on average not change the preference between a i ± (cid:104) η (cid:105) to average out to zero, onethus expects noise in η not to inﬂuence the choices of the agents at all, and to be uncorrelated with A .The variance of the total imbalance A + η (denoted by σ ) would then simply be the sum of the separatevariances: σ = σ A + σ η . The results of Fig. 8 show that this is indeed the case. It is well known from the physics of complex systems that noise in combination with nonlinear dynamicsmay have counterintuitive or unforeseen eﬀects, in particular constructive ones as we shall see in this section.For a nonlinear reserve power price or pay-oﬀ function, the role of noise is more interesting than for linearreserve power price. For linear price function we had the simpliﬁcation that (cid:104) I − R ( A + η ) (cid:105) = c (cid:104) A + η − A ∗ (cid:105) a) (b) Figure 7: (a) Visualization (with logarithmic y-scale) of diﬀerent weight distributions: an exponential distribution, a Paretodistribution with P w ( w ) ∝ w − and the realistic distribution described in Appendix A, all scaled such that (cid:104) w (cid:105) = 1. (b)Rescaled ﬂuctuations for diﬀerent weight distributions, where the rescaling is prescribed by the ’no anti-coordination’ Nashequilibrium estimate. For the realistic distribution, the weights are set exactly according to Appendix A; for the otherdistributions, the weights are drawn independently according to the respective distribution. Each data point is an average over100 samples. only depends on the mean of P ( A + η ), and thus setting it to zero uniquely determines the mean amountof arbitrage. For nonlinear price function, (cid:104) R ( A + η ) (cid:105) = (cid:82) ∞−∞ d x R ( x ) P ( A + η = x ) depends on the wholedistribution of A + η . To understand how this changes the outcomes of the game, let us ﬁrst consider theprice function up to quadratic order around A ∗ : R ( A + η ) = I + c ( A + η − A ∗ ) + c ( A + η − A ∗ ) (12) → (cid:104) R ( A + η ) (cid:105) = I + c (cid:104) A − A ∗ (cid:105) + c ( σ A + σ η ) . (13)Thus, setting (cid:104) I − R ( A + η ) (cid:105) = 0 shows that, if we consider Nash equilibria of the non-repeated game, thevariance in A and η shifts the mean of A away from the value that it would have for linear cost-function(which would be A ∗ ). There is thus an interaction between the mean and variance of the imbalance A + η .If c is positive the mean of A becomes lower, and vice-versa. Essentially this eﬀect is due to the fact that inthe presence of a positive second derivative of the price function, noise lowers the proﬁts of the arbitrageurs,causing them to leave some arbitrage opportunities unexploited (which would have been worth exploiting inthe absence of noise).A look at the example price function of Fig. 1(b) shows that for positive values of the imbalance, it isin general the case that the second derivative is indeed positive throughout almost the entire graph . Asshown in Appendix B, if the marginal price of reserve power increases fast enough, the second derivative ofthe price function is always positive. Positive second derivative of the price function means that noise η and The actual function changes every four hours. Although it is often similar to the one in Fig. 1(b), it occasionally occursthat the second derivative is negative throughout large parts of the graph. igure 8: The standard deviation of A + η , denoted by σ , for a linear pay-oﬀ function. The dashed lines correspond to theprediction that the variance in A and the variance of the noise add linearly, that is σ = σ A + σ η , that is, for given σ η , measured σ A , calculate σ , and compare it to the numerically measured σ (crosses and dots). Gaussian noise η has been added with ﬁxed N = 1025 and S = 2. Each data point is an average over 100 samples. variance in A lower the expected amount of arbitrage.To further conﬁrm that a positive second derivative of the price function implies that noise lowers theexpected amount of arbitrage, we run the ’basic’ minority game from section 2 with N = 500 and S = 2.This time we add a small quadratic component to the price function: R ( x ) = x + c x . (14)Fig. 9 shows the expected amount of arbitrage (cid:104) A (cid:105) as a function of the noise (Fig. 9(a), for c = 1 / c (Fig. 9(b), for σ η = 50) for α = 0 . α = 1. In section 2 we found that, even for zero noise,the minority game for α = 0 . A . Consequently, Fig. 9(a) shows a mean amountof arbitrage that is lower than that for α = 1. In both cases the total variance of A + η increases when thenoise strength increases, which in turn means that the mean is decreasing when the strength of the noiseincreases (Fig. 9(a)). (The larger the noise, the lower the mean to satisfy the Nash equilibrium condition (cid:104) I − R (cid:105) = 0.) Likewise, Fig. 9(b) shows that the expected amount of arbitrage can also be decreased byincreasing the non-linearity of the reserve power price function (as determined by the value c ).Note that the discussion so far holds for arbitrary distributions of noise, as for a quadratic price functiononly the variance of the noise enters the average pay-oﬀ. For general price functions, the entire distributionsof the noise and of A are important. However, similar behaviour may hold: In Appendix C we consider adistribution of A + η which is symmetric about its mean. We show that on any interval where the secondderivative of the price function is positive, a broadening of the distribution, that is, a shift of the probabilitymass away from the mean amount of imbalance (e.g. due to noise η , or due to the distribution of A itself) We keep the initial drawing of the strategies unbiased, i.e. s ( µ ) = ± a) (b) Figure 9: Decrease of the expected arbitrage (cid:104) A (cid:105) (a) as the external ﬂuctuations η increase, (b) as the strength of the nonlinearityintroduced by c increases. Here for the minority game with quadratic price function from Eq. 14, for N = 1025, S = 2. increases (cid:104) R ( A + η ) (cid:105) , and therefore decreases the mean amount of arbitrage for which a Nash equilibrium isachieved. The reverse also holds: A negative second derivative increases the mean amount of arbitrage. In section 3.1 we found that the less agents are involved in the game, the lower the severity of theﬂuctuations σ . This is disproportionally true for agents of relatively high weight. It is thus beneﬁcial toprevent these agents from considering arbitrage altogether. The way this can be achieved is by threateningthe agents with legal prosecution if they are identiﬁed to be involved in arbitrage (such measures have beenapplied already to the energy market [1]). Nevertheless, such prosecution may not completely fend oﬀ thearbitrageurs: If they expect a large enough proﬁt, the prospect of this ﬁnancial proﬁt may outweigh therisk of being punished by the relevant authorities. To investigate the response of the agents to the threat oflegal prosecution, we give the agents an additional course of action, following ref. [7] to a large extent. Theagents will refrain from arbitrage altogether if they do not expect the proﬁt to outweigh the risk of beingprosecuted.So far we have assumed that agents always decide to play either a i = 1 or a i = −

1. This means thatthe agents never refrain from arbitrage. If the agents are eager to make a proﬁt this is natural, as a i = 1makes a proﬁt (up to O (1 /N )) whenever I > R ( A ) and a i = − O (1 /N )) whenever I < R ( A ) (neglecting the correlations between (cid:104) a i (cid:105) and (cid:104) A (cid:105) ). Thus, a given forecast of A would prescribe anagent to decide a i = ±

1. However, in reality agents are not willing to perform arbitrage for arbitrarily smallproﬁts: For one, since the behaviour is illegal, they will only take their chances if they expect the proﬁt to belarger than the risk derived from being noticed by the authorities. To include this in the minority dynamics,we give each agent i a risk-aversion (cid:15) i > s from the evaluation U ts : as the evaluation U ts represents the total pay-oﬀ an agent would haveachieved if he would have always used strategy s , the pay-oﬀ per time-step that he expects from a strategy18 is simply equal to U ts /t . The dynamics of the minority game is then altered as follows: • If an agent i has a strategy with expected pay-oﬀ U ts /t larger than (cid:15) i he proceeds as usual; • otherwise, he plays a ti = 0 independently of the signal µ t .In other words, if an agent expects the pay-oﬀ to be worth the risk, he will perform arbitrage; otherwise hewill refrain from arbitrage altogether.Before actually running the minority game, let us ﬁrst estimate the eﬀects of this modiﬁcation. Agentswill refrain from playing as long as their expected proﬁt is smaller than (cid:15) i . The expected proﬁt (up to O (1 /N )) is (cid:104) a (cid:105)(cid:104) I − R ( A + η ) (cid:105) . An agent will thus refrain from playing unless: (cid:104) a i (cid:105)(cid:104) I − R ( A + η ) (cid:105) ≥ (cid:15) i . (15)This requires A to be closer to zero (on average) than for the (cid:15) = 0 case to increase the average pay-oﬀ.(The only way the inequality can be satisﬁed for a substantial fraction of the agents, is a situation in whicha number of agents refrain from playing: In this case A becomes closer to 0. Since d R ( x )d x >

0, this means (cid:104) I − R ( A + η ) (cid:105) goes up if A >

0, while it goes down if

A <

0, independently of

I > R or I < R . Thesecases imply that for most agents (cid:104) a i (cid:105) > (cid:104) a i (cid:105) <

0, respectively, therefore increasing proﬁts of most ofthe active agents. The same eﬀect cannot be achieved by agents simply changing their expected (cid:104) a i (cid:105) : If itincreases (decreases), A also increases (decreases), thereby lowering the average proﬁts.)To investigate whether this is achieved by the minority game dynamics, we choose w i = 1 for all i , η = 0,linear reserve price function R ( A + η ) = A , I = 500, S = 2, N = 2000 and homogeneous (cid:15) . Strategies arebiased as in section 3.3, where for a given s and µ , a = ± ± A ∗ W . The resulting (cid:104) A (cid:105) is shown in Fig. 10, for diﬀerent values of (cid:15) and α . It can be seen that the larger the risk-aversion (cid:15) , thelower the expected amount of arbitrage. Note that (cid:104) A (cid:105) is not monotonically decreasing with decreasing α ,but is larger for α = 0 . α .We investigate whether the structure of the standard version of the minority game (section 2), includingthe phase transition, remains intact. To this end we run the same version of the risk-averse minority game,and measure σ A for a wide range of values of α = P/N . The results are shown in Fig. 11(a). The caseof (cid:15) = −∞ corresponds to the standard minority game discussed in section 2, whereas (cid:15) = 0 correspondsto the situation where the agents are willing to perform arbitrage whenever they expect to make a proﬁt(that is, expect their pay-oﬀ to be positive), no matter how small. The results for these two situations aresimilar, as shown in Fig. 11(a), leaving the overall structure of the results intact. As soon as (cid:15) >

0, however,a transition to diﬀerent behaviour is seen. For high values of α the results remain the same as for (cid:15) = −∞ .However, the location of the phase transition is diﬀerent: For (cid:15) >

0, the phase transition occurs at muchlower values of α , and the standard deviation σ A reaches very low values.For positive risk-aversion (cid:15) there is thus a range of α -values for which σ A reaches very low values, largelydecreasing the range for which the system is in the α < α c phase. As this phase would be associated with19 igure 10: Decreasing expected arbitrage with increasing levels of risk aversion (cid:15) , for N = 2000, S = 2, I = 500, for diﬀerentvalues of α . (a) (b) Figure 11: (a) Eﬀect of risk-aversion on the ﬂuctuations of arbitrage, for N = 2000, S = 2, I = 500, as a function of α , showingthat for positive (cid:15) the phase transition occurs at very low values of α (with corresponding low values of σ A ). (b) Fluctuations ofarbitrage for ﬁxed risk-aversion (cid:15) = 1, S = 2, for diﬀerent N : N = 2000 , ,

500 and I = 500 , , α , although the high- α -phase is almost not aﬀected. strong outliers (see Fig. 5), even a small risk-aversion (cid:15) has a positive eﬀect for decreasing the risk of reservepower exhaustion.We also investigate whether α ≡ P/N remains the only control parameter determining the behaviour ofthe system. In Fig. 11 (b) we show the same risk-averse game as before, with (cid:15) = 1, for diﬀerent values of N .If α is truly the only control parameter (as it was for the standard minority game of section 2), the resultsshould be the same for diﬀerent values of N , as long as α = P/N is kept constant. Interestingly, while forthe high α the results in Fig. 11(b) show that this is the case (to a good approximation), the phase transitiondoes not occur at the same value of α : Rather, it occurs at approximately constant P ≈

20. For N → ∞ this means that α c →

0. For P of the same order of N this means that the system is always in the α > α c phase. We note that the authors of [7], who investigate the I = 0 case, ﬁnd analytically that the phase20 igure 12: The minority game with heterogeneous risk aversion: Either agents with high weight are risk averse while the agentswith low weight are not, or vice-versa. Simulated for the realistic weight distribution described in Appendix A, S = 2, P = 120,and I = 50. .transition completely disappears when N → ∞ ; it would be interesting to study whether this correspondsto the same mechanism.Finally, we want to investigate another aspect of risk aversion. In section 3.1 we found that the ’noanti-coordination’ Nash equilibrium suggests that agents with high weight have a disproportionally strongcontribution to the ﬂuctuations σ A . In order to discourage arbitrage by threatening with penalties, this wouldimply that the focus should be especially on BRPs that trade in large volumes of power. To investigate this inthe context of the learning dynamics of the minority game, we take the realistic weight distribution describedin Appendix A and split the agents into two groups: Agents with high weight, and agents with low weight,making sure that the total weight in each group is the same. We then give each of the groups a diﬀerentrisk-aversion: Either the group of agents with low weight has (cid:15) = 0 and the group of agents with high weighthas non-zero (cid:15) , or vice-versa. We then measure how the overall magnitude of the ﬂuctuations, σ A , dependson the strength of the non-zero (cid:15) , and for which group this non-zero (cid:15) is implemented. The results aredisplayed in Fig. 12, showing that if agents with high weight have a large value of (cid:15) , the ﬂuctuations stronglydecrease. In the opposite situation, where agents with high weight have (cid:15) = 0 and agents with low weighthave high (cid:15) , the ﬂuctuations (unexpectedly) increase as compared to no risk aversion.

4. Conclusions: Suggestion of Measures for Controlling the Amount of Arbitrage

As conclusions from the results of the previous sections we suggest some measures for controlling theamount of arbitrage and distinguish between economic incentives via suitable price policies and statutorymeasures. 21 .1. Economic incentives

Determining the reserve price by the merit order is essential, as it means the reserve price has a positiveﬁrst derivative, such that any agent performing arbitrage reduces the arbitrage opportunities for the otheragents (the minority mechanism). In addition to this, the most natural way to decrease the incentive forarbitrage is to increase the reserve energy price. Indeed most measures that have actually been implementedhave focused on this aspect [1].

A signiﬁcant measure that has been implemented, designed to remove all incentives for harmful arbitrage,is to couple the reserve price to the intraday price [1, 9]. We have seen that if the intraday price is higherthan the reserve energy price, there is an incentive for the agents to perform the arbitrage corresponding tothe decision a = 1 (sell energy on the intraday market, feed too little energy into the grid). This increases A . On the other hand, if the intraday price is lower than the reserve energy price, there is an incentivefor the agents to perform the arbitrage corresponding to a = − A . From the point of view of the stability of the grid, onewishes to have an as small imbalance as possible. In our framework, this means that A + η should be closeto zero. Arbitrage increasing or decreasing A is thus harmful if A + η > A + η <

0, respectively. Inthese cases the action of the arbitrageur increases the imbalance A + η , and thus the risk of exhausting allof the reserve power. On the other hand, these actions can also have a positive impact on the security ofthe grid: If A + η < A + η >

0, respectively, the arbitrage brings the total imbalance closer to 0. Tomake sure that any performed arbitrage is always helpful, a price function must therefore have the followingrequirements, depending on the sign of the imbalance: R ( x ) > I if x > ,R ( x ) < I if x < . (16)After the events of June 2019, this has been pursued by simply implementing a cut-oﬀ to R ( x ) [1]: R ( x ) =  − . I avg R ( x ) > − . I avg and x < . I avg R ( x ) < . I avg and x > R ∗ ( x ) otherwise , (17)where R ∗ ( x ) is what the price function would have been without this rule, and I avg is the average price onthe intraday market, for a given 15 − minute interval. The interpretation is the following: • If the price function is such that the requirements in Eq. 16 hold, there is no intervention. • If the requirements do not hold, the price function is replaced by a constant price for which therequirements do hold. 22 igure 13: Histogram of the intraday price when the market closes, minus the 1 .

25 times the average intraday price (which, ifit is positive, means that there are opportunities for arbitrage). The data shown contains every 15-min interval on the intradaymarket between 01-01-2017 00:00 and 07-11-2017 24:00, if the diﬀerence is positive and the total trading volume on the intervalis larger than 500 MW. Data from [8].

The factor of 1 .

25 serves there as a safety margin: I avg is the average price (for power to be delivered in agiven 15 − min interval) on the intraday market. At a speciﬁc moment in time the actual intraday price canthus be higher than I avg . Although the cut-oﬀ certainly helps to reduce the incentives for arbitrage, for thisreason it does not prevent it completely. Figure 13 shows that large diﬀerences between intraday prices andthe average intraday price do occur; in particular, it shows the diﬀerence between 1 . I avg and the intradayprice at the closing of the market. As the intraday price at closing time often exceeds 1 . I avg , sometimesby a large amount, opportunities for arbitrage remain. Changing the cut-oﬀ such that the price at closing ofthe market (rather than the average price) is taken into account may further reduce arbitrage opportunities.In practice the price at closing of the market might be too volatile for this purpose, such that a combinationof the average price and of the price at closing may be more suited. The measures that have been taken so far have focused on the fact that increasing the reserve powerprice removes economic incentives for harmful arbitrage. In this paper we have furthermore investigatedthe implications of the minority mechanism, which gives rise to ﬂuctuations around the expected amountof arbitrage. We found that these lead to a signiﬁcant risk of exhausting a large amount of reserve power(section 3.1). So far little attention has been paid to this aspect.We introduced learning dynamics and found that for a range of parameters (in our model, for α >α c ≈ . (cid:104) R ( A + η ) (cid:105) increases.In view of the impact of ﬂuctuations, one may think of optimizing the design of the market in termsof bounds on the number of participating BRP parties. We have seen a sensible dependence on N (viathe parameter α = P/N ). Our results were restricted to the eﬀect of ﬂuctuations on the arbitrage andtherefore not representative for all market activities, but they have indicated that the dependence of N maybe non-monotonic and depend on the distribution of ﬂuctuations which need not be Gaussian. Shortly said,volatility need not increase with only √ N . Strictly speaking, reserve power arbitrage is illegal. Despite this, statutory measures taken to penalizearbitrageurs can only discourage them to some extent: If they expect a large enough pay-oﬀ, they mighttake the risk. This is the set-up we studied in section 3.5, where we found the following: • Making the agents apprehensive to perform arbitrage decreases the expected amount of arbitrage. Thelarger the fear of punishment (risk aversion), the lower the amount of arbitrage. • Inspecting the phase structure of the minority game, even a very small fear of penalty signiﬁcantlydecreases the range of α over which the collective dynamics behaves according to the dangerous α < α c phase. Instead, the critical α is shifted toward smaller values and the small ﬂuctuations reveal aneﬀective large degree of anti-coordination between the agents, dramatically reducing the ﬂuctuations. • Placing an extra focus on statutory prosecution for high-weight agents decreases the ﬂuctuations.Overall the conclusion is that the threat of statutory prosecution of arbitrageurs has a positive eﬀect ondecreasing the risk of reserve power exhaustion, and that placing emphasis on high-weight agents is dispro-portionally eﬀective at decreasing the ﬂuctuations in the total amount of arbitrage. More surprising is thefact that if the agents experience even a very small risk of legal prosecution, this may already dramaticallydecrease the strength of the ﬂuctuations for a certain range of α , and thereby decrease the risk of reservepower exhaustion. 24 . Summary and Outlook As conclusions from our results we have suggested economic and statutory measures to protect the marketfrom the detrimental eﬀect of arbitrage. Implementing the mechanism of minority games in the descriptionof the energy market has led to some useful insights which should be relevant also in less stylized and morerealistic models of the market. These insights are speciﬁc for the physics approach. One is related to theidentiﬁcation of an underlying phase transition (in spin glasses from the replica-symmetry broken phase tothe replica-symmetric phase). In our case, varying the dimension of the space of strategies P and/or thenumber of agents N , thereby tuning α = P/N , leads from a phase of unexpectedly strong outliers in thearbitrage to a phase in which the arbitrage is Gaussian distributed (so that outliers are strongly suppressed).Intuitively, such a transition which is responsible for the qualitatively diﬀerent behavior is not accessible.We have conﬁrmed the occurrence of this phase transition under several extensions of the minimal minoritygame model. The transition appears rather stable under variations of the input such as the replacement ofa uniform distribution of power contributions of the BRP parties to the market by a power-law distributionand a distribution taken from real data.A second feature which we observed, is well known from complex systems if noise acts in combination withnonlinear dynamics, here the nonlinear reserve price function. The eﬀect of noise can be counterintuitiveas conﬁrmed here. More external ﬂuctuations on the energy market (stronger η ) reduce the amount ofarbitrage, such that it is possible for the ﬂuctuations to have a positive eﬀect on the balance of the system.The action of noise is even more subtle in the case of colored noise. Together with a nonlinear price function,it then has to be determined from case to case how higher moments of a noise distribution shift the Nashequilibrium toward higher or lower arbitrage, having a beneﬁcial or detrimental eﬀect. These insights onthe impact of ﬂuctuations should add upon results from economics which mainly focus on measures directlyrelated to the control of arbitrage rather than on its ﬂuctuations.For future work it seems interesting to further elaborate on the interaction between various realisticsources of noise and nonlinear price functions. Moreover, since agent-based-modeling should be comple-mented by an analytical treatment, we think of applying the cavity method as alternative to the replicatrick. The goal then would be to minimize the “implicit information” in the market by ﬁnding the groundstate of the related spin-glass problem, here in terms of the strategies used by the agents. Declaration of Interest

The authors report no conﬂicts of interest. The authors alone are responsible for the content and writingof the paper. 25 cknowledgments

We would like to dedicate this work in memoriam Dietrich Stauﬀer. One of us (H.M.-O.) is indebted toDietrich Stauﬀer for his constructive support and valuable discussions when changing the ﬁeld of researchfrom particle physics to network science in 2001. We thank also Martin Palovic (Jacobs University Bremen)for useful discussions from the point of view of an economist and for alluding to reference [1]. The authorsgratefully acknowledge the support from the German Federal Ministry of Education and Research (BMBF,Grant No. 03EK3055D).

Appendix A. Order of magnitude of the ﬂuctuations

In section 3.1 we derived as order of magnitude for the ratio of the size of ﬂuctuations and the meanamount of arbitrage the relation: σ A µ a = (cid:112) W /µ a − (cid:112) N/X . (A.1)To apply it to realistic data, we need to have an estimate of the distribution of weights w i . As noticed in[1], in June 2019 all of the BRPs were imbalanced in the same direction, that is, they did not feed enoughpower into the grid, this corresponds to the case where all BRPs play a = 1, in which case the distributionof contributions to the imbalance would be equal to the distribution of weights. From [1] we know that onJune 25, 2019 there was a total imbalance of ≈ ≈ ≈ N/X ) gives the following distribution in units of MW: w i =  i = 1 , . . . , i = 6 , . . . , i = 11 , . . . , i = 21 , . . . , . (A.2)This leads to N/X ≈ .

3. Estimating

W/µ A ≈ √ ≈ . µ A , so this should betaken as an order of magnitude) and combining these results gives σ A µ A ≈ .

4. The same approach appliedto the data of June 6, 2019 and June 12, 2019 gives

N/X ≈ . N/X ≈ .

5, respectively, such thatthese numbers give little higher estimates of σ A µ A , around 0 .

5. As mentioned before, these estimates shouldmerely be taken as an order of magnitude, but they indicate that the ﬂuctuations are about half of the sizeof the average arbitrage. 26 ppendix B. Characteristics of the pay-oﬀ function

Reserve power is activated according to the merit order, from cheap to expensive. As the total amountof activated reserve power, x , increases, the marginal price per unit of reserve power, pr( x ), thus increasesas well. The total costs of an amount of reserve power x are given by (cid:82) x0 pr( x (cid:48) ) d x (cid:48) . The average price perunit of reserve power is thus given by: R ( x ) = (cid:82) x0 pr( x (cid:48) ) d x (cid:48) x . (B.1)Note that R ( x ) ≤ pr( x ) if x > R ( x ) ≥ pr( x ) if x <

0. It is a non-decreasing function of x :d R ( x )d x = pr( x ) − R ( x ) x (B.2) ≥ pr( x ) − pr( x ) x = 0 . (B.3)For positive R ( x ), assuming that pr( x ) and x have the same sign , the second derivative is positive if dpr( x ) / d x pr( x ) is large enough: d R ( x )d x = 1 x (cid:0) d pr( x )d x − R ( x )d x (cid:1) (B.4) ≥ x (cid:0) d pr( x )d x − x ) x (cid:1) (B.5)= C ( x ) (cid:16) dpr( x ) / d x pr( x ) − x (cid:17) , (B.6)where C ( x ) = pr( x ) x is assumed to be positive. Thus if dpr( x )d x > x ) x , the second derivative of R ( x ) ispositive (the reverse does not necessarily hold). If R ( x ) is negative, the inequality in Eq. B.5 is reversed,and dpr( x )d x > − x ) x implies negative second derivative instead (note that in this case pr( x ) < Appendix C. Non-linear price functions

We search for conditions, under which statements about the eﬀect of a nonlinear price function onarbitrage are possible. Assuming that P ( A ) is symmetric around A = A ∗ , we have: (cid:104) R ( A + η ) (cid:105) ≡ (cid:90) ∞−∞ d x P ( A + η = x ) R ( x ) (C.1)= (cid:90) ∞ A ∗ d x P ( A + η = x ) R ( x ) + (cid:90) A ∗ −∞ d x P ( A + η = x ) R ( x ) (C.2)= (cid:90) ∞ d∆ P ( A + η = A ∗ + ∆) (cid:2) R ( A ∗ + ∆) + R ( A ∗ − ∆) (cid:3) , (C.3)where ∆ = | A + η − A ∗ | . We have:dd∆ (cid:2) R ( A ∗ + ∆) + R ( A ∗ − ∆) (cid:3) = (cid:2) R (cid:48) ( A ∗ + ∆) − R (cid:48) ( A ∗ − ∆) (cid:3) . (C.4)We give two suﬃcient conditions for this derivative to be positive. For a given ∆: The assumption is that both positive and negative reserve power always comes at a cost. This is in general the case, butfor x negative and close to 0 this assumption does not always hold [9]. Either R (cid:48)(cid:48) ( x ) > A ∗ − ∆ < x < A ∗ + ∆, • or R (cid:48) ( A ∗ − ∆) = 0 and R (cid:48) ( A ∗ + ∆) > (cid:104) R ( A + η ) (cid:105) = (cid:90) ∞ d∆ P ( A ∗ + ∆) f (∆) , (C.5)with d f (∆)d∆ > A ∗ ) always increases the expectation value ofthe price as a function of the arbitrage. References [1] 50 Hertz, Amprion, TenneT, T. BW, Untersuchung von Systembilanzungleichgewichten in Deutschlandim Juni 2019 (2019).URL [2] Inductive reasoning and bounded rationality, The American Economic Review 84 (1) (1994) 406–411.URL [3] A. Coolen, The Mathematical Theory of Minority Games, Oxford University Press, 2005.[4] D. Challet, M. Marsili, G. Ottino, Shedding light on El Farol, Physica A: Statistical Mechanics and itsApplications 332 (2004) 469–482. doi:10.1016/j.physa.2003.06.003 .URL [5] D. Challet, M. Marsili, Y.-C. Zhang, Modeling market mechanism with minority game, Physica A:Statistical Mechanics and its Applications 276 (1) (2000) 284–315, publisher: Elsevier.URL https://ideas.repec.org/a/eee/phsmap/v276y2000i1p284-315.html [6] D. Challet, A. Chessa, M. Marsili, Y.-C. Zhang, From minority games to real markets, QuantitativeFinance 1 (1) (2001) 168–176. doi:10.1080/713665543 .[7] D. Challet, M. Marsili, Criticality and market eﬃciency in a simple realistic model of the stock market,Phys. Rev. E 68 (3) (2003) 036132. doi:10.1103/PhysRevE.68.036132 .URL https://link.aps.org/doi/10.1103/PhysRevE.68.036132 [8] Epex Spot, Products: Intraday auction. (2017).URL [9] regelleistung.net , oﬃcial website by the German TSOs. Accessed: 2020-11-06.[10] S. Tadelis, Game theory: an introduction, Princeton University Press, Princeton ; Oxford, 2013.2811] T. Castellani, A. Cavagna, Spin-glass theory for pedestrians, J. Stat. Mech. 2005 (05) P05012. doi:10.1088/1742-5468/2005/05/P05012 .URL https://iopscience.iop.org/article/10.1088/1742-5468/2005/05/P05012https://iopscience.iop.org/article/10.1088/1742-5468/2005/05/P05012