[PDF] Asymmetric Tsallis distributions for modelling financial market dynamics

Abstract

Financial markets are highly non-linear and non-equilibrium systems. Earlier works have suggested that the behavior of market returns can be well described within the framework of non-extensive Tsallis statistics or superstatistics. For small time scales (delays), a good fit to the distributions of stock returns is obtained with q-Gaussian distributions, which can be derived either from Tsallis statistics or superstatistics. These distributions are symmetric. However, as the time lag increases, the distributions become increasingly non-symmetric. In this work, we address this problem by considering the data distribution as a linear combination of two independent normalized distributions - one for negative returns and one for positive returns. Each of these two independent distributions are half q-Gaussians with different non-extensivity parameter q and temperature parameter beta. Using this model, we investigate the behavior of stock market returns over time scales from 1 to 80 days. The data covers both the .com bubble and the 2008 crash periods. These investigations show that for all the time lags, the fits to the data distributions are better using asymmetric distributions than symmetric q-Gaussian distributions. The behaviors of the q parameter are quite different for positive and negative returns. For positive returns, q approaches a constant value of 1 after a certain lag, indicating the distributions have reached equilibrium. On the other hand, for negative returns, the q values do not reach a stationary value over the time scales studied. In the present model, the markets show a transition from normal to superdiffusive behavior (a possible phase transition) during the 2008 crash period. Such behavior is not observed with a symmetric q-Gaussian distribution model with q independent of time lag.

Full PDF

AAsymmetric Tsallis distributions for modelling financial market dynamics

Sandhya Devi †

509 6 th Ave S, Edmonds, WA 98020, United States of America Email: [email protected]

Abstract:

Financial markets are highly non-linear and non-equilibrium systems. Earlier works have suggested that the behavior of market returns can be well described within the framework of non-extensive Tsallis statistics or superstatistics. For small time scales (delays), a good fit to the distributions of stock returns is obtained with q -Gaussian distributions, which can be derived either from Tsallis statistics or superstatistics. These distributions are symmetric. However, as the time lag increases, the distributions become increasingly non-symmetric. In this work, we address this problem by considering the data distribution as a linear combination of two independent normalized distributions – one for negative returns and one for positive returns. Each of these two independent distributions are half q -Gaussians with different non-extensivity parameter q and temperature parameter beta. Using this model, we investigate the behavior of stock market returns over time scales from 1 to 80 days. The data covers both the .com bubble and the 2008 crash periods. These investigations show that for all the time lags, the fits to the data distributions are better using asymmetric distributions than symmetric q -Gaussian distributions. The behaviors of the q parameter are quite different for positive and negative returns. For positive returns, q approaches a constant value of 1 after a certain lag, indicating the distributions have reached equilibrium. On the other hand, for negative returns, the q values do not reach a stationary value over the time scales studied. In the present model, the markets show a transition from normal to superdiffusive behavior (a possible phase transition) during the 2008 crash period. Such behavior is not observed with a symmetric q -Gaussian distribution model with q independent of time lag. Keywords : non-extensive systems, Tsallis statistics, superstatistics, entropy, nonlinear dynamics, asymmetry, superdiffusion, models of financial markets, econophysics

1. Introduction

Many well-known financial models [1] are based on the efficient market hypothesis [2] according to which: a) investors have all the information available to them and they independently make rational decisions using this information, b) the market reacts to all the information available reaching equilibrium quickly, and c) in this equilibrium state the market essentially follows a random walk [3]. In such a system, extreme changes are very rare. In reality however, the market is a complex system that is the result of decisions by interacting agents (e.g., herding behavior), traders who speculate and/or act impulsively on little news, etc. Such collective/chaotic behavior † Shell International Exploration and Production Co. (Retired) symmetric Tsallis distributions 2 can lead to wild swings in the system, driving it away from equilibrium into the realm of nonlinearity, resulting in a variety of interesting phenomena such as phase transitions, critical phenomena such as bubbles, crashes [4], superdiffusion [5] and so on. The distribution of a system following a random walk can be obtained by maximizing Shannon entropy [6][7] with constraints on the moments. In particular, this maximization with constraints on the normalization and the second moment yields a Gaussian distribution. Therefore, if the stock market follows a pattern of random walk, the corresponding returns should show a Gaussian distribution. However, it is well known [8] that stock market returns, in general, have a more complicated distribution. This is illustrated in figure 1 which compares the distributions of 1 day and 20 day log returns of S&P 500 and Nasdaq stock markets (1994-2014) with the corresponding Gaussian distributions. The data distributions show sharp peaks in the center and fat tails over many scales, neither of which is captured by the Gaussian distribution. Several studies [9][10] indicate that these issues can be addressed using statistical methods based on Tsallis entropy [11] (also known as q -entropy), which is a generalization of Shannon entropy to non-extensive systems. These formalisms were originally proposed to study classical and quantum chaos, physical systems far from equilibrium such as turbulent systems, and long range interacting Hamiltonian systems. However, in the last several years, there has been considerable interest in applying these methods to analyze financial market dynamics as well. Such applications fall into the category of econophysics [5]. As in the case of Shannon entropy, the equilibrium distribution of a non-extensive system can be obtained by maximizing Tsallis entropy with constraints on the moments of the distribution [11]. Constraints on the normalization and the second moment yield a symmetric q -Gaussian distribution. As q →

1, the system tends to an extensive system and the q -Gaussian distribution approaches a Gaussian distribution. Let us now look at the non-equilibrium situation. In the random walk model, the dynamics of the probabilities are described by a Fokker Plank (FP) equation with linear drift and diffusion terms. The solution to this FP equation is a Gaussian distribution. In this case, the autocorrelation of the standard deviation 𝜎 falls as √𝜏 where 𝜏 is the time delay. A generalization of Fokker-Plank equation to non-extensive systems, using statistical methods based on Tsallis entropy, has been given by [12][13]. This results in a Fokker-Plank equation with a linear drift term and a non-linear diffusion term. The solution to this equation, under some assumptions, is a Tsallis q -Gaussian distribution, with the parameter q independent of time. For q < 5/3, the generalized inverse mean square deviation of this distribution follows a power law, with the magnitude of the exponent > 1. This points to a superdiffusive process. This model has been applied to study both high frequency [14][15] and long-term low frequency stock market returns [16]. However, in a previous publication [17] we have shown that for low frequency returns (time delay >= 1 day) with q independent of time delay, the inverse mean square deviation shows a power law behavior with the exponent very close to 1 pointing to a normal diffusive behavior. Hence, for returns longer than one day, the predicted superdiffusive behavior from the non-linear FP equations discussed above is not supported by the empirical results. symmetric Tsallis distributions 3 The non-extensive statistics is relevant for two classes of systems: a) Systems which have interacting sub-systems: Tsallis formulation of non-extensivity based on entropy principle is used for this class. In this case, the entropies of the sub-systems are not additive. b) Superstatistics: Beck [18] showed that non-extensivity can also arise in systems following ordinary statistics if the temperature or energy dissipation rate is locally fluctuating. In this case, the distribution can be considered as the integral of the probability density 𝑓(𝛽) of the inverse temperature β with the Boltzmann-Gibbs (BG) distribution function 𝑒 −𝛽𝐸 . This is a superposition of two statistics – hence the name ‘superstatistics’. When 𝛽 h as a 𝜒 distribution (which is widely applicable to many common systems [19]), the superstatistics yields a q -exponential or q -Gaussian distribution, depending on the definition of E [20][21]. One thing to note in the case of superstatistics is that even though the system follows a BG distribution, because of the temperature variation, the entropy is quasi-additive [18]. This means, if we consider two independent systems 𝐴 and 𝐵 having some value 𝑞, the entropy of the composite system 𝐴 + 𝐵 is the sum of the entropies of 𝐴 and 𝐵 but with a different 𝑞. For dynamical systems, if 𝐴 and 𝐵 belong to smaller time scales, the system with a larger time scale will have a different 𝑞. Hence in superstatistics, the non-extensivity parameter changes with time scale. We will discuss the dynamics of 𝑞 in more detail in section 2 since this is an important aspect of the present analysis. Neither of the two non-extensivity models discussed above supports asymmetry. A perturbative approach to treating asymmetry has been discussed in [22]. However, a look at figure 2, which displays the variation of skew with time lag for S&P 500 and Nasdaq returns for the .com bubble and the 2008 crash periods, clearly shows that the skew rises very rapidly with time lag. This is particularly pronounced during the crash period. Hence, except for very small time scales, the perturbative approach for treating asymmetry will not be adequate. Budini [23] has derived a family of extended 𝑞 -Gaussian distributions starting from two gamma distributions that have different shape parameters but the same temperature parameters. The resulting distribution is asymmetric only in the shape parameter and not in the spatial scale parameter. In this paper we address the asymmetry in the distribution of financial market returns by requiring that both 𝑞 and the ‘temperature’ parameter 𝛽 depend on the sign of the market returns. Since asymmetry increases with time delay, both these parameters change with time scale. The rest of paper is organized as follows. A brief review of derivations of q -Gaussian distributions from Tsallis entropic formulation and superstatistics will be given in section 2. Our formulation of asymmetric distributions will be discussed in section 3. Results of application of this formulation to financial market dynamics will be given in section 4 followed by summary and conclusions in section 5. symmetric Tsallis distributions 4 . q -Gaussian distributions A detailed review of the derivation of q -Gaussian distribution from Tsallis entropy and the estimation of the parameters of the model is given in an earlier publication [17]. As already discussed earlier, in Tsallis statistics the non-extensivity of a system arises due to the long range interactions of the sub-systems. This is evident from the non-additive property of Tsallis entropy 𝑆 𝑞 (𝐴 + 𝐵) = 𝑆 𝑞 (𝐴) + 𝑆 𝑞 (𝐵) + (1 − 𝑞) 𝑆 𝑞 (𝐴) 𝑆 𝑞 (𝐵) (1) The third term on the right hand side of the equation (1) is the result of interaction between the sub-systems 𝐴 and 𝐵. As the non-extensivity parameter 𝑞 → 1, the additive property of the entropies is recovered. Note that 𝑞 for the combined system is the same as for the individual systems. An important thing to note here is that if 𝐴 and 𝐵 refer to financial market returns at two shorter time scales, the bigger system 𝐴 + 𝐵 referring to larger time lag has the same non-extensivity index 𝑞 . So, in this case, 𝑞 is static with respect to time scale . Beck [18] showed that non-extensivity can also arise in systems following ordinary statistics, if the temperature or energy dissipation rate is locally fluctuating. In this case, the distribution 𝑃 can be considered as the integral of the distribution of the inverse temperature β with the Boltzmann-Gibbs (BG) distribution function 𝑒 −𝛽𝐸 . 𝑃(𝐸) = ( ) ∫ 𝑓(𝛽)𝑒 −𝛽𝐸 𝑑𝛽 ∞0 (2) Here 𝑁 is the normalization constant. The right hand side of equation (2) is the weighted average of a BG distribution function. When 𝛽 is chosen to have a 𝜒 distribution, the probability density function is 𝑓(𝛽) = (𝛤(𝛼)) −1 ( 𝛼𝛽 ) 𝛼 𝛽 (𝛼−1) 𝑒 (−𝛼𝛽 𝛽 ⁄ ) (3) where 𝛽 = ∫ 𝛽𝑓(𝛽)𝑑𝛽 In this case, the integral in equation (2) can be shown to be [20][21] the q -exponential function [1 + (𝑞 − 1)𝛽 𝐸] (4) symmetric Tsallis distributions 5 Here . The integral in equation (2) holds for any positive E. In our case, we choose E to be the square of the stock market return Ω. In this case equation (4) becomes the q -Gaussian function [1 + (𝑞 − 1)𝛽 Ω ] (5) The q -exponential function given in equation (4) is a special case of superstatistics where 𝛽 is chosen to have a 𝜒 distribution. There are many other distribution functions besides 𝜒 which can multiply the BG function in superstatistics. Some of these are discussed in [24][25][26]. For low frequency stock returns (one day or longer), a 𝜒 distribution is shown to be appropriate [26][27]. It should be noted that for superstatistics to be a valid model, the variation of 𝛽 should be slow compared to that of Ω . This is shown to be in general true if there are no outliers like isolated sharp spikes in the data [25][26]. For 𝑞 > 1 , 𝑞 can now be given a physical meaning as the relative variance of 𝛽 (𝑞 − 1) = [< 𝛽 > − < 𝛽 > ] < 𝛽 > ⁄ (6) Here the expectation value < > is taken with respect to 𝑓(𝛽) . Let us now consider the non-equilibrium case. Note that unlike in Tsallis statistics, here there is no long-range interaction between sub-systems. However, the variation of 𝛽 gives rise to a quasi-additive property of the entropies [18] 𝑆 𝑞 (𝐴) + 𝑆 𝑞 (𝐵) = 𝑆 𝑞′ (𝐴 + 𝐵) (7) If we consider 𝐴 and 𝐵 as systems with smaller time scales, then the one with larger time delay 𝐴 + 𝐵 will have a different 𝑞 ′ . In general, one can expect the temperature fluctuations of the smaller systems to be larger than those of the composite system and hence the 𝑞 to be a decreasing function of time scale. For 𝑞 close to 1, it can be shown [18] that 𝑞 ′ and 𝑞 are related as (𝑞 ′ − 1) (𝑞 − 1) = [1 + (< ln 𝑃 𝑖 >) < (ln 𝑃 𝑖 ) >⁄ ] −1 ⁄ (8) Equation (8) shows that for 𝑞 values close to 1, 𝑞 monotonically decreases as the time lag increases. The 𝑞 variation with time lag is one of the important things that will be discussed in the present analysis.

3. Asymmetric q -Gaussian distributions symmetric Tsallis distributions 6 To generalize Tsallis statistics and superstatistics to include asymmetry, we consider a distribution (PDF) which is a linear combination of two independent normalized distributions 𝑃 (  ) and 𝑃 (  ) 𝑃(  ) = [𝑃 (  ) + 𝑃 (  )] (9) where  = (𝑥 − 𝜇) 𝜎 ⁄ is a zero mean random variable, 𝜇 is the mean of the data variable 𝑥 and 𝜎 is a standard deviation at the lowest time scale. Denoting the distributions for negative and positive returns  as 𝑃 − and 𝑃 + respectively, we rewrite (9) as 𝑃(  ) = [𝑃 − (  ) + 𝑃 + (  )] (10) where 𝑃 − (  ) = 0  > 0 (11a) 𝑃 + (  ) = 0  ≤ 0 (11b) 𝑃 − (  ) and 𝑃 + (  ) are half q -Gaussians, given by 𝑃 − (  ) = − [1 + (𝑞 − − 1)𝛽 −  ] − )⁄  ≤ 0 (12a) 𝑃 + (  ) = + [1 + (𝑞 + − 1)𝛽 +  ] + )⁄  > 0 (12b) Here, 𝑞 − , 𝛽 − , 𝑞 + and 𝛽 + are the q -Gaussian parameters for negative and positive  respectively. The normalizations 𝑍 − and 𝑍 + are such that ∫ 𝑃 − (  ) 𝑑  = ∫ 𝑃 + (  ) 𝑑  = 1/2 ∞0 (13) so that the complete PDF 𝑃(  ) in (10) is normalized. This gives 𝑍 − = 𝐶 𝑞 − (√𝛽 − ⁄ ) (14a) 𝑍 + = 𝐶 𝑞 + (√𝛽 + ⁄ ) (14b) 𝐶 𝑞 = √𝜋  ( − )√𝑞−1  ( ) (14c) symmetric Tsallis distributions 7 Equations (11a) – (14c) can be derived from the maximization of q -entropy assuming 𝑞 is dependent on the sign of  and giving separate constraints to negative and positive  (see appendix A). In this case, 𝛽 − and 𝛽 + are the Lagrange multipliers for the constraints on the variance. The same results can be obtained from superstatistics by letting the parameters 𝛼 and 𝛽 in 𝑓(𝛽) be dependent on the sign of  . In this case 𝛽 − and 𝛽 + can be interpreted as the expectation values of the temperature with respect to the temperature distributions for the negative and positive values of the variable  . Since  has zero mean, the mean square deviation for the asymmetric case is given by 𝜎 𝑎𝑠𝑦𝑚2 = ∫  𝑃(  )𝑑  = [ 1 (⁄ 𝛽 − (5 − 3𝑞 − )) + 1 (⁄ 𝛽 + (5 − 3q + )) ] (15) for 𝑞 − , 𝑞 + < 5/3. Equations (9) – (15) go over to symmetric case results when 𝑞 − = 𝑞 + = 𝑞 and 𝛽 − = 𝛽 + = 𝛽 . Let us now consider the non-equilibrium case. As discussed in section 2, in the Tsallis entropic model the parameter 𝑞 i s independent of time scale. However, in the superstatistics model 𝑞 varies with time scale. Equation (8) gives the variation for small ( 𝑞 − 1) . Since 𝑃 − and 𝑃 + are independent, the parameters 𝑞 − and 𝑞 + are also independent. Hence following a procedure similar to that given in [18] one can show (see appendix B) that an equivalent relationship can be obtained for the asymmetric case. (𝑞 −′ − 1) (𝑞 − ⁄ − 1) = [1 + (< ln 𝑃 − >) < (ln 𝑃 − ) >⁄ ] −1 (16a) (𝑞 +′ − 1) (𝑞 + ⁄ − 1) = [1 + (< ln 𝑃 + >) < (ln 𝑃 + ) >⁄ ] −1 (16b) The expectation values < > are taken with the corresponding distributions. Note that the 𝑞 for each branch monotonically decrease with time scale.

4. Results

The data chosen for our analysis are S&P 500 and Nasdaq daily (close of the day) stock prices which are de-trended with CPI to remove inflation trends. These are displayed in figure 3.

We will consider the period after 1991 (about a year before the time when electronic trading over the internet was launched), since the character of the stock price variation changes dramatically after that. The time series shows a non-stationary character with wild fluctuations. The data for analysis symmetric Tsallis distributions 8 is divided into two regions bounded by vertical dotted lines. Regions 1 and 2 cover the dot-com bubble period and the crash of 2008 respectively. Each region has 3000 samples. The variables used for the estimation of 𝑞 and 𝛽 are the standardized log returns  (𝑡, 𝑡 ) for delay 𝑡 :  (𝑡, 𝑡 ) = (𝑦(𝑡, 𝑡 ) − 𝜇 𝑡 ) 𝜎 ⁄ (17) computed for several starting times 𝑡 over the period of interest. Here 𝑦(𝑡, 𝑡 ) = ln 𝑋(𝑡 + 𝑡) − ln 𝑋(𝑡 ) X is the stock value, 𝜇 𝑡 is the mean of 𝑦(𝑡) , and 𝜎 is the standard deviation for 1 day log returns. With this choice,  ̅ = . The 1 day standardized log returns are displayed in figure 4. Note that in general the negative log returns show relatively more higher spikes than the positive returns. In [17], we discussed the estimation of the q -Gaussian parameters for the symmetric case using the maximum likelihood estimate method [28]. Here the 𝑞 parameter was held constant as a function of time scale. We follow a similar procedure here to estimate the parameters 𝑞 − , 𝛽 − and 𝑞 + , 𝛽 + from data  ≤ 0 and  > 0 respectively, for time delays 1-80 days. This range is long enough to study the asymmetric effects and short enough so that the number of samples for the parameter estimation is not drastically reduced.

Figures 5 and 6 show the comparison of the asymmetric q -Gaussian distributions obtained from the estimated parameters, with the data distributions for regions 1 and 2 respectively. All the PDF’s have large tails particularly for negative returns. This is the result of large jumps in the time series of returns shown in figure 4. One thing to note is that even at large time delays, the distributions do not approach Gaussian distributions (central limit theorem). In fact, the asymmetry increases with delay. To quantify the goodness of the parameter estimates, Kolmogorov-Smirnov (KS) tests [29] are performed at all delays considered. To do this, synthetic data are generated at each delay using a generalized Box-Müller method for generating q -Gaussian random deviates [30] given 𝑞, 𝛽 values. The synthetic data are standardized in the same way as the empirical data. The maximum absolute distances D max between the empirical and synthetic cumulative distribution functions (CDF) are calculated. If D max exceeds a critical distance 𝐃 𝐜𝐫𝐢𝐭 [31] at a particular significance level, that fit should either be rejected or accepted at a higher significance level. 𝐃 𝐜𝐫𝐢𝐭 is given by 𝐃 𝐜𝐫𝐢𝐭 = 𝑐(𝛾) √(𝑛1 + 𝑛2) (𝑛1 ∗ 𝑛2)⁄ symmetric Tsallis distributions 9 Here, n1 and n2 are the number of samples in the empirical and synthetic CDF’s respectively. The table for function 𝑐(𝛾) at different significance levels 𝛾 can be found in [31]. Figure 7 shows D max as a function of delay for the negative and positive returns branches. Also shown are the critical distances 𝐃 𝐜𝐫𝐢𝐭 for a significance level of 0.05 (confidence 95%). The distances in general are all well below the corresponding 𝐃 𝐜𝐫𝐢𝐭 . However, for region 2 (the crash period of 2008) negative returns, D max increases for higher delays. A look at figure 8, which shows the S&P 500 70 day log returns, explains why this is so. There are a few big spikes for negative returns which dominate the estimates resulting in higher errors. But even for these delays, D max is well below the critical level. For comparison, the KS test results for the symmetric q -Gaussian fit are displayed in figure 9. For easier comparison with the asymmetric case, the results for negative and positive returns are shown separately. In almost all cases, the D max is much higher and closer to 𝐃 𝐜𝐫𝐢𝐭 than in the asymmetric case. Hence any conclusions drawn on the behavior of the estimated parameters are questionable. This is particularly so for the negative returns indicating the importance of considering asymmetry. The asymmetric character of the distributions become more obvious when we look at the variation of 𝑞 − and 𝑞 + with time scale (figure 10). For positive returns, 𝑞 + is greater than 1 but less than the limit 5/3 (equation (15)) for smaller delays (< 20 days) and approaches the value of 1 after that. The attractor in this case is a (half) Gaussian which has a finite variance. The behavior is quite different for negative returns. No stationary state is reached even at 80 days. For region 2, the 𝑞 − values at longer delays are higher than the limit 5/3 and continue to slowly increase. The high values of 𝑞 − are the results of very fat tails as shown in figure 6. Note that, even though the variance is infinite for 𝑞 > 5/3, the attractor (steady state distribution) in this case could be an 𝛼 -stable distribution which has infinite variance and satisfies a generalized central limit theorem [32]. The variation of the estimated 𝛽 − and 𝛽 + , along with error bars, are shown in figure 11 on a log-log scale. The two plots are very close to each other indicating the main contribution to asymmetry comes from 𝑞 and not from the temperature parameter. The straight line character of the plots shows that 𝛽 for both negative and positive returns follows a power law behavior. In an earlier publication [17] we showed that in the framework of Tsallis statistics, where the entropic index 𝑞 is independent of time scale, no superdiffusion behavior is observed either during the .com bubble or the 2008 crash periods. The variance estimated from the symmetric q -Gaussian distribution increases almost linearly with time scale. This points to a normal behavior. symmetric Tsallis distributions 10 The case for asymmetric distributions is given in figure 12, which shows the variation of the mean square deviation 𝜎 𝑎𝑠𝑦𝑚2 (equation (15)) with time delay, on a log-log scale. The straight line character of the plots shows that 𝜎 𝑎𝑠𝑦𝑚2 ∝ 𝑡  In region 1, the  values are close to 1 indicating normal diffusion. A value of  slightly greater than 1 for Nasdaq (which was greatly affected by the .com bubble) indicates a mild superdiffusive behavior. In region 2,  is > 1. This clearly indicates a superdiffusive behavior. At larger time delays 𝑞 − exceeds the limit of 5/3 and 𝜎 𝑎𝑠𝑦𝑚2 is no longer defined. These delays are omitted from figure 12. This transition from normal diffusion in region 1 to superdiffusion in region 2 indicates a possible phase transition. Beck showed [18] that when ( 𝑞 − 1) is small, 𝛿 ⁄ where 𝑟 is the spatial scale. Laboratory observations of hydrodynamic turbulence give a value of ~ 0.3 for 𝛿 . Theoretical estimation by Beck gives a value of 𝛿 ~ 0.4. Figure 13 shows versus time scale for delays up to 20 days, since 𝑞 + → 1 after this. For positive returns, 𝛿 lies between 0.33 - 0.55 which is close to both the laboratory observations and the theoretical estimates of Beck for hydrodynamic turbulence. However, the situation is quite different for negative returns. The (𝑞 − − 1) values do not decrease monotonically with delay. The values remain high at all times and hence the perturbation result (16a) may not hold. The high 𝑞 − values are the results of much fatter tails on the negative return side than on the positive return side (figures 5 and 6). This might be an indication that during heavy market sell off, the reaction of agents is heavily dependent on what others are doing (herding behavior). This in turn indicates that the non-extensive character of the market in this case is likely arising due to long range interactions rather than from the local temperature variations.

5. Summary and Conclusions

Investigations of the behavior of the financial market long term returns, over a period which includes both the dot-com period of 2000 and the crash of 2008, show that the distributions of the returns are non-Gaussian and fat-tailed even for as long a term as 1-80 days. This behavior cannot be adequately described by Boltzmann-Gibbs statistics. One needs to consider non-extensive statistical mechanics such as Tsallis statistics and/or superstatistics. Observations of the PDFs of market returns show that for small time scales, they can be well modelled by a symmetric Tsallis q -Gaussian distribution. However, as the time delay increases, the PDFs become increasingly asymmetric. In this work we generalize the Tsallis distribution to the asymmetric case by modelling the data distribution as a linear combination of two independent symmetric Tsallis distributions 11 distributions each of which are half q -Gaussians and separately describe the PDF’s corresponding to negative and positive returns. The q -Gaussian parameters, namely the entropic index 𝑞 and the temperature 𝛽 , are different for negative and positive returns and are estimated separately. These parameters are allowed to vary with time scale, since the asymmetry changes with time scale. The goodness of fit tests (KS tests) show that the asymmetric model distributions fit very well the data distributions for all delays. The same tests performed on the corresponding symmetric model distributions show considerably higher errors. The time scale behavior of the entropic indices 𝑞 − and 𝑞 + and the temperature parameters 𝛽 − and 𝛽 + show that the asymmetry arises mainly due to the quite different dynamical behaviors of the entropic indices for negative and positive market returns. In all cases, 𝑞 + decreases monotonically with time delay approaching an asymptotic value of 1 in a relatively short period (20 days). The equilibrium distribution in this case is a Gaussian distribution. This behavior is consistent with the superstatistics model put forth by Beck. In contrast, the behavior of 𝑞 − is very different. The value remains high at all times, even increasing slowly at higher delays. No stationary state is reached during the period of study. Heavy market sell offs are usually due to strong interactions between agents (herding behavior). Hence, in this case, the non-extensivity of the system might be best described by Tsallis statistics. Finally, the system shows a phase transition from a normal diffusive state during the .com bubble period to a superdiffusive state during crash period. This is indicated by the power law behavior of the variance with the exponent changing from ~1 to a value > 1. The present investigations show the importance of including asymmetry for an adequate description of stock market return distributions. The results show that both superstatistics and Tsallis statistics non-extensive models are needed to describe the complex financial market dynamics. Acknowledgements

Many thanks to Sherman Page for a critical reading of the manuscript.

Appendix A: Asymmetric q -Gaussian distributions from Tsallis statistics The Tsallis 𝑞 entropy is given by S = ∑ 𝑃(  𝑖 ) 𝑖 𝑙𝑛 𝑞 (1 𝑃(  𝑖 )⁄ ) (A1) symmetric Tsallis distributions 12 where 𝑃(  𝑖 ) is the probability density function at sample i of the variable  and the 𝑞 logarithm 𝑙𝑛 𝑞 (𝑥) is given by 𝑙𝑛 𝑞 (𝑥) = (𝑥 − 1) (1 − 𝑞)⁄ (A2) In order to consider asymmetry, we make 𝑞 dependent on the sign of the  . Using equations (10) and (A2), equation (A1) can be written as

S = ∑ 𝑃 −  ≤0 𝑙𝑛 𝑞 − (1 𝑃 − ⁄ ) + ∑ 𝑃 +  >0 𝑙𝑛 𝑞 + (1 𝑃 + ⁄ ) (A3) The parameters 𝑞 − and 𝑞 + are estimated by maximizing 𝑆 under suitable constraints. Assuming  has zero mean and considering the continuous case for the random variable  , S = ∫ 𝑃 − (  ) 𝑙𝑛 𝑞 − (1 𝑃 − (  )⁄ )𝑑  + ∫ 𝑃 + (  ) ∞0 𝑙𝑛 𝑞 + (1 𝑃 + (  ⁄ ))𝑑  = − ) ∫ [𝑃 −𝑞 − − 𝑃 − ] 𝑑  + + ) ∫ [𝑃 +𝑞 + − 𝑃 + ] 𝑑  ∞0 It is straightforward to show that maximizing S with respect to 𝑃 − and 𝑃 + under the constraints ∫ 𝑃 − (  ) 𝑑  = ∫ 𝑃 + (  ) 𝑑  = 1/2 ∞0 (A4) ∫  𝑃 −𝑞 − (  ) 𝑑  = 𝜎 𝑞 − (A5) ∫  𝑃 +𝑞 + (  ) 𝑑  = ∞0 𝜎 𝑞 + (A6) gives 𝑃 − (  ) = − [1 + (𝑞 − − 1)𝛽 −  ] − )⁄  ≤ 0 (A7) 𝑃 + (  ) = + [1 + (𝑞 + − 1)𝛽 +  ] + )⁄  > 0 (A8) with 𝑍 − = 𝐶 𝑞 − (√𝛽 − ⁄ ) 𝑍 + = 𝐶 𝑞 + (√𝛽 + ⁄ ) symmetric Tsallis distributions 13 𝐶 𝑞 = √𝜋  ( 1𝑞 − 1 − 12)√𝑞 − 1  ( 1𝑞 − 1 ) Here 𝛽 − and 𝛽 + are the Lagrange multipliers of the constraints (A5) and (A6). Appendix B: Time scale variation of 𝒒 In the superstatistics model, the entropies are quasi-additive given by equation (7) 𝑆 𝑞 ′ (𝐴 + 𝐵) = 𝑆 𝑞 (𝐴) + 𝑆 𝑞 (𝐵) (B1) For asymmetric distributions, we denote 𝑞 ′ = {𝑞 −′ 𝑞 +′ } and 𝑞 = {𝑞 − 𝑞 + } (B2) In deriving the following equations, we follow a procedure similar to one given by Beck [18]. For two independent sub-systems 𝐴 and 𝐵 with identical PDF’s 𝑆 𝑞 (𝐴) + 𝑆 𝑞 (𝐵) = 2𝑆 𝑞 = 2 ∑ ( 𝑃 𝑖− − 𝑃 𝑖−𝑞− ) 𝑖− 𝑞 − − 1 + 2 ∑ ( 𝑃 𝑖+ − 𝑃 𝑖+𝑞+ ) 𝑖+ 𝑞 + − 1 (B3) Writing 𝑃 𝑞 = 𝑃𝑒 (q−1) ln 𝑃 and expanding the exponential to the order (𝑞 − 1) , it is straightforward to show 𝑆 𝑞 (𝐴) + 𝑆 𝑞 (𝐵) = [−2 < ln 𝑃 − > − (𝑞 − − 1) < (ln 𝑃 − ) > ] + [−2 < ln 𝑃 + > − (𝑞 + − 1) < (ln 𝑃 + ) > ] (B4) Here t he expectation values < > are taken with the corresponding distributions. Now let us consider the entropy of the joint system 𝐴 + 𝐵.

The entropy in this case is given by symmetric Tsallis distributions 14 𝑆 𝑞 ′ (𝐴 + 𝐵) = ∑ (ℙ 𝑘− − ℙ 𝑘−𝑞− ) 𝑘− 𝑞 −′ − 1 + ∑ (ℙ 𝑘+ − ℙ 𝑘+𝑞+ ) 𝑘+ 𝑞 +′ − 1 (B5) Here the joint probabilities are ℙ 𝑘 − = 𝑃 𝑖 − 𝑗 − = 𝑃 𝑖 − 𝑃 𝑗 − and ℙ 𝑘 + = 𝑃 𝑖 + 𝑗 + = 𝑃 𝑖 + 𝑃 𝑗 + (B6) where the indices i, j are the indices of the subsystems and the summation over joint probability index k stands for summation over i, j . Using (B5) and (B6), it is straightforward to show that 𝑆 𝑞 ′ (𝐴 + 𝐵) = −2 < ln 𝑃 − > −(𝑞 − − 1)[(< ln 𝑃 − >) + < (ln 𝑃 − ) >] −2 < ln 𝑃 + > −(𝑞 + − 1)[(< ln 𝑃 + >) + < (ln 𝑃 + ) >] (B7) Here again, we have used 𝑃 𝑞 = 𝑃𝑒 (q−1) ln 𝑃 and expanded the exponential up to the order (𝑞 − 1) . Equating (B4) and (B7) and noting 𝑃 − and 𝑃 + are independent, we get (𝑞 −′ − 1) (𝑞 − ⁄ − 1) = [1 + (< ln 𝑃 − >) < (ln 𝑃 − ) >⁄ ] −1 (B8a) (𝑞 + ′ − 1) (𝑞 + ⁄ − 1) = [1 + (< ln 𝑃 + >) < (ln 𝑃 + ) >⁄ ] −1 (B8b) References [1]

Black F and Scholes M 1973 The pricing of options and corporate liabilities

J. Polit. Econ.

637 [2]

Fama E F 1965 The behavior of stock-market prices

J. Bus.

34 [3]

Bachelier L 1964 Theory of speculation

The Random Character of Stock Market Prices ed P Cootner (Cambridge, MA: MIT Press) [4]

Sornette D 2003 Critical market crashes

Phys. Rep.

Mantegna R N and Stanley H E 2000

An Introduction to Econophysics: Correlations and Complexity in Finance (Cambridge: Cambridge University Press) [6]

Shannon C E 1948 A mathematical theory of communication

Bell Syst. Tech. J.

379 [7]

Jaynes E T 1983

Papers on Probability, Statistics and Statistical Physics (Dordrecht: Reidel) symmetric Tsallis distributions 15 [8]

Mandelbrot B B 1997

Fractals and Scaling in Finance (New York: Springer) [9]

Tsallis C, Anteneodo C, Borland L and Osorio R 2003 Nonextensive statistical mechanics and economics

Physica A

89 [10]

Osorio R, Borland L and Tsallis C 2004 Distributions of high-frequency stock-market observables

Nonextensive Entropy: Interdisciplinary Applications ed C Tsallis and M Gell-Mann (New York: Oxford University Press) [11]

Tsallis C 2000

Introduction to Nonextensive Statistical Mechanics (New York:

Springer) [12]

Tsallis C and Bukman D J 1996 Anomalous diffusion in the presence of external forces: Exact time-dependent solutions and their thermostatistical basis

Phys. Rev. E Zanette D H 1999 Statistical-thermodynamical foundations of anomalous diffusion

Braz. J. Phys.

108 [14]

Cortines A A G and Riera R 2007 Non - extensive behavior of a stock market index at microscopic time scales Physica A

181 [15]

Michael F and Johnson M D 2003 Financial market dynamics

Physica A

525 [16]

Borland L 2002 A theory of non-Gaussian option pricing

Quant. Finance

415 [17]

Devi S 2017 Financial market dynamics: superdiffusive or not?

J. Stat. Mech.

Beck C 2002 Non-additivity of Tsallis entropies and fluctuations of temperature

EuroPhy. Lett.

329 [19]

Hastings N A J and Peacock J B 1974

Statistical Distributions (London: Butterworth) [20]

Beck C 2001 On the small-scale statistics of Lagrangian turbulence

Phys. Lett. A

240 [21]

Wilk G and Wlodarczyk Z 2000 Interpretation of the nonextensivity parameter q in some applications of Tsallis statistics and Lévy distributions Phys. Rev. Lett. Van der Straeten E and Beck C 2011 Skewed superstatistical distributions from a Langevin and Fokker-Planck approach

Chin. Sci. Bull. Budini A 2015 Extended q -Gaussian and q -exponential distributions from gamma random variables Phys. Rev E Beck C and Cohen E G D 2003 Superstatistics

Physica A

267 [25]

Van der Straeten E and Beck C 2009 Superstatistical fluctuations in time series: Applications to share-price dynamics and turbulence

Phys. Rev. E Xu D and Beck C 2016 Transition from lognormal to chi-square superstatistics for financial time series

Physica A

173 [27]

Ausloos M and Ivanova K 2003 Dynamical model and nonextensive statistical mechanics of a market index on large time windows

Phys. Rev. E [28] Shalizi C R 2007 Maximum likelihood estimation for q -exponential (Tsallis) distributions (arXiv:math/0701854) [29] Massey F J 1951 The Kolmogorov-Smirnov test for goodness of fit

J. Am. Stat. Assoc. [30] Thistleton W, Marsh J A, Nelson K and Tsallis C 2007 Generalized Box-Müller method for generating q -Gaussian random deviates IEEE Trans. Inf. Theory [31] Chakravarti I M, Laha R G and Roy J 1967

Handbook of Methods of Applied Statistics

Volume I (New York: Wiley) [32]

Tsallis C, Levy S V F, Souza A M C and Maynard R 1995 Statistical-mechanical foundation of the ubiquity of Lévy distributions in nature

Phys. Rev. Lett. symmetric Tsallis distributions 16 Figures

Figure 1. Comparison of the distributions of standardized log returns with the Gaussian distributions (solid blue line) having the same mean and standard deviation as the data (black dots). (a) S&P 500 for 2 January 1994 – 31 December 2013 and (b) Nasdaq over the same period. symmetric Tsallis distributions 17

Figure 2. Skew of the PDF as a function of time scale (delay). Region 1 covers 14 December 1993 – 8 November 2005 and region 2 covers 9 November 2005 – 11 October 2017 for (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 18

Figure 3. S&P 500 and Nasdaq stock prices for 2 January 1990 – 31 December 2019. Region 1 covers 14 December 1993 – 8 November 2005 and region 2 covers 9 November 2005 – 11 October 2017. Blue – S&P 500. Red – Nasdaq. symmetric Tsallis distributions 19

Figure 4. 1 day standardized log returns for region 1 (14 December 1993 – 8 November 2005) and region 2 (9 November 2005 – 11 October 2017) for (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 20

Figure 5. Comparison of the estimated asymmetric q -Gaussian distributions with the data distributions for region 1 (14 December 1993 – 8 November 2005). The delays corresponding to the distributions are given on the right hand side of the figure. The distributions for each delay are shifted by multiplying the corresponding PDF with the factors shown on the right hand side next to the delays. (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 21 Figure 6. Same as figure 5 for region 2 (9 November 2005 – 11 October 2017). symmetric Tsallis distributions 22

Figure 7. Kolmogorov-Smirnov goodness of fit test for asymmetric q -Gaussian distributions. Maximum distance between the empirical and synthetic CDF’s are shown as functions of delay for region 1 (14 December 1993 – 8 November 2005) and region 2 (9 November 2005 – 11 October 2017). Also shown in the bold lines are the critical distances for a significance level of 0.05 (confidence 95%). Blue – negative returns, Red - positive returns. (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 23 Figure 8. S&P 500 70 day log return for region 2 (9 November 2005 – 11 October 2017). symmetric Tsallis distributions 24

Figure 9. Kolmogorov-Smirnov goodness of fit test for symmetric q -Gaussian distributions. Maximum distance between the empirical and synthetic CDF’s are shown as functions of delay for region 1 (14 December 1993 – 8 November 2005) and region 2 (9 November 2005 – 11 October 2017). Also shown in the bold lines are the critical distances for a significance level of 0.05 (confidence 95%). Blue – negative returns, Red - positive returns. (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 25 Figure 10. Comparison of 𝑞 − (blue) and 𝑞 + (red) variation vs. time delay for region 1 (14 December 1993 – 8 November 2005) and region 2 (9 November 2005 – 11 October 2017). Error bars on the estimates are also displayed. (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 26 Figure 11. Comparison of the estimated 𝛽 − (blue) and 𝛽 + (red) variation vs. time delay for region 1 (14 December 1993 – 8 November 2005) and region 2 (9 November 2005 – 11 October 2017). The error bars are also shown. The solid line is the linear fit. (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 27 Figure 12. The time scale behavior of q -variance 𝜎 𝑎𝑠𝑦𝑚2 . The q -variance is not defined for 𝑞 − > 5/3 and those are not shown in the figure. The solid line is the linear fit to the log-log data. Region 1 (14 December 1993 – 8 November 2005) and region 2 (9 November 2005 – 11 October 2017). (a) S&P 500 and (b) Nasdaq. symmetric Tsallis distributions 28 Figure 13.1 (𝑞 − 1)⁄