Information-theoretic measures for non-linear causality detection: application to social media sentiment and cryptocurrency prices
IInformation-theoretic measures for non-linear causality detection: application tosocial media sentiment and cryptocurrency prices
Z. Keskin
1, 2 and T. Aste Department of Computer Science & Centre for Blockchain Technologies,University College London, Gower Street, WC1E 6EA, London, United Kingdom Department of Physics and Astronomy, University College London,Gower Street, WC1E 6EA, London, United Kingdom (Dated: June 18, 2019)Information transfer between time series is calculated by using the asymmetric information-theoretic measure known as transfer entropy. Geweke’s autoregressive formulation of Grangercausality is used to find linear transfer entropy, and Schreiber’s general, non-parametric, information-theoretic formulation is used to detect non-linear transfer entropy. We first validate these measuresagainst synthetic data. Then we apply these measures to detect causality between social sentimentand cryptocurrency prices. We perform significance tests by comparing the information transferagainst a null hypothesis, determined via shuffled time series, and calculate the Z-score. We alsoinvestigate different approaches for partitioning in nonparametric density estimation which can im-prove the significance of results. Using these techniques on sentiment and price data over a 48-monthperiod to August 2018, for four major cryptocurrencies, namely bitcoin (BTC), ripple (XRP), lite-coin (LTC) and ethereum (ETH), we detect significant information transfer, on hourly timescales,in directions of both sentiment to price and of price to sentiment. We report the scale of non-linearcausality to be an order of magnitude greater than linear causality.
I. INTRODUCTION
Causality is a central concept in natural sciences, com-monly understood to describe where some process, evolv-ing in time, has some observable effect on a second pro-cess. However, the nature of this causative effect is chal-lenging to describe and quantify with precision. There isa long history in determining whether some change trulycauses another [1, 2], especially if the effect is not de-terministic, and is observed only in aggregate. In thispaper, we consider a statistical form of causality, whichcan be observed in co-dependent time series where a re-sponse in the dependent series is more likely to followafter some change in the driving series. The direction ofinformation transfer is forced by requiring the cause toprecede the effect. This concept was conceived first byWiener in 1956 [3], and formalised by Granger in 1969[4] who was subsequently awarded the Nobel memorialprize in economics for his work on the analysis of timeseries. In simplest terms, the so-called Granger causalitydescribes by how much a response in the dependent se-ries can be explained by a change in the first; or, moreexactly, the extent to which a given series is better ableto be predicted by considering the information providedby a prior sequence of another series. If this responsescales as a linear multiple of the driving signal, this rela-tionship is described as a linear coupling. If, instead, theresponse follows some other function of the signal, therelationship is non-linear.In modern portfolio theory, investors commonly calcu-late correlations between asset types to construct port-folios aiming to maximise their return at a given levelof risk [5]. In the search for excess returns, quantitativeapproaches are often exploited to detect predictive sig-nals across time series. In the ideal case, from knowing the movement of one price, we can infer the movementof a second. For an investor, it is sufficient to know thatthe first movement anticipates the second, and in thispaper we explore the effectiveness of two promising tech-niques for detecting anticipatory signals between alter-native data and cryptocurrency prices.The concept of an entirely peer-to-peer digital currencymanaged via a distributed ledger was described and ap-plied by Nakamoto in 2008 [6], who named the currency‘Bitcoin’. The proposal and subsequent implementationcaptured the attention of technologists, economists, lib-ertarians and futurists, and spawned numerous adapta-tions utilising the blockchain technology [7], which havecome to be known as cryptocurrencies. Trading in thesecryptocurrencies has become widely available even to lesssophisticated retail investors, and volumes have grownsignificantly as interest in the currencies has widened.The crypto market is characterised by high volatilitywhich seems to reflect changes in the attitudes of in-vestors. The usage of cryptocurrencies in the traditionaleconomy remains limited, and it is reasonable to assumethat prices are in part driven by speculative dynamics,separate to any utility as a medium of exchange or toany revenue-generating process. Therefore a similar, andmore marked predictive effect as observed in equity mar-kets, should be observed between measures of social me-dia market sentiment and cryptocurrency prices. Wetherefore hypothesise that investor sentiment on futureprices may be expected to feed into the short-term pricemovements via speculation. This paper tests this hy-pothesis.The relationship between social media sentiment andprice has been explored in the literature for traditionalmarkets and, more recently, for crypto markets as well.For instance, Bollen et al. [8] showed that the mood a r X i v : . [ phy s i c s . d a t a - a n ] J un of Twitter messages can be used as a proxy for marketsentiment, and that this can show a linear relationshipwith price movements in US equities. Zheludev & Aste[9] also performed sentiment analysis using Natural Lan-guage Processing (NLP) on Twitter data, to show senti-ment is significantly coupled with price movements for anumber of instruments issued by S&P500 firms. Souza &Aste used Twitter messages to model market sentiment,and showed the non-linear predictive relationship may begreater than the linear one [10].In the cryptocurrency market specifically, one ofthe authors of the present paper has recently appliedinformation-theoretic techniques to approaches from net-work theory to characterise the structure of the market asa complex system [11]. This provided evidence that themarket forms a complex, causally-interrelated networklinking prices and sentiments across multiple currencies.Hypothesising, therefore, that cryptocurrency pricedepends on prior values of both price and market sen-timent, a Granger causality test can detect the impact ofpast values of X t on future values of Y t [4]. This can becalculated using a vector auto-regressive (VAR) model,which describes the extent to which including past val-ues of X , at some time-lag k , reduces the sum of squaredresiduals in the regression of X against Y , hence estimat-ing the predictive effect of the social sentiment at time t − k on the price at time t .The VAR approach performs a regression analysiswhich is limited to linear associations between variables.To investigate non-linear effects, we can adopt tech-niques developed in information theory. Many popularinformation-theoretic measures for comparing distribu-tions, such as mutual information, are symmetric andso can not model a directional information transfer from X to Y . Therefore, to generalise Granger causality tothe non-linear case, we adopt the measure formalised bySchreiber [12], known as transfer entropy, which is ableto capture the size and also the direction of informationtransfer.Transfer entropy arises from the formulation of con-ditional mutual information; when conditioning on pastvalues of the variables, it quantifies the reduction in un-certainty provided by these past values in predicting thedependent variable. This presents a natural way to modelstatistical causality between variables in multivariate dis-tributions. In the general formulation, transfer entropy isa model-free statistic, able to measure the time-directedtransfer of information between stochastic variables, andtherefore provides an asymmetric method to measure in-formation transfer. As presented in this paper, transferentropy appears naturally as a generalisation of Grangercausality. In fact it has been shown that, for multivariatenormally-distributed statistics, where the relationship istherefore linear, this is indeed the case; Granger causalityand transfer entropy are equivalent [13].Though developed relatively recently, information-theoretic methods have been used with success in re-search across disciplines, to detect information transfer where interventionist approaches are not possible. Forexample, in neuroscience, Vicente et al. [14] found trans-fer entropy to be a superior measure in detecting causal-ity in electrophysiological communication than the auto-regressive Granger causality formulation. In climatology,Liang derived from first principles a linear informationflow measure, and used this to show that El Ni˜no tendsto stabilise the Indian Ocean Dipole [15]. This analysisalso detected a causal effect in the other direction; theIndian Ocean Dipole was shown to amplify El Ni˜no os-cillations. The technique was used with further successby Stips et al. [16] to confirm that recent CO emissionsshow a one-way causality towards global mean tempera-ture anomalies, but that on paleoclimate timescales, thisdirection is reversed and temperatures drive CO lev-els. Finally, in finance, information transfer was mea-sured between equities indices by Kwon & Yang, showingthat the information transfer was greatest from the US,and greatest towards the APAC region [17]. In particu-lar the S&P500 was shown to be the strongest driver ofother stock indices. In an earlier and somewhat relatedwork, Marschinski & Kantz [18] defined and used effectivetransfer entropy to quantify contagion in financial mar-kets. Similarly, Tungsong et al. [19] developed upon theprevious work by Diebold & Yilmaz [20] in quantifyingspillover effects between financial markets, generalisingthe methodology and estimating the time evolution ofinterconnectedness between financial systems.The rest of the paper is organised as follows. In Sec-tion II we provide a brief background on Granger causal-ity (linear causality measure) and transfer entropy (non-linear causality measure). In Section III we describe de-tails of the methodology adopted to quantify and validatelinear and non-linear causality, and the techniques usedto generate synthetic series of linear and non-linear causalcoupling. Section IV demonstrates that the methodolo-gies correctly detect causality in the linear and non linearcase when testing against synthetic data. Results for realdata, concerning causality between cryptocurrency priceand sentiment, are presented in Section V. Section VIreports conclusions and perspectives. II. BACKGROUND
We calculate statistical causality between time seriesusing two different approaches. The first assumes lin-earity and employs vector auto-regressive techniques toestimate the extent to which knowing the driving timeseries can help predict the dependent series. The sec-ond technique compares the difference in mutual infor-mation between the independent case and the joint caseto describe the success of predicting the dependent se-ries. When predictability is increased by considering thepast values of the driving variable, statistical causality isobserved.
A. Linear Causality
We model a time series as autoregressive by expressingits value Y t at time t as a sum of the contributions over m distinct lagged series, using the linear equation: Y t = m (cid:88) k =1 β ( Y ) k Y t − k + (cid:15) t , (1)where β ( Y ) k is a general coefficient term and (cid:15) t is theresidual. Linear regression estimates the coefficient pa-rameters β ( Y ) k which minimise the sum of squared resid-uals.To detect whether the values of some second time series X anticipate the future values of Y , we can compareequation 1 with: Y t = m (cid:88) k =1 β (cid:48) ( Y ) k Y t − k + m (cid:88) k =1 β (cid:48) ( X ) k X t − k + (cid:15) (cid:48) t . (2)We determine that the distribution Y is Granger-caused by X if the residual in the second regression issignificantly smaller than the residual in the first. Whenthis holds, then there must be some information transferfrom X to Y . Following Geweke [21], we can representthe information transfer by:TE X → Y = 12 log (cid:18) var( (cid:15) t )var( (cid:15) (cid:48) t ) (cid:19) , (3)where we adopt the transfer entropy notation (TE),following the result from Barnett et al. [13] showingGranger causality to be equivalent to transfer entropyfor multivariate normal distributions. B. Non-Linear Causality
To detect non-linear causality, we apply aninformation-theoretic approach. Equation 3 mea-sures the extent to which the additional information inthe lagged variable reduces the variance in the modelresiduals. Transfer entropy extends this concept byconsidering the uncertainty, instead of the variance.Adopting Shannon’s measure of information [22], wecan express the uncertainty associated with the randomvariable X by: H ( X ) = − (cid:88) x p ( x ) log p ( x ) , (4)where H ( X ) is termed the Shannon entropy of the dis-tribution, and p ( x ) represents the probability of X = x .This can be conditioned on a second variable to givethe conditional entropy: H ( Y | X ) = H ( X, Y ) − H ( X ) . (5)Where two random variables share information, themutual information is given by: I ( X ; Y ) = H ( Y ) − H ( Y | X ) . (6)The entropy of Y conditioned on two variables is: H ( Y | X, Z ) = H ( X, Y, Z ) − H ( X, Z ) , (7)and the conditional mutual information is therefore: I ( X ; Y | Z ) = H ( Y | Z ) − H ( Y | X, Z ) . (8)Now, for each lag k , we can describe the informationtransfer from X t − k to Y t in terms of the following condi-tional mutual information: T E ( k ) X → Y = I ( Y t ; X t − k | Y t − k ) = H ( Y t | Y t − k ) − H ( Y t | X t − k , Y t − k ) . (9)This represents the resolution of uncertainty in predicting Y when considering the past values of both Y and X ,compared with considering the past values of Y alone.Considering equations 5 and 7, we can therefore rep-resent the transfer entropy for a single lag k , which isshown in equation 9, in terms of four separate joint en-tropy terms. Following equation 4, these may be esti-mated from the data using a nonparametric density esti-mation of the probability distributions. For multivariatenormal statistics, equations 9 and 3 coincide [13]. III. METHODS
We calculate linear transfer entropy using ordinaryleast squares regression, by comparing the variance ofthe residuals in the joint vector space { Y t , Y t − k , X t − k } against those in the independent vector space { Y t , Y t − k } ,following equation 3.To detect non-linear transfer entropy, we perform non-parametric density estimation to calculate the joint en-tropy terms in equations 5 and 7. The density is es-timated using a multidimensional histogram approach,where the choice in partitioning of the vector space im-pacts the calculation of the transfer entropy. In thispaper we adopt a partitioning approach which to ourknowledge is new in entropy estimation, and which wedemonstrate to be robust to varying the coarseness of thepartition. Specifically we use a quantile-based binningapproach in the marginals, which results in bin edges byeach dimension containing equal numbers of data points.To partition the sample space in this way, we selecteach dimension and calculate bin edges independently tocontain roughly equal numbers of data points. These areused to construct multi-dimensional histograms for esti-mating the probability distribution. We observe that thequantile bin sizes perform better than equal-sized bins,as large gradients in the probability distribution functionare better able to be captured without the introductionof additional information through refining the partition.In estimating Shannon entropy, the coarseness of thepartition directly impacts the numerical value, withfinely-partitioned histograms returning larger entropyvalues over the same data since more information is ac-quired about the distribution. This effect should cancelout in the calculation of transfer entropy, however weobserve instead that more bins generally results in largertransfer entropies for the same data, which amplifies bothsignal and noise. We therefore adopt a parsimonious ap-proach in this paper, using a small number of bins com-patible with a sufficient resolution, to capture the infor-mation transfer. We tested granular partitions of 3 to 8classes per dimension, finding comparable results in eachcase. We report the results using histograms of 6 classesper dimension, a partition size which leads to good andmeaningful results for each of the currencies analysed.It is a feature of the nonparametric estimation of en-tropy that the absolute scale of the transfer entropy mea-sure has only limited meaning; to detect causality, a rel-ative position must be considered. A simple techniqueproposed by Marschinski & Kantz [18] is the EffectiveTransfer Entropy (ETE), derived by subtracting fromthe observed transfer entropy an average transfer entropyfigure calculated over independently-shuffled time series,which destroys the temporal order and hence any possiblecausality.We adopt a shuffling approach producing 50 null-hypothesis transfer entropy values from independentlyshuffled time-series over the same domain, containing nocausality. By calculating the mean and standard devia-tion of the shuffled transfer entropy figures, we estimatethe significance of a causal result as the distance betweenthe result and the average shuffled result, standardisingby the shuffled standard deviation: Z := TE − ¯TE shuffle σ shufle . (10)This corresponds to the degree to which the result liesin the right tail of the distribution of the zero-causalityshuffled samples, and hence how unlikely the result isdue to chance. Therefore the Z-score figure representsthe significance of the excess transfer entropy in the un-shuffled case. We compute the Z-score in Eq.10 for bothlinear and non-linear results.To justify the usage of these techniques in detect-ing causal relationships in practice, we first validatethe methodology using coupled time series of predefined causative relationships. A. Synthetic Geometric Brownian Motion
We validate the approach by generating synthetic datafollowing a directionally coupled random walk. First, wegenerate a driving series, following a discrete GeometricBrownian Motion (GBM): X t +1 = (1 + µ ) X t + σX t η t , (11)where η t is a normally distributed random noise η t ∼N (0 , µ and σ are respectively drift and diffu-sion coefficients. Then we produce a dependent series Y t , which is a linear combination of X and a second,independent GBM process X (cid:48) , the strength of the de-pendency being determined by some coupling constant α , over some lag length k : Y t = (1 − α ) X t − k + αX (cid:48) t − k . (12) B. Synthetic Coupled Logistic Map
We generate non-linear coupled time series using a cou-pled logistic map. This system can be represented interms of two stationary difference equations; the inde-pendent series is defined by the difference equation givenby the general update function f ( X ): f ( X t ) = X t +1 = rX t (1 − X t ) (13)where X t is the value of X at time t , and r is a pa-rameter which in fact defines the dynamical state of thesystem. Following Hahs & Pethel [23], we take r = 4so the function evolves chaotically. We then introduce asecond map, which is dependent on the first, taking theform: Y t +1 = (1 − α ) rY t (1 − Y t ) + αg ( X t ) (14)where α ∈ [0 ,
1] is the cross-similarity, or couplingstrength, and g ( x ) is a coupling function which may bechosen to produce different dynamic effects. We followthe choice of Boba et al. [24] and Hahs & Pethel [23] inthe coupling function: g ( X t ) = (1 − (cid:15) ) f ( X t ) + (cid:15)f ( f ( X t )) (15)where (cid:15) ∈ [0 ,
1] represents the coupling strength, de-scribing the extent to which Y t +1 depends on f ( f ( X t )).It should be noted that the logistic map, in contrastto geometric brownian motion, is a deterministic, albeitchaotic system, and that therefore f ( f ( X t )) is equivalentto X t +2 . The extent of this anticipatory effect is drivenby the selection of the (cid:15) parameter. We follow Hahs &Pethel in selecting (cid:15) = 0 .
4. Indeed, as α increases, withlarge (cid:15) , the direction of information transfer is less clear,as Y t contains more information about the future valuesof X . IV. VALIDATION WITH SYNTHETIC DATA
In order to validate the autoregressive andinformation-theoretic approaches to detecting causality,we apply these to the calculation of transfer entropy forsynthetic data generated by both linear and non-linearcoupled time series, of increasing coupling strength.
A. Linear Process Causality Validation
We calculate the directional information transfer fromthe driving series to the dependent series, and inthe reverse direction, using both autoregressive andinformation-theoretic approaches for the linearly-coupledsystem of GBM walks defined by equations 11 and 12.Figure 1 shows the results for coupling strengths from α = 0 . α = 0 .
5. For each coupling strength, a dataset is simulated over 2500 time steps. Both techniquesare applied to each data set to calculate the informationtransfer, in both directions, with the results from X → Y and from Y → X plotted on separate axes.In the information-theoretic approach we calculatetransfer entropy using histograms with quantile binningof 6 classes per dimension. We generate multiple syn-thetic coupled random walks, calculating transfer en-tropy and Z-scores for each realisation, and reporting themean values. Quantile bins are generated independentlyfor each realisation.We observe that using finer-grained partitions, henceof more bins, results in an increased estimate of trans-fer entropy for the same data. However, the choice ofcoarseness does not affect the final analysis in validatingcausality; equivalent results are observed when consid-ering significance instead of just the numerical transferentropy figure.As can be observed from Fig.1, the qualitative corre-spondence between both methods is clearly visible, andquantitatively the results are similar. Additionally, theone-way direction of information transfer is accuratelydetected, with large transfer entropy and Z-scores ob-served in the direction of X → Y , and small values inthe opposite direction. B. Non-Linear Process Causality Validation
We calculate the directional information transfer fromthe driving series to the dependent series, and in the reverse direction, using both autoregressive andinformation-theoretic approaches for the non-linear cou-pled logistic map system from equations 13, 14 and 15.In the information-theoretic approach we calculatetransfer entropy again using histograms with quantilebinning of 6 classes per dimension, generating bins in-dependently for each realisation.Figure 2 shows the mean transfer entropy results for2500 synthetic data points. We observe that, for this sys-tem, the linear method is incapable of detecting causal-ity; it finds no significant information transfer, fails torepresent the expected exposureresponse relationship andalso suggests a slight causality in the reverse direction.The information-theoretic method, by contrast, producesresults which better represent the increasing couplingstrength relationship, and direction of causality in thesystem. However, this technique also detects causalityfrom Y to X , for large values of α , and the effect is greaterthan in the linear case. We explain this with reference tothe coupling function g ( x ) which involves repeated appli-cation of the update function f ( x ); from equation 13 wesee that f ( f ( X t )) is equivalent to X t +2 so, for large α , Y t will contain increasing amounts of the future informationof X t . In fact, at large coupling strengths approaching α = 1, the observed transfer entropy from X to Y beginsto decrease, as more information exists in Y about itsfuture evolution.The results of these validation experiments suggestthat the information-theoretic approach is superior in de-tecting causal signals, being model-free and so able todetect relationships of more complex, non-linear modes. C. Decay of Causal Signals with Lag Length
As a final validation exercise, we explore the perfor-mance of the methods in detecting signals in coupledtime series when the lag of the relationship is unknown.In general, it is expected that causal links should bestrongest at time-lags closest to the true signal lag, andgradually decay as the time-lag considered is increased.However, the complexity of causative relationships, par-ticularly where any feedback exists between the time se-ries, suggests that there could also be multi-modal causal-ities, operating at different lags.We use the coupled GBM system defined in equations11 and 12 to create a coupling of a fixed lag L = 6,and then perform both autoregressive and information-theoretic analysis to detect the transfer entropy at time-lags from k = 1 up to k = 35. The information-theoreticapproach is again applied using histograms partitionedinto 6 classes per dimension. The results are shown inFig. 3.We observe two interesting features. First, the sur-prising anticipation of the peak is seen at lags k shorterthan the true lag L of the causal relationship. Secondly,a clear peak is seen at the expected lag, which decaysslowly and incompletely. We explain this by the com- Coupling Strength T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
X Y
Non-linear TELinear TE
Coupling Strength T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Y X
Non-linear TELinear TE
Coupling Strength S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
X Y
Non-linear TELinear TE
Coupling Strength S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Y X
Non-linear TELinear TE
FIG. 1: Demonstration that both linear and non-linear transfer entropy methods detect causality for linearly coupled syntheticdata. The plots are calculated over 2500 data points of the synthetic random walk process from equations 11 and 12. Non-lineartransfer entropy is calculated using a quantile histogram of 6 classes per dimension. The Z-score of each result is also plotted forboth methods. We observe a small but non-zero baseline transfer entropy in the non-causal direction Y → X , which explainsthe systematic over-estimation of transfer entropy calculated in the direction X → Y . The size of this over-estimation increaseswith the number of histogram bins. Coupling Strength T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
X Y
Non-linear TELinear TE
Coupling Strength T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Y X
Non-linear TELinear TE
Coupling Strength S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
X Y
Coupling Strength S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Y X
FIG. 2: Demonstration that the non-linear causal relationship in synthetic data generated from equations 14 and 15 is detectedonly by the non-linear method. The plots are calculated over 2500 data points of the synthetic coupled logistic map process,with (cid:15) = 0 .
4. Non-linear transfer entropy is calculated using a quantile histogram of 6 classes per dimension. The Z-score foreach result is also plotted for both methods. We note that from α = 0 , α approaches 1. parison to the transfer entropy observed in the decou- pled case with α = 0. In the limit of increasing time-lag lag = k T r a n s f e r E n t r o p y ( b i t s ) TE with = 0
Non-linear TELinear TE lag = k T r a n s f e r E n t r o p y ( b i t s ) TE with = 0.8
Non-linear TELinear TE lag = k S i g n i f i c a n c e ( Z - s c o r e ) Non-linear TELinear TE lag = k S i g n i f i c a n c e ( Z - s c o r e ) Non-linear TELinear TE
FIG. 3: Demonstration that both methods identify the true lag L = 6 with maximal transfer entropy. Non-linear transferentropy is calculated using a quantile-binned histogram, of 6 classes per dimension, over 2500 points. The Z-score for eachresult is also plotted for both methods. We observe a non-zero transfer entropy in the non-causal case α = 0, which grows withtime-lag k . This might explain the systematic over-estimation of transfer entropy calculated in the direction X → Y . The sizeof this over-estimation increases with the number of histogram bins. k , the information-theoretic approach detects a causalityeven when there is no coupling in the data; we note thatthe Effective Transfer Entropy measure could performbetter in such cases, where subtracting the average zero-causality transfer entropy would give a better estimateof the true information transfer [18]. Importantly, bothtechniques show a clear peak at the true causal time-lag,with the autoregressive technique displaying considerablygreater significance, albeit this is also observed even atspurious lags. It is possible that the observed trend of in-creasing causality at long lags is due to the way in whichdata points are excluded for increased lags; for k = 35,for example, we discard 35 data points from the set inthe calculation of transfer entropy. V. RESULTS WITH REAL DATA
Having confirmed that the information-theoretic ap-proach is able to detect both linear and non-linear sig-nals, we apply the technique to investigate the effect ofsocial media sentiment on cryptocurrency prices. We alsoapply the linear method to compare whether linear ornon-linear dynamics dominate any causal relationship.We estimate information transfer over 24-month win-dows, rolling forward with a stride of two weeks from theearliest market data available to September 2018. Priceis taken as the combined close price, on the hour, over an aggregation of exchanges (see appendix A 2). Social sen-timent is estimated from NLP analysis of Twitter tweetsand StockTwits during the preceding hour; we quantifythis sentiment as the sum of of positive messages in theprevious hour. In early periods of the data, infrequentlysome hours have no messages; in these cases we forward-fill from the previous hour, making the assumption thatsentiment does not drop to neutral in these cases. Tohandle non-stationarity in the data, we take the differ-ence between the logarithms of the values at times t and t −
1. This differencing is applied to both time series.The choice of timescale in aggregating raw sentimentdata involves a trade-off; with too fine a timescale, thereare not enough messages to estimate sentiment, but toolong a timescale cannot capture the dynamics of the timeseries. We hypothesise that causal signals between senti-ment and price operate at sub-hourly timescales; hourlyaggregation is the smallest time period available in thedata, and so this aggregation of sentiment is used.The transfer entropy is calculated over multiplebackward-looking 24-month windows, which are passedover all available data with a two-week stride. For theinformation-theoretic approach, it is observed that per-forming the analysis with histograms of equal-width binsgives different results depending on the number of binsselected. Specifically, partitioning the axes of the samplespace into odd-numbers of bins produces no significant re-sult over this data, suggesting the information is capturedmostly from the middle peak of the distribution. How-ever, we note that the use of quantile binning avoids thisissue, finding both odd-numbered and even-numbered bincounts to provide similar results, suggesting a key ben-efit in using quantile bins for the calculation of transferentropy. Accordingly, in this analysis we partition thesample space into quantile bins, using six classes per di-mension, having validated this choice in Section IV.The histogram bins for the non-linear approach arecalculated once, using the full data set for each currency,and then they are applied across all windows. In select-ing an appropriate partition, further bias is inevitably in-troduced. By calculating appropriate bins for each win-dow, the results cannot be directly compared betweenwindows. However, the growth in message volumes overtime means that selecting bins sized to capture the fullspread of values also introduces a bias, since such bins aremore suited to the later months than the earlier months.Additionally, since the granularity of the histogram par-tition also impacts the transfer entropy value, we per-form significance tests over each window independently,to report any causality, calculating the Z-scores and com-paring these across windows and currencies.We report the windows with greatest significance usinga time-lag of k = 1 hour. Performing the analysis usinglonger time-lags shows weaker causal signals over thisdata. This provides evidence in support of the hypothe-sis that the true causal dynamics operate at sub-hourlytimescales.We report results for linear and non-linear transfer en-tropy, calculated using multidimensional histograms us-ing bins of 6 quantile classes per dimension. The transferentropy figure and Z-score are calculated independentlyfor each 24-month window, bins are generated once foreach currency, over the whole dataset, and used for eachwindow. This selection produces the clearest detectionof a causal relationship between sentiment and price.Plots showing the information transfer for the fourcryptocurrencies investigated are reported in Fig. 4, Fig.5, Fig. 6 and Fig. 7.For BTC, in Fig. 4, we detect a strong causative signal,of roughly similar scale in both directions of sentiment toBTC price and in the reverse direction.LTC, in Fig. 5, shows a similar pattern to BTC, al-though it is less equivocal in the direction of informationtransfer, with significance in the direction of sentiment toprice consistently appearing greater than in the reversedirection. We note the Z-scores reveal greater overallsignificance compared to the other currencies.XRP, in Fig. 6, shows a clear non-linear causality fromsentiment to price and also in the opposite direction.However, the signal is more significant from sentimentto price, and especially in the periods ending in 2018.ETH, in Fig. 7, shows an interesting and unique be-haviour. In particular, there appears to be, initially, asignificant signal which collapses in both directions inthe windows ending around January 2018. This stronglysuggests another driving mechanism, the effect of which first becomes present around January 2016 (due to 24-month windows). This effect is likely to be associatedwith the rapid price movements at the time. VI. CONCLUSION
Information-theoretic and autoregressive techniqueswere developed and validated on coupled random walksand chaotic logistic maps, confirming the ability of bothtechniques to detect linear information transfer, and ofthe information-theoretic technique to detect non-linearinformation transfer. Following validation, the tech-niques were applied to historical data describing socialmedia sentiment and cryptocurrency prices to detect in-formation transfer between sentiment and price move-ments.The information-theoretic investigation detected a sig-nificant non-linear causal relationship in BTC, LTC andXRP, over multiple timescales and in both the directionssentiment to price and price to sentiment. The effect wasstrongest and most consistent for BTC and LTC. Giventhe hypothesis that low barriers to entry and unsophisti-cated investor speculation are key drivers for price move-ments, and that these represent the most widely knownand traded cryptocurrencies, the fact that causality isdetected most clearly for these currencies correspondedto expectations. We observe that the direction of infor-mation transfer is stronger from sentiment to price forall currencies except BTC, for which the causal signal isslightly stronger in the direction from price to sentiment.The significance tests confirm the existence of causally-coupled relationships, though the strength of these arechallenging to accurately quantify from the data, espe-cially for the sake of comparison between different timeseries and between the linear and non-linear results overthe same data. However, the significance values them-selves offer the possibility of quantifying the strengthof causality, which may be used as a proxy when usingtransfer entropy as a tool for detecting statistical causal-ity. With this work we demonstrate that the dynamics ofthe causative relationship is non-linear, as the autoregres-sive technique observed at most very limited causality ineither direction, for any of the currencies.Let us point out that there is a risk of assuming er-godicity in the results; we have shown the level of causa-tion in-sample, but there is no fundamental reason thatthis should continue out-of-sample. Up to this point, re-search into information transfer has been restricted tobackwards-looking statistical analyses, overlooking anyanalysis into the forward evolution of causal relationshipswith time.
VII. ACKNOWLEDGMENTS
The authors acknowledge Th´arsis Souza in advisingon the method of testing for linear Granger causality, -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Sentiment Price
Non-linear TELinear TE -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Price Sentiment
Non-linear TELinear TE -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Sentiment Price
Non-linear TELinear TE -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Price Sentiment
Non-linear TELinear TE
FIG. 4: Evidence that BTC sentiment and price are causally coupled in both directions in a non-linear way. Non-linear TEis calculated by multidimensional histograms with 6 quantile bins per dimension. Z-scores, calculated over 50 shuffles, show ahigh level of significance, especially during 2017 and 2018, in both directions. -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Sentiment Price
Non-linear TELinear TE -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Price Sentiment
Non-linear TELinear TE -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Sentiment Price
Non-linear TELinear TE -
08 2016 -
11 2017 -
02 2017 -
05 2017 -
08 2017 -
11 2018 -
02 2018 -
05 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Price Sentiment
Non-linear TELinear TE
FIG. 5: Evidence that LTC price and sentiment are causally coupled in both directions in a non-linear way, with sentimenthaving a larger influence on price than the other way round. Non-linear TE is calculated by multidimensional histograms with6 quantile bins per dimension. Z-scores, calculated over 50 shuffles, show a small but clear significant signal, in both directions,with the net information transfer generally operating in the direction of sentiment to price. with thanks along with Yuqing Long, whose datacollation and wrangling was a great help. Finally, agreat debt of thanks is to PsychSignal for providing their market sentiment data for this academic study.TA acknowledges support from ESRC (ES/K002309/1),EPSRC (EP/P031730/1) and EC (H2020-ICT-2018-20 -
01 2017 -
03 2017 -
05 2017 -
07 2017 -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Sentiment Price
Non-linear TELinear TE -
01 2017 -
03 2017 -
05 2017 -
07 2017 -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Price Sentiment
Non-linear TELinear TE -
01 2017 -
03 2017 -
05 2017 -
07 2017 -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Sentiment Price
Non-linear TELinear TE -
01 2017 -
03 2017 -
05 2017 -
07 2017 -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Price Sentiment
Non-linear TELinear TE
FIG. 6: Evidence that XRP price and sentiment are causally coupled in both directions in a non-linear way, with the prevailingdirection of information transfer flowing from sentiment to price in the first period, and from price to sentiment in the second.Non-linear TE is calculated by multidimensional histograms with 6 quantile bins per dimension. Z-scores, calculated over 50shuffles, show a small but clear significant signal, in both directions, which decays rapidly towards January 2018 and does notrecover afterward. -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Sentiment Price
Non-linear TELinear TE -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - T r a n s f e r E n t r o p y ( b i t s ) Transfer Entropy from
Price Sentiment
Non-linear TELinear TE -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Sentiment Price
Non-linear TELinear TE -
09 2017 -
11 2018 -
01 2018 -
03 2018 -
05 2018 -
07 2018 - S i g n i f i c a n c e ( Z - S c o r e ) Significance of TE from
Price Sentiment
Non-linear TELinear TE
FIG. 7: Evidence that ETH price and sentiment are causally coupled in both directions in a non-linear way. Overall thiscoupling is of lower significance compared to the other currencies investigated. Non-linear TE is calculated by multidimensionalhistograms with 6 quantile bins per dimension. Z-scores, calculated over 50 shuffles, indicate some significance, followed by lowsignificance after the collapse in signal strength beginning around January 2016.
Appendix A: Appendix1. Source Code
All analysis for this paper was performed using aPython package (PyCausality) created during the leadauthor’s MSc. This is maintained on the author’s publicGitHub profile, which can be found at https://github.com/ZacKeskin/PyCausality . For the latest release thiscan be simply installed via PyPi using pip.Ongoing maintenance and pre-release development ofthe package will be made available through this repos-itory, and contributors may fork code and submit pullrequests to develop this further.
2. Data
The social sentiment data was provided courtesy ofPsychSignal, and may be made available pending request to the authors. The data takes the form of the number ofpositive messages and the number of negative messages,publicly shared on either Twitter or StockTwits, asso-ciated each hour with the cryptocurrencies in question.The association is detected via the use of a ‘hashtag’ (or‘cashtag’) which takes the form of https://min-api.cryptocompare.com/ [1] D. Hume,
A Treatise of Human Nature (1738).[2] J. Pearl,
Causality (New York, NY: Cambridge Univer-sity Press, 2009).[3] N. Wiener,
The theory of prediction, Modern mathemat-ics for the engineer (McGraw-Hill, New York City, NY,1956).[4] C. W. Granger, Econometrica: Journal of the Economet-ric Society pp. 424–438 (1969).[5] H. Markowitz, The journal of finance , 77 (1952).[6] S. Nakamoto (2008).[7] T. Aste, P. Tasca, and T. Di Matteo, Computer , 18(2017).[8] J. Bollen, H. Mao, and X. Zeng, Journal of computationalscience , 1 (2011).[9] I. Zheludev, R. Smith, and T. Aste, Scientific reports ,4213 (2014).[10] T. T. P. Souza and T. Aste, arXiv preprintarXiv:1601.04535 (2016).[11] T. Aste, Digital Finance (2019), ISSN 2524-6186, URL https://doi.org/10.1007/s42521-019-00008-9 .[12] T. Schreiber, Physical review letters , 461 (2000).[13] L. Barnett, A. B. Barrett, and A. K. Seth, Physical re-view letters , 238701 (2009). [14] R. Vicente, M. Wibral, M. Lindner, and G. Pipa, Journalof computational neuroscience , 45 (2011).[15] X. San Liang, Physical Review E , 052150 (2014).[16] A. Stips, D. Macias, C. Coughlan, E. Garcia-Gorriz, andX. San Liang, Scientific reports , 21691 (2016).[17] O. Kwon and J.-S. Yang, EPL (Europhysics Letters) ,68003 (2008).[18] R. Marschinski and H. Kantz, The European PhysicalJournal B - Condensed Matter and Complex Systems ,275 (2002).[19] S. Tungsong, F. Caccioli, and T. Aste, The Journal ofNetwork Theory in Finance , 1 (2018).[20] F. X. Diebold and K. Yilmaz, The Economic Journal , 158 (2009).[21] J. F. Geweke, Journal of the American Statistical Asso-ciation , 907 (1984).[22] C. E. Shannon, The Bell System Technical Journal ,379 (1948), ISSN 0005-8580.[23] D. W. Hahs and S. D. Pethel, Physical review letters , 128701 (2011).[24] P. Boba, D. Bollmann, D. Schoepe, N. Wester, J. Wiesel,and K. Hamacher, Frontiers in Physics , 10 (2015).2