A note on causation versus correlation
aa r X i v : . [ phy s i c s . d a t a - a n ] J a n A note on causation versus correlation
X. San Liang ∗ Nanjing Institute of Meteorology, Nanjing, China
Xiuqun Yang
School of Atmospheric Sciences, Nanjing University, Nanjing, China (Dated: January 30, 2020)
Abstract
Recently, it has been shown that the causality and information flow between two time seriescan be inferred in a rigorous and quantitative sense, and, besides, the resulting causality can benormalized. A corollary that follows is, in the linear limit, causation implies correlation, whilecorrelation does not imply causation. Now suppose there is an event A taking a harmonic form(sine/cosine), and it generates through some process another event B so that B always lags A by a phase of π/
2. Here the causality is obviously seen, while by computation the correlation is,however, zero. This seemingly contradiction is rooted in the fact that a harmonic system alwaysleaves a single point on the Poincar´e section; it does not add information. That is to say, thoughthe absolute information flow from A to B is zero, i.e., T A → B = 0, the total information increaseof B is also zero, so the normalized T A → B , denoted as τ A → B , takes the form of . By slightlyperturbating the system with some noise, solving a stochastic differential equation, and lettingthe perturbation go to zero, it can be shown that τ A → B approaches 100%, just as one would haveexpected. PACS numbers: 05.45.-a, 89.70.+c, 89.75.-k, 02.50.-rKeywords: Causality; Time series; Information flow; Correlation ∗ Electronic address: [email protected] x = ( x , x ) d x dt = F ( x , t ) + B ( x , t ) ˙ w , (1)where F = ( F , F ) may be arbitrary nonlinear functions of x and t , ˙ w is a vector ofwhite noise, and B = ( b ij ) is the matrix of perturbation amplitudes which may also beany functions of x and t . Here we adopt the convention in physics and do not distinguishdeterministic and random variables; in probability theory, they are ususally distinguishedwith capital and lower-case symbols. Assume that F and B are both differentiable withrespect to x and t . Then the information flow from x to x (in nats per unit time) can beexplicitly found in a closed form: T → = − E (cid:20) ρ ∂ ( F ρ ) ∂x (cid:21) + 12 E (cid:20) ρ ∂ g ρ ∂x (cid:21) , (2)where E stands for mathematical expectation, and g ii = P nk =1 b ik b ik , ρ i = ρ i ( x i ) is themarginal probability density function (pdf) of x i . The rate of information flowing from X to x can be obtained by switching the indices. If T j → i = 0, then x j is not causal to x i ;otherwise it is causal, and the absolute value measures the magnitude of the causality from x j to x i . For discrete-time mappings, the information flow is in much more complicated aform; see [1].In the case with only two time series (no dynamical system is given) X and X , underthe assumption of a linear model with additive noise, the maximum likelihood estimator(MLE) of the rate of information flowing from X to X is[2]ˆ T → = C C C ,d − C C ,d C C − C C , (3)where C ij is the sample covariance between X i and X j , and C i,dj the sample covariancebetween X i and a series derived from X j using the Euler forward differencing scheme: ˙ X j,n =( X j,n + k − X j,n ) / ( k ∆ t ), with k ≥ T → . For details, refer to [2].Considering the long-standing debate ever since Berkeley (1710)[3] over correlation versuscausation, we may rewrite (3) in terms of linear correlation coefficients, which immediatelyimplies[2]: 2 ausation implies correlation, but correlation does not imply causation. The above formalism has been validated with many benchmark systems (e.g., [1]) suchas baker transformation, H´enon map, Kaplan-Yorke map, R¨ossler system, etc. It also hasbeen successfully applied to the studies of many real world problems such those in financialeconomics (e.g., the “Seven Dwarfs vs. a Giant” problem[4]), earth system science (e.g., theAntarctica mass balance problem[5] and the global warming problem[6]), neuroscience (e.g.,the concussion problem[7]), to name but a few.Now suppose we have an dynamic event A which drives another B . The former has aharmonic form, leading the latter by a phase of 90 o . That is to say, the time series resultingfrom the two are in quadrature. Then the correlation between the two are zero. However,since A causes B , the result is apparently in contradiction to the corollary that ”causationimplies correlation.”The problem can be more formally stated with the harmonic system: d x dt = F ( x , t ) = Ax = (cid:20) a a a a (cid:21) (cid:20) x x (cid:21) = (cid:20) −
11 0 (cid:21) (cid:20) x x (cid:21) . (4)If the system is initialized with x (0) = 1, ˙ x (0) = 1), the solution is, x = cos t , x = sin t .So the population covariance σ = R cos t sin tdt = 0 (the integral is taken over one or manyperiods). This yields an information flow from x to x : T → = a σ σ = 0 . (5)Fundamentally the above problem arises from the fact that it is a deterministic system. InGranger causality test, this case has been explicitly excluded, as in such case the trajectoriesdo not form appropriate ensembles in the sample space. For a harmonic series, on a Poincaresection it is only one single point; so the total information does not accure. If the totalinformation does not change, the information flow to x must also vanish. However, thevanishing information flow does not mean that there is no influence of x on x . As weargued in Liang (2015), the so-obtained information must be normalized, just as covarianceneeds to be normalized into correlation, for one to assess the causal influence. Here if thenormalizer is zero, the problem becomes something like in calculus. We may then approachit by taking the limit. Specifically, we may approach it by enlarging the sample space slightly,i.e., by adding some stochasticity to the system, then take the limit by letting the stochasticperturbation amplitude go zero.By Liang (2015), the normalizer for T → is Z → = | T → | + (cid:12)(cid:12)(cid:12)(cid:12) dH ∗ dt (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) dH noise1 dt (cid:12)(cid:12)(cid:12)(cid:12) , (6)where on the right hand side, the second term is the contribution from x itself, and thethird term the contribution from noise. In Liang (2015), it has shown that dH ∗ dt is a Lyapunovexponent-like, phase-space stretching rate, and dH noise dt a noise-to-signal ratio. In this prob-lem, we do not have noise taken into account. But in reality, noise is ubiquitous. We mayhence view a deterministic system as a limit or extreme case as the amplitude of stochasticperturbation goes to zero. For this case, we add to (4) a stochastic term: d x dt = Ax + B ˙ w , (7)3here w is a vector of standard Wiener processes. For simplicity, let the perturbationamplitude B a constant matrix. Further let G = BB T , with elements( g ij ) = X k =1 b ik b jk . Liang (2008) established that dH ∗ dt = a = 0 , (8) dH noise1 dt = 12 g σ . (9)So in this case, the normalized flow from x to x is τ → = a σ σ (cid:12)(cid:12)(cid:12) a σ σ (cid:12)(cid:12)(cid:12) + 0 + (cid:12)(cid:12)(cid:12) g σ (cid:12)(cid:12)(cid:12) = − σ | σ | + g . (10)Now for stochastic equation, the covariance matrix Σ evolves as d Σ dt = AΣ + ΣA T + BB T = AΣ + ΣA T + G (11)Expanding, this is ddt (cid:20) σ σ σ σ (cid:21) = (cid:20) − σ − σ σ σ (cid:21) + (cid:20) − σ σ − σ σ (cid:21) + (cid:20) g g g g (cid:21) . We hence obtain the following equation set: dσ dt = − σ + g ,dσ dt = − σ + σ + g ,dσ dt = 2 σ + g . Solving, we get d σ dt = − σ + ( g − g + g ) . So the solution is σ = C cos 2 t + C sin 2 t + 12 ( g − g + g ) t . If σ (0) = 0, ˙ σ (0) = 0, then the integration constants C = C = 0. So τ → = − σ | σ | + g = −
11 + g ( g − g + g ) t Two cases are distinguished: 4ase I g − g = const = 0. lim g → τ → → − . Case II g − g = 0. lim g → τ → → −
11 + 1 /t . As t goes to infinity, τ → also approaches -1.If initially there exists some covariance, say, σ (0) = c , then C = c , and hence τ → = −
11 + g c cos 2 t +( g − g + g ) t . In this case, as g →
0, we always have τ → → −
1. Either way, the relative informationflow τ → approaches -1 in the limit of deterministic system. This is indeed what we expect.So even for this extreme case, there is no contradiction at all for causal inference usinginformation flow.To summarize, a recent rigorously formulated causality analysis asserts that, in the linearlimit, causation implies correlation, while correlation does not necessarily mean causation.In this short note, a case which seemingly vioates the assertion is examined. In this case anevent A takes a harmonic form (sine/cosine), and generates through some process anotherevent B so that B is always out of phase with A , i.e., lag A by 90 o . Obviously A causes B , but by computation the correlation between A and B is zero. In this study we showthat this is an extreme case, with only one point in the ensemble space and hence theproblem becomes singular. We re-examine the problem by enlarging the ensemble spaceslightly through adding some noise. A stochastic differential equation is then solved for thecorresponding covariances, which allows us to obtaint the information flows for the perturbedsystem. Then as the noisy perturbation goes to zero, the normalized information flow ratefrom A to B is established to be 100%, just as one would have expected. So actually nocontradiction exists.One thing that merits mentioning is that, here although it seems that A causes B ,actually here the normalized information flow rate from B to A is also 100%. That is tosay, for such a harmonic system with circular cause-effect relation, it is actually impossibleto differentiate causality by simply assessing which takes place first; anyhow, taking leadby π/ π/
2. The moral is, for a process that is nonsequential(e.g., that in the nonsequential stochastic control systems), circular cause and consequencecoexist, it is essentially impossible to distinguish a delay from an advance.
Acknowledgments.
This study was partially supported by the National Science Foun-dation of China (NSFC) under Grant No. 41975064, and the 2015 Jiangsu Program forInnovation Research and Entrepreneurship Groups. [1] X.S. Liang, Information flow and causality as rigorous notions ab initio. Phys. Rev. E ,052201 (2016).[2] X.S. Liang, Unraveling the cause-effect relation between time series. Phy. Rev. E , 052150(2014).
3] G. Berkeley,
A Treatise on Principle of Human Knowledge (1710).[4] X.S. Liang, Normalizing the causality between time series. Phys. Rev. E , 022126 (2015).[5] S. Vannitsem, Q. Dalaiden, H. Goosse, Testing for dynamical dependence–Application to thesurface mass balance over Antarctica. Geophys. Res. Lett., DOI:10.1029/2019GL084329.[6] A. Stips, and Coauthors, On the causal structure between CO and global temperature. Sci.Rep. , 21691. DOI: 10.1038/srep21691.[7] D.T. Hristopulos, A. Babul, S. Babul, L.R. Brucar, N. Virji-Babul, Disrupted information flowin resting-state in adoloscents with sports related concussion. Frontiers in Human Neuroscience , 419 (2019). DOI: 10.3389/fnhum2019.00419.[8] X.S. Liang, Information flow within stochastic dynamical systems. Phys. Rev. E , 031113(2008)., 031113(2008).