[PDF] A note on causation versus correlation

Abstract

Recently, it has been shown that the causality and information flow between two time series can be inferred in a rigorous and quantitative sense, and, besides, the resulting causality can be normalized. A corollary that follows is, in the linear limit, causation implies correlation, while correlation does not imply causation. Now suppose there is an event A taking a harmonic form (sine/cosine), and it generates through some process another event B so that B always lags A by a phase of π/2 . Here the causality is obviously seen, while by computation the correlation is, however, zero. This seemingly contradiction is rooted in the fact that a harmonic system always leaves a single point on the Poincaré section; it does not add information. That is to say, though the absolute information flow from A to B is zero, i.e., T A→B =0 , the total information increase of B is also zero, so the normalized T A→B , denoted as τ A→B , takes the form of 0 0 . By slightly perturbating the system with some noise, solving a stochastic differential equation, and letting the perturbation go to zero, it can be shown that τ A→B approaches 100\%, just as one would have expected.

Full PDF

aa r X i v : . [ phy s i c s . d a t a - a n ] J a n A note on causation versus correlation

X. San Liang ∗ Nanjing Institute of Meteorology, Nanjing, China

Xiuqun Yang

School of Atmospheric Sciences, Nanjing University, Nanjing, China (Dated: January 30, 2020)

Abstract

Recently, it has been shown that the causality and information ﬂow between two time seriescan be inferred in a rigorous and quantitative sense, and, besides, the resulting causality can benormalized. A corollary that follows is, in the linear limit, causation implies correlation, whilecorrelation does not imply causation. Now suppose there is an event A taking a harmonic form(sine/cosine), and it generates through some process another event B so that B always lags A by a phase of π/

2. Here the causality is obviously seen, while by computation the correlation is,however, zero. This seemingly contradiction is rooted in the fact that a harmonic system alwaysleaves a single point on the Poincar´e section; it does not add information. That is to say, thoughthe absolute information ﬂow from A to B is zero, i.e., T A → B = 0, the total information increaseof B is also zero, so the normalized T A → B , denoted as τ A → B , takes the form of . By slightlyperturbating the system with some noise, solving a stochastic diﬀerential equation, and lettingthe perturbation go to zero, it can be shown that τ A → B approaches 100%, just as one would haveexpected. PACS numbers: 05.45.-a, 89.70.+c, 89.75.-k, 02.50.-rKeywords: Causality; Time series; Information ﬂow; Correlation ∗ Electronic address: [email protected] x = ( x , x ) d x dt = F ( x , t ) + B ( x , t ) ˙ w , (1)where F = ( F , F ) may be arbitrary nonlinear functions of x and t , ˙ w is a vector ofwhite noise, and B = ( b ij ) is the matrix of perturbation amplitudes which may also beany functions of x and t . Here we adopt the convention in physics and do not distinguishdeterministic and random variables; in probability theory, they are ususally distinguishedwith capital and lower-case symbols. Assume that F and B are both diﬀerentiable withrespect to x and t . Then the information ﬂow from x to x (in nats per unit time) can beexplicitly found in a closed form: T → = − E (cid:20) ρ ∂ ( F ρ ) ∂x (cid:21) + 12 E (cid:20) ρ ∂ g ρ ∂x (cid:21) , (2)where E stands for mathematical expectation, and g ii = P nk =1 b ik b ik , ρ i = ρ i ( x i ) is themarginal probability density function (pdf) of x i . The rate of information ﬂowing from X to x can be obtained by switching the indices. If T j → i = 0, then x j is not causal to x i ;otherwise it is causal, and the absolute value measures the magnitude of the causality from x j to x i . For discrete-time mappings, the information ﬂow is in much more complicated aform; see [1].In the case with only two time series (no dynamical system is given) X and X , underthe assumption of a linear model with additive noise, the maximum likelihood estimator(MLE) of the rate of information ﬂowing from X to X is[2]ˆ T → = C C C ,d − C C ,d C C − C C , (3)where C ij is the sample covariance between X i and X j , and C i,dj the sample covariancebetween X i and a series derived from X j using the Euler forward diﬀerencing scheme: ˙ X j,n =( X j,n + k − X j,n ) / ( k ∆ t ), with k ≥ T → . For details, refer to [2].Considering the long-standing debate ever since Berkeley (1710)[3] over correlation versuscausation, we may rewrite (3) in terms of linear correlation coeﬃcients, which immediatelyimplies[2]: 2 ausation implies correlation, but correlation does not imply causation. The above formalism has been validated with many benchmark systems (e.g., [1]) suchas baker transformation, H´enon map, Kaplan-Yorke map, R¨ossler system, etc. It also hasbeen successfully applied to the studies of many real world problems such those in ﬁnancialeconomics (e.g., the “Seven Dwarfs vs. a Giant” problem[4]), earth system science (e.g., theAntarctica mass balance problem[5] and the global warming problem[6]), neuroscience (e.g.,the concussion problem[7]), to name but a few.Now suppose we have an dynamic event A which drives another B . The former has aharmonic form, leading the latter by a phase of 90 o . That is to say, the time series resultingfrom the two are in quadrature. Then the correlation between the two are zero. However,since A causes B , the result is apparently in contradiction to the corollary that ”causationimplies correlation.”The problem can be more formally stated with the harmonic system: d x dt = F ( x , t ) = Ax = (cid:20) a a a a (cid:21) (cid:20) x x (cid:21) = (cid:20) −

11 0 (cid:21) (cid:20) x x (cid:21) . (4)If the system is initialized with x (0) = 1, ˙ x (0) = 1), the solution is, x = cos t , x = sin t .So the population covariance σ = R cos t sin tdt = 0 (the integral is taken over one or manyperiods). This yields an information ﬂow from x to x : T → = a σ σ = 0 . (5)Fundamentally the above problem arises from the fact that it is a deterministic system. InGranger causality test, this case has been explicitly excluded, as in such case the trajectoriesdo not form appropriate ensembles in the sample space. For a harmonic series, on a Poincaresection it is only one single point; so the total information does not accure. If the totalinformation does not change, the information ﬂow to x must also vanish. However, thevanishing information ﬂow does not mean that there is no inﬂuence of x on x . As weargued in Liang (2015), the so-obtained information must be normalized, just as covarianceneeds to be normalized into correlation, for one to assess the causal inﬂuence. Here if thenormalizer is zero, the problem becomes something like in calculus. We may then approachit by taking the limit. Speciﬁcally, we may approach it by enlarging the sample space slightly,i.e., by adding some stochasticity to the system, then take the limit by letting the stochasticperturbation amplitude go zero.By Liang (2015), the normalizer for T → is Z → = | T → | + (cid:12)(cid:12)(cid:12)(cid:12) dH ∗ dt (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) dH noise1 dt (cid:12)(cid:12)(cid:12)(cid:12) , (6)where on the right hand side, the second term is the contribution from x itself, and thethird term the contribution from noise. In Liang (2015), it has shown that dH ∗ dt is a Lyapunovexponent-like, phase-space stretching rate, and dH noise dt a noise-to-signal ratio. In this prob-lem, we do not have noise taken into account. But in reality, noise is ubiquitous. We mayhence view a deterministic system as a limit or extreme case as the amplitude of stochasticperturbation goes to zero. For this case, we add to (4) a stochastic term: d x dt = Ax + B ˙ w , (7)3here w is a vector of standard Wiener processes. For simplicity, let the perturbationamplitude B a constant matrix. Further let G = BB T , with elements( g ij ) = X k =1 b ik b jk . Liang (2008) established that dH ∗ dt = a = 0 , (8) dH noise1 dt = 12 g σ . (9)So in this case, the normalized ﬂow from x to x is τ → = a σ σ (cid:12)(cid:12)(cid:12) a σ σ (cid:12)(cid:12)(cid:12) + 0 + (cid:12)(cid:12)(cid:12) g σ (cid:12)(cid:12)(cid:12) = − σ | σ | + g . (10)Now for stochastic equation, the covariance matrix Σ evolves as d Σ dt = AΣ + ΣA T + BB T = AΣ + ΣA T + G (11)Expanding, this is ddt (cid:20) σ σ σ σ (cid:21) = (cid:20) − σ − σ σ σ (cid:21) + (cid:20) − σ σ − σ σ (cid:21) + (cid:20) g g g g (cid:21) . We hence obtain the following equation set: dσ dt = − σ + g ,dσ dt = − σ + σ + g ,dσ dt = 2 σ + g . Solving, we get d σ dt = − σ + ( g − g + g ) . So the solution is σ = C cos 2 t + C sin 2 t + 12 ( g − g + g ) t . If σ (0) = 0, ˙ σ (0) = 0, then the integration constants C = C = 0. So τ → = − σ | σ | + g = −

11 + g ( g − g + g ) t Two cases are distinguished: 4ase I g − g = const = 0. lim g → τ → → − . Case II g − g = 0. lim g → τ → → −

11 + 1 /t . As t goes to inﬁnity, τ → also approaches -1.If initially there exists some covariance, say, σ (0) = c , then C = c , and hence τ → = −

11 + g c cos 2 t +( g − g + g ) t . In this case, as g →

0, we always have τ → → −

1. Either way, the relative informationﬂow τ → approaches -1 in the limit of deterministic system. This is indeed what we expect.So even for this extreme case, there is no contradiction at all for causal inference usinginformation ﬂow.To summarize, a recent rigorously formulated causality analysis asserts that, in the linearlimit, causation implies correlation, while correlation does not necessarily mean causation.In this short note, a case which seemingly vioates the assertion is examined. In this case anevent A takes a harmonic form (sine/cosine), and generates through some process anotherevent B so that B is always out of phase with A , i.e., lag A by 90 o . Obviously A causes B , but by computation the correlation between A and B is zero. In this study we showthat this is an extreme case, with only one point in the ensemble space and hence theproblem becomes singular. We re-examine the problem by enlarging the ensemble spaceslightly through adding some noise. A stochastic diﬀerential equation is then solved for thecorresponding covariances, which allows us to obtaint the information ﬂows for the perturbedsystem. Then as the noisy perturbation goes to zero, the normalized information ﬂow ratefrom A to B is established to be 100%, just as one would have expected. So actually nocontradiction exists.One thing that merits mentioning is that, here although it seems that A causes B ,actually here the normalized information ﬂow rate from B to A is also 100%. That is tosay, for such a harmonic system with circular cause-eﬀect relation, it is actually impossibleto diﬀerentiate causality by simply assessing which takes place ﬁrst; anyhow, taking leadby π/ π/

2. The moral is, for a process that is nonsequential(e.g., that in the nonsequential stochastic control systems), circular cause and consequencecoexist, it is essentially impossible to distinguish a delay from an advance.

Acknowledgments.

This study was partially supported by the National Science Foun-dation of China (NSFC) under Grant No. 41975064, and the 2015 Jiangsu Program forInnovation Research and Entrepreneurship Groups. [1] X.S. Liang, Information ﬂow and causality as rigorous notions ab initio. Phys. Rev. E ,052201 (2016).[2] X.S. Liang, Unraveling the cause-eﬀect relation between time series. Phy. Rev. E , 052150(2014).

3] G. Berkeley,

A Treatise on Principle of Human Knowledge (1710).[4] X.S. Liang, Normalizing the causality between time series. Phys. Rev. E , 022126 (2015).[5] S. Vannitsem, Q. Dalaiden, H. Goosse, Testing for dynamical dependence–Application to thesurface mass balance over Antarctica. Geophys. Res. Lett., DOI:10.1029/2019GL084329.[6] A. Stips, and Coauthors, On the causal structure between CO and global temperature. Sci.Rep. , 21691. DOI: 10.1038/srep21691.[7] D.T. Hristopulos, A. Babul, S. Babul, L.R. Brucar, N. Virji-Babul, Disrupted information ﬂowin resting-state in adoloscents with sports related concussion. Frontiers in Human Neuroscience , 419 (2019). DOI: 10.3389/fnhum2019.00419.[8] X.S. Liang, Information ﬂow within stochastic dynamical systems. Phys. Rev. E , 031113(2008)., 031113(2008).