Mutual information rate and bounds for it
M. S. Baptista, R. M. Rubinger, E. R. V. Junior, J. C. Sartorelli, U. Parlitz, C. Grebogi
aa r X i v : . [ n li n . C D ] M a y Mutual information rate and bounds for it
M. S. Baptista , R. M. Rubinger , E. R. V. Junior , J. C. Sartorelli , U. Parlitz , and C. Grebogi , Institute for Complex Systems and Mathematical Biology, SUPA,University of Aberdeen, AB24 3UE Aberdeen, United Kingdom Federal University of Itajuba, Av. BPS 1303, Itajub´a, Brazil Institute of Physics, University of S˜ao Paulo, Rua do Mat˜ao, Travessa R, 187, 05508-090, S˜ao Paulo, Brazil Biomedical Physics Group, Max Planck Institute for Dynamicsand Self-Organization, Am Fassberg 17, 37077 G¨ottingen, Germany Freiburg Institute for Advanced Studies (FRIAS),University of Freiburg, Albertstr. 19, 79104 Freiburg, Germany (Dated: May 28, 2018)The amount of information exchanged per unit of time between two nodes in a dynamical networkor between two data sets is a powerful concept for analysing complex systems. This quantity,known as the mutual information rate (MIR), is calculated from the mutual information, whichis rigorously defined only for random systems. Moreover, the definition of mutual information isbased on probabilities of significant events. This work offers a simple alternative way to calculatethe MIR in dynamical (deterministic) networks or between two data sets (not fully deterministic),and to calculate its upper and lower bounds without having to calculate probabilities, but rather interms of well known and well defined quantities in dynamical systems. As possible applications ofour bounds, we study the relationship between synchronisation and the exchange of information ina system of two coupled maps and in experimental networks of coupled oscillators.
I. INTRODUCTION
Shannon’s entropy quantifies information [1]. It mea-sures how much uncertainty an observer has about anevent being produced by a random system. Another im-portant concept in the theory of information is the mu-tual information [1]. It measures how much uncertaintyan observer has about an event in a random system X after observing an event in a random system Y (or vice-versa).Mutual information is an important quantity becauseit quantifies not only linear and non-linear interdepen-dencies between two systems or data sets, but also is ameasure of how much information two systems exchangeor two data sets share. Due to these characteristics, itbecame a fundamental quantity to understand the devel-opment and function of the brain [2, 3], to characterise[4, 5] and model complex systems [6–8] or chaotic sys-tems, and to quantify the information capacity of a com-munication system [9]. When constructing a model ofa complex system, the first step is to understand whichare the most relevant variables to describe its behaviour.Mutual information provides a way to identify those vari-ables [10].However, the calculation of mutual information indynamical networks or data sets faces three maindifficulties[4, 11–13]. Mutual information is rigorouslydefined for random memoryless processes, only. In ad-dition, its calculation involves probabilities of significantevents and a suitable space where probability is calcu-lated. The events need to be significant in the sense thatthey contain as much information about the system aspossible. But, defining significant events, for examplethe fact that a variable has a value within some partic-ular interval, is a difficult task because the interval that provides significant events is not always known. Finally,data sets have finite size. This prevents one from cal-culating probabilities correctly. As a consequence, mu-tual information can often be calculated with a bias, only[4, 11–13].In this work, we show how to calculate the amount ofinformation exchanged per unit of time [Eq. (3)], theso called mutual information rate (MIR), between twoarbitrary nodes (or group of nodes) in a dynamical net-work or between two data sets. Each node representinga d-dimensional dynamical system with d state variables.The trajectory of the network considering all the nodesin the full phase space is called “attractor” and repre-sented by Σ. Then, we propose an alternative method,similar to the ones proposed in Refs. [14, 15], to calcu-late significant upper and lower bounds for the MIR indynamical networks or between two data sets, in termsof Lyapunov exponents, expansion rates, and capacity di-mension. These quantities can be calculated without theuse of probabilistic measures. As possible applications ofour bounds calculation, we describe the relationship be-tween synchronisation and the exchange of informationin small experimental networks of coupled Double-Scrollcircuits.In previous works of Refs. [14, 15], we have proposedan upper bound for the MIR in terms of the positive con-ditional Lyapunov exponents of the synchronisation man-ifold. As a consequence, this upper bound could only becalculated in special complex networks that allow the ex-istence of complete synchronisation. In the present work,the proposed upper bound can be calculated to any sys-tem (complex networks and data sets) that admits thecalculation of Lyapunov exponents.We assume that an observer can measure only onescalar time series for each one of two chosen nodes. Thesetwo time series are denoted by X and Y and they form abidimensional set Σ Ω = ( X, Y ), a projection of the “at-tractor” into a bidimensional space denoted by Ω. Tocalculate the MIR in higher-dimensional projections Ω,see Supplementary Information. Assume that the spaceΩ is coarse-grained in a square grid of N boxes withequal sides ǫ , so N = 1 /ǫ .Mutual information is defined in the following way [1].Given two random variables, X and Y , each one pro-duces events i and j with probabilities P X ( i ) and P Y ( j ),respectively, the joint probability between these events isrepresented by P XY ( i, j ). Then, mutual information isdefined as I S = H X + H Y − H XY . (1) H X = − P i P X ( i ) log [ P X ( i )], H Y = − P j P Y ( j ) log [ P Y ( j )],and H XY = − P i,j P XY ( i, j ) log [ P XY ( i, j )] . For simplifica-tion in our notation for the probabilities, we drop thesubindexes X , Y , and XY , by making P X ( i ) = P ( i ), P Y ( j ) = P ( j ), and P XY ( i, j ) = P ( i, j ). When using Eq.(1) to calculate the mutual information between the dy-namical variables X and Y , the probabilities appearingin Eq. (1) are defined such that P ( i ) is the probability offinding points in a column i of the grid, P ( j ) of findingpoints in the row j of the grid, and P ( i, j ) the probabilityof finding points where the column i meets the line j ofthe grid.The MIR was firstly introduced by Shannon [1] as a“rate of actual transmission” [16] and later more rigor-ously redefined in Refs. [17, 18]. It represents the mutualinformation exchanged between two dynamical variables(correlated) per unit of time. To simplify the calculationof the MIR, the two continuous dynamical variables aretransformed into two discrete symbolic sequences X and Y . Then, the MIR is defined by M IR = lim n →∞ I S ( n ) n , (2)where I S ( n ) represents the usual mutual information be-tween the two sequences X and Y , calculated by consid-ering words of length n .The MIR is a fundamental quantity in science. Itsmaximal value gives the information capacity betweenany two sources of information (no need for stationarity,statistical stability, memoryless) [19]. Therefore, alterna-tive approaches for its calculation or for the calculationof bounds of it are of vital relevance. Due to the limit toinfinity in Eq. (2) and because it is defined from prob-abilities, the MIR is not easy to be calculated especiallyif one wants to calculate it from (chaotic) trajectories ofa large complex network or data sets. The difficultiesfaced to estimate the MIR from dynamical systems andnetworks are similar to the ones faced in the calculation ofthe Kolmogorov-Sinai entropy, H KS [20], (Shannon’s en-tropy per unit of time). Because of these difficulties, theupper bound for H KS proposed by Ruelle [21] in termsof the Lyapunov exponents and valid for smooth dynam-ical systems ( H KS ≤ P λ + i , where λ + i represent all the i positive Lyapunov exponents) or the Pesin’s equality [22] ( H KS = P λ + i ) proved in Ref. [23] to be valid forthe large class of systems that possess a SRB measure,became so important in the theory of dynamical systems.Our upper bound [Eq. (13)] is a result equivalent to thework of Ruelle. II. MAIN RESULTS
One of the main results of this work (whose derivationcan be seen in Sec. III B) is to show that, in dynami-cal networks or data sets with fast decay of correlation, I S in Eq. (1) represents the amount of mutual informa-tion between X and Y produced within a special timeinterval T , where T represents the time for the dynam-ical network (or data sets) to lose its memory from theinitial state or the correlation to decay to zero. Correla-tion in this work is not the usual linear correlation, buta non-linear correlation defined in terms of the evolutionof spatial probabilities, the quantity C ( T ) in Sec. III A.Therefore, the mutual information rate (MIR), betweenthe dynamical variables X and Y (or two data sets) canbe estimated by MIR = I S T (3) In systems that present sensitivity to initial conditions,e.g. chaotic systems, predictions are only possible fortimes smaller than this time T . This time has othermeanings. It is the expected time necessary for a setof points belonging to an ǫ -square box in Ω to spreadover Σ Ω and it is of the order of the shortest Poincar´ereturn time for a point to leave a box and return to it[24, 25]. It can be estimated by T ≈ λ log (cid:20) ǫ (cid:21) . (4) where λ is the largest positive Lyapunov exponent mea-sured in Σ Ω . Chaotic systems present the mixing prop-erty (see Sec. III A), and as a consequence the correla-tion C ( t ) always decays to zero, surely after an infinitelylong time. The correlation of chaotic systems can alsodecay to zero for sufficiently large but finite t = T (seeSupplementary Information). T can be interpreted tobe the minimum time required for a system to satisfythe conditions to be considered mixing. Some examplesof physical systems that are proved to be mixing andhave exponentially fast decay of correlation are nonequi-librium steady-state [26], Lorenz gases (models of diffu-sive transport of light particles in a network of heavierparticles) [27], and billiards [28]. An example of a “realworld” physical complex system that presents exponen-tially fast decay of correlation is plasma turbulence [29].We do not expect that data coming from a “real world”complex system is rigorously mixing and has an expo-nentially fast decay of correlation. But, we expect thatthe data has a sufficiently fast decay of correlation (e.g.stretched exponential decay or polynomially fast decays),implying that the system has sufficiently high sensitivityto initial conditions and as a consequence C ( t ) ∼ = 0, fora reasonably small and finite time t = T .The other two main results of our work are presentedin Eqs. (5) and (7), whose derivations are presented inSec. III C. The upper bound for the MIR is given by I C = λ − λ = λ (2 − D ) , (5) where λ and λ (positive defined) represent the largestand the second largest Lyapunov exponent measured inΣ Ω , if both exponents are positive. If the i -largest ex-ponent is negative, then we set λ i = 0. If the set Σ Ω represents a periodic orbit, I C = 0, and therefore thereis no information being exchanged. The quantity D isdefined as D = − log ( N C ( t = T ))log ( ǫ ) , (6) where N C ( t = T ) is the number of boxes that would becovered by fictitious points at time T . At time t = 0,these fictitious points are confined in an ǫ -square box.They expand not only exponentially fast in both direc-tions according to the two positive Lyapunov exponents,but expand forming a compact set, a set with no “holes”.At t = T , they spread over Σ Ω .The lower bound for the MIR is given by I lC = λ (2 − ˜ D ) , (7)where ˜ D represents the capacity dimension of the setΣ Ω ˜ D = lim ǫ → (cid:20) − log ( ˜ N C ( ǫ ))log ( ǫ ) (cid:21) , (8) where ˜ N C represents the number of boxes in Ω that areoccupied by points of Σ Ω . D is defined in a way similar to the capacity dimen-sion, thought it is not the capacity dimension. In fact, D ≤ ˜ D , because ˜ D measures the change in the num-ber of occupied boxes in Ω as the space resolution varies,whereas D measures the relative number of boxes with acertain fixed resolution ǫ that would be occupied by thefictitious points (in Ω) after being iterated for a time T .As a consequence, the empty space in Ω that is not occu-pied by Σ Ω does not contribute to the calculation of ˜ D ,whereas it contributes to the calculation of the quantity D . In addition, N C ≥ ˜ N C (for any ǫ ), because while thefictitious points form a compact set expanding with thesame ratio as the one for which the real points expand(ratio provided by the Lyapunov exponents), the real setof points Σ Ω might not occupy many boxes. III. METHODSA. Mixing, correlation decay and invariantmeasures
Denote by F T ( x ) a mixing transformation that rep-resents how a point x ∈ Σ Ω is mapped after a time T into Σ Ω , and let ρ ( x ) to represent the probabilityof finding a point of Σ Ω in x (natural invariant den-sity). Let I ′ represent a region in Ω. Then, µ ( I ′ ) = R ρ ( x ) dx , for x ∈ I ′ represents the probability mea-sure of the region I ′ . Given two square boxes I ′ ∈ Ωand I ′ ∈ Ω, if F T is a mixing transformation, thenfor a sufficiently large T , we have that the correlation C ( T ) = µ [ F − T ( I ′ ) ∩ I ′ ] − µ [ I ′ ] µ [ I ′ ], decays to zero, theprobability of having a point in I ′ that is mapped to I ′ is equal to the probability of being in I ′ times the prob-ability of being in I ′ . That is typically what happens inrandom processes.If the measure µ (Σ Ω ) is invariant, then µ ([ F − T (Σ Ω )] = µ (Σ Ω ). Mixing and ergodic systems produce measuresthat are invariant. B. Derivation of the mutual information rate(MIR) in dynamical networks and data sets
We consider that the dynamical networks or data setsto be analysed present either the mixing property or havefast decay of correlations, and their probability measureis time invariant. If a system that is mixing for a timeinterval T is observed (sampled) once every time interval T , then the probabilities generated by these snapshotobservations behave as if they were independent, and thesystem behaves as if it were a random process. Thisis so because if a system is mixing for a time interval T , then the correlation C ( T ) decays to zero for this timeinterval. For systems that have some decay of correlation,surely the correlation decays to zero after an infinite timeinterval. But, this time interval can also be finite, asshown in Supplementary Information.Consider now that we have experimental points andthey are sampled once every time interval T . The prob-ability ˜ P XY ( i, j ) → ˜ P XY ( k, l ) of the sampled trajec-tory to follow a given itinerary, for example to fall inthe box with coordinates ( i, j ) and then be iteratedto the box ( k, l ) depends exclusively on the probabili-ties of being at the box ( i, j ), represented by ˜ P XY ( i, j ),and being at the box ( k, l ), represented by ˜ P XY ( k, l ).Therefore, for the sampled trajectory, ˜ P XY ( i, j ) → ˜ P XY ( k, l ) = ˜ P XY ( i, j ) ˜ P XY ( k, l ). Analogously, the prob-ability ˜ P X ( i ) → ˜ P Y ( j ) of the sampled trajectory to fallin the column (or line) i of the grid and then be iteratedto the column (or line) j is given by ˜ P X ( i ) → ˜ P Y ( j ) =˜ P X ( i ) ˜ P Y ( j ).The MIR of the experimental non-sampled trajectorypoints can be calculated from the mutual information ˜ I S of the sampled trajectory points that follow itineraries oflength n : M IR = lim n →∞ ˜ I S ( n ) nT , (9)Due to the absence of correlations of the sampled tra-jectory points, the mutual information for these pointsfollowing itineraries of length n can be written as˜ I S ( n ) = n [ ˜ H X ( n = 1)+ ˜ H Y ( n = 1) − ˜ H XY ( n = 1)] , (10)where ˜ H X ( n = 1) = − P i ˜ P X ( i ) log [ ˜ P X ( i )], ˜ H Y ( n =1) = − P j ˜ P Y ( j ) log [ ˜ P Y ( j )], and ˜ H XY ( n = 1) = − P i,j ˜ P XY ( i, j ) log [ ˜ P XY ( i, j )], and ˜ P X ( i ), ˜ P Y ( j ), and˜ P XY ( i, j ) represent the probability of the sampled trajec-tory points to fall in the line i of the grid, in the column j of the grid, and in the box ( i, j ) of the grid, respectively.Due to the time invariance of the set Σ Ω assumed toexist, the probability measure of the non-sampled trajec-tory is equal to the probability measure of the sampledtrajectory. If a system that has a time invariant mea-sure is observed (sampled) once every time interval T ,the observed set has the same natural invariant densityand probability measure of the original set. As a conse-quence, if Σ Ω has a time invariant measure, the proba-bilities P ( i ), P ( j ), and P ( i, j ) (used to calculate I S ) areequal to ˜ P X ( i ), ˜ P Y ( j ), and ˜ P XY ( i, j ).Consequently, ˜ H X ( n = 1) = H X , ˜ H Y ( n = 1) = H Y ,and ˜ H XY ( n = 1) = H XY , and therefore ˜ I S ( n ) = nI S ( n ).Substituting into Eq. (9), we finally arrive to M IR = I S T (11)where I S between two nodes is calculated from Eq. (1).Therefore, in order to calculate the MIR, we need to es-timate the time T for which the correlation of the systemapproaches zero and the probabilities P ( i ), P ( j ), P ( i, j )of the experimental non-sampled experimental points tofall in the line i of the grid, in the column j of the grid,and in the box ( i, j ) of the grid, respectively. C. Derivation of an upper ( I C ) and lower ( I lC )bounds for the MIR Consider that our attractor Σ is generated by a 2dexpanding system that possess 2 positive Lyapunov ex-ponents λ and λ , with λ ≥ λ . Σ ∈ Ω. Imagine abox whose sides are oriented along the orthogonal ba-sis used to calculate the Lyapunov exponents. Then,points inside the box spread out after a time interval t to ǫ √ λ t along the direction from which λ is cal-culated. At t = T , ǫ √ λ T = L , which provides T inEq. (4), since L = √
2. These points spread after a timeinterval t to ǫ √ λ t along the direction from which λ is calculated. After an interval of time t = T , thesepoints spread out over the set Σ Ω . We require that for t ≤ T , the distance between these points only increases:the system is expanding.Imagine that at t = T , fictitious points initiallyin a square box occupy an area of ǫ √ λ T L =2 ǫ exp ( λ + λ ) T . Then, the number of boxes of sides ǫ that contain fictitious points can be calculated by N C = 2 ǫ exp ( λ + λ ) T / ǫ = exp ( λ + λ ) T . From Eq. (4), N = exp λ T , since N = 1 /ǫ .We denote with a lower-case format, the probabili-ties p ( i ), p ( j ), and p ( i, j ) with which fictitious points occupy the grid in Ω. If these fictitious points spread uni-formly forming a compact set whose probabilities of find-ing points in each fictitious box is equal, then p ( i ) = 1 /N (= N C N C N ), p ( j ) = 1 /N , and p ( i, j ) = 1 /N C . Let us de-note the Shannon’s entropy of the probabilities p ( i, j ), p ( i ) and p ( j ) as h X , h Y , and h XY . The mutual infor-mation of the fictitious trajectories after evolving a timeinterval T can be calculated by I uS = h X + h Y − h XY .Since, p ( i ) = p ( j ) = 1 /N and p ( i, j ) = 1 /N C , then I uS = 2 log ( N ) − log ( N C ). At t = T , we have that N = exp λ T and N C = exp ( λ + λ ) T , leading us to I uS = ( λ − λ ) T . Therefore, defining, I C = I uS /T , wearrive at I C = λ − λ .We defining D as D = − log ( N C ( t = T ))log ( ǫ ) , (12) where N C ( t = T ) being the number of boxes that wouldbe covered by fictitious points at time T . At time t = 0,these fictitious points are confined in an ǫ -square box.They expand not only exponentially fast in both direc-tions according to the two positive Lyapunov exponents,but expand forming a compact set, a set with no “holes”.At t = T , they spread over Σ Ω .Using ǫ = exp − λ T and N C = exp ( λ + λ ) T in Eq. (12),we arrive at D = 1 + λ λ , and therefore, we can write that I C = λ − λ = λ (2 − D ) , (13) To calculate the maximal possible MIR, of a randomindependent process, we assume that the expansion ofpoints is uniform only along the columns and lines of thegrid defined in the space Ω, i.e., P ( i ) = P ( j ) = 1 /N ,(which maximises H X and H Y ), and we allow P ( i, j ) tobe not uniform (minimising H XY ) for all i and j , then I S ( ǫ ) = − ǫ ) + X i,j P ( i, j ) log [ P ( i, j )] . (14) Since T ( ǫ ) = − /λ log ( ǫ ), dividing I S ( ǫ ) by T ( ǫ ), tak-ing the limit of ǫ →
0, and reminding that the informa-tion dimension of the set Σ Ω in the space Ω is defined as˜ D =lim ǫ → P i,j P ( i,j ) log [ P ( i,j )]log ( ǫ ) , we obtain that the MIRis given by I S /T = λ (2 − ˜ D ) . (15) Since ˜ D ≤ ˜ D (for any value of ǫ ), then λ (2 − ˜ D ) ≥ λ (2 − ˜ D ), which means that a lower bound for the max-imal MIR [provided by Eq. (15)] is given by I lC = λ (2 − ˜ D ) , (16) But D ≤ ˜ D (for any value of ǫ ), and therefore I C is anupper bound for I lC .To show why I C is an upper bound for the maximalpossible MIR, assume that the real points Σ Ω occupythe space Ω uniformly. If ˜ N C > N , there are many boxesbeing occupied. It is to be expected that the proba-bility of finding a point in a line or column of the gridis P ( i ) = P ( j ) ∼ = 1 /N , and P ( i, j ) ∼ = 1 / ˜ N C . In sucha case, M IR ∼ = I lC , which implies that I C ≥ M IR .If ˜ N C < N , there are only few boxes being sparselyoccupied. The probability of finding a point in a lineor column of the grid is P ( i ) = P ( j ) ∼ = 1 / ˜ N C , and P ( i, j ) ∼ = 1 / ˜ N C . There are ˜ N C lines and columns be-ing occupied by points in the grid. In such a case, I S ∼ = 2 log ( ˜ N C ) − log ( ˜ N C ) ∼ = log ( ˜ N C ). Comparingwith I uS = 2 log ( N ) − log ( N C ), and since ˜ N C < N and N C ≥ ˜ N C , then we conclude that I uS ≥ I S , which impliesthat I C ≥ M IR .Notice that if P ( i, j ) = p ( i, j ) = 1 /N C and ˜ D = ˜ D ,then I S /T = I lC = I C . D. Expansion rates
In order to extend our approach for the treatment ofdata sets coming from networks whose equations of mo-tion are unknown, or for higher-dimensional networksand complex systems which might be neither rigorouslychaotic nor fully deterministic, or for experimental datathat contains noise and few sampling points, we write ourbounds in terms of expansion rates defined in this workby e k ( t ) = 1 / ˜ N C ˜ N C X i =1 t log [ L ik ( t )] , (17) where we consider k = 1 , L i ( t ) measures the largestgrowth rate of nearby points. In practice, it is calculatedby L i ( t ) = ∆ δ , with δ representing the largest distancebetween pair of points in an ǫ -square box i and ∆ rep-resenting the largest distance between pair of the pointsthat were initially in the ǫ -square box but have spreadout for an interval of time t . L i ( t ) measures how anarea enclosing points grows. In practice, it is calculatedby L i ( t ) = Aǫ , with ǫ representing the area occupiedby points in an ǫ -square box, and A the area occupiedby these points after spreading out for a time interval t . There are ˜ N C boxes occupied by points which aretaken into consideration in the calculation of L ik ( t ). Anorder- k expansion rate, e k ( t ), measures on average howa hypercube of dimension k exponentially grows after aninterval of time t . So, e measures the largest growth rateof nearby points, a quantity closely related to the largestfinite-time Lyapunov exponent [30]. And e measureshow an area enclosing points grows, a quantity closelyrelated to the sum of the two largest positive Lyapunovexponents. In terms of expansion rates, Eqs. (4) and(13) read T = e log (cid:2) ǫ (cid:3) and I C = e (2 − D ), respec-tively, and Eqs. (12) and (16) read D ( t ) = e ( t ) e ( t ) and I lC = e (2 − ˜ D ), respectively.From the way we have defined expansion rates, we ex-pect that e k ≤ P ki =1 λ i . Because of the finite time inter-val and the finite size of the regions of points considered, regions of points that present large derivatives, contribut-ing largely to the Lyapunov exponents, contribute lessto the expansion rates. If a system has constant deriva-tive (hyperbolic) and has constant natural measure, then e k = P ki =1 λ i .There are many reasons for using expansion rates inthe way we have defined them in order to calculatebounds for the MIR. Firstly, because they can be easilyexperimentally estimated whereas Lyapunov exponentsdemand huge computational efforts. Secondly, becauseof the macroscopic nature of the expansion rates, theymight be more appropriate to treat data coming fromcomplex systems that contains large amounts of noise,data that have points that are not (arbitrarily) close asformally required for a proper calculation of the Lya-punov exponents. Thirdly, expansion rates can be welldefined for data sets containing very few data points: thefewer points a data set contains, the larger the regions ofsize ǫ need to be and the shorter the time T is. Finally,expansion rates are defined in a similar way to finite-timeLyapunov exponents and thus some algorithms used tocalculate Lyapunov exponents can be used to calculateour defined expansion rates. IV. APPLICATIONSA. MIR and its bounds in two coupled chaoticmaps
To illustrate the use of our bounds, we consider thefollowing two bidirectionally coupled maps X (1) n +1 = 2 X (1) n + ρX (1) n + σ ( X (2) n − X (1) n ) , mod 1 X (2) n +1 = 2 X (2) n + ρX (2) n + σ ( X (1) n − X (2) n ) , mod 1 (18) where X ( i ) n ∈ [0 , ρ = 0, the map is piecewise-linearand quadratic, otherwise. We are interested in measuringthe exchange of information between X (1) and X (2) . Thespace Ω is a square of sides 1. The Lyapunov exponentsmeasured in the space Ω are the Lyapunov exponents ofthe set Σ Ω that is the chaotic attractor generated by Eqs.(18).The quantities I S /T , I C , and I lC are shown in Fig. 1as we vary σ for ρ = 0 (A) and ρ = 0 . I S using in Eq. (1) the probabilities P ( i, j ) in whichpoints from a trajectory composed of 2 , ,
000 samplesfall in boxes of sides ǫ =1/500 and the probabilities P ( i )and P ( j ) that the points visit the intervals [( i − ǫ, iǫ [of the variable X (1) n or [( j − ǫ, jǫ [ of the variable X (2) n ,respectively, for i, j = 1 , . . . , N . When computing I S /T ,the quantity T was estimated by Eq. (4). Indeed formost values of σ , I C ≥ I S /T and I lC ≤ I S /T .For σ = 0 there is no coupling, and therefore the twomaps are independent from each other. There is no in-formation being exchanged. In fact, I C = 0 and I lC ∼ = 0in both figures, since D = ˜ D = 2, meaning that the I S /T I C I Cl σ (A)(B) FIG. 1: [Color online] Results for two coupled maps. I S /T [Eq. (11)] as (green online) filled circles, I C [Eq. (13)] asthe (red online) thick line, and I lC [Eq. (16)] as the (brownonline) crosses. In (A) ρ = 0 and in (B) ρ = 0 .
1. The unitsof I S /T , I C , and I lC are [bits/iteration]. attractor Σ Ω fully occupies the space Ω. This is a re-markable property of our bounds: to identify that thereis no information being exchanged when the two maps areindependent. Complete synchronisation is achieved and I C is maximal, for σ > . σ ≥ .
55 (B). Aconsequence of the fact that D = ˜ D = 1, and therefore, I C = I lC = λ . The reason is because for this situationthis coupled system is simply the shift map, a map withconstant natural measure; therefore P ( i ) = P ( j ) and P ( i, j ) are constant for all i and j . As usually happenswhen one estimates the mutual information by partition-ing the phase space with a grid having a finite resolutionand data sets possessing a finite number of points, I S istypically larger than zero, even when there is no infor-mation being exchanged ( σ = 0). Even when there iscomplete synchronisation, we find non-zero off-diagonalterms in the matrix for the joint probabilities causing I S to be smaller than it should be. Due to numerical errors, X (1) ∼ = X (2) , and points that should be occupying boxeswith two corners exactly along a diagonal line in the sub-space Ω end up occupying boxes located off-diagonal andthat have at least three corners off-diagonal. The es-timation of the lower bound I lC suffers from the sameproblems.Our upper bound I C is calculated assuming that thereis a fictitious dynamics expanding points (and producingprobabilities) not only exponentially fast but also uni-formly. The “experimental” numerical points from Eqs.(18) expand exponentially fast, but not uniformly. Mostof the time the trajectory remains in 4 points: (0,0),(1,1), (1,0), (0,1). That is the main reason of why I C ismuch larger than the estimated real value of the M IR ,for some coupling strengths. If a two nodes in a dynam-ical network, such as two neurons in a brain, behave inthe same way the fictitious dynamics does, these nodeswould be able to exchange the largest possible amount of information.We would like to point out that one of the mainadvantages of calculating upper bounds for the MIR( I S /T ) using Eq. (13) instead of actually calculating I S /T is that we can reproduce the curves for I C us-ing much less number of points (1000 points) than theones (2 , , I S /T .If ρ = 0, I C = − ln (1 − σ ) can be calculated since λ = ln (2) and λ = ln (2 − σ ). B. MIR and its bounds in experimental networksof Double-Scroll circuits
We illustrate our approach for the treatment of datasets using a network formed by an inductorless version ofthe Double-Scroll circuit [31]. We consider four networksof bidirectionally diffusively coupled circuits. TopologyI represents two bidirectionally coupled circuits, Topol-ogy II, three circuits coupled in an open-ended array,Topology III, four circuits coupled in an open-ended ar-ray, and Topology IV, coupled in an closed array. Wechoose two circuits in the different networks (one con-nection apart) and collect from each circuit a time-seriesof 79980 points, with a sampling rate of δ = 80 .
000 sam-ples/s. The measured variable is the voltage across oneof the circuit capacitors, which is normalised in order tomake the space Ω to be a square of sides 1. Such nor-malisation does not alter the quantities that we calculate.The following results provide the exchange of informationbetween these two chosen circuits. The values of ǫ and t used to course-grain the space Ω and to calculate e inEq. (17) are the ones that minimises | N C ( T, e ) − ˜ N C ( ǫ ) | and at the same time satisfy N C ( T, e ) ≥ ˜ N C ( ǫ ), where N C ( T, e ) = exp T e ( t ) represents the number of fictitiousboxes covering the set Σ Ω in a compact fashion, when t = T . This optimisation excludes some non-significantpoints that make the expansion rate of fictitious pointsto be much larger than it should be. In other words, werequire that e describes well the way most of the pointsspread. We consider that t used to calculate e k in Eq.(17) is the time for points initially in an ǫ -side box tospread to 0.8 L . That guarantee that nearby points inΣ Ω are expanding in both directions within the time in-terval [0 , T ]. Using 0 . L < t < . L produces alreadysimilar results. If t > . L , the set Σ Ω might not be onlyexpanding. T might be overestimated. I S has been estimated by the method in Ref. [32].Since we assume that the space Ω where mutual infor-mation is being measured is 2D, we will compare ourresults by considering in the method of Ref. [32] a 2Dspace formed by the two collected scalar signals. In themethod of Ref. [32] the phase space is partitioned inregions that contain 30 points of the continuous trajec-tory. Since that these regions do not have equal areas (asit is done to calculate I C and I lC ), in order to estimate T we need to imagine a box of sides ǫ k , such that itsarea ǫ k contains in average 30 points. The area occupied R (k Ω ) ( kb it s / s ) ( kb it s / s ) ( kb it s / s ) I S /T k I C I Cl ( kb it s / s ) (C)(D)(B)(A) AS PS
FIG. 2: [Color online] Results for experimental networks ofDouble-Scroll circuits. On the left-side upper corner pic-tograms represent how the circuits (filled circles) are bidi-rectionally coupled. I S /T k as (green online) filled circles, I C as the (red online) thick line, and I lC as the (brown online)squares, for a varying coupling resistance R . The unit of thesequantities shown in these figures is (kbits/s). (A) TopologyI, (B) Topology II, (C) topology III, and (D) Topology IV.In all figures, ˜ D increases smoothly from 1.25 to 1.95 as R varies from 0.1kΩ to 5kΩ. The line on the top of the figurerepresents the interval of resistance values responsible to in-duce almost synchronisation (AS) and phase synchronisation(PS). by the set Σ Ω is approximately given by ǫ ˜ N C , where˜ N C is the number of occupied boxes. Assuming thatthe 79980 experimental data points occupy the space Ωuniformly, then on average 30 points would occupy anarea of ǫ ˜ N C . The square root of this area is theside of the imaginary box that would occupy 30 points.So, ǫ k = q ˜ N C ǫ . Then, in the following, the “ex-act” value of the MIR will be considered to be given by I S /T k , where T k is estimated by T k = − e log ( ǫ k ).The three main characteristics of the curves for thequantities I S /T k , I C , and I lC (appearing in Fig. 2) withrespect to the coupling strength are that (i) as the cou-pling resistance becomes smaller, the coupling strengthconnecting the circuits becomes larger, and the levelof synchronisation increases followed by an increase in I S /T k , I C , and I lC , (ii) all curves are close, (iii) and asexpected, for most of the resistance values, I C > I S /T k and I lC ≤ I S /T k . The two main synchronous phenomenaappearing in these networks are almost synchronisation(AS) [33], when the circuits are almost completely syn-chronous, and phase synchronisation (PS) [34]. For thecircuits considered in Fig. 2, AS appears for the interval R ∈ [0 ,
3] and PS appears for the interval R ∈ [3 , . C. MIR and its upper bound in stochastic systems
To analytically demonstrate that the quantities I C and I S /T can be well calculated in stochastic systems, weconsider the following stochastic dynamical toy model il-lustrated in Fig. 3. In it points within a small box ofsides ǫ (represented by the filled square in Fig. 3(A))located in the centre of the subspace Ω are mapped afterone iteration of the dynamics to 12 other neighbouringboxes. Some points remain in the initial box. The pointsthat leave the initial box go to 4 boxes along the diag-onal line and 8 boxes off-diagonal along the transversedirection. Boxes along the diagonal are represented bythe filled squares in Fig. 3(B) and off-diagonal boxes byfilled circles. At the second iteration, the points occupyother neighbouring boxes, as illustrated in Fig. 3(C),and at the time n = T the points do not spread anylonger, but are somehow reinjected inside the region ofthe attractor. We consider that this system is completelystochastic, in the sense that no one can precisely deter-mine the location of where an initial condition will bemapped. The only information is that points inside asmaller region are mapped to a larger region.At the iteration n , there will be N d = 2 n + 1 boxesoccupied along the diagonal (filled squares in Fig. 3)and N t = 2 nN d − C (˜ n ) (filled circles in Fig. 3) boxesoccupied off-diagonal (along the transverse direction),where C (˜ n ) = 0 for ˜ n =0, and C (˜ n ) > n ≥ n = n − T − α . α is a small number of iterations rep-resenting the time difference between the time T for thepoints in the diagonal to reach the boundary of the spaceΩ and the time for the points in the off-diagonal to reachthis boundary. The border effect can be ignored whenthe expansion along the diagonal direction is much fasterthan along the transverse direction.At the iteration n , there will be N C = 2 n + 1 +(2 n + 1)2 n − C (˜ n ) boxes occupied by points. In thefollowing calculations we consider that N C ∼ = 2 n (1 +2 n ). We assume that the subspace Ω is a square whosesides have length 1, and that Σ ∈ Ω, so L = √
2. For n > T , the attractor does not grow any longer along theoff-diagonal direction. The time n = T , for the points tospread over the attractor Σ, can be calculated by the timeit takes for points to visit all the boxes along the diagonal.Thus, we need to satisfy N d ǫ √ √
2. Ignoring the1 appearing in the expression for N d due to the initialbox in the estimation for the value of T , we arrive that T > log (1 /ǫ )log (2) −
1. This stochastic system is discrete. Inorder to take into consideration the initial box in thecalculation of T , we pick the first integer that is largerthan log (1 /ǫ )log (2) −
1, leading T to be the largest integer thatsatisfies T < − log ( ǫ )log (2) . (19) FIG. 3: (A) A small box representing a set of initial con-ditions. After one iteration of the system, the points thatleave the initial box in (A) go to 4 boxes along the diago-nal line [filled squares in (B)] and 8 boxes off-diagonal (alongthe transverse direction) [filled circles in (B)]. At the seconditeration, the points occupy other neighbouring boxes as illus-trated in (C) and after an interval of time n = T the pointsdo not spread any longer (D). The largest Lyapunov exponent or the order-1 expan-sion rate of this stochastic toy model can be calculatedby N d ( n ) exp λ = N d ( n + 1), which take us to λ = log (2) . (20)Therefore, Eq. (19) can be rewritten as T = − log ( ǫ ) λ .The quantity D can be calculated by D = log ( N C )log ( N ) ,with n = T . Neglecting C (˜ n ) and the 1 appearing in N C due to the initial box, we have that N C ∼ = 2 T [1 +2 T ]. Substituting in the definition of D , we obtain D = (1+ T ) log (2)+log (1+2 T ) − log ( ǫ ) . Using T from Eq. (19), we arriveat D = 1 + r, (21)where r = − log (2)log ( ǫ ) − log (1 + 2 T )log ( ǫ ) (22)Placing D and λ in I C = λ (2 − D ), give us I C = log (2)(1 − r ) . (23)Let us now calculate I S /T . Ignoring the border effect,and assuming that the expansion of points is uniform,then P ( i, j ) = 1 /N C and P ( i ) = P ( j ) = 1 /N = ǫ . Atthe iteration n = T , we have that I S = − ǫ ) − log ( N C ). Since N C ∼ = 2 T [1 + 2 T ], we can write that I S = − ǫ ) − (1 + T ) log (2) − log (1 + 2 T ). Placing T from Eq. (19) into I S takes us to I S = − log (2) − log ( ǫ ) − log (1 + 2 T ). Finally, dividing I S by T , we arrive that I S T = log (2) (cid:20) ǫ ) + log (1 + 2 T )log ( ǫ ) (cid:21) = log (2)(1 − r ) . (24)As expected from the way we have constructed thismodel, Eq. (24) and (23) are equal and I C = I S T .Had we included the border effect in the calculationof I C , denote the value by I bC , we would have typicallyobtained that I bC ≥ I C , since λ calculated consideringa finite space Ω would be either smaller or equal thanthe value obtained by neglecting the border effect. Hadwe included the border effect in the calculation of I S /T ,denote the value by I bS /T , typically we would expect thatthe probabilities P ( i, j ) would not be constant. That isbecause the points that leave the subspace Ω would berandomly reinjected back to Ω. We would conclude that I bS /T ≤ I S /T . Therefore, had we included the bordereffect, we would have obtained that I bC ≥ I bS /T .The way we have constructed this stochastic toy modelresults in D ∼ = 1. This is because the spreading ofpoints along the diagonal direction is much faster thanthe spreading of points along the off-diagonal transversedirection. In other words, the second largest Lyapunovexponent, λ , is close to zero. Stochastic toy mod-els which produce larger λ , one could consider thatthe spreading along the transverse direction is given by N t = N d αn − C (˜ n ), with α ∈ [0 , D. Expansion rates for noisy data with fewsampling points
In terms of the order-1 expansion rate, e , our quan-tities read I C = e (2 − D ), T = e log (cid:2) ǫ (cid:3) , and I lC = e (2 − ˜ D ). In order to show that our expansion rate canbe used to calculate these quantities, we consider thatthe experimental system is uni-dimensional and has aconstant probability measure. Additive noise is assumedto be bounded with maximal amplitude η , and havingconstant density.Our order-1 expansion rate is defined as e ( t ) = 1 / ˜ N C ˜ N C X i =1 t log [ L i ( t )] . (25)where L i ( t ) measures the largest growth rate of nearbypoints. Since all it matters is the largest distance betweenpoints, it can be estimated even when the experimentaldata set has very few data points. Since, in this exam-ple, we consider that the experimental noisy points haveconstant uniform probability distribution, e ( t ) can becalculated by e ( t ) = 1 t log (cid:20) ∆ + 2 ηδ + 2 η (cid:21) . (26)where δ + 2 η represents the largest distance betweenpair of experimental noisy points in an ǫ -square box and∆+2 η represents the largest distance between pair of thepoints that were initially in the ǫ -square box but havespread out for an interval of time t . The experimentalsystem (without noise) is responsible to make points thatare at most δ apart from each other to spread to at mostto ∆ apart from each other. This points spread out expo-nentially fast according to the largest positive Lyapunovexponent λ by ∆ = δ exp λ t . (27)Substituting Eq. (27) in (26), and expanding log tofirst order, we obtain that e = λ , and therefore, ourexpansion rate can be used to estimate Lyapunov expo-nents. V. SUPPLEMENTARY INFORMATIONA. Decay of correlation and First Poincar´e Returns
As rigorously shown in [40], the decay with time of thecorrelation, C ( t ), is proportional to the decay with timeof the density of the first Poincar´e recurrences, ρ ( t, ǫ ),which measures the probability with which a trajectoryreturns to an ǫ -interval after t iterations. Therefore, if ρ ( t, ǫ ) decays with t , for example exponentially fast, C ( t )will decay with t exponentially fast, as well. The re-lationship between C ( t ) and ρ ( t ) can be simply under-stood in chaotic systems with one expanding direction(one positive Lyapunov exponent). As shown in [41], the“local” decay of correlation (measured in the ǫ -interval)is given by C ( t, ǫ ) ≤ µ ( ǫ ) ρ ( t, ǫ ) − µ ( ǫ ) , where µ ( ǫ ) is theprobability measure of a chaotic trajectory to visit the ǫ -interval. Consider the shift map x n +1 = 2 x n , mod 1.For this map, µ ( ǫ ) = ǫ and there are an infinite numberof possible intervals that makes C ( t, ǫ ) = 0, for a finite t . These intervals are the cells of a Markov partition.As recently demonstrated by [P. Pinto, I. Labouriau,M. S. Baptista], in piecewise-linear systems as the shiftmap, if ǫ is a cell in an order- t Markov partition and ρ ( t, ǫ ) >
0, then ρ ( t, ǫ ) = 2 − t and by the way a Markovpartition is constructed we have that ǫ = 2 − t . Sincethat ǫ = µ ( ǫ ) = 2 − t , we arrive at that C ( t, ǫ ) ≤ t . Notice that ǫ = 2 − t can berewritten as − ln ( ǫ ) = t ln (2). Since for this map, thelargest Lyapunov exponent is equal to λ = ln (2), then t = − λ ln ( ǫ ), which is exactly equal to the quantity T ,the time interval responsible to make the system to loseits memory from the initial condition and that can becalculated by the time that makes points inside an initial ǫ -interval to spread over the whole phase space, in thiscase [0 , B. I C , and I lC in larger networks andhigher-dimensional subspaces Σ Ω Imagine a network formed by K coupled oscillators.Uncoupled, each oscillator possesses a certain amount ofpositive Lyapunov exponents, one zero, and the othersare negative. Each oscillator has dimension d . Assumethat the only information available from the network aretwo Q dimensional measurements, or a scalar signal thatis reconstructed to a Q -dimensional embedding space.So, the subspace Σ Ω has dimension 2 Q , and each sub-space of a node (or group of nodes) has dimension Q .To be consistent with our previous equations, we assumethat we measure M Ω = 2 Q positive Lyapunov exponentson the projection Σ Ω . If M Ω = 2 Q , then in the follow-ing equations 2 Q should be replaced by M Ω , naturallyassuming that M Ω ≤ Q .In analogy with the derivation of I C and I lC in a bidi-mensional projection, we assume that if the spreading ofinitial conditions is uniform in the subspace Ω. Then, P ( i ) = N Q represents the probability of finding trajec-tory points in Q -dimensional space of one node (or agroup of nodes) and P ( i, j ) = N C represents the proba-bilities of finding trajectory points in the 2 Q -dimensionalcomposed subspace constructed by two nodes (or twogroups of nodes) in the subspace Ω. Additionally, weconsider that the hypothetical number of occupied boxes N C will be given by N C ( T ) = exp T ( P Qi =1 λ i ) . Then, wehave that T = 1 /λ log ( N ), which lead us to I C = λ (2 Q − D ) . (28)Similarly to the way we have derived I lC in a bidimen-sional projection, if Σ Ω has more than 2 positive Lya-punov exponents, then I lC = λ (2 Q − ˜ D ) . (29)To write Eq. (28) in terms of the positive Lyapunovexponents, we first extend the calculation of the quantity D to higher-dimensional subspaces that have dimension-ality 2Q, D = 1 + Q X i =2 λ i λ , (30)where λ ≥ λ ≥ λ . . . ≥ λ Q are the Lyapunov expo-nents measured on the subspace Ω. To derive this equa-tion we only consider that the hypothetical number ofoccupied boxes N C is given by N C ( T ) = exp T ( P Qi =2 λ i ) .We then substitute D as a function of these exponents(Eq. (30)) in Eq. (28). We arrive at I C = (2 Q − λ − Q X i =2 λ i . (31)0 C. I C as a function of the positive Lyapunovexponents of the network Consider a network whose attractor Σ possesses M positive Lyapunov exponents, denoted by ˜ λ i , i =1 , . . . , M . For a typical subspace Ω, λ measured on Ω isequal to the largest Lyapunov exponent of the network.Just for the sake of simplicity, assume that the nodes inthe network are sufficiently well connected so that in atypical measurement with a finite number of observationsthis property holds, i.e., ˜ λ = λ . But, if measurementsprovide that ˜ λ >> λ , the next arguments apply as well,if one replaces ˜ λ appearing in the further calculations bythe smallest Lyapunov exponent, say, ˜ λ k , of the networkthat is still larger than λ , and then, substitute ˜ λ by˜ λ k +1 , and so on. As before, consider that M Ω = 2 Q .Then, for an arbitrary subspace Ω, P Qi =2 λ i ≤ P Qi =2 ˜ λ i ,since a projection cannot make the Lyapunov exponentslarger, but only smaller or equal.Defining ˜ I C = (2 Q − λ − Q X i =2 ˜ λ i . (32)Since P Qi =2 λ i ≤ P Qi =2 ˜ λ i , it is easy to see that˜ I C ≤ I C . (33)So, I C , measured on the subspace Σ Ω and a function ofthe 2 Q largest positive Lyapunov exponents measured inΣ Ω , is an upper bound for ˜ I C , a quantity defined by the2 Q largest positive Lyapunov exponents of the attractorΣ of the network. Therefore, if the Lyapunov exponentsof a network are know, the quantity ˜ I C can be used asa way to estimate how much is the MIR between twomeasurements of this network, measurements that formthe subspace Ω.Notice that I C depends on the projection chosen (thesubspace Ω) and on its dimension, whereas ˜ I C dependson the dimension of the subspace Σ Ω (the number 2Qof positive Lyapunov exponents). The same happens forthe mutual information between random variables thatdepend on the projection considered.Equation (32) is important because it allows us to ob-tain an estimation for the value of I C analytically. As anexample, imagine the following network of coupled mapswith a constant Jacobian X ( i ) n +1 = 2 X ( i ) n + σ K X j =1 A ij ( X ( j ) n − X ( i ) n ) , mod 1 , (34)where X ∈ [0 ,
1] and A represents the connecting adja-cent matrix. If node j connects to node i , then A ij = 1,and 0 otherwise.Assume that the nodes are connected all-to-all. Then,the K positive Lyapunov exponents of this network are: ˜ λ = log (2) and ˜ λ i = log 2[1 + σ ], with i = 2 , K . Assumealso that the subspace Ω has dimension 2 Q and that 2 Q positive Lyapunov exponents are observed in this spaceand that ˜ λ = λ . Substituting these Lyapunov expo-nents in Eq. (32), we arrive at˜ I C = (2 Q −
1) log (1 + σ ) . (35)We conclude that there are two ways for ˜ I C to increase.Either one considers larger measurable subspaces Ω orone increases the coupling between the nodes. This sug-gests that the larger the coupling strength is the moreinformation is exchanged between groups of nodes.For arbitrary topologies, one can also derive analyticalformulas for ˜ I C in this network, since ˜ λ i for i > λ [42]. One arrives at˜ λ i ( ω i σ/
2) = ˜ λ ( σ ) , (36)where ω i is the i th largest eigenvalue (in absolute value)of the Laplacian matrix L ij = A ij + I P j A ij . VI. CONCLUSIONS
Concluding, we have shown a procedure to calculatemutual information rate (MIR) between two nodes (orgroups of nodes) in dynamical networks and data setsthat are either mixing, or present fast decay of correla-tions, or have sensitivity to initial conditions, and haveproposed significant upper ( I C ) and lower ( I lC ) boundsfor it, in terms of the Lyapunov exponents, the expan-sion rates, and the capacity dimension. Since our upperbound is calculated from Lyapunov exponents or expan-sion rates, it can be used to estimate the MIR betweendata sets that have different sampling rates or experi-mental resolution (e.g. the rise of the ocean level and theaverage temperature of the Earth), or between systemspossessing a different number of events. Additionally,Lyapunov exponents can be accurately calculated evenwhen data sets are corrupted by noise of large amplitude(observational additive noise) [37, 38] or when the systemgenerating the data suffers from parameter alterations(“experimental drift”) [39]. Our bounds link information(the MIR) and the dynamical behaviour of the systembeing observed with synchronisation, since the more syn-chronous two nodes are, the smaller λ and D will be.This link can be of great help in establishing whether twonodes in a dynamical network or in a complex system notonly exchange information but also have linear or non-linear interdependences, since the approaches to measurethe level of synchronisation between two systems are rea-sonably well known and are been widely used. If variablesare synchronous in a time-lag fashion [34], it was shownin Ref. [16] that the MIR is independent of the delay be-tween the two processes. The upper bound for the MIRcould be calculated by measuring the Lyapunov expo-nents of the network (see Supplementary Information),1which are also invariant to time-delays between the vari-ables. Acknowledgments
M. S. Baptista was partially sup-ported by the Northern Research Partnership (NRP) andAlexander von Humboldt foundation. M. S. Baptista would like to thank A. Politi for discussions concerningLyapunov exponents. R.M. Rubinger, E.R. V. Juniorand J.C. Sartorelli thanks the Brazilian agencies CAPES,CNPq, FAPEMIG, and FAPESP. [1] Shannon CE (1948) Bell System Technical Journal 27:379-423.[2] Strong SP, Koberle R, de Ruyter van Steveninck RR,Bialek W (1998) Phys. Rev. Lett. 80: 197-200.[3] Sporns O, Chialvo DR, Kaiser M, Hilgetag CC (2004)Trends in Cognitive Sciences 8: 418-425.[4] Palus M, Kom´arek V, Proch´azka T, et al. (2001) IEEEEngineering in Medicice and Biology Sep/Oct: 65-71.[5] Donges JF, Zou Y, Marwan N, and Kurths J (2009) Eur.Phys. J. 174: 157-179.[6] Fraser AM and Swinney HL (1986) Phys. Rev. A 33:1134-1140.[7] Kantz H and Schreiber T (2004) Nonlinear Time SeriesAnalysis, Cambridge University Press.[8] Parlitz U (1998) Nonlinear Time-Series Analysis, inNonlinear Modelling - Advanced Black-Box techniques,Kluwer Academic Publishers.[9] Haykin S (2001) Communication Systems, John Wiley &Sons.[10] Rossi F, Lendasse A, Fran¸cois D, Wertz V, and Verley-sen M (2006) Chemometrics and Intellingent LaboratorySystems, 80: 215-226.[11] Paninski L (2003) Neural Computation 15: 1191-1253.[12] Steuer R, Kurths J, Daub CO, et al. (2002) Bioinformat-ics 18: S231-S240.[13] Papana A, Kugiumtzis D, and Larsson PG (2009) Int. J.Bifurcation and Chaos 19: 4197-4215.[14] Baptista MS and Kurths J (2008) Phys. Rev. E 77:026205-1-026205-13.[15] Baptista MS, de Carvalho JX, Hussein MS (2008) PloSONE 3: e3479.[16] Blanc JL, Pezard L, and Lesne A (2011) Phys. Rev. E84: 036214-1-036214-9.[17] Dobrushin RL (1959) Usp. Mat. Nauk. 14: 3-104; transl:Amer. Math. Soc. Translations, series 2 33: 323-438.[18] Gray RM and Kieffer JC (1980) IEEE Transations onInformation theory IT-26: 412-421.[19] Verd´u S (1994) IEEE Trans. Information Theory, 40,1147-1157.[20] Kolmogorov AN (1959) Dokl. Akad. Nauk SSSR 119:861-864; 124: 754-755. [21] Ruelle D (1978) Bol. Soc. Bras. Mat. 9: 83-87.[22] Pesin YaB (1977) Russ. Math. Surveys 32: 55-114.[23] Ledrappier F and Strelcyn JM (1982) Ergod. TheoryDyn. Syst. 2: 203-219.[24] Gao JB (1999) Phys. Rev. Lett. 83: 3178-3181.[25] Baptista MS, Eulalie N, Pinto PRF, et al.et al.