Fingerprints of data compression in EEG sequences
FFingerprints of data compression in EEGsequences (cid:63) . Fernando Araujo Najman − − − , AntonioGalves − − − X ] , and Claudia D. Vargas − − − Instituto de Matem´atica e Estat´ıstica, Universidade de S˜ao Paulo, Brazil [email protected] Instituto de Biof´ısica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro,Brazil
Abstract.
It has been classically conjectured that the brain compressesdata by assigning probabilistic models to sequences of stimuli. An im-portant issue associated to this conjecture is what class of models is usedby the brain to perform its compression task. We address this issue byintroducing a new statistical model selection procedure aiming to studythe manner by which the brain performs data compression. Our proce-dure uses context tree models to represent sequences of stimuli and anew projective method for clustering EEG segments. The starting pointis an experimental protocol in which EEG data is recorded while a par-ticipant is exposed to auditory stimuli generated by a stochastic chain.A simulation study using sequences of stimuli generated by two differ-ent context tree models with EEG segments generated by two distinctalgorithms concludes this article.
Keywords: context tree models · statistical model selection · EEG dataanalysis · clustering algorithm for functional data The conjecture that the brain compresses data by assigning probabilistic mod-els to sequences of stimuli can be traced as early as the nineteenth century [5](see for instance [8], [11], [2], [7]). Recently [3] addressed this conjecture usingthe following combined probabilistic and experimental framework. EEG data is (cid:63)
This work is part of FAPESP project ’Research, Innovation and Dissemination Cen-ter for Neuromathematics’ (FAPESP grant 2013/07699-0), University of So Pauloproject ’Mathematics, computation, language and the brain’, project ’Plasticityin the brain after a brachial plexus lesion’ (FAPERJ grant E26/010002474/2016)and project ’PROINFRA HOSPITALAR’ (FINEP grant 18.569-8). Authors A.Gand C.V. are partially supported by CNPq fellowships (grants 311 719/2016-3 and 309560/2017-9, respectively). Author C.D.V. is also partially supportedby a FAPERJ fellowship (CNE grant E-26/202.785/2018). F.A.N. (grant number88882.377124/2019-01) is supported by a PhD fellowship from the Brazilian Coor-denao de Aperfeioamento de Pessoal de Nvel Superior - CAPES a r X i v : . [ q - b i o . N C ] A ug F.A. Najman et al. recorded while a participant is exposed to a sequence of auditory stimuli gen-erated by a probabilistic algorithm. The question is whether his/her brain isable to identify statistical regularities in the sequence of stimuli and use themto compress the conveyed information. In this study, context tree models areused as a framework to represent the brain compression mechanism. This is anatural choice as any stationary stochastic chain of symbols with finite memorycan be seen as a context tree model [10]. The rationale behind this approach isthe following. Assume that the brain is able to identify the context at each stepof the stimuli sequence and that this identification is expressed trough an EEGactivity which differs from one context to another. The specificity of the EEGcorresponding to each context means that they are independent realizations ofthe same probability measure on a suitable space of functions. In other terms,each context defines a different probability measure on the set of trajectorieswhich can be realized by an EEG. If this is the case, encoding the sequence ofEEG segments by symbols corresponding to the different probability measuresproduces a compressed stochastic chain conveying essentially the same informa-tion.To address this issue, [3] proposes collecting together all EEG signals recordedafter the last stimulus of any sequence of three consecutive acoustic stimuli.Then, using the projective method [1], together with a suitable variant of theContext Algorithm [10], the equality in law of the EEG segments recorded aftersequences ending with a common suffix is checked through a statistical test.If the test supports the null assumption of equality, the last element of thesequences with the same suffix is pruned. This procedure is repeated until thetest rejects the null assumption. In conclusion, the tree of sequences obtainedby the pruning procedure can be compared with the context tree generating thesequence of stimuli.Applying this methodology to experimental data, [6] found that the retrievedtrees coincided with those generating the sequences in the pre-frontal cortex. Amajor drawback of this approach is the fact that the pruning procedure alwaysproduces a tree, since only the laws of the EEG segments recorded after sequenceswith a common suffix are compared. Therefore, the question about the class ofmodels used by the brain to compress data remains open.To make a step forward, let us have a closer look at the structure of thesequences of stimuli considered in [3]. These sequences assume values in A = { , , } , where each symbol represents a distinct auditory stimulus. One of thesequences considered is the following. We start with the deterministic sequence . . . . . . . Then for each symbol 1, we decide to either keep it with probability (1 − (cid:15) ) orto replace it by a 0 with probability (cid:15) , where (cid:15) ∈ [0 , /
2] is a fixed parameter.This choice is made independently at each symbol 1. Let ( X , X , . . . ) be theresulting stochastic chain.This stochastic chain can be generated step by step by an algorithm usingonly information from the past. To generate X n , we first look to the last symbol X n − . ingerprints of data compression in EEG sequences. 3 – If X n − = 2, then X n = (cid:26) , with probability 1 − (cid:15), , with probability (cid:15). – If X n − = 1 or X n − = 0, then we need to go back one more step, (cid:5) if X n − = 2, then X n = (cid:26) , with probability 1 − (cid:15), , with probability (cid:15) ; (cid:5) if X n − = 1 or X n − = 0, then X n = 2 with probability 1.The algorithm described above is characterized by two elements: – a partition τ of the set of all possible sequences of past units; – a family p of transition probabilities indexed by the elements of τ .The partition τ described above is given by τ = { , , , , , , } . The pair ( τ, p ) having as first element a context tree and a second element afamily of transition probabilities indexed by the contexts in τ is called a proba-bilistic context tree . A stochastic chain ( X n ) n ≥ taking values in a finite alpha-bet A and generated by a probabilistic context tree ( τ, p ), as described above, iscalled is a context tree model .The second sequence of stimuli considered in [3] is also a context tree modelhaving τ = { , , , , , , , } . The pair ( τ, p ) corresponding to this context tree model is presented in Figure1. Following [10], we call context any element of the partition τ . Observe thatthe partition τ can be represented by a rooted and labeled tree . Figure 1 rep-resents the context tree and the corresponding family of transition probabilitiesdescribed above. F.A. Najman et al.00 10 20 01 11 21 2 (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) Fig. 1.
Context tree and family of probability measures corresponding to the modelwith context tree τ = { , , , , , , } described in the example of section 1.000 100 20010 20 01 21 2 (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) Fig. 2.
Context tree and family of probability measures corresponding to the modelwith context tree τ = { , , , , , , , } which is used in [3] Given a context tree model, for each, n ≥
0, call C n the only context in τ which is a suffix of the sequence ( . . . , X n − , X n − , X n ). The examples consideredin [3] both have the following property: for each n ≥
0, the context C n is a suffixof the sequence obtained by the concatenation of the previous context C n − and the next symbol X n . This implies that the sequence ( C n ) n ≥ is a Markovchain of order 1 taking values in the context tree τ . This property suggests analternative procedure to treat the EEG data recorded using the experimentalprotocol employed in [3]. ingerprints of data compression in EEG sequences. 5 τ We start by introducing a general framework. Let ( X n ) n ≥ be a stochastic chainof stimuli generated by a probabilistic context tree ( τ, p ) as in the exampledescribed above. Let also Y n ∈ L ([0 , T ]) be the EEG segment recorded at afixed electrode while a participant is exposed to stimuli X n , where T is thedistance in time between two successive auditory stimuli onsets.Let us make the following assumptions.1. The context tree ( τ, p ) is such that, for each n ≥ C n is a suffix of thesequence C n − concatenated with X n .2. The law of the EEG segment Y n is a function of the context C n . We denotethis law as Q C n .3. If w and w (cid:48) are contexts belonging to τ , and Q w and Q w (cid:48) are the probabilitymeasures on L ([0 , T ]) associated to w and w (cid:48) respectively, then Q w = Q w (cid:48) if and only if w = w (cid:48) .Let Q τ be the set of probability measures { Q w : w ∈ τ } . By the third assumptionabove, there is a one to one correspondence between Q τ and τ . Ordering thecontexts belonging to τ using the lexicographic order, we can represent the setof contexts τ by the set of positive integers { , . . . , | τ |} . Let I ∗ n be the uniqueelement in the set { , . . . , | τ |} corresponding to the law Q C n . Theorem 1.
Under the assumptions presented above, the sequence ( I ∗ n ) n ≥ is aMarkov chain of order 1, taking values in the set { , . . . , | τ |} . If the sequence of stimuli and the corresponding sequence of EEG segmentssatisfy the assumptions presented above, then Theorem 1 suggests a new way tolook at the sequence ( Y n ) n ≥ produced under the experimental protocol of [3].This is the content of the next section. Take a positive integer l ≥ h ( τ ), where h ( τ ) is the maximal length of the contextsbelonging to τ . Given a sample (( X , Y ) , . . . , ( X N , Y N )) generated as describedabove, for all u ∈ A l let Y u = { Y n : n = l − , . . . , n, ( X n − l +1 , . . . , X n ) = u } .If the EEG segments collected in Y u have all been generated by the sameprobability measure Q u on L ([0 , T ]), then the sample Y u can be used to approx-imate Q u , and the approximation improves as the the length N of the samplediverges. Therefore, the closeness in a suitable distance between two sets Y u and Y u (cid:48) , for different strings u and u (cid:48) , should give an indication of the closeness be-tween the probability measures Q u and Q u (cid:48) . We now implement this idea usingthe projective method introduced in [1].We start by generating a realization B = ( B ( t ) : t ∈ [0 , T ]) of the BrownianBridge. Then we project all the EEG segments ( Y , . . . , Y N ) using this fixed real-ization of the Brownian Bridge. This is done as follows. For every n = 0 , . . . , N , F.A. Najman et al. the projection of Y n in the direction B is obtained by the internal product inthe Hilbert space L ([0 , T ]) R n,B = (cid:90) T B ( t ) Y n ( t ) dt . For each u ∈ A l , the projection in the direction B of the set Y u is naturallydefined as Y uB = { R n,B : Y n ∈ Y u } Let F uB be the empirical distribution obtained from the sample of real numbers Y uB F uB ( t ) = 1 |Y uB | (cid:88) y uB ∈Y uB { y uB ≤ t } , t ∈ R Then, for each pair of sequences u and v in A l , we define the renormalizedKolmogorov-Smirnov distance between the empirical measures obtained fromthe samples Y uB and Y vB KS ( Y uB , Y vB ) = (cid:115) |Y uB ||Y vB ||Y uB | + |Y vB | sup t ∈ R {| (cid:90) t −∞ [ F uB ( x ) − F vB ( x )] dx |} . Let B , . . . , B M be independent copies of the Brownian Bridge. We definethe distance between Y u and Y v as D M ( Y u , Y v ) = 1 M M (cid:88) i =1 KS ( Y uB i , Y vB i )For details on the projective method we refer the reader to [3].For k = 2 , . . . , | A | l we want to partition the set {Y u : u ∈ A l } as follows. Westart by choosing arbitrarily k different sequences u , . . . , u k such that, for all i and j with i (cid:54) = j , u i (cid:54) = u j . These sequences will be used as medoids of the firstcandidate partition defined as follows. P u j is the set of all v ∈ A l such that D M ( Y u j , Y v ) = min { D M ( Y u i , Y v ) : i = 1 , . . . , k } . Now we iterate the procedure. For each all v ∈ P u j and j = 1 , . . . , k , we define u j = arg min { (cid:88) v (cid:54) = v (cid:48) D M ( Y v , Y v (cid:48) ) } , and then define P u j is the set of all v ∈ A l such that D M ( Y u j , Y v ) = min { D M ( Y u i , Y v ) : i = 1 , . . . , k } . The partition C k = {C k , . . . , C kk } is the limit of the above described procedure.For more details on the algorithm we refer the reader to [9].We can now use the partition C k to encode the sequence of EEG segments( Y , . . . , Y N ) as follows. Let I kn be the index of the cluster containing the EEGsegment Y n . When k = | τ | , Theorem 1 predicts that the sequence ( I kn ) n ≥ isa Markov chain with memory of order 1. In the next section we address thisprediction using a simulation study. ingerprints of data compression in EEG sequences. 7 To discuss the performance of the proposed method we conducted a simula-tion study following the assumptions in sections 2 and 3, using different val-ues for the length N = 100 , ,
900 of the sequence of the simulated sample(( X , Y ) , . . . , ( X N , Y N )).The sequence of stimuli were generated by the two context tree models pre-sented in Section 1, henceforth called ( τ , p ) and ( τ , p ) respectively, with (cid:15) = 0 . Av-erage Model Algorithm , can be described as follows.1. For each context w ∈ τ , choose independently and uniformly 40 real numbers s w = { s w , . . . , s w } in the interval [0 , s wi φ wi ( t ) = sin( t
250 2 πs wi ) .
3. For each context, compute the average vector S w ( t ) = 140 (cid:88) i =1 φ wi ( t )4. For n = h ( τ ) − , . . . , N simulate the EEG segment as follows Y n ( t ) = S C n ( t ) + ξ ( t )where C n is the context present at step n , and ξ ( s ) : s = 1 , . . .
250 are inde-pendent random variables with normal distribution of mean 0 and variance0 . { S w : w ∈ τ } .The second algorithm used to generate the EEG vector, henceforth called Auto-Regressive Algorithm , can be described as follows. – For each context w ∈ τ , choose independently and uniformly 2 real numbers θ w = { θ w , θ w } in the interval [ − , – For all n = 1 , . . . , N , let Y n (1) = 0. – For each n = h ( τ ) , . . . , N and for each t = 1 , . . . , N simulate the EEGsegment as follows Y n ( t ) = θ C n + θ C n Y n ( t −
1) + ξ t , F.A. Najman et al.
Fig. 3.
The upper image presents one of the random averages S w for a fixed w ∈ τ used in the Average Model Algorithm EEG simulation algorithm. The lower imagepresents all the random averages S w : w ∈ τ used in the same simulation. where C n is the context present at step n , and ξ ( s ) : s = 1 , . . .
250 are inde-pendent random variables with normal distribution of mean 0 and variance0 . N = 300 , , N . Each simulation was made using a new set of random parameters chosenindependently.The clustering of the data was done with the procedure described in Section 3.For each sample we used M = 5000 independently generated Brownian Bridgesto perform the projections required to define D M .Let ( I n ) n ≥ h ( τ ) and ( I n ) n ≥ h ( τ ) be the index sequences obtained by themethod described above for τ = τ or τ = τ , respectively. To identify theorder of the chains ( I n ) n ≥ h ( τ ) and ( I n ) n ≥ h ( τ ) we used a slightly modified ver-sion of the statistical model selection procedure SMC introduced in [4]. Allthe codes used for the simulations, including the version of the SMC proce- ingerprints of data compression in EEG sequences. 9 Fig. 4.
The upper image presents one of the random averages S w for a fixed w ∈ τ used in the Average Model Algorithm EEG simulation algorithm. The lower imagepresents all the random averages S w : w ∈ τ used in the same simulation. dure we implemented are available at ’ https://github.com/FernandoNajman/Fingerprints-of-data-compression-in-EEG-sequences ’. As predicted by Theorem 1, we observed that for the majority of simulatedsamples generated by the two probabilistic context trees ( τ , p ) and ( τ , p ) andthe two EEG generator algorithms, the SMC procedure identified the retrievedsequences of cluster indexes ( I n : n = h ( τ ) , . . . , N ) and ( I n : n = h ( τ ) , . . . , N )as Markov chains of order 1. These results are summarized in Tables 1, 2, 3 and4. The results obtained from the simulations employing the two probabilisticcontext trees and the two EEG generators are very similar.Finally, in the simulation study it appeared that even with the smallest valueof N = 300, we have around 80% of the obtained sequences of indexes identifiedas order 1 Markov chains by the SMC procedure.Furthermore, as expected, for each probabilistic context tree and each EEGgenerator algorithm, the number of simulations in which the SMC procedure identified the sequences of cluster indexes as Markov chains of order 1 increases,when the length N of the sample increases. This was expected as a consequenceof the consistency of the SMC procedure [4]. This is also an indication of theaccuracy of the method introduced in the present article to identify the differentprobability measures generating the EEG segments.In conclusion this article presented a new statistical model selection proce-dure aiming to identify the manner by which the brain performs data compres-sion. It also introduced a new method for clustering functional data which canfind use beyond the original neurobiological motivation.Table 1: Simulation results. For each N = 300 , ,
900 one hundred samplesof length N were simulated with sequences of stimuli generated by ( τ , p ) andthe corresponding EEG segments were produced using the Average Model Algo-rithm. N (1), N (2), N (3) and N (4) indicate the number of times the statisticalselection procedure SMC assigned memory 1, 2, 3 and 4, respectively, to thesimulated samples. N (1) N (2) N (3) N (4)N = 300 83 9 7 1N = 600 91 8 1 0N = 900 100 0 0 0 Table 2: Simulation results. For each N = 300 , , N were simulated with sequences of stimuli generated by ( τ , p ) and thecorresponding EEG segments were produced using the Auto-Regressive Algo-rithm. N (1), N (2), N (3) and N (4) indicate the number of times the statisticalselection procedure SMC assigned memory 1, 2, 3 and 4, respectively, to thesimulated samples. N (1) N (2) N (3) N (4)N = 300 85 12 1 2N = 600 93 4 3 0N = 900 96 4 0 0ingerprints of data compression in EEG sequences. 11 Table 3: Simulation results. For each N = 300 , , N were simulated with sequences of stimuli generated by ( τ , p ) andthe corresponding EEG segments were produced using the Average Model Algo-rithm. N (1), N (2), N (3) and N (4) indicate the number of times the statisticalselection procedure SMC assigned memory 1, 2, 3 and 4, respectively, to thesimulated samples. N (1) N (2) N (3) N (4)N = 300 77 13 8 2N = 600 87 11 0 2N = 900 93 7 0 0 Table 4: Simulation results. For each N = 300 , , N were simulated with sequences of stimuli generated by ( τ , p ) and thecorresponding EEG segments were produced using the Auto-Regressive Algo-rithm. N (1), N (2), N (3) and N (4) indicate the number of times the statisticalselection procedure SMC assigned memory 1, 2, 3 and 4, respectively, to thesimulated samples. N (1) N (2) N (3) N (4)N = 300 87 11 1 1N = 600 93 6 1 0N = 900 94 6 0 0 References
1. Cuesta-Albertos, J.A., Fraiman, R., Ransford, T.: Random projec-tions and goodness-of-fit tests in infinite-dimensional spaces. Bulletin ofthe Brazilian Mathematical Society, New Series (4), 477–501 (2006).https://doi.org/10.1007/s00574-006-0023-02. Doya, K., Ishii, S., Pouget, A., Rao, R.P.: Bayesian brain: Probabilistic approachesto neural coding. MIT press (2007)3. Duarte, A., Fraiman, R., Galves, A., Ost, G., Vargas, C.D.: Retrievinga Context Tree from EEG Data. Mathematics (5), 427 (May 2019).https://doi.org/10.3390/math7050427,
4. Galves, A., Galves, C., Garcia, J.E., Garcia, N.L., Leonardi, F., et al.: Context treeselection and linguistic rhythm retrieval from written texts. The Annals of AppliedStatistics (1), 186–209 (2012)5. von Helmholtz, H.: Handbuch der physiologischen Optik, vol. III. Leopold Voss(1867), translated by The Optical Society of America in 1924 from the third ger-mand edition, 1910, Treatise on physiological optics, Vol. III6. Hern´andez, N., Neto, R.M.d.A., Duarte, A., Ost, G., Fraiman, R., Galves, A.,Vargas, C.D.: Retrieving the structure of probabilistic sequences of auditory stimulifrom eeg data. arXiv preprint arXiv:2001.11502 (2020)2 F.A. Najman et al.7. Huang, Y., Rao, R.P.: Predictive coding. Wiley Interdisciplinary Reviews: Cogni-tive Science (5), 580–593 (2011)8. Maheu, M., Dehaene, S., Meyniel, F.: Brain signatures of a multiscale processof sequence learning in humans (Feb 2019). https://doi.org/10.7554/eLife.41541, https://elifesciences.org/articles/41541
9. Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expertsystems with applications (2), 3336–3341 (2009)10. Rissanen, J.: A universal data compression system. IEEE Transactions on Informa-tion Theory (5), 656–664 (Sep 1983). https://doi.org/10.1109/TIT.1983.105674111. Rubin, J., Ulanovsky, N., Nelken, I., Tishby, N.: The representation of predictionerror in auditory cortex. PLoS computational biology12