[PDF] Characterising Alzheimer's Disease with EEG-based Energy Landscape Analysis

Abstract

Alzheimer's disease (AD) is one of the most common neurodegenerative diseases, with around 50 million patients worldwide. Accessible and non-invasive methods of diagnosing and characterising AD are therefore urgently required. Electroencephalography (EEG) fulfils these criteria and is often used when studying AD. Several features derived from EEG were shown to predict AD with high accuracy, e.g. signal complexity and synchronisation. However, the dynamics of how the brain transitions between stable states have not been properly studied in the case of AD and EEG data. Energy landscape analysis is a method that can be used to quantify these dynamics. This work presents the first application of this method to both AD and EEG. Energy landscape assigns energy value to each possible state, i.e. pattern of activations across brain regions. The energy is inversely proportional to the probability of occurrence. By studying the features of energy landscapes of 20 AD patients and 20 healthy age-matched counterparts, significant differences were found. The dynamics of AD patients' brain networks were shown to be more constrained - with more local minima, less variation in basin size, and smaller basins. We show that energy landscapes can predict AD with high accuracy, performing significantly better than baseline models.

Full PDF

CCharacterising Alzheimer’s Disease with EEG-based Energy Landscape Analysis

Dominik Klepl, Fei He, ∗ Min Wu, Daniel J. Blackburn, Matteo De Marco, and Ptolemaios Sarrigiannis Centre for Data Science, Coventry University, Coventry CV1 2JH, UK Institute for Infocomm Research, Agency for Science,Technology and Research (A*STAR), 138632, Singapore Department of Neuroscience, University of Sheﬃeld, Sheﬃeld, S10 2HQ, UK Department of Neurophysiology, Royal Devon and Exeter NHS Foundation Trust, Exeter, EX2 5DW, UK

Alzheimer’s disease (AD) is one of the most common neurodegenerative diseases, with around 50million patients worldwide. Accessible and non-invasive methods of diagnosing and characterisingAD are therefore urgently required. Electroencephalography (EEG) fulﬁls these criteria and is oftenused when studying AD. Several features derived from EEG were shown to predict AD with highaccuracy, e.g. signal complexity and synchronisation. However, the dynamics of how the braintransitions between stable states have not been properly studied in the case of AD and EEG data.Energy landscape analysis is a method that can be used to quantify these dynamics. This workpresents the ﬁrst application of this method to both AD and EEG. Energy landscape assigns energyvalue to each possible state, i.e. pattern of activations across brain regions. The energy is inverselyproportional to the probability of occurrence. By studying the features of energy landscapes of 20 ADpatients and 20 healthy age-matched counterparts, signiﬁcant diﬀerences were found. The dynamicsof AD patients’ brain networks were shown to be more constrained - with more local minima, lessvariation in basin size, and smaller basins. We show that energy landscapes can predict AD withhigh accuracy, performing signiﬁcantly better than baseline models.

I. INTRODUCTION

Alzheimer’s Disease (AD) is a neurodegenerative dis-order causing neuronal cell death that leads to dementia,and AD accounts for 70% of dementia cases. With nearly50 million patients, it is the most common neurodegen-erative disorder in the world [1]. Although there is nocure for AD, an early and precise diagnosis can be cru-cial to prevent or delay the progression of dementia andthus to improve the quality of life of AD patients [2] . Asseveral new treatments for AD are undergoing evaluationin clinical trials, sensitive, non-invasive and reproduciblebiomarkers of brain function are urgently required i) toidentify and recruit patients in the prodromal phase ofthe disease, ii) to be implemented as objective outcomesmeasures and iii) to monitor disease progression and po-tential response to novel treatments [3]. Since current di-agnosis mostly relies on neuroimaging scans and invasivetests, both time-consuming and expensive to perform,low-cost but precise diagnostic methods are required [2].An EEG based biomarker, that can be used as a sur-rogate endpoint to study the eﬀects of new therapeuticapproaches in AD, can become a game-changer in run-ning large pharmaceutical trials.Resting-state Electroencephalography (EEG) is awidely and commonly used method in everyday clinicalpractice, mainly to provide evidence for the diagnosis,classiﬁcation and management of patients with epilepsyand various other brain disorders (e.g. dementia). EEGis painless, economical, non-invasive, easy to adminis-ter and widely available in most hospitals [2, 4]. EEG ∗ Correspondence to: [email protected] measures the changes in electrical currents generated bylarge populations of cortical neurons. Compared to otherneuroimaging methods, EEG provides high temporal res-olution on the scale of milliseconds. Recording EEG atrest is advantageous when examining AD patients as itrequires little cooperation and is not stressful.Several characteristic EEG features of AD patientshave been documented such as slowing of signals [5–7],reduced complexity [8–10] and decreased synchronisation[4, 11, 12]. However, previous work mainly analysedindividual channels, or pairs of channels [4–7, 11, 12].In comparison, we use pairwise analysis only as a ba-sis for estimating global properties of the system (or theregion of interest), e.g. group of 10 channels. Ratherthan claim that our approach achieves better classiﬁca-tion accuracy, the focus of this study is to characteriseAD from a novel perspective – the global system, networkand information-theoretic energy viewpoint.Although neuron loss and an accumulation of proteinaggregates, the beta-amyloid plaques, are a diagnostichallmark of Alzheimer’s disease, the exact cause of neu-rodegeneration remains to be elucidated. The early lossof neocortical synapses appears also to play a key role inbrain network dysfunction and correlates well with vari-ous cognitive deﬁcits [13, 14] and altered functional con-nectivity [12, 13, 15] . Thus, AD can be viewed as anetwork disorder.Aiming to characterise brain network organisation inAD, this study models the EEG using the energy land-scape (EL). This approach allows quantifying globalcharacteristics of a dynamic complex system such as thebrain. As a powerful emerging method in neuroscience,EL conceptualises the brain signals as a network of dis-tinct states. Each state has an energy which refers tothe negative log probability that the system is in a given a r X i v : . [ q - b i o . N C ] F e b state [16]. Note that energy is an information-theoreticmeasure and has no link to the physical concept of en-ergy. Pairwise maximum entropy model (pMEM) is usedto estimate the energy of each state. Schneidman et al.[17] demonstrated the ﬁrst application of pMEM and ELto study the dynamics of neural networks on a single-cell level. More recently, EL has been applied to variousneuroimaging data such as fMRI [16, 18–20] and MEG[21]. To our knowledge, this study is the ﬁrst attempt toextend the method to EEG analysis.The EL method was mostly used to analyse fMRI data,which characteristics diﬀer from those of EEG. The spa-tial resolution of fMRI is considerably higher than EEG[22], which allowed previous fMRI work to focus on theanalysis of speciﬁc brain networks with rather accurateanatomical and functional deﬁnitions. Therefore, in pre-vious studies, the brain regions were manually selected[16–21]. However, EEG does not allow accurate mon-itoring of speciﬁc networks as it measures the activityof macro-regions. Unless a structural scan of the brainis available along with EEG data, which would allowsource localisation, it is impossible to perform the anal-ysis on the same level as with fMRI. We overcome thelow spatial resolution of EEG by adopting purely a sen-sor level data-driven approach for selecting brain regionsto analyse. We use two channel selection methods tooptimise predictive accuracy and channel activity whiletaking into account the nonlinear relationships betweenthe EEG channels (see Methods for details).The second diﬀerence between fMRI and EEG data isthe temporal resolution. While fMRI measures the neuralactivity on a scale of seconds, EEG can measure on ascale of milliseconds [22]. However, this diﬀerence doesnot seem to pose any methodological issues for adaptingEL to EEG data.It has been shown that EL constructed using pMEMis a powerful, yet relatively simple method to study net-work connectivity, outperforming all traditional connec-tivity measures [23]. It has been used to study restingstate dynamics [18], multi-stability and state transitionsduring rest [16], module dynamics [19] and juvenile my-oclonic epilepsy [21]. However, there is a lack of studiesinvestigating the EL characteristics of brain disorders,except for epilepsy. This study aims to quantify the dif-ferences in EL of AD patients compared to age-matchedhealthy participants and use these diﬀerences to auto-matically classify patients with AD. II. DATA

This study uses EEG recordings collected from 20 ADpatients and 20 healthy participants (HC) under 70. Fora detailed description of the EEG electrode conﬁgura-tion, experimental design and conﬁrmation of diagnosis,see [4]. All AD participants were recruited in the SheﬃeldTeaching Hospital memory clinic, which focuses mainlyon young-onset memory disorders. The participants were diagnosed with AD between 1 month and 2 years priorto data collection. Their diagnosis was determined usingevidence from medical history, the battery of psycholog-ical tests and neurological and neuroradiological exami-nations. All of them were in the mild to moderate stageof the disease at the time of recording with the averageMini Mental State Examination score of 20 . sd = 4).To eliminate alternative causes of dementia, high resolu-tion structural magnetic resonance imaging (MRI) scansof all patients were acquired. Age and gender matchedHC participants (neuropsychological tests and structuralMRI scans were normal) were also recruited.All EEG data were recorded using an XLTEK 128-channel headbox with Ag/AgCL electrodes placed onthe scalp and sampling frequency of 2 kHz. A modi-ﬁed 10-10 overlapping a 10-20 international system ofelectrode placement was adopted. A referential montagewith a linked earlobe reference was used. The recordingslasted 30 minutes, during which the participants were in-structed to rest and not to think about anything speciﬁc.Within the 30 minutes recording, there were two-minute-long epochs during which the participants had their eyesopen (EO) or closed (EC).All the recordings were reviewed by an experiencedneurophysiologist on the XLTEK review station withtime-locked video recordings (Optima Medical LTD). Foreach participant, 3 12 second long artefact-free EO andEC epochs were isolated. Finally, to avoid volume con-duction eﬀects related to the common reference elec-trodes, the following 23 bipolar channels were created:F8–F4, F7–F3, F4–C4, F3– C3, F4–FZ, FZ–CZ, F3–FZ,T4–C4, T3–C3, C4–CZ, C3–CZ, CZ–PZ, C4–P4, C3–P3,T4–T6, T3–T5, P4–PZ, P3–PZ, T6– O2, T5–O1, P4–O2,P3–O1 and O2–O1. A. Data preprocessing

Typically, EEG is susceptible to noise. The noise canoriginate from external sources such as power line, par-ticipant’s movements, and muscle contraction artefacts.In this study, Fourier transform ﬁlter (FTF) is used todenoise the signals. The removed frequencies were in-formed by the ﬁndings of meta-analysis on resting-stateEEG of AD patients [2]. Following frequencies were re-moved: 0-0.5 Hz relating to slow artefacts and eye-blinks,50 Hz relating to the power line noise and 100 Hz andabove, which could result from muscle movement. As aresult, the cleaned signals are within 0.5 and 100 Hz.Then, we separate the EEG signals into frequencybands (further as bands) to analyse the data in ﬁner de-tail. We create 6 bands using the FTF: delta ( < Sequence of binary activations state s state s state s ⋮ ⋮ ⋮ σ , σ , σ , … , σ state s k Estimate pMEM

A BC D ⋮ Binarize ⋮ Analyse features of energy landscape

FIG. 1: A conceptual schematic of implementing energy landscape to EEG data. A) 10 EEG channels are selected using acombination of ﬁlter and wrapper methods. The continuous signal from each channel is then thresholded. Values above meanbecome 1 and values below mean become -1. B) The binarised dataset. Each row represents a brain state at given timepoint.C) Energy of each brain state is estimated using the pairwise maximum entropy model (pMEM). pMEM estimates the energyusing J and h parameters that represent the functional connectivity between channels and base rate of activation, respectively.D) Features of constructed landscape can be extracted. We count the number of local minima (LM), the standard deviationof basin sizes, mean energy diﬀerence between LM and global minimum (GM), and simulate duration in the basin of GM. plitude envelope is used in this study.

III. METHODS

In this study, we analysed the EL of resting EEG sig-nals. The preprocessed data are used to select 10 chan-nels. The channel selection uses entropy-based methodand backward elimination jointly. Then the continuoussignals are binarised to construct ELs.EL models the probability of occurrence of a state byusing the concept of energy. EL can be estimated withthe use of pMEM. In this study, we estimate the pMEMusing gradient ascent method described by [18]. ThepMEM is ﬁtted to binarised EEG signals. The goodness-of-ﬁt is evaluated using an accuracy index r [18, 21]. Theadvantage of EL analysis is tested by comparing the pre-dictive power of the energy of all states to baseline mod-els based on pairwise connectivity. Multiple features ofELs are then extracted in order to analyse the diﬀerencesbetween AD and HC. A. Channel selection and signal binarisation

The number of channels needs to be reduced since thecomputational cost of constructing EL increases expo-nentially with number of channels. The channels are se-lected in an automatic, data-driven manner. The chan-nel selection optimises predictive accuracy and channelactivity while accounting for nonlinear relationships be-tween the channels. Technically, a ﬁlter and a wrapperselection methods are implemented [24].Predictive accuracy is optimised as we aim to investi-gate the predictive power of EL. Therefore, the channelsproviding the highest accuracy are selected. Backwardelimination with SVM is implemented to select thesechannels. SVM with 5-fold cross-validation is used as itcan detect both linear and nonlinear patterns in the data.The folds are created so that data from the same partici-pant are kept within the same fold to prevent informationleakage. The importance of each channel is evaluated us-ing permutation importance [25]. The backward elimi-nation is initialised with all channels being used to trainthe SVM. In each iteration, the channel with the lowestimportance is removed, and the model is re-trained. Thisis repeated while the cross-validated accuracy increases.Channel activity is optimised as constructing EL ofchannels with high activity is desirable as it ensures thatstate-transitions are maximised. Such channel selectionsimulates the manual region selection reported in the lit-erature as it selects the most interesting channels giventhe data. An entropy ﬁlter method was selected for thispurpose. Entropy was used rather than variance, as it re-mains a stable estimate of activity with any distribution.Kernel density estimation was used to estimate the prob-ability distribution of each channel. 15 channels with thehighest entropy are then selected.10 of the channels selected jointly by both methodsare chosen as the ﬁnal channels for each band. If thereare more than 10 common channels, the SVM method ispreferred. If there are less than 10 common channels, thehighest-ranking channels in either selection method areselected, with SVM method being preferred.Next, the continuous EEG signals from selected chan-nels are binarised so that values above mean become 1,and -1 otherwise. As a result, for each EEG recordingwith N electrodes we obtain a sequence of binary signals { σ i (1) , ..., σ i ( T ) } , where T is the number of the recordedsamples. σ i ( t ) = 1 means that the brain region measuredby the electrode i is active at time t . Thus, the state ofthe system at time t is represented by an N -dimensionalvector s ( t ) = ( σ , ..., σ N ) ∈ {− , } N . There are 2 N pos-sible states s k ( k = 1 ..., N ).Previously, the sample mean computed from the wholetime series had been used. However, EEG signals arenon-stationary, and thus, computing means over severalwindows might be necessary, and the window size needsto be optimised. B. Pairwise maximum entropy model

The pMEM describes the probability of each state, us-ing Boltzmann distribution and two parameters, h and J . h i quantiﬁes the baseline activity of ith electrode while J ij quantiﬁes the interaction between ith and jth elec-trodes.We start by calculating the empirical frequency of eachstate s k , P emp ( s k ), and then calculate the empirical ac-tivation rate of each electrode (cid:104) σ i (cid:105) emp and the pairwiseco-occurrence of any two electrodes (cid:104) σ i σ j (cid:105) emp from (cid:104) σ i (cid:105) emp = 1 T T (cid:88) t =1 σ i ( t ) (1) (cid:104) σ i σ j (cid:105) emp = 1 T T (cid:88) t =1 σ i ( t ) σ j ( t ) (2)Next, we ﬁt Boltzmann distribution to P emp ( s k ) givenby: P pMEM ( s k | h, J ) = exp[ − E ( s k | h, J )] (cid:80) N k (cid:48) =1 exp[ − E ( s k (cid:48) | h, J )] (3) where E ( s k ) denotes the energy of the state s k , given by E ( s k ) = − N (cid:88) i =1 h i σ i ( s k ) − N (cid:88) i =1 N (cid:88) j =1 j (cid:54) = i J ij σ i ( s k ) σ j ( s k ) (4)Here σ i ( s k ) indicates the ith element of the state s k . h and J are the parameters of the model, described above,that are to be estimated from the data. Based on themaximum entropy principle, the parameters h and J are selected so that (cid:104) σ i (cid:105) emp = (cid:104) σ i (cid:105) mod and (cid:104) σ i σ j (cid:105) emp = (cid:104) σ i σ j (cid:105) mod . The activation rate (cid:104) σ i (cid:105) mod and pairwise co-occurrence (cid:104) σ i σ j (cid:105) mod predicted by the model are givenby (cid:104) σ i (cid:105) mod = N (cid:88) k =1 σ i ( s k ) P pMEM ( s k | h, J ) (5) (cid:104) σ i σ j (cid:105) mod = N (cid:88) k =1 σ i ( s k ) σ j ( s k ) P pMEM ( s k | h, J ) (6)We estimate h and J from the data using gradient as-cent scheme [18] with a learning rate 0 .

1, a stopping cri-terion of 5 × − and the maximum number of iterations1 × . C. Goodness-of-ﬁt

Previous studies report several metrics to evaluate howwell the pMEM approximates the empirical data andwhat is the contribution of including pairwise interac-tions in the model. We use accuracy index r [18, 21, 23]given by: r = ( D − D ) /D (7)where D represents the Kullback-Leibler divergence be-tween the probability distribution estimated with pMEM(3) and the empirical data, which is given by: D = N (cid:88) k =1 P emp ( s k ) log (cid:104) P emp ( s k ) P pMEM ( s k ) (cid:105) (8) D represents the Kullback-Leibler divergence betweenthe probability distribution estimated with independentmaximum entropy model (iMEM) and empirical data.iMEM does not consider any pairwise interactions, i.e. J = 0.Thus, r represents the contribution of pairwise inter-actions. r = 1 when the pMEM predicts the empiricaldistribution without error, and r = 0 when including thepairwise interactions does not contribute to the predic-tion of empirical distribution. D. Constructing energy landscape

After the pMEM model is ﬁtted, (4) can be used toobtain the energy of each of 2 N possible states s k . ELcan then be framed as an undirected network of stateswhere edges denote transition between two states. Weassume a gradual transitions between the states so anedge connects two states with hamming distance = 1.For example, consider a network where N = 3, there isconnection between states [1,1,1] and [-1,1,1] but states[1,1,1] and [-1,-1,1] are not connected. Thus we can rep-resent the states of EL in an adjacency matrix A : A ( i, j ) = (cid:40) D H = 10 otherwise (9)where D H is the Hamming distance between states i and j . An EL in such form can be used to extract features.Local minimum (LM) of EL is a state whose energy islower than all of its neighbouring states. As all LM of ELare jointly determined by (9), we exhaustively comparethe energy of all states and their neighbours. We alsoidentify the global minimum (GM).The number of LM and mean energy diﬀerence be-tween GM and LMs are then further analysed. Each LMcan be viewed as an attractor with a ﬁeld of attraction,i.e. basin. We implement an algorithm for computingthe basin size [23]. One state is selected, and we movethrough the EL by moving to a lower-energy neighbouruntil an LM is reached. The initial state is then assignedto the basin of the LM, where the algorithm stopped.This is repeated for all states. The standard deviation ofthe basin sizes of all LMs is then computed.Finally, we use the EL as a generative model to sim-ulate a sequence of state transitions. To simulate thetransitions in the EL, we use Markov chain Monte Carlosampling. Speciﬁcally, we implement the Metropolis-Hastings (MH) algorithm [26]. MH sampling is initialisedat random state. In each iteration one of neighbours s c (cid:48) of the current state s c is selected with probability 1 /N .If E ( s c ) < E ( s c (cid:48) ) then the system moves to the selectedstate s c (cid:48) with probability of exp[ E ( s c ) − E ( s c (cid:48) )]. Other-wise, the probability of moving is 1.We obtain 22 000 samples using the MH algorithm.The ﬁrst 2000 samples are removed to minimise the po-tential eﬀect of the starting state. Using the samples,time the system spends in the GM basin is computedand further analysed. E. Experimental design

Besides the parameters of pMEM, we have two addi-tional parameters that need to be optimised, i.e. sam-pling frequency and window size. The original sam-pling frequency of the data is 2000 Hz. However, such a high sampling frequency may be unnecessary as the sameamount of information is often retained with smaller sam-pling frequencies while reducing the computational load.Sampling frequencies were tested, from 500 Hz to theoriginal 2000 Hz in increments of 500 Hz.Several window sizes were also tested for each samplingfrequency: 0.1, 0.2, 0.3, 0.4, 0.5, 1, 1.5, 2, 2.5, 3, 3.5 and4 seconds. As the window sizes are measured in seconds,the actual window size scales by the given sampling fre-quency.For all combinations of sampling frequencies, windowsizes, bands and conditions, pMEMs are estimated, andtwo machine learning models are trained: using nor-malised values of J parameters of pMEM and the energyvalues of all states. A radial-basis kernel SVM and 10-fold cross-validation is used with samples from the samepatient being kept within the same fold. PCA is usedto reduce the dimensions while preserving 95% of thevariance. PCA is computed for each iteration of cross-validation using only training data.In order to select the parameter combination that re-sults in stable performance across both types of mod-els, an average of the area under the receiver operatingcharacteristic curve (AUC) values is used for the selec-tion. The analysis is further performed only on the ELobtained from the selected combination of sampling fre-quency and window size.The goodness-of-ﬁt of the pMEM models is then eval-uated to verify that any detected diﬀerences betweengroups are not due to diﬀerences in the goodness-of-ﬁtof pMEM. Tests are performed to estimate diﬀerencesbetween the groups, bands and conditions.Next, we test whether EL-based models perform bet-ter than a baseline model. A model trained on pairwiseconnectivity is used as a baseline, and we argue that theEL should not be analysed unless its performance is su-perior to a simpler connectivity-based model. SVM andPCA, which retains 95% of variance, are used. A 100times repeated 10-fold cross-validation is used to obtainsamples from the distribution of performances for eachmodel that can be used to test for diﬀerences betweenthe models.Finally, the diﬀerences in the features extracted fromthe ELs are analysed. These tests can be viewed as an in-terpretation of the diﬀerences in ELs learned by the mod-els trained on energy values. These features are comparedbetween groups, conditions and bands using an ANOVA.Any signiﬁcant ﬁxed eﬀects and interactions are testedwith Tukey’s post hoc tests. IV. RESULTSA. Channel selection

Fig. 2 shows the selected channels in each band. Theselected channels show many similarities across all bands,with posterior and central channels more likely to be se-lected. There are only a few deviations from this pattern,such as in alpha and theta bands, where a few frontalchannels were also selected.

F8F4F7F3 C4C3 FZCZ T4T3 PZ P4P3 T6T5 O2O1 alpha

F8F4F7F3 C4C3 FZCZ T4T3 PZ P4P3 T6T5 O2O1 beta

F8F4F7F3 C4C3 FZCZ T4T3 PZ P4P3 T6T5 O2O1 delta

F8F4F7F3 C4C3 FZCZ T4T3 PZ P4P3 T6T5 O2O1 gamma

F8F4F7F3 C4C3 FZCZ T4T3 PZ P4P3 T6T5 O2O1 theta

F8F4F7F3 C4C3 FZCZ T4T3 PZ P4P3 T6T5 O2O1 full

FIG. 2: Selected channels by the overlap of the entropy ﬁlterand the SVM wrapper methods.

B. Selection of sampling frequency and window size

The predictive performance across multiple samplingfrequencies and window sizes was tested (Fig. 3). TheAUC was maximised at a sampling frequency of 1500 Hzand a window size of 3.5 seconds (=5250 samples). Theremaining results refer to models with these parameters.

Window size A UC Sampling frequency

FIG. 3: Cross-validated performance of models (in AUC)across diﬀerent sampling frequencies and window sizes. Theperformance for each parameter combination is averagedover all bands, conditions and types of machine learningmodels.

C. Goodness-of-ﬁt of pMEM pMEM was ﬁtted for each participant separately usingthe gradient descent based parameter updating describedin the previous section. The goodness-of-ﬁt of the pMEM was evaluated using r . ANOVA was used to test for dif-ferences in r . No signiﬁcant diﬀerence between groupswas found ( F (1 , . , p = 0 . F (5 , . , p < . F (1 , . , p < . r = 0 . sd =0 . D. Performance against baseline

Samples from the performance distributions of bothmodels were tested for diﬀerences between the twotypes of models. There are signiﬁcant main eﬀects ofmodel type ( F (1 , . , p < . F (5 , . , p < . F (1 , . , p < . F (5 , . , p < . δ ( F (1 , . , p = 0 . α ( F (1 , . , p = 0 . δ ( F (1 , . , p = 0 . θ ( F (1 , . , p < . α ( F (1 , . , p = 0 . β ( F (1 , . , p < . γ ( F (1 , . , p < . F (1 , . , p < . θ ( F (1 , . , p = 0 . β ( F (1 , . , p = 0 . γ ( F (1 , . , p < . F (1 , . , p < . EC EO d q a b g f d q a b g f A UC Predicted by

Connectivity Energy

FIG. 4: Comparison of performance of energy based modelsand baseline models. Bars show mean value and errorbarsshow the 95% conﬁdence interval of mean.

E. Performance of energy-based models

Since there is evidence that energy-based models per-form signiﬁcantly better or the same as connectivity-based models except for one band during EO condi-tion, we report the performance of these models obtainedby leave-one-patient-out cross-validation. These perfor-mances are reported in Fig. 5. The best model achieves0.94 AUC, 84% sensitivity and 87% speciﬁcity.

AUC Sensitivity Specificity d q a b g f d q a b g f d q a b g f condition ECEO

FIG. 5: Performance of energy-based models trained usingleave-one-patient-out cross-validation.

F. Diﬀerences in energy landscapes

The number of LMs was computed for each partici-pant, band and condition (Fig. 6 A)AD patients have consistently more LMs than HC( F (1 , . , p < . F (5 , . , p < . F (1 , . , p = 0 . F (5 , . , p = 0 . F (1 , . , p = 0 . F (5 , . , p = 0 . F (5 , . , p = 0 . F (5 , . , p < . F (1 , . , p = 0 . F (1 , . , p = 0 . F (5 , . , p < . F (1 , . , p = 0 . F (5 , . , p = 0 . F (5 , . , p = 0 . δ band ( F (1 , . , p = 0 . α ( F (1 , . , p < . β ( F (1 , . , p = 0 . F (1 , . , p < . F (5 , . , p < . F (1 , . , p = 0 . F (5 , . , p = 0 . F (5 , . , p = 0 . F (1 , . , p = 0 .

86) andgroup:band:condition ( F (5 , . , p = 0 . θ ( F (1 , . , p < . β ( F (1 , . , p = 0 . γ ( F (1 , . , p = 0 . F (1 , . , p < . δ ( F (1 , . , p =0 . α ( F (1 , . , p = 0 . F (1 , . , p = 0 . δ ( F (1 , . , p =0 . θ ( F (1 , . , p = 0 . α ( F (1 , . , p = 0 . β ( F (1 , . , p = 0 . γ ( F (1 , . , p = 0 . F (1 , . , p < . F (5 , . , p < . F (5 , . , p = 0 . F (5 , . , p = 0 . F (1 , . , p = 0 . F (1 , . , p = 0 . F (5 , . , p = 0 . θ ( F (1 , . , p < . α ( F (1 , . , p = 0 . β ( F (1 , . , p = 0 . γ ( F (1 , . , p < . F (1 , . , p = 0 . δ ( F (1 , . , p = 0 . θ ( F (1 , . , p = 0 . F (1 , . , p = 0 . δ ( F (1 , . , p = 0 . α ( F (1 , . , p = 0 . β ( F (1 , . , p = 0 . γ ( F (1 , . , p = 0 . V. DISCUSSION AND CONCLUSIONS

In this study, for the ﬁrst time, an EL approach isapplied to study changes in global dynamics of EEG frompatients with AD. Furthermore, we have extended thetraditional process of estimating the EL to an entirelydata-driven approach and proposed adjustments to applythe method to EEG.The data-driven channel selection approach allows es-timating ELs by selecting a subset of channels withoutrelying on any prior assumptions about the regions of in-terest. The channels selected by our approach (Fig. 2)are mainly in the back portion of the scalp, i.e. parietal,central and the lateral temporal channels. The frontalregions and occipital regions were rarely selected. Theselected areas are in line with the ﬁndings of slowing ofEEG rhythms in AD, such as the increase of θ power anddecrease of β power in parieto-occipital regions [6].We show that using EL is worth the additional com-putation cost as models trained on energy values resultsin signiﬁcantly better performance compared to mod-els trained on pairwise functional connectivity (Fig. 4).Moreover, this suggests that the EEG signals cannot befully explained as a product of a fully connected network.Comparison of the ELs revealed that AD cases haveconsistently more LMs than HC (Fig. 6A). The LM canbe viewed as attractors in the state space since it is astate with a high probability of occurrence. To interpretthe meaning of such ﬁnding in the context of previous re-search, the LM should rather be likened to constraints ofthe system. It is well described that a balanced and tem-porally precise pattern of synchronisation and desynchro-nisation is pertinent to cognitive function [27]. The con-straints of the system revealed in this work might reﬂectthe inability of AD networks to show such degree of ﬂex-ibility. The EL is, therefore quantiﬁcation of constraintsof a multivariate system. Thus, increased complexity ofEL means a decrease in complexity of the underlying sig-nals, which ﬁts well with the ﬁndings of previous researchshowing a decrease in complexity both in single channeland pairs of channels [6, 8–10]. In other words, the brainsof AD patients have fewer degrees of freedom. This ﬁnd-ing might be correlated with the early pathology of AD,i.e. the loss of synaptic contacts, correlated to variouscognitive decrements [14].We suggest that the loss of synapses and/or neuronsand the subsequent decline in communication betweencortical regions [6] might lead to the observation of moreLMs, i.e. constraints, in addition to the “normal” LMswhich probably have a functional role. We can assumethat these extra LM in AD do not have any function;instead, they might be responsible for the disruption ofinformation processing in AD [6].The GM largely determines the shape of the EL asit is an attractor with a large ﬁeld of attraction. The diﬀerence in energy between the GM and the remainingLMs then quantiﬁes the ease of transition between them.We calculated the average value of these diﬀerences (Fig.6B). In δ band AD have higher energy diﬀerence whichindicates easier state transitions. Research shows thatthe power in δ band increases in AD [5–7]; similarly, ourﬁnding allows less organised state transitions, i.e. moreactivity. Average energy diﬀerence of AD is lower in α and β , which could be linked to the decrease in thesebands [5–7].The GM largely deﬁnes the variation in the sizes ofbasins of LMs. This is because the GM basin is alwaysthe largest, which means that calculating the variationof basin sizes essentially measures the similarity of LMsto the GM in terms of basin size. Our analysis revealsthat AD cases have a smaller variation of basin sizes,indicating higher similarity to the GM (Fig. 6C). Thisresult needs to be interpreted together with the numberof LM. This is given by the deﬁnition of how basin sizeis computed, as each state belongs to one and only onebasin. This, in turn, means that if an EL has more LM,then their basins tend to be smaller since the number ofstates remains constant. It can then be concluded thatAD cases have smaller basins which are similar to eachother. Smaller basins also mean that the states in themare more strongly attracted to their LMs since they arecloser to the centre of the basin. We interpret this asLMs of AD being stronger constraints compared to HCthus supporting the account of AD, leading to decreasedcomplexity of signals [6, 8–10].Finally, the simulated duration in the basin of a GMwas analysed. We focus only on the GM, whereas withsize, the attractor strength declines - states on the edgesof the basin are less aﬀected by the GM than states closeto the centre. We demonstrate that AD cases spend asigniﬁcantly shorter time in the basin of GM (Fig. 6D).Since AD cases have more LMs that are similar to theGM, the time spent within each basin must be dividedevenly among all LMs. In other words, a large propor-tion of states that the AD cases must “visit” is prede-termined by the LMs. On the other hand, HC cases areconstrained mainly by a large but weak basin of the GMand a few additional strong LMs (as evidenced by thehigher variation in basin sizes). This result again rein-forces the ﬁnding that the signals of AD cases are moreconstrained, i.e. less complex than HC cases.The reported research has several limitations. First,the selected window size and sampling frequency (i.e.3.5s and 1500 Hz) replicates the data with only mediumaccuracy as evidenced by the mean goodness-of-ﬁt ofpMEM. A possible cause is the length of the EEG record-ings, which might be too short to estimate the pMEMmodel with higher accuracy. Second, the data need to bebinarised by using a threshold [18]. Such simpliﬁcationleads to considerable loss of information as compared tothe continuous EEG signals. We attempt to lessen theimpact of binarisation by using multiple thresholds in-stead of a single value. While using multiple thresholds EC EO d q a b g f d q a b g f5678910 Lo c a l M i n i m a A EC EO d q a b g f d q a b g f0.91.11.31.5 E ne r g y D i ff e r en c e B EC EO d q a b g f d q a b g f1011121314 s d ( B a s i n s i z e ) C EC EO d q a b g f d q a b g f3.23.64.04.4 T i m e i n G M D Diagnosis AD HC

FIG. 6: Average values of features extracted from EL. Bars show mean value and errorbars show the 95% conﬁdence intervalof mean of: A) Number of LM B) Mean energy diﬀerence between LM and GM C) Standard deviation of basin sizes D)Simulated duration in the basin of GM does not resolve the stationarity issue, it seems to cap-ture a larger portion of information since it improves thepredictive accuracy (Fig. 3).Third, ﬁtting pMEM is computationally expensive,and currently, it is only possible to estimate the EL with10-15 channels [18]. Since the number of possible statesincreases exponentially, performing the analysis with ahigher number of channels is computationally infeasible.Previous research relied on manually selecting regions ofinterest which might lead to biased selections. Our data-driven channel selection resolves the issue. On the otherhand, such channel selection produces diﬀerent selectionsfor each band, making band-wise comparisons diﬃcult.Finally, the sample size used in this study is rather lim-ited. It is suﬃcient to demonstrate the predictive powerof our approach for a binary classiﬁcation, i.e. between AD and HC cases. However, AD patients can diﬀer sig-niﬁcantly both in the severity of symptoms and pathol-ogy. Our sample size is not suﬃcient for studying thediﬀerences between diﬀerent AD patients. Future studiesshould focus on performing the analysis on large samplesize and treating the diagnosis problem as regression, i.e.predicting the severity of AD.

ACKNOWLEDGEMENT

The original data collection was funded by a grant fromthe Alzheimer’s Research UK (ARUK-PPG20114B-25).The views expressed are those of the author(s) and notnecessarily those of the NHS, the NIHR or the Depart-ment of Health. [1] L. C. dos Santos Picanco, P. F. Ozela, M. de Fatima deBrito Brito, A. A. Pinheiro, E. C. Padilha, F. S. Braga,C. H. T. de Paula da Silva, C. B. R. dos Santos, J. M. C.Rosa, and L. I. da Silva Hage-Melim, Alzheimer’s dis-ease: A review from the pathophysiology to diagnosis,new perspectives for pharmacological treatment, CurrentMedicinal Chemistry , 3141 (2018).[2] R. Cassani, M. Estarellas, R. San-Martin, F. J. Fraga,and T. H. Falk, Systematic review on resting-state eeg foralzheimer’s disease diagnosis and progression assessment,Disease markers (2018). [3] C. Laske, H. R. Sohrabi, S. M. Frost, K. L´opez-deIpi˜na, P. Garrard, M. Buscema, J. Dauwels, S. R.Soekadar, S. Mueller, C. Linnemann, et al. , Innovativediagnostic tools for early detection of alzheimer’s disease,Alzheimer’s & Dementia , 561 (2015).[4] D. J. Blackburn, Y. Zhao, M. D. Marco, S. M. Bell, F. He,H.-L. Wei, S. Lawrence, Z. C. Unwin, M. Blyth, andJ. Angel, A pilot study investigating a novel non-linearmeasure of eyes open versus eyes closed eeg synchroniza-tion in people with alzheimer’s disease and healthy con-trols, Brain sciences , 134 (2018). [5] R. P. Brenner, C. F. R. III, and R. F. Ulrich, Diagnosticeﬃcacy of computerized spectral versus visual eeg anal-ysis in elderly normal, demented and depressed subjects,Electroencephalography and clinical neurophysiology ,110 (1988).[6] J. Jeong, Eeg dynamics in patients with alzheimer’s dis-ease, Clinical neurophysiology , 1490 (2004).[7] P. Ghorbanian, D. M. Devilbiss, T. Hess, A. Bernstein,A. J. Simon, and H. Ashraﬁuon, Exploration of eegfeatures of alzheimer’s disease using continuous wavelettransform, Medical & biological engineering & computing , 843 (2015).[8] M. Ahmadlou, H. Adeli, and A. Adeli, New diagnostic eegmarkers of the alzheimer’s disease using visibility graph,Journal of neural transmission , 1099 (2010).[9] C. Coronel, H. Garn, M. Waser, M. Deistler, T. Benke,P. Dal-Bianco, G. Ransmayr, S. Seiler, D. Grossegger,and R. Schmidt, Quantitative eeg markers of entropy andauto mutual information in relation to mmse scores ofprobable alzheimer’s disease patients, Entropy , 130(2017).[10] H. Garn, M. Waser, M. Deistler, T. Benke, P. Dal-Bianco, G. Ransmayr, H. Schmidt, G. Sanin, P. San-ter, and G. Caravias, Quantitative eeg markers relate toalzheimer’s disease severity in the prospective dementiaregistry austria (prodem), Clinical Neurophysiology ,505 (2015).[11] J. Dauwels, F. Vialatte, T. Musha, and A. Cichocki, Acomparative study of synchrony measures for the earlydiagnosis of alzheimer’s disease based on eeg, NeuroIm-age , 668 (2010).[12] O. Vyˇsata, M. Valiˇs, A. Proch´azka, R. Rusina, andL. Pazdera, Linear and nonlinear eeg synchronization inalzheimer’s disease, Neurophysiology , 46 (2015).[13] E. L. Dennis and P. M. Thompson, Functional brainconnectivity using fmri in aging and alzheimer’s disease,Neuropsychology review , 49 (2014).[14] R. D. Terry, Cell death or synaptic loss in alzheimer dis-ease, Journal of Neuropathology & Experimental Neurol-ogy , 1118 (2000).[15] Z. Fu, A. Caprihan, J. Chen, Y. Du, J. C. Adair, J. Sui,G. A. Rosenberg, and V. D. Calhoun, Altered static anddynamic functional network connectivity in alzheimer’sdisease and subcortical ischemic vascular disease: sharedand speciﬁc brain connectivity abnormalities, Humanbrain mapping , 3203 (2019). [16] J. Kang, C. Pae, and H.-J. Park, Graph-theoretical anal-ysis for energy landscape reveals the organization of statetransitions in the resting-state human cerebral cortex,PloS one (2019), pmid:31498822.[17] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek,Weak pairwise correlations imply strongly correlated net-work states in a neural population, Nature , 1007(2006).[18] T. Ezaki, T. Watanabe, M. Ohzeki, and N. Masuda, En-ergy landscape analysis of neuroimaging data, Philosoph-ical Transactions of the Royal Society A: Mathematical,Physical and Engineering Sciences , 20160287 (2017).[19] A. Ashourvan, S. Gu, M. G. Mattar, J. M. Vettel, andD. S. Bassett, The energy landscape underpinning mod-ule dynamics in the human brain connectome, NeuroIm-age , 364 (2017).[20] S. Gu, M. Cieslak, B. Baird, S. F. Muldoon, S. T.Grafton, F. Pasqualetti, and D. S. Bassett, The energylandscape of neurophysiological activity implicit in brainnetwork structure, Scientiﬁc reports , 1 (2018).[21] D. Krzemi´nski, N. Masuda, K. Hamandi, K. D. Singh,B. Routley, and J. Zhang, Energy landscape of restingmagnetoencephalography reveals fronto-parietal networkimpairments in epilepsy, Network Neuroscience , 374(2020).[22] S. A. Bunge and I. Kahn, Cognition: An overview ofneuroimaging techniques, Encyclopedia of Neuroscience , 1063 (2009).[23] T. Watanabe, S. Hirose, H. Wada, Y. Imai, T. Machida,I. Shirouzu, S. Konishi, Y. Miyashita, and N. Masuda, Apairwise maximum entropy model accurately describesresting-state human brain networks, Nature communica-tions , 1 (2013).[24] T. Alotaiby, F. E. Abd El-Samie, S. A. Alshebeili, andI. Ahmad, A review of channel selection algorithms foreeg signal processing, EURASIP Journal on Advances inSignal Processing , 66 (2015).[25] A. Altmann, L. Tolo¸si, O. Sander, and T. Lengauer, Per-mutation importance: a corrected feature importancemeasure, Bioinformatics , 1340 (2010).[26] R. McElreath, Statistical rethinking: A bayesian coursewith examples in r and stan, , 242 (2016).[27] A. Schnitzler and J. Gross, Normal and pathological os-cillatory communication in the brain, Nature reviewsneuroscience6