Dynamic Graph Modeling of Simultaneous EEG and Eye-tracking Data for Reading Task Identification
DDYNAMIC GRAPH MODELING OF SIMULTANEOUS EEG AND EYE-TRACKING DATAFOR READING TASK IDENTIFICATION
Puneet Mathur, Trisha Mittal, Dinesh Manocha
University of Maryland, College Park
ABSTRACT
We present a new approach, that we call
AdaGTCN ,for identifying human reader intent from Electroencephalo-gram (EEG) and Eye movement (EM) data in order to helpdifferentiate between normal reading and task-oriented read-ing. Understanding the physiological aspects of the readingprocess (the cognitive load and the reading intent) can helpimprove the quality of crowd-sourced annotated data. Ourmethod, Adaptive Graph Temporal Convolution Network(AdaGTCN), uses an Adaptive Graph Learning Layer andDeep Neighborhood Graph Convolution Layer for identify-ing the reading activities using time-locked EEG sequencesrecorded during word-level eye-movement fixations. Adap-tive Graph Learning Layer dynamically learns the spatialcorrelations between the EEG electrode signals while theDeep Neighborhood Graph Convolution Layer exploits tem-poral features from a dense graph neighborhood to establishthe state of the art in reading task identification over othercontemporary approaches. We compare our approach withseveral baselines to report an improvement of . on theZuCo . dataset, along with extensive ablation experiments. Index Terms — electroencephalography, eye-tracking,graph convolution networks, reading task identification
1. INTRODUCTION
Reading is a complex cognitive task that requires the simul-taneous processing of complex visual input across a series ofbrief fixation pauses and saccadic eye movements as well asretrieving, updating, and integrating contents of memory [1,2, 3]. Although each individual tends to process languagein its own distinct style, the reading patterns tend to followan underlying assumption that readers retain fixations on aword according to its cognitive importance in lexical process-ing [4]. Identifying such patterns in reading can help bettermodel how humans read and perform regular linguistic tasksand transfer this knowledge to machines to better automatethem for language processing.Understanding and automated modeling of human read-ing patterns can help in improving the manual crowd-sourcedannotations in a variety of Natural Language Processing (NLP)tasks as they are closely intertwined with the human subjectreading intent [5]. While there is no standard practice toquantify the quality of annotator efforts, these annotationsheavily feed into the NLP supervised learning setups effect-ing the quality of learning models. Recognizing the readingpatterns for estimating the reading effort also has broader ap-plications in medical diagnosis of reading impairments suchas dyslexia [6], child developmental reading disorder [7] andattention deficit disorder [8]. Hollenstein et al. [5] defines two reading paradigms-Natural Sentence Reading (NR) and Task-Specific Read-ing (TSR). In a natural reading setup, the reader is expectedto read sentences without any specific task other than compre-hension. On the other hand, task-specific reading focuses onachieving some predecided linguistic goals beyond generalcomprehension such as relation extraction, sentiment label-ing, pronoun resolution, and named entity recognition. In thiswork, we focus on identifying participant’s reading activities -normal sentence reading vs task-specific reading using simul-taneous psychophysiological signals - Electroencephalogram(EEG) and Eye Movement(EM).The process of reading a specific text is considered com-plex from the perspective of neuroscience since it involvesthe vision, memory, motor control, learning, among others.Past research [9] suggests the need of using brain Electroen-cephalography (EEG) signals for studying the cognitive stateof a subject for behavior analysis. Similarly, eye-trackingdata also help better understand human reading patterns andis highly correlated with the cognitive load associated withdifferent stages of text reading. Hence, simultaneous EEGand eye-tracking recordings hint towards promising results tofurther understand word-level brain activity signals during areading session.
Main Contributions:
The following are some of themain contributions in this work.1. We propose A daptive G raph T emporal C onvolution N etwork (AdaGTCN), a novel neural architecturecomposed of - Adaptive Graph Learning layer, DeepNeighborhood Graph Convolution layers and DilatedInception Layers, all combined sequentially for iden-tifying reading activities from synchronous EEG andeye-tracking data provided by ZuCo . corpus [5].2. Our proposed Adaptive Graph Learning Layer uti-lizes the spatio-temporal relationship between the EEGelectrodes to dynamically learn the graph networkstructure based on EEG sequences during word-leveleye-movement fixations and the
Deep NeighborhoodGraph Convolution Layers interleaved with
DilatedInception Layers simultaneously exploit the messagepassing in interdependent EEG electrodes while pre-serving their short range temporality.
2. RELATED WORKUnderstanding Human Reading Behavior:
Human read-ing analysis can directly benefit in determining the annotationcomplexity of text. [10] conducted thorough investigation onthe behavior patterns in complex reading comprehension to a r X i v : . [ ee ss . SP ] F e b nderstand how humans allocate their attention during read-ing comprehension in an attempt to quantify reading efforts. Utilizing EEG and Eye-tracking Data:
Several previ-ous works have explored EEG and eye-tracking features foranalyzing human-information interaction tasks such as movietrailer analysis [11], diagnosis of mild Alzheimer’s disease[12] and hazard prediction [13]. These studies concluded thatisolated physiological signals miss out crucial informationthat are better perceived when combined together. [14] stud-ied the linguistic effects of co-registered eye movements andEEG neural activity in natural sentence reading and showedthat such synchronized signals accurately represent lexicalprocessing. This paper is the first attempt in utilizing syn-chronized EEG and eye-tracking signals for identifying formsof reading. Most prior works are handicapped by one or moreof the following challenges when working with EEG andeye-tracking data either due to lack of requisite data or com-putational techniques. Our work can augment eye-trackingdata for English-Chinese sight translation as done in [15].
Modeling EEG and Eye-tracking in Network Archi-tectures:
Sequential modeling through recurrent neural net-works like LSTM has been a popular technique for extract-ing relevant features from the EEG [16, 17] and eye-trackingdata [18, 19]. However, these models overlook the functionalconnectivities between the the EEG electrodes and their ef-fect on the gaze patterns, leading to spatio-temporal infor-mation loss for sequence classification tasks [20]. We over-come this challenge by learning the interdependency betweenthe EEG electrodes as part of the training process. Recently,many graph based methods have been proposed for extractingspatio-temporal information from EEG signals [21, 22, 23,24]. However, most of these methods treat the graph networkconnections as static and do not learn the functional connec-tivities between the EEG electrodes as a part of the trainingprocess. We hypothesize that learning the graph structure inan online fashion helps the model exploit the latent spatialinterdependencies in the brain signals without relying on ex-ternal biases about brain graph modeling.
3. PROBLEM FORMULATION
We formally define the problem statement and the input andoutput data setup. For every sentence reading session, wehave two signals, eye fixation data and the EEG signals.
Eye Fixation Data:
This data comprises of sequence hori-zontal axis gaze location entries for all individual fixationsrecorded while the reader fixates on the sequence of words( w [0 ,n ] ) in a single sentence, where the n words may not nec-essarily be in their natural linguistic order. EEG Data:
TheEEG signals have p nodes (fixed number of electrodes corre-sponding to each frequency band). In a given sentence, theEEG signals recorded during the fixation duration ∆ t i forword w i are represented as e g,ip × ∆ t i and its mean value acrossthe time interval ∆ t i is given by ˆ e g,ip .Given a participant reading a sentence in a reading session S consisting of n word fixations, the EEG time series is rep-resented as z = [ˆ e g, p , ˆ e g, p , ˆ e g, p . . . ˆ e g,np ] . We aim to learn afunction f ( z ) = y where y = 1 if S is Task-specific readingand y = 0 for natural reading.
4. OUR APPROACH
In this section, we describe the individual components of theproposed Adaptive Graph Temporal Convolution Network(AdaGTCN) as illustrated in Figure 1.
EEG signals are long temporal sequences with long stretchesof noisy artifacts. Thus, we extract the EEG signals corre-sponding to the First Fixation Duration (FFD) - the dura-tion of a fixation on the prevailing word, for each word inthe reading sequence. Each electrode signal per channel isbroken down in 8 frequency bands: θ (4–6 Hz), θ (6.5–8Hz), α (8.5–10 Hz), α (10.5–13 Hz), β (13.5–18 Hz), β (18.5–30Hz), γ (30.5–40 Hz), and γ (40–49.5 Hz). Hence,for each word fixated by a participant in its reading sequence,we utilize the mean EEG signal values per channel in eightdifferent frequency bands. The pre-processed data is fed into an AGL layer that learnsthe adjacency matrix at the training time by randomly initial-izing node embeddings of multiple subsets of the graph andforms unweighted directed edges between the closest nodepairs. Let a subset of nodes v i be randomly sampled frompool of all input nodes V . The input temporal node featurematrix N of size p × n is passed through a linear layer withparameters θ i with a tanh non-linearity. Additionally, eachof the temporal node feature matrices is regulated by a satu-ration coefficient ω in order to provide a dropout effect. Thisensures that the graph remains sparse and probability of densehub formation reduces. The result of these operations is asparse feature matrix X i as shown in Equation 1. X i = tanh( ωNθ i ) (1)Repeating the process similarly for other k node partitionswe obtain X i , i ∈ { , , . . . k } . Each transformed sparsefeature matrix X i is multiplied by every other X j where j (cid:54) = i and summed together to form a partial adjacency matrix. Fur-ther, the sum of product of other sparse feature matrix withtheir transpose ( X j X Tj ) is subtracted from the partial adja-cency matrix through matrix regularization constant, λ to reg-ularize the complete adjacency matrix candidate A i by mini-mizing its diagonal component. The complete adjacency ma-trix candidate is passed through a fully connected layer ( φ )with ReLu non-linearity. A i = tanh( k (cid:88) j (cid:54) = i ( X i X Tj ) − λ k (cid:88) j X j X Tj ) (2) M i = ReLu ( φ ( A i )) (3)Equation 2 and 3 is repeated for all k node partitions. Thisstochastic sampling process assigns k potential edge e i,j fromvertex u i to u j for each vertex pair ( u i , u j ) . We select the top-k edge connections such that the top-k node candidates are setto 1 while the rest of the node pairs are set as zero. The sam-pled edge weights computed correspond to discrete variables.In such a case, the edge weights in the Adaptive Graph Learn-ing Layer cannot be updated vie backpropogation. In order to ig. 1 : Our network AdaGTCN consists of Adaptive Graph Learning (AGL) Layer which takes in the sequential time series input cor-responding to each node and outputs a multilayered graph adjacency matrix. This is fed into the Deep Neighborhood Graph Convolution(DN-GCN) layer followed by the Dilated Inception Temporal Convolution (DI-TCN) layer. The DN-GCN module and the DI-TCN moduleboth have 16 output channels. The output module consists of two dense layers having 32 and 16 output channels, respectively followed by aone neuron layer to normalize the output through Softmax to predict the class probabilities. convert these weights into continuous probabilistic distribu-tion, we employ the Gumbel-softmax reparametrization tech-nique [25], as given by w i,j = sof tmax (( e i,j + q ) /τ ) , to en-able gradients computation. Here, q is a random vector whosecomponents are independent and identically distributed and τ is the softmax temperature that controls sampling smooth-ness. The process outlined in the above steps helps to learnstable node relationships over the training period. The hyperparameters ω and λ are adjusted over the course of gradientbackpropogation as new training data updates the model. EEG electrodes connectivities attempt to simulate a minia-ture version of the human brain graph. The small hub as-sumption of the brain graph theory [26] describes the nerveconnections in the human brain as a combination of non-random clustering with short path length. The effect of thisassumption is critical in designing brain graph networks asa multi-layered graph structure that propagates informationboth in the immediate node locality and the deeper neigh-bourhood. However, past studies have shown that increasingthe depth of the GCN converges it to the random walk’slimit distribution [27]. This inevitably results in informa-tion saturation where the hidden states converge to a singlepoint and are skipped in the message passing in deeper lay-ers. The traditional vanilla Graph Convolution (GCN) Layer[28] is as H ( K +1) = σ ˆ AH ( K ) W ( K ) , where H ( K ) and H ( K +1) are the input and output activation of the layer K ,and ˆ A = D − ( A + I n ) D − is a symmetrically normalizedadjacency matrix with self-connections. W ( K ) is a trainableweight matrix such that W (0) is obtained as output from theAGL layer. The message passing algorithm in GCN acts asa simple neighborhood averaging operator which replaceseach row in the feature matrix by the average of its neigh-bors. Inspired by [29], we modify the vanilla GCN layerto introduce the Deep Neighbourhood Graph Convolution(DN-GCN) layer to recursively propagate the neighbour-hood information from deep layers selectively over spatiallydependent nodes. We modify the vanilla GCN layer with ad-ditional inputs from recursively deeper GCN layers to obtainthe following equation: H K +1 = σ ( K (cid:88) l =1 β l ˆ AH l W l ) , where β l is the depth regularization coefficient selected in a way suchthat (cid:80) Kl =1 β l = 1 . This modification retains a proportionof hidden states from each of the previous layers during thepropagation step so as to preserve locality and exploring adeeper neighborhood at the same time. However, there can becases when an EEG electrode node may not have any spatialdependency on its neighbours due to its peripheral location[5]. To handle such extreme cases, we add a feature selectorterm, S l . For layers that do not have any spatial dependencyon the l th layer, it is possible that S l = 0 , except for l = K to preserve information flowing through the self-connectionacquired by the re-normalization trick of vanilla GCN layers.The final representation of the DN-GCN layer is given by H K +1 = σ ( K (cid:88) l =1 β l ˆ AH l W l S l ) . The EEG signals may have temporal dependencies spanningmultiple ranges. Graph convolution layers can successfullymodel spatial dependencies but fall short of extracting time-varying patterns in long sequences. 1-D convolutional layerscan be helpful in extracting temporal patterns. However, theyare limited by their receptive field which grows in a linearprogression with the depth of the network, requiring a deepernetwork that is harder to converge during optimization due tovanishing gradients. On the other hand, dilated convolutions[30] provide exponentially expanding receptive fields withoutlosing resolution. Hence, the dilated convolution layers are agood choice for extracting temporal patterns of the input timeseries data in our model. However, choosing the right filtersize is a difficult task as the temporal dependencies arisingdue to reading task difficulty have large variations. A verylarge filter size may miss critical recurring patterns while ashorter than required size may overfit the model. ¯ x d = x ⊗ f × d ( t ) = (cid:90) ∞−∞ f × d ( s ) x ( t − r ∗ s ) (4)Inspired by [31], we adopt Dilated Inception Network inour proposed architecture which uses multiple parallel dilatedconvolutions with different filter sizes to enrich the diversityof receptive fields in feature maps. We experiment various fil-ters ( f × d , d ∈ { , , . . . D } ) going till D = 12 . Given x be ethods F1 Accuracy (%) U n i m od a l k-NN 0.478 51.55 B a s e li n e s EEG-LSTM 0.524 52.78EM-LSTM 0.550 54.22EEG-GCN 0.582 59.15EEG-GCN + Attention Pooling 0.614 59.75EEG-GCN + Hierarchial Pooling 0.621 60.56 M u lti M od a l EEG-LSTM + EM-LSTM 0.640 62.33EEG-GCN + EM-LSTM 0.659 63.50 A b l a t i o n AdaGTCN 0.695 69.79AdaGTCN w/o DI-TCN 0.652 64.12AdaGTCN w/o DN-GCN 0.633 63.72AdaGTCN w/o AGL 0.675 66.20
AdaGTCN (Ours) 0.695 69.79
Table 1 : Quantitative Results:
We compare with unimodal andmultimodal baselines and perform ablation experiments and showan improvement of . . the input 1-D sequence, the output of dilated convolution withdilation factor r , ( ¯ x d ) obtained by convolution with filter f × d is represented by Equation 4. The transformed inputs acrossall dilation layers are truncated to the same length accordingto the largest filter and concatenated across the channel di-mension to form the output of the DI-TCN layer as given by ¯ x = concat (¯ x , ¯ x , . . . ¯ x D ) .
5. EXPERIMENTS AND RESULTSFig. 2 : Qualitative Analysis:
Left:
Eye fixations on words withrespect to time;
Middle & Right:
Mean EEG values across α and β frequency bands for each word as they appear in the sentence.Comparison between natural reading (NR) and task-specific reading(TSR) for the sentence -“Henry Ford, with his son Edsel, foundedthe Ford Foundation in 1936 as a philanthropic organization with agoal to promote human welfare”. Dataset:
ZuCo . dataset [5] provides simultaneous EEGand eye-tracking data for participants reading a total of sentences ( for NR and sentences for task-specific reading). The data of 12 participants ( . ) isused for train, for validation ( . %) and for test-ing ( %). Training Hyperparameters:
We summa-rize the range of hyperparameters as follows: dropout δ ∈ [0 , . , learning rate η ∈ { − , − , − } , batch size b ∈ { , , , , } and epochs ( < . Adam was em-ployed for optimizing the cross entropy loss of the model.The proposed model was trained on Pytorch using NVIDIARTX 2080Ti, with batch size with the learning rate . to converge in epochs. The input word-level fixation seg-mented EEG signal was padded to , the max length of theinput across the dataset. The hyperparameters of the Adap-tive Graph Learning layer - saturation coefficient ( ω ), matrixregularization constant ( λ ), and the temperature of Softmax smoothing ( τ ) are all set to . for best performance. Thetop-k edge connections in the same layer are experimentedwith the value of k ∈ [1 , . Empirically, the depth of thenode feature aggregation K = 2 and the depth regularizationcoefficients ( β , β ) are set to . and . , respectively inthe DN-GCN layer. The filter sizes f xD are experimentallyfound to be most effective when D ≤ . Layer Normalizationis applied after each graph convolution module. Quantitative Results:
We compare with several baselineson the ZuCo 2.0 dataset and report the average values ofF1 score and the accuracy, respectively, in Table 1. Uni-modal baselines include methods that process EEG and EMindependently, either using recurrent neural networks suchas LSTMs [16] or graph convolutional networks (GCNs)similar to [21]. In contrast, multimodal baselines performlate fusion to process both EEG as well as EM signals. Ourproposed AdaGTCN model outperforms these baselines bya significant margin of +6.29% in accuracy. To motivate theimportance of the Adaptive Graph Learning Layer (AGL),Dilated Inception Temporal ConvNet layer (DI-TCN), andDeep Neighborhood GCN (DN-GCN) layer, we perform aseries of ablation experiments, also shown in Table 1. Ourmodel generates sparser graphs with average node degree andtotal number of edges of . and , respectively. Thisis a significant reduction in number of parameters in the bestperforming graph structure obtained by [17] (average nodedegree=8 and total edges = 3524). Qualitative Results:
Weanalyze the eye-movement across words and the EEG signalsfor one particular reading sequence in Figure 2. It can beobserved that the eye-movement in natural reading is coher-ent with the reading sequence ordering as compared to thetask-specific reading in the first plot. The subject tends tohave multiple fixations and digressions on functional wordssuch as names (”Henry Ford”), dates (”1936”) and relations(”son”). The α and β frequency range of EEG are relativelypassive during natural reading, although sudden spikes canbe observed in task-specific reading.
6. CONCLUSION
We propose Adaptive Graph Temporal Convolution Net-work (AdaGTCN), for identifying normal reading from task-specific annotation reading using simultaneous word-leveleye-fixation segmented EEG signals. We motivate the ad-vantages of learning the spatial graph structure formed byinterdependent EEG electrodes at training time while exploit-ing the temporal patterns from a dense graph neighborhood.We demonstrate the benefits of AdaGTCN model throughcompetitive performance on the ZuCo 2.0 dataset and bench-mark relevant design choices for future signal processingapplications on co-registered physiological data. Future di-rection of research will aim to leverage semi-supervised andunsupervised methods that do not rely on large amounts ofannotataed data.
7. ACKNOWLEDGEMENT
This work was supported in part by ARO Grants W911NF1910069and W911NF1910315 and Adobe. . REFERENCES [1] O. Dimigen, W. Sommer, A. Jacobs A. Hohlfeld, and R. Kliegl,“Co-registration of eye movements and eeg in natural reading:Analyses & review,”
The Mind Research Repository , 2014.[2] R. Kliegl, A. Nuthmann, and Ralf Engbert, “Tracking the mindduring reading: the influence of past, present, and future wordson fixation durations.,”
Journal of experimental psychology.General , vol. 135 1, pp. 12–35, 2006.[3] R. Radach and Alan Kennedy, “Theoretical perspectives on eyemovements in reading: Past controversies, current issues, andan agenda for future research,”
European Journal of CognitivePsychology , vol. 16, pp. 26 – 3, 2004.[4] Tzipi Horowitz-Kraus, Christopher DiCesare, and Adam W.Kiefer, “Longer fixation times during reading are correlatedwith decreased connectivity in cognitive-control brain regionsduring rest in children.,”
Mind, brain and education : the of-ficial journal of the International Mind, Brain, and EducationSociety , vol. 12 1, pp. 49–60, 2018.[5] Nora Hollenstein, Marius Troendle, Ce Zhang, and Nico-las Langer, “Zuco 2.0: A dataset of physiological record-ings during natural reading and annotation,” arXiv preprintarXiv:1912.00903 , 2019.[6] S. Syal and M. Torppa, “Task-avoidant behaviour and dyslexia:A follow-up from grade 2 to age 20.,”
Dyslexia , 2019.[7] Eddy Cavalli, P. Col´e, C. Pattamadilok, J. Badier, and J. C.Ziegler, “Spatiotemporal reorganization of the reading networkin adult dyslexia,”
Cortex , vol. 92, pp. 204–221, 2017.[8] M. Kofler, Jamie A. Spiegel, Elia F Soto, Lauren N. Irwin, Er-ica L Wells, and K. E. Austin, “Do working memory deficitsunderlie reading problems in attention-deficit/hyperactivitydisorder (adhd)?,”
Journal of Abnormal Child Psychology , vol.47, pp. 433–446, 2019.[9] D. Dvoˇr´ak, Andrea Shang, Samah Abdel-baki, W. Suzuki, andA. Fenton, “Cognitive behavior classification from scalp eegsignals,”
IEEE Transactions on Neural Systems and Rehabili-tation Engineering , vol. 26, pp. 729–739, 2018.[10] Y. Zheng, J. Mao, Y. Liu, Zixin Ye, M. Zhang, and S. Ma,“Human behavior inspired machine reading comprehension,”
Proceedings of the 42nd International ACM SIGIR Conferenceon Research and Development in Information Retrieval , 2019.[11] Jan-Philipp Tauscher, Maryam Mustafa, and Marcus Magnor,“Comparative analysis of three different modalities for percep-tion of artifacts in videos,”
ACM Transactions on Applied Per-ception (TAP) , vol. 14, no. 4, pp. 1–12, 2017.[12] Malihe Moghadami, Sahar Moghimi, Ali Moghimi, Gho-lam Reza Malekzadeh, and Javad Salehi Fadardi, “The inves-tigation of simultaneous eeg and eye tracking characteristicsduring fixation task in mild alzheimer’s disease,”
Clinical EEGand Neuroscience , p. 1550059420932752, 2020.[13] Louisa V Kulke, Janette Atkinson, and Oliver Braddick, “Neu-ral differences between covert and overt attention studied usingeeg with simultaneous remote eye tracking,”
Frontiers in hu-man neuroscience , vol. 10, pp. 592, 2016.[14] Olaf Dimigen, Werner Sommer, A. Hohlfeld, Arthur M Jacobs,and R. Kliegl, “Coregistration of eye movements and eeg innatural reading: analyses and review.,”
Journal of experimentalpsychology: General , vol. 140, no. 4, pp. 552, 2011.[15] W. Su and D. Li, “Identifying translation problems in english-chinese sight translation: An eye-tracking experiment,”
TheInformation Society , vol. 14, pp. 110–134, 2019.[16] Shiba Kuanar, Vassilis Athitsos, Nityananda Pradhan, Ara-binda Mishra, and Kamisetty R Rao, “Cognitive analysis of working memory load from eeg, by a deep recurrent neuralnetwork,” in . IEEE, 2018, pp.2576–2580.[17] Soobeom Jang, S. Moon, and J. Lee, “Eeg-based video iden-tification using graph signal modeling and graph convolutionalneural network,” , pp. 3066–3070, 2018.[18] Shane D Sims, Vanessa Putnam, and Cristina Conati, “Pre-dicting confusion from eye-tracking data with recurrent neuralnetworks,” arXiv preprint arXiv:1906.11211 , 2019.[19] Fatemeh Koochaki and Laleh Najafizadeh, “Eye gaze-basedearly intent prediction utilizing cnn-lstm,” in . IEEE, 2019, pp. 1310–1313.[20] Dalin Zhang, Lina Yao, Kaixuan Chen, Sen Wang, Xiao-jun Chang, and Yunhao Liu, “Making sense of spatio-temporal preserving representations for eeg-based human in-tention recognition,”
IEEE transactions on cybernetics , 2019.[21] Seong-Eun Moon Jang, Soobeom and J. Lee, “Eeg-based videoidentification using graph signal modeling and graph convolu-tional neural network,” in
Proceedings of the 2018 IEEE Inter-national Conference on Acoustics, Speech and Signal Process-ing (ICASSP). IEEE, 2018 , 2018, pp. 3066–3070.[22] Ziyu Jia, Youfang Lin, Xiyang Cai, Haobin Chen, Haijun Gou,and Jing Wang, “Sst-emotionnet: Spatial-spectral-temporalbased attention 3d dense network for eeg emotion recognition,”in
Proceedings of the 28th ACM International Conference onMultimedia , 2020, pp. 2909–2917.[23] Peixiang Zhong, Di Wang, and Chunyan Miao, “Eeg-basedemotion recognition using regularized graph neural networks,”
IEEE Transactions on Affective Computing , 2020.[24] D. Zeng, K. Huang, C. Xu, H. Shen, and Z. Chen, “Hierarchygraph convolution network and tree classification for epilepticdetection on electroencephalography signals,”
IEEE Transac-tions on Cognitive and Developmental Systems , 2020.[25] Eric Jang, Shixiang Gu, and Ben Poole, “Categoricalreparameterization with gumbel-softmax,” arXiv preprintarXiv:1611.01144 , 2016.[26] Danielle S Bassett and Edward T Bullmore, “Small-worldbrain networks revisited,”
The Neuroscientist , vol. 23, no. 5,pp. 499–516, 2017.[27] Johannes Klicpera, Aleksandar Bojchevski, and StephanG¨unnemann, “Predict then propagate: Graph neuralnetworks meet personalized pagerank,” arXiv preprintarXiv:1810.05997 , 2018.[28] Thomas N Kipf and Max Welling, “Semi-supervised classi-fication with graph convolutional networks,” arXiv preprintarXiv:1609.02907 , 2016.[29] Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, NazaninAlipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg VerSteeg, and Aram Galstyan, “Mixhop: Higher-order graph con-volutional architectures via sparsified neighborhood mixing,” arXiv preprint arXiv:1905.00067 , 2019.[30] Fisher Yu and Vladlen Koltun, “Multi-scale contextaggregation by dilated convolutions,” arXiv preprintarXiv:1511.07122 , 2015.[31] Sheng Yang, Guosheng Lin, Qiuping Jiang, and Weisi Lin, “Adilated inception network for visual saliency prediction,”