[PDF] CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation

Abstract

Word embeddings can reflect the semantic representations, and the embedding qualities can be comprehensively evaluated with human natural reading-related cognitive data sources. In this paper, we proposed the CogniFNN framework, which is the first attempt at using fuzzy neural networks to extract non-linear and non-stationary characteristics for evaluations of English word embeddings against the corresponding cognitive datasets. In our experiment, we used 15 human cognitive datasets across three modalities: EEG, fMRI, and eye-tracking, and selected the mean square error and multiple hypotheses testing as metrics to evaluate our proposed CogniFNN framework. Compared to the recent pioneer framework, our proposed CogniFNN showed smaller prediction errors of both context-independent (GloVe) and context-sensitive (BERT) word embeddings, and achieved higher significant ratios with randomly generated word embeddings. Our findings suggested that the CogniFNN framework could provide a more accurate and comprehensive evaluation of cognitive word embeddings. It will potentially be beneficial to the further word embeddings evaluation on extrinsic natural language processing tasks.

Full PDF

CC OGNI

FNN: A F

UZZY N EURAL N ETWORK F RAMEWORK FOR C OGNITIVE W ORD E MBEDDING E VALUATION

A P

REPRINT

Xinping Liu

School of ICTUniversity of TasmaniaSandy Bay, TAS 7005 [email protected]

Zehong Cao ∗ School of ICTUniversity of TasmaniaSandy Bay, TAS 7005 [email protected]

Son Tran

School of ICTUniversity of TasmaniaNewnham, TAS 7248 [email protected]

September 25, 2020 A BSTRACT

Word embeddings can reﬂect the semantic representations, and the embedding qualities can becomprehensively evaluated with human natural reading-related cognitive data sources. In this paper,we proposed the CogniFNN framework, which is the ﬁrst attempt at using fuzzy neural networksto extract non-linear and non-stationary characteristics for evaluations of English word embeddingsagainst the corresponding cognitive datasets. In our experiment, we used 15 human cognitive datasetsacross three modalities: EEG, fMRI, and eye-tracking, and selected the mean square error andmultiple hypotheses testing as metrics to evaluate our proposed CogniFNN framework. Comparedto the recent pioneer framework, our proposed CogniFNN showed smaller prediction errors ofboth context-independent (GloVe) and context-sensitive (BERT) word embeddings, and achievedhigher signiﬁcant ratios with randomly generated word embeddings. Our ﬁndings suggested that theCogniFNN framework could provide a more accurate and comprehensive evaluation of cognitiveword embeddings. It will potentially be beneﬁcial to the further word embeddings evaluation onextrinsic natural language processing tasks.

Distributional word representations trained on large-scale corpora are widely used in modern natural language processing(NLP) systems, which aims to describe the meaning of words and sentences with vectorized representations (Turney andPantel, 2010). Recent studies (Peters et al., 2018, Devlin et al., 2019, Yang et al., 2019) addressed the state-of-the-artword embedding performance on various NLP tasks, where start to focus on how to evaluate the performance betweendifferent word embeddings accurately. However, Tsvetkov et al. (2015) and Chiu et al. (2016) have demonstrated thateven for the same word embedding, most of the existing evaluation methods do not provide the constantly correlativeresults between intrinsic evaluation and extrinsic evaluation. Therefore, evaluating the performance of word embeddingswith a uniﬁed metric is challenging in NLP tasks.Hollenstein et al. (2019) proposed a new evaluation framework called CogniVal, which applied traditional neuralnetworks for regression and considered both intrinsic and extrinsic measurements based on collected human naturallanguage processing-related cognitive data sources across three modalities: electroencephalography (EEG), functionalmagnetic resonance imaging (fMRI), and eye-tracking. CogniVal is potentially identiﬁed as a pioneer of multi-modalcognitive word embedding evaluation framework, which conducts vectorized word embeddings evaluation by predictinghow much they reﬂect the semantic representations against cognitive data sources that recorded when human processingnatural language.However, CogniVal framework ignored to measure some characteristics of human physiological signals. Speciﬁcally,all three modalities (i.e. EEG, fMRI and eye-tracking) of cognitive data used in their experiment featuring with ∗ Corresponding author. a r X i v : . [ c s . A I] S e p PREPRINT - S

EPTEMBER

25, 2020non-stationary and non-linear motions (Penny and Henson, 2006, Zhu et al., 2006). Inspired by Zekri et al. (2008),Bodyanskiy and Vynokurova (2013), we assume that neural networks and fuzzy systems as computational intelligencemethods are suitable tools for modelling expert knowledge and dealing with uncertain non-linear processes or non-stationary time series in a dynamic system, because approximate reasoning characteristics of fuzzy systems couldpresent a practical model to handle uncertainty and disturbances in real data for complex hybrid non-linear or non-stationary problems (Kharazihai Isfahani et al., 2019). For this reason, we proposed a fuzzy-based neural network (FNN)framework for evaluating word embeddings with cognitive datasets, name CogniFNN, which expects to enhance thequality of evaluating the performance of word embeddings with cognitive data sources (i.e. more accurate predictionsbetween word embeddings and cognitive language processing signals), and achieve a higher ratio of signiﬁcant resultswith random word embeddings as well.

Contributions

The main contributions of our study are shown as follows: • We developed a new cognitive word embedding evaluation framework called CogniFNN, which is the ﬁrstattempt at fuzzy neural networks to evaluate cognitive-based word embeddings against multi-modal 15 humanphysiological data sources. • Compared to the recent pioneer cognitive word embedding evaluation framework: CogniVal, our proposedCogniFNN framework presents smaller prediction errors of both context-independent (GloVe) and context-sensitive (BERT) word embeddings, which means our framework provides a more accurate word embeddingsevaluation with cognitive data sources. • By setting random word embeddings, our proposed CogniFNN framework achieves higher signiﬁcant ratiosin the number of hypotheses on most of the data sources, compared to that of the CogniVal framework,which means our framework provides a more comprehensive word embeddings evaluation with cognitive datasources.

Mitchell et al. (2008) initiated introduced a neural based computational model to predict the fMRI activation whensubjects are given the representation of word stimuli. Following this work, Babaeian Jelodar et al. (2010) proposed asimilar model to Mitchell et al., but used different word embedding (i.e. WordNet) to solve the ambiguity issues infMRI dataset and improves the accuracy of processing cognition-language data. Later on, Wehbe et al. (2014) haveconducted an extensive study on evaluating the performance of brain activation patterns at sentence level rather than anisolated word, and Fernandino et al. (2015) proposed a multiple regression model with sensory-motor experience basedattributes as elements of the word vector to predict neural activation pattern for lexical concepts. Moreover, Søgaard(2016) has used the eye-tracking data source which is another modality of cognitive data to evaluate word embeddingsagainst continuous text stimuli along with the fMRI data.More recently, as the success of the neural network based approach for learning word representations, a study of whetherword embedding models might simulate in part how the human brain process natural language has become a trend.Hence, Anderson et al. (2017) proposed a deep convolutions neural networks model to evaluate the prediction of brainactivation patterns, which was using Word2Vec as word embedding to compare the text-based word representationwith image-based models. However, the lack of proper training data has become a signiﬁcant reason why evaluatingvector-space based word embedding models by using human cognitive data source has not been popularized so far(Bakarov, 2018), which means these related works mentioned above mainly focus on the single modality of recordingsignals from a small individual cognitive data source, without the universality of the word embeddings evaluationframework. To solve this problem, Hollenstein et al. (2019) developed CogniVal, a neural network based regressionmodel pioneered predicting cognitive language processing data against various modalities of recording human signalsEEG, fMRI, and eye-tracking. Furthermore, the CogniVal is used to evaluate the ability of how well embeddings canpredict human processing data against various modalities of recording human signals (i.e. EEG, fMRI, and eye-tracking)to counteract the noisiness of the data.However, these approaches above mostly focused on the collection or integration of related cognitive datasets, and noneof the them tried to solve the non-linear and non-stationary problems of these signals. Hence, in this work we developeda CogniFNN framework to extract non-linear and non-stationary characteristics of human language processing-relatedphysiological signals. To the best of our knowledge, this is the ﬁrst attempt at using fuzzy neural networks to improvethe comprehensive evaluation of cognitive wording embeddings.2

PREPRINT - S

EPTEMBER

25, 2020

For the purpose of accurate and comprehensive evaluation of cognitive word embeddings, we evaluated the vectorizedword representations generated from embedding language models against the corresponding cognitive data cross threemodalities: EEG, fMRI and eye-tracking on our proposed CogniFNN framework.In respect to the architecture of the CogniFNN, it consists of ﬁve layers: the input layer, the fuzzy layer, the normalizedlayer, the weighted layer, and the ﬁnal output layer. Additionally, the fuzzy-related algorithm in the fuzzy layer is basedon ellipsoidal basis function (EBF) (Leng et al., 2005) and Takagi-Sugeno (TS) type fuzzy model (Takagi and Sugeno,1985). We train our framework with r input dimensions using sof tmax activation, u neurons in the fuzzy layer, u neurons in the normalized layer, u neurons in the weighted layer, and the ﬁnal output layer of n neurons using linear activation, where r is the number of dimensions for a word vector from a word embedding type, for instance, r willequal 1024 when the input word vector comes from pre-trained BERT-large word embedding model, and n changeswith the dimension of the cognitive data source to be predicted. The value of n will be the same as the dimension of thecognitive data feature when predicting the cognitive data sources, e.g. the value of n will be the same as the numberof electrodes in the EEG data sources or the same as the number of voxels in the fMRI data sources, while n will be1 if the fuzzy neural network predicts single eye-tracking features (Hollenstein et al., 2019). Figure 1 illustrates thearchitecture of this fuzzy neural network, and the pseudo code is shown in Algorithm 1.Figure 1: The Architecture of the CogniFNN Framework.In the ﬁrst layer (the input layer), each neuron represents an input variable, x i ( i = 1 , , ..., r ) , where r is the dimensionof word vectors extracted from the selected word embedding. For accommodating as many dimensions as possible inthe FNN model, sof tmax activation was employed in the input layer.In the second layer (the fuzzy layer), each neuron was coupling on a T-norm of Gaussian fuzzy membership function(MF) (Leng et al., 2005), representing the premise of a fuzzy rule, and the outputs of fuzzy neurons were computed bythe products of the grades of MFs as follows: µ ij = exp[ − ( sof tmax ( x i ) − c ij ) σ ij ] (1) φ j = exp[ − r (cid:88) i =1 µ ij ] (2)where µ ij represents the i th MF in j th neuron which is the premise of the fuzzy rule j ( j = 1 , , ..., u ) , and u is thetotal number of neurons. Furthermore, c ij , σ ij represent the center and the width of i th MF in j th neuron, respectively. φ j is the output for j th neuron of the fuzzy layer, which is the product of MFs.3 PREPRINT - S

EPTEMBER

25, 2020In the third layer (the normalized layer), the number of neurons is the same as that from the previous fuzzy layer, andthe output for each neuron from this layer was computed as follows: ψ j = φ j (cid:80) uk =1 exp[ − (cid:80) ri =1 ( x i − c ik ) σ ik ] (3)where u is the total number of neurons.For the fourth layer (the weighted layer), each neuron has two inputs: one is the output of the corresponding neuron inthe previous normalized layer (i.e. ψ j ), and another is the weighted bias w j : A j = [ a j , a j , ..., a jr ] (4) B = [1 , x , x , ..., x r ] T (5) w j = A j · B = a j + a j x + ... + a jr x r (6)where A j is the set of parameters related to the consequent of the fuzzy rule j , B is the the bias from weighted layer.The output for each neuron from weighted layer was computed as follows: f j = w j ψ j (7)For the ﬁfth layer (the ﬁnal output layer), neurons are represented as an output variable y t , t = 1 , , ..., n , where n is the dimension of cognitive features of the cognitive data source. Also, the linear activation was selected for theregression, and the output variables (predicted cognitive features) from the output layer is computed as follows: y j = u (cid:88) j =1 linear ( f j ) (8)Finally, the predicted results are compared with ground truth cognitive data by calculating mean squared error (MSE),averaged through all predicted words. 4 PREPRINT - S

EPTEMBER

25, 2020

Algorithm 1

CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation

Require: word vector x = [ x , x , ...x r ] with r dimension Ensure: predicted corresponding cognitive features y = [ y , y , ..., y n ] with n dimension function FNN_P

REDICTION ( x i , r, n ) x i ← Sof tmax ( x i ) (cid:46) Input Layer using SoftMax activation µ ij ← exp[ − ( x i − c ij ) σ ij ] , i = 1 , , .., r, j = 1 , , ..., u (cid:46) µ ij is the i th Membership Function (MF) in j th neuron, which is the premise of the fuzzy rule j (cid:46) c ij is the center of i th MF in j th neuron (cid:46) σ ij is the width of i th MF in j th neuron (cid:46) each MF is Gaussian function φ j ← exp[ − (cid:80) ri =1 µ ij ] , j = 1 , , ..., u (cid:46) φ j is the output for j th neuron from Fuzzy Layer, which is the product of MFs ψ j ← φ j (cid:80) uk =1 exp[ − (cid:80) ri =1 ( xi − cik )22 σ ik ] , k = 1 , , ..., u (cid:46) ψ j is the output for j th neuron from Normalized Layer, where u is the total number of neurons B = [1 , x , x , ..., x r ] T (cid:46) B is the the bias from Weighted Layer A j = [ a j , a j , ..., a jr ] (cid:46) A j is the set of parameters related to the consequent of the fuzzy rule j w j = A j · B = a j + a j x + ... + a jr x r , j = 1 , , ..., u (cid:46) w j is the weighted bias from Weighted Layer, which is the consequent of the j th fuzzy rule f j ← w j ψ j (cid:46) f j is the output for j th neuron from Weighted Layer y j ← Linear ( (cid:80) uj =1 f j ) (cid:46) y j is the value of an output variable from Output Layer using Linear activation return y = [ y , y , ..., y j , ..., y n ] end function In this study, the evaluation frameworks were estimated on the most representative pre-trained word embedding languagemodels: GloVe and BERT, representing context-independent and context-sensitive embeddings respectively, against 15cognitive data sources across three modalities: EEG, fMRI, and eye-tracking.

Word embeddings

A total of 2 word embedding models representing context-independent and context-sensitiveembeddings separately were using in our experiment as the inputs for CogniFNN: • GloVe (Pennington et al., 2014) is proposed for Global Vectors Word Representation, which is used to directlycapture the global statistics of the corpus. GloVe provides embedding of four different dimensions (see Table 1for an overview of its dimensions) which are trained on aggregated global word-word co-occurrence statisticson a 6 billion-character-scaled corpus. 5

PREPRINT - S

EPTEMBER

25, 2020 • BERT is an embedding with contextual and bidirectional word representations (Devlin et al., 2019), whichaims to pre-train deep bidirectional word representations from unlabeled text through conditional calculationsshared in left and right contexts. Therefore, the pre-trained BERT model requires only an additional outputlayer to be ﬁne-tuned to generate the latest models for various natural language processing tasks. It is worthmentioning that the pre-training of BERT is carried out in a large corpus containing unlabeled text on the entireWikipedia (full 2.5 billion words), and a book corpus (800 million words). The related word representationswhich used in our experiment are retrieved from the second-to-last hidden layer of BERT-base and BERT-largemodels, respectively. embeddings dim. num of neurons

Glove 50 [ , 26, 20, 5]Glove 100 [ , 30]Glove 200 [ , 50]Glove 300 [ , 50]BERT 768 [ , 400, 200]BERT 1024 [ , 600, 200]Table 1: Overview of word embeddings evaluated with CogniFNN. The last column shows the search space of the gridsearch for the number of neurons in the hidden layer, and the optimal parameters are highlighted with bold fonts Cognitive data sources

A total of 15 cognitive data sources across three modalities (EEG, fMRI and eye-tracking)were used in our experiment: • EEG - 4 data sources:

ZUCO (Hollenstein et al., 2018), NATURAL SPEECH (Broderick et al., 2018), N400(Broderick et al., 2018), and UCL (Frank et al., 2015) were selected as EEG datasets which were collectedeither from when subjects reading sentences or listening to natural speech. • fMRI - 4 data sources: HARRY POTTER (Wehbe et al., 2014), ALICE (Brennan et al., 2016), PEREIRA(Pereira et al., 2018), and NOUNS (Mitchell et al., 2008) were selected as fMRI datasets with 1000 voxels, andone scan of fMRI covers multiple words with continuous stimuli such as natural reading or storey listening. • Eye-tracking - 7 data sources:

DUNDEE (Kennedy et al., 2003), UCL (Frank et al., 2013), CFILT-SARCASM (Mishra et al., 2016), CFILT-SCANPATH (Mishra and Bhattacharyya, 2018), PROVO (Lukeand Christianson, 2018), GECO (Cop et al., 2017), and ZUCO (Hollenstein et al., 2018) were selected aseye-tracking datasets. Moreover, these 7 eye-tracking datasets were collected when participants conduct anormal, self-running reading, and every single dataset provides different features for eye tracking period ofﬁrst ﬁxation, period of ﬁrst pass, mean duration of ﬁxation, total duration of ﬁxation and number of ﬁxationswhich were recorded from the self-paced or natural reading. CogniVal framework (Hollenstein et al., 2019): it is identiﬁed as the recent pioneer of cognitive English wordembeddings evaluation, based on a traditional neural network regression model, i.e. a three-layer multipleregression model to predict human cognitive features from corresponding cognitive word embeddings.2)

Random embeddings : they only include random vectors for each word of the dimensions that the samenumber of corresponding evaluated pre-trained word embeddings and without speciﬁc context measurements.These random word embeddings are used as baselines for multiple hypotheses testing in CogniVal or CogniFNNframework.

Evaluation Metrics Cognitive data prediction : the predicted outcomes from CogniVal framework or our proposed CogniFNNframework were compared to the ground truth cognitive features of cognitive data sources. The predictederrors were calculated by mean squared error (MSE), averaged through all predicted words.2)

Multiple hypotheses testing : a hypothesis consists of comparing the combination of an embedding type anda cognitive data source to the random word embedding. Then, Wilcoxon signed-rank test (Dror et al., 2018a)was performed for each hypothesis, and the conservative Bonferroni correction was applied to counteract6

PREPRINT - S

EPTEMBER

25, 2020the multiple hypotheses. The global null hypothesis will be rejected if p < α/N , where α = 0 . , and N is the number of hypotheses (Dror et al., 2018b) (i.e. N = 4 for EEG, N = 59 for fMRI, and N = 42 foreye-tracking). In this paper, we tune our models on the development set and use a grid search to determine the optimal parameters. Theloss function optimizes the MSE and we use Adam optimizer with a learning rate of 0.001. The 5-fold cross validationwas performed for every single model, i.e. 4/5 of data is used for training and 1/5 for testing. We select the Gaussianmembership degree (i.e. premise threshold) among [0.0677, , 0.2031, 0.2708], the percentage of samples among[0.65, 0.75, 0.85, ], and the batch size among [4, 8, 16, 32, 64, ]. The optimal parameters are highlighted withbold fonts. For the parameter, the number of neurons, which is selected individually for each combination of cognitivedata source and embedding type, please see Table 1 for details on the search space for it. The predicted results aremeasured with the MSE, averaged over all predicted words. We also optimize the initial width ( σ = 4 . for all MFs inneurons to correct the deviation from the optimal value. In this section, we illustrate the evaluation results of our proposed CogniFNN versus the CogniVal framework onGloVe (in 50, 100, 200, 300 dimensions) and BERT (in 768, 1024 dimensions) word embeddings against 15 cognitivedatasets mentioned in Section 4.1. We also presented the outcomes of multiple statistical signiﬁcance testing where eachhypothesis was compared with the random word embeddings in our proposed CogniFNN or the CogniVal framework.Based on performance metrics, we could observe that:

Mean squared errors

Table 2 and table 3 showed the context-independent (GloVe) and context-sensitive (BERT)word embeddings evaluation with the prediction errors (i.e. MSEs) of both CogniVal and CogniFNN frameworksbased on 15 cognitive data sources. Compared to the CogniVal framework, our proposed CogniFNN frameworkachieved smaller MSEs on cognitive GloVe and partial BERT word embeddings, which means our CongiFNN has abetter prediction performance directing at EEG, fMRI and eye-tracking cognitive features so that it provides a moreaccurate word embeddings evaluation with cognitive datasets. Furthermore, we presented the averaged MSEs where theCogniFNN framework can handle cognitive evaluations better with smaller averaged MSEs of GloVe and BERT wordembeddings overall, relative to the CogniVal framework.

Embeddings GloVe-50 GloVe-100 GloVe-200 GloVe-300Modality-Dataset Frameworks

CogniVal CogniFNN (Ours) CogniVal CogniFNN (Ours) CogniVal CogniFNN (Ours) CogniVal CogniFNN (Ours)EEG-N400 0.067

EEG-NATURAL SPEECH 0.013

EEG-ZUCO 0.009

EEG-UCL 0.031 fMRI-HARRY POTTER 0.005 fMRI-NOUNCS 0.204 fMRI-ALICE 0.036 fMRI-PEREIRA 0.044

Eye-Tracking-GECO 0.010

Eye-Tracking-ZUCO 0.008

Eye-Tracking-PROVO 0.031

Eye-Tracking-DUNDEE 0.010

Eye-Tracking-SARCASM 0.016

Eye-Tracking-SCANPATH 0.023

Eye-Tracking-UCL 0.044

Average 0.037

Table 2: Absolute mean squared errors (MSEs) across all cognitive data sources for context-independent wordembeddings: GloVe. The preﬁx of each data source indicates their modality: EEG, fMRI, or eye-tracking, and the lastrow of this table shows the averaged MSEs of all cognitive data sources in a single word embedding.

Signiﬁcant results

Table 4 illustrates the ratios of signiﬁcant results under the Bonferroni correction to the totalnumber of hypotheses between a random and GloVe/BERT word embeddings. A higher ratio indicates the morecomprehensive word embeddings evaluation performance with designated cross-modalities cognitive datasets. Themajority of the results from both CogniVal and CogniFNN frameworks in the table are signiﬁcantly better than therandom word embeddings, but our proposed CogniFNN framework achieved higher signiﬁcant ratios in cognitiveGloVe and partial BERT word embeddings. Also, by using the CogniFNN framework, the total ratio of signiﬁcantresults was improved from 431/640 to 498/630. 7

PREPRINT - S

EPTEMBER

25, 2020

Embeddings BERT-base BERT-largeModality-Dataset Frameworks

CogniVal CogniFNN (Ours) CogniVal CogniFNN (Ours)EEG-N400 0.014

EEG-NATURAL SPEECH 0.005 0.005 0.007 0.007EEG-ZUCO 0.006 0.006 0.006 0.007EEG-UCL 0.006 0.016 0.010 fMRI-HARRY POTTER 0.001 0.012 0.001 0.012fMRI-NOUNCS 0.042 fMRI-ALICE 0.001 0.007 0.001 0.005fMRI-PEREIRA 0.012

Eye-Tracking-GECO 0.007

Eye-Tracking-ZUCO 0.003 0.004 0.004 0.005Eye-Tracking-PROVO 0.006 0.012 0.006 0.015Eye-Tracking-DUNDEE 0.008

Eye-Tracking-SARCASM 0.009 0.011 0.011 0.011Eye-Tracking-SCANPATH 0.006

Eye-Tracking-UCL 0.033

Average 0.011 0.011 0.013

Table 3: Absolute mean squared errors (MSEs) across all cognitive data sources for context-sensitive word embeddings:BERT. The preﬁx of each data source indicates their modality: EEG, fMRI, or eye-tracking, and the last row of thistable shows the averaged MSEs of all cognitive data sources in a single word embedding.

Modalities Embeddings Frameworks

CogniVal CogniFNN (Ours)EEG Random/GloVe-50 0/4

Random/GloVe-100 0/4

Random/GloVe-200 1/4

Random/GloVe-300 1/4

Random/BERT-base 4/4 4/4Random/BERT-large 4/4 4/4fMRI Random/GloVe-50 6/59

Random/GloVe-100 1/59

Random/GloVe-200 34/59

Random/GloVe-300 31/59

Random/BERT-base 59/59 54/59Random/BERT-large 59/59 52/59Eye-Tracking Random/GloVe-50 30/42

Random/GloVe-100 33/42

Random/GloVe-200 42/42 42/42Random/GloVe-300 42/42 42/42Random/BERT-base 42/42 40/42Random/BERT-large 42/42 40/42Total Random/(GloVe + BERT) 431/630

Table 4: The ratio of signiﬁcant results under the Bonferroni correction to the total number of hypotheses between arandom baseline and GloVe/BERT word embeddings.

In this paper, we proposed a CogniFNN framework using fuzzy-based neural networks to explore the non-linear and non-stationary characteristics of physiological signals for improving the evaluation performance of word embeddings againstcognitive datasets which recorded when subjects were understanding natural language (i.e. English). Our ﬁndingsshowed that CogniFNN achieved smaller prediction errors and higher signiﬁcant ratios on both context-independent(GloVe) and context-sensitive (BERT) word embeddings against 15 cognitive data sources across EEG, fMRI andeye-tracking. Our contributions could be a useful evaluation strategy which is beneﬁcial to the exhaustive investigationon word embedding evaluations with corresponding cognitive features.

References

Peter D. Turney and Patrick Pantel. From frequency to meaning vector space models of semantics.

Computation andLanguage , arXiv1003.1141, 2010. URL http//arxiv.org/abs/1003.1141 .Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer.Deep contextualized word representations.

Computation and Language , arXiv:1802.05365, 2018. URL https://arxiv.org/abs/1802.05365 . version 2. 8

PREPRINT - S

EPTEMBER

25, 2020Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectionaltransformers for language understanding.

Computation and Language , arXiv:1810.04805, 2019. URL https://arxiv.org/abs/1810.04805 . version 2.Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. Paws-x: A cross-lingual adversarial dataset for paraphraseidentiﬁcation.

Computation and Language , arXiv:1908.11828, 2019. URL https://arxiv.org/abs/1908.11828 .Yulia Tsvetkov, Faruqui Manaal, Ling Wang, Lample Guillaume, and Dyer Chris. Evaluation of word vector representa-tions by subspace alignment. In

Proceedings of the 2015 Conference on Empirical Methods in Natural LanguageProcessing , pages 2049–2054, 2015.Billy Chiu, Anna Korhonen, and Sampo Pyysalo. Intrinsic evaluation of word vectors fails to predict extrinsicperformance. In

Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP , pages 1–6,2016.Nora Hollenstein, Antonio de la Torre, Nicolas Langer, and Ce Zhang. Cognival: A framework for cognitive wordembedding evaluation. In

Proceedings of the 23rd Conference on Computational Natural Language Learning(CoNLL) , pages 538–549, 2019.William Penny and Richard Henson.

Statistical Parametric Mapping: The analysis of functional brain images . Elsevier,London, UK, 2006.Zhiwei Zhu, Qiang Ji, and K. P. Bennett. Nonlinear eye gaze mapping function estimation via support vector regression.In , volume 1, pages 1132–1135, 2006.Maryam Zekri, Saeed Sadri, and Farid Sheikholeslam. Adaptive fuzzy wavelet network control design for nonlinear sys-tems.

Fuzzy Sets and Systems , 159(20):2668 – 2695, 2008. ISSN 0165-0114. doi: https://doi.org/10.1016/j.fss.2008.02.008. URL .Theme: Fuzzy Control.Y. Bodyanskiy and O. Vynokurova. Hybrid adaptive wavelet-neuro-fuzzy system for chaotic time series identiﬁcation.

Information Sciences , 220:170 – 179, 2013. ISSN 0020-0255. doi: https://doi.org/10.1016/j.ins.2012.07.044. URL . Online FuzzyMachine Learning and Data Mining.Mohsen Kharazihai Isfahani, Maryam Zekri, Hamid Reza Marateb, and Miguel Angel Mañanas. Fuzzy jump waveletneural network based on rule induction for dynamic nonlinear system identiﬁcation with real data applications.

PLOS ONE , 14(12):1–26, 12 2019. doi: 10.1371/journal.pone.0224075. URL https://doi.org/10.1371/journal.pone.0224075 .Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, andMarcel Adam. Predicting human brain activity associated with the meanings of nouns.

Science , 320:1191–1195,2008.Ahmad Babaeian Jelodar, Mehrdad Alizadeh, and Shahram Khadivi. Wordnet based features for predicting brain activityassociated with meanings of nouns. In

Proceedings of the NAACL HLT 2010 First Workshop on ComputationalNeurolinguistics , pages 18–26, 2010.Leila Wehbe, Ashish Vaswani, Kevin Knight, and Tom Mitchell. Aligning context-based statistical models of languagewith brain activity during reading. In

Proceedings of the 2014 Conference on Empirical Methods in Natural LanguageProcessing (EMNLP) , pages 233–243, 2014.Leonardo Fernandino, Colin J Humphries, Mark S Seidenberg, William L Gross, Lisa L Conant, and Jeffrey R Binder.Predicting brain activation patterns associated with individual lexical concepts based on ﬁve sensory-motor attributes.

Neuropsychologia , 76:17–26, 2015.Anders Søgaard. Evaluating word embeddings with fmri and eye-tracking. In

Proceedings of the 1st Workshop onEvaluating Vector Space Representations for NLP , pages 116–121. Association for Computational Linguistics, 2016.ISBN 978-1-5108-2770-7. null ; Conference date 07-08-2016 Through 12-08-2016.Andrew J. Anderson, Douwe Kiela, Stephen Clark, and Massimo Poesio. Visually grounded and textual semantic modelsdifferentially decode brain activity associated with concrete and abstract nouns.

Transactions of the Association forComputational Linguistics , 5:17–26, 2017.Amir Bakarov. Can eye movement data be used as ground truth for word embeddings evaluation?

CoRR , abs/1804.08749,2018. URL http//arxiv.org/abs/1804.08749 .Gang Leng, Thomas Martin McGinnity, and Girijesh Prasad. An approach for on-line extraction of fuzzy rules using aself-organising fuzzy neural network.

Fuzzy Sets and Systems , 150(2):211–243, 2005.9

PREPRINT - S

EPTEMBER

25, 2020T. Takagi and M. Sugeno. Fuzzy identiﬁcation of systems and its applications to modeling and control.

IEEETransactions on Systems, Man, and Cybernetics , SMC-15(1):116–132, 1985.Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages1532–1543, 2014.Nora Hollenstein, Jonathan Rotsztejn, Marius Troendle, Andreas Pedroni, Ce Zhang, and Nicolas Langer. Zuco, asimultaneous eeg and eye-tracking resource for natural sentence reading.

Scientiﬁc Data , 5, 2018.Michael P. Broderick, Andrew J. Anderson, Giovanni M. Di Liberto, Michael J. Crosse, and Edmund C. Lalor.Electrophysiological correlates of semantic dissimilarity reﬂect the comprehension of natural, narrative speech.

Current Biology , 28(5):803–809, 2018.Stefan L. Frank, Leun J. Otten, Giulia Galli, and Gabriella Vigliocco. The erp response to the amount of informationconveyed by words in sentences.

Brain and Language , 140:1–11, 2015.Jonathan R. Brennan, Edward P. Stabler, Sarah E. Van Wagenen, Wen-Ming Luh, and John T. Hale. Abstract linguisticstructure correlates with temporal activity during naturalistic comprehension.

Brain and Language , 157-158:81–94,2016.Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J. Gershman, Nancy Kanwisher, MatthewBotvinick, and Evelina Fedorenko. Toward a universal decoder of linguistic meaning from brain activation.

NatureCommunications , 9:963, 2018.Alan Kennedy, Robin Hill, and Joel Pynte. The dundee corpus. In

Proceedings of the 12th European Conference onEye Movement , 2003.Stefan L. Frank, Irene Fernandez Monsalve, Robin L. Thompson, and Gabriella Vigliocco. Reading time data forevaluating broad-coverage models of english sentence processing.

Behavior Research Methods , 45(4):1182–1190,2013.Abhijit Mishra, Diptesh Kanojia, and Pushpak Bhattacharyya. Predicting readers’ sarcasm understandability bymodeling gaze behavior. In

Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence, February 12-17,2016, Phoenix, Arizona, USA , pages 3747–3753. AAAI Press, 2016.Abhijit Mishra and Pushpak Bhattacharyya.

Scanpath Complexity: Modeling Reading/Annotation Effort Using GazeInformation . Springer, Singapore, Singapore, 2018.Steven G. Luke and Kiel Christianson. The provo corpus: A large eye-tracking corpus with predictability norms.

Behavior Research Methods , 50(2):826–833, 2018.Uschi Cop, Nicolas Dirix, Denis Drieghe, and Wouter Duyck. Presenting geco: An eyetracking corpus of monolingualand bilingual sentence reading.

Behavior Research Methods , 49(2):602–615, 2017.Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. Replicability analysis for natural language processing:Testing signiﬁcance with multiple datasets.

Transactions of the Association for Computational Linguistics , 5:471–486,2018a.Rotem Dror, Gili Baumer, Segev Shlomov, and Roi Reichart. The hitchhiker’s guide to testing statistical signiﬁcance innatural language processing. In

Proceedings of the 56th Annual Meeting of the Association for Computational Linguis-tics (Volume 1: Long Papers) , pages 1383–1392, Melbourne, Australia, July 2018b. Association for ComputationalLinguistics. doi: 10.18653/v1/P18-1128. URL