[PDF] End-to-End Models for the Analysis of System 1 and System 2 Interactions based on Eye-Tracking Data

Abstract

While theories postulating a dual cognitive system take hold, quantitative confirmations are still needed to understand and identify interactions between the two systems or conflict events. Eye movements are among the most direct markers of the individual attentive load and may serve as an important proxy of information. In this work we propose a computational method, within a modified visual version of the well-known Stroop test, for the identification of different tasks and potential conflicts events between the two systems through the collection and processing of data related to eye movements. A statistical analysis shows that the selected variables can characterize the variation of attentive load within different scenarios. Moreover, we show that Machine Learning techniques allow to distinguish between different tasks with a good classification accuracy and to investigate more in depth the gaze dynamics.

Full PDF

EEnd-to-End Models for the Analysis of

System 1 and

System 2

Interactionsbased on Eye-Tracking Data

Alessandro Rossi , Sara Ermini , Dario Bernabini , Dario Zanca ,Marino Todisco , Alessandro Genovese , and Antonio Rizzo DISPOC, University of Siena DSMCN, University of Siena AIDILAB, Siena, Italy

Abstract

While theories postulating a dual cognitive system take hold, quantitative conﬁr-mations are still needed to understand and identify interactions between the twosystems or conﬂict events. Eye movements are among the most direct markers ofthe individual attentive load and may serve as an important proxy of information.In this work we propose a computational method, within a modiﬁed visual versionof the well-known Stroop test, for the identiﬁcation of different tasks and potentialconﬂicts events between the two systems through the collection and processingof data related to eye movements. A statistical analysis shows that the selectedvariables can characterize the variation of attentive load within different scenar-ios. Moreover, we show that Machine Learning techniques allow to distinguishbetween different tasks with a good classiﬁcation accuracy and to investigate morein depth the gaze dynamics.

Keywords:

System 1 and System 2, Eye-tracking, Data Analysis, Machine Learn-ing, Classiﬁcation, Stroop test

Viewing is a complex activity, involving cognitive aspects, conscious and unconscious.It manifests itself through motor behavior that acquires salient information in the formof light radiation. When observing static images, this attentive activity exhibits rapideye movements called saccades , occurring between the so-called ﬁxations . Duringﬁxations, the eye remains ﬁxed and the information is sampled. It is well known that thecognitive load of individual tasks may inﬂuence eye movements statistics (McMains &Kastner, 2009; Connor, Egeth, & Yantis, 2004; Mathˆot, 2018), and in particular somevariables like average ﬁxation duration , saccade length or saccade velocity , among1 a r X i v : . [ q - b i o . N C ] F e b thers. For this reason, it seems reasonable to deﬁne techniques based on eye-trackingdata in order to recognize recurring patterns related to the visual attention and identifythe task that the subject is performing (Klingner, 2010; Zagermann, Pfeil, & Reiterer,2016). Indeed, it has already been observed that the variation of the attentive loadwithin different tasks affects the eye movements (Castelhano, Mack, & Henderson,2009; Tanaka, Inuzuka, & Hirayama, 2019).In this work we analyzed the vision behaviour of subjects involved in a Strooptest (Stroop, 1935) while performing different visual tasks, naming and reading, inorder to explore possible effects on the human attention. The exploratory patterns areexpressed through variables related to eye ﬁxations and saccadic movements, sincethey are both inﬂuenced by processing difﬁculty (Pollatsek, Rayner, & Balota, 1986).The execution of each task requires to the subject different attention loads: reading isperformed as a fast and automatic process, while naming the color of a word writtenwith an ink color mismatching its semantics is a slow conscious activity (Kahneman,2011). The delay in naming colors of incompatible words has been described as aconﬂict between the System 1 and the System 2 (Kahneman, 2011). This phenomenonis well-known in experimental psychology and several methods have been developedto test and measure it (Jensen, 1965; Dalrymple-Alford & Budayr, 1966; Bench etal., 1993). To this aim, we set up a visual version of the Stroop test during whichwe recorded the eye movements of 64 subjects, following the experimental protocoldeﬁned in (Megherbi, Elbro, Oakhill, Segui, & New, 2018). The experiment involvetwo different tasks, deﬁned as Naming and Reading, and two conditions, deﬁned as”With Interference” and ”Without Interference”.The goals are (1) to verify the presence of recurrent visual behavioural patternsfor different tasks and conditions through a statistical data analysis and (2) to generateautomatic models which are able to identify in which task or condition the subjectcurrently is involved.The paper is organized as follows. The section ”Method” describes the experimen-tal protocol set up for stimuli presentation and data collection. In the section ”Experi-ments” we provide a detailed description of the data pre-processing, Machine Learningtechniques and metrics for evaluation of the results. Finally, in the ”Conclusions” wediscuss results and suggest direction for future works. We set up a visual Stroop test to record eye movements of 64 Italian subjects (32 fe-males and 32 males, average age = 30,2 ± Reading Without Interference (RWoI) - Participants had to read the words onscreen. The words ”ROSSO” (”red”), ”GIALLO” (”yellow”), ”VERDE” (”green”)and ”BLU” (”blue”) were all colored black. • Reading With Interferences (RWI) - Participants had to read the words on screen.The words ”ROSSO”, ”GIALLO”, ”VERDE” and ”BLU” were coloured red,yellow, green and blue, with a mismatching between the shade used and themeaning of the word (e.g. ”ROSSO” was never coloured in red). • Naming With Interference (NWoI) - Participants had to name the color of thewords on screen. In this case, the Latin letters were replaced by pseudo-lettersconstructed to match the real letters physical properties (height, number of pix-els, and contiguous pixels) by reconﬁguring their original characteristics (Megherbiet al., 2018). The pseudo-words were colored red, green, yellow and blue. • Naming With Interferences (NWI) - Participants had to name the color of thewords on screen. The composition of the screen followed the same principlesused for the construction of the RWI condition.The presentation order of the conditions were randomized balancing the set of pos-sibilities among participants. Eye movements were recorded by an EyeLink PortableDuo set to 500 Hz sampling rate and in Head-Free mode. Images were presented ona 17 inches display (1920 x 1080 pixels), placed perpendicularly in front of the partic-ipant at a distance from the eyes ranging in 46-52 cm. The experiment was executedindividually only once and subjects unable to correctly read words and instructions onthe screen were discarded. The light in the room and external noise are controlled bythe experimenter. Written instructions are presented to the participants, followed by anoral explanation. The experiment is preceded by an initial unrecorded trial and a stan-dard 5-point calibration. Between the instructions and the stimulus screens, a whitescreen containing a circular trigger located at the top-left corner of the task image werepresented. Each trial began when the participant ﬁxated the trigger for at least 100 ms.The trial is completed when the participant press the space key. During the execution,the experimenter annotated on an Excel spreadsheet any relevant information regardingthe experience of the subject and possible technical issues (e.g. if the ﬁrst calibrationfailed). Variables about eye movements were extracted by the software released with the eye-tracker device . However, we found that a further cleaning process was necessaryto improve the data quality. Fixations that fall outside of the areas of interest werediscarded, since they could be either due to an instrumental artifact or to a subject’sactivity which is not related to the task. WoI vs. RWoI NWI vs. NWoI RWI vs. RWoINumber of ﬁxations 0.285402 0.000002 0.088730Average ﬁxation length 0.000052 0.027428 0.015745Maximum ﬁxation length 0.000606 0.035081 0.008078Horizontal regressions 0.079062 0.000005 0.144568Vertical regressions 0.632652 0.000001 0.214819 p -values Table 1: Signiﬁcance p -values generated by the one-way ANOVA on ﬁxations vari-ables when comparing three pairs of tasks: NWoI vs. RWoI, NWI vs. NWoI andRWI vs. RWoI. For each subject and stimulus (NWI, NWoI, RWI, RWoI), a set of statistical featuresrelated to eye movements were extracted: • Number of ﬁxations: total number of ﬁxations. • Average ﬁxation length: the average of the duration among all the ﬁxations. • Maximum ﬁxation length: the maximum of the duration among all the ﬁxations. • Horizontal/Vertical regressions: the number of times that the eyes step backwardin their horizontal/vertical path (assumed left to right and up to down respec-tively), excluding the changes of line in the horizontal counting. • Up/Down/Left/Right Frequency: the counting of saccadic movements in eachdirection, normalized by the total number of saccades. • Minimum/Average/Maximum saccade duration: statistics about the duration ofeach saccade. • Minimum/Average/Maximum saccade velocity: statistics about the estimatedvelocity of each saccade. • Minimum/Average/Maximum saccade amplitude: statistics about the amplitudeof each saccade (in degrees of visual angle). • Minimum/Average/Maximum saccade distance: statistics about the distance ofeach saccade (in degrees of visual angle). • Minimum/Average/Maximum saccade slope: statistics about the slope of eachsaccade with respect to the horizontal axis.A ﬁrst statistical hypothesis testing was performed by a standard one-way ANOVA(Fisher, 1992). The purpose was to assess the representativeness of the variables withindifferent tasks and conditions. For each variable we analyzed the differences in thegenerated statistical distributions when comparing two different conditions, in order toverify the conﬁdence that the considered measurement is sampled from two different5opulations from the compared groups. The test is repeated for three comparisons:NWoI vs. RWoI, NWI vs. NWoI and RWI vs. RWoI. Since the difference betweenNaming and Reading tasks is proven in literature (Kahneman, 2011), the ﬁrst test isa sort of control for the whole experimentation setting. However, we believe that ex-ploring connections with visual attention and eye movements could be of interest too.The second and third comparisons are the focus of the presented work in order to assesif different attention levels, required to perform two different tasks, can be caught byvariables related to eye movements. In Table 1 we report the signiﬁcance values p ob-tained for in each comparison involving the variable about ﬁxations. As we can see,the differences between Naming and Reading tasks are well represented by statisticsabout the duration of ﬁxations (both average and maximum). In the second test, theeffects of interference in Naming is highly expressed by the number of ﬁxations andthe eye regressions in both axis. This results appeared to be in agreement with the lit-erature, since saccades regressions are found to be more frequent and larger when thereader encounter some difﬁculties (Pollatsek et al., 1986; Murray & Kennedy, 1988).The results in this ﬁrst two tests was also conﬁrmed by subjects general interviews, inwhich they admitted to perceive the Naming task as highly counterintuitive, especiallyin presence of interference. On the other hand, they conﬁrmed to perceive the Readingtask as more trivial, with little additional difﬁculty introduced by interference. Thisperception is also in agreement with our results, since the p -values for the RWI vs.RWoIare in general higher with respect to the other tests. Nevertheless, we observedthat variables related to saccadic movements produce a lower level of signiﬁcance whenconsidered alone. Indeed, the minimum signiﬁcance value p = .

005 was achieved byAverage Saccades Duration when comparing NWI vs. NWoI, but in few other caseswe obtained a p < . Even if we are analyze ﬁxations, their positions and, hence, possible regressions are directly related tosaccadic movements.

Fix (features related to ﬁxations),

Saccades (features related to saccades),

Fix+Saccades (features related to ﬁxations and saccades),

Fix+Saccades-norm (features related toﬁxations and saccades with subject-wise normalization). The dashed red line indicatesthe score of the random baseline.tasks, and subtracted it to the original values. These should mitigate individual effectson each tasks, and improve the ﬁnal representativeness of the variables.We repeated the same tests investigated within the statistical analysis by settingup three separated binary classiﬁcation test: NWoI vs. RWoI, NWI vs. NWoI andRWI vs. RWoI. We avoid a global 4-class test since the dynamic of the tasks aretoo complex to be modeled by a such small numbers of samples. We exploited theScikit-learn (Pedregosa et al., 2011) Python software package to test four differentclassiﬁers (Bishop, 2006):i. Logistic Regression (Logistic) is a statistical model that in its basic form uses alogistic function, applied to a weighted average of the input features, to model abinary dependent variable (the model prediction);ii. Support Vector Machines (SVM) are supervised learning models used for binaryclassiﬁcation. SVMs can learn non-linear separation surfaces by means of the so-called kernel trick, implicitly mapping their inputs into high-dimensional featuresspace;iii. Random Forests (RF) are an ensemble of decision trees based on bootstrapping.Different models are trained on a subset of samples and the ﬁnal decision is takenby majority voting.iv. Artiﬁcial Neural Networks (ANN) are a well known class of learning algorithmsinspired by the biological neural networks; they are based on a collection of unitsor nodes, called artiﬁcial neurons, connected by edges which represent the ﬂow ofinformation; edges are in fact numbers and represent the parameters of the model,7ypically learned by the back-propagation of an error signal with respect to thetarget.Our goal is to demonstrate that interactions between the two cognitive systems, System1 and System 2, affect the gaze dynamics and can be detected by machine learningalgorithms based on eye-related features. Eventually, we would like to ﬁnd out thatclassiﬁcation performances are consistently better than a random baseline in order toprove that the populations are intrinsically distinct. We performed a 5-fold Cross-Validation for each classiﬁers and computed the average of achieved F1-scores. Thisshould guarantee that results do not depend on the choice of the test set, even if therelative high variability presented depends on the small size of the test (one singlesample which is not classiﬁed correctly heavily affect the results).To improve the performances of Machine Learning algorithms, features were nor-malized in [ , ] (we found this method to slightly outperform z-normalization in ourcase). In addition, to investigate more in depth the information expressed by the fea-tures, we generated four sets of variables that we tested independently. • Fix.

It was composed by the variables extracted from ﬁxations, when ﬁlteringout ﬁxations shorter than 200 ms. • Saccades.

It was composed by the variables extracted from saccades, aimed atcapturing gaze dynamics and visual exploration schemes. • Fix+Saccades.

It is composed both from ﬁxations and saccades features. • Fix+Saccades-norm.

Since variables related to eye-movements are character-ized by an strong inter-subject variability, for each subject we computed andsubtracted the mean of each variable to generate the third set of features. Theseprocess was aimed at centering the variables distributions related to each sub-ject around zero, translating the comparison among different subjects in a morecomparable scale.F1-scores achieved within the tests are reported in Fig. 2. All the classiﬁers andfeatures pairs are signiﬁcantly above the random baseline, and in best cases above 0.8.These results corroborated the statistical analysis and the hypothesis that attention levelinﬂuences gaze dynamics and those differences. Furthermore, these connections canbe captured by an automatic classiﬁer even at a small scale (i.e. with few training sam-ples). Interestingly, a global trend is observed while considering different sets of inputfeatures. Combining features about ﬁxations and saccades brings an improvement onthe performances of each classiﬁer, compared with the case in which separated featuresare exploited. This result also connects the attentive load to different exploration strate-gies of the visual scene. As already said, information about backward saccades, areconnected to more complex types of reasoning, typical of System 2, which are leadedby a need re-analysis or re-sampling of already visited portions of the scene (Pollatseket al., 1986; Murray & Kennedy, 1988). Moreover, a strong improvements have beenachieved by applying a subject-wise normalization. This conﬁrms that the analyzedscenario is highly affected by personal behaviors, but we showed that these effects canbe mitigated by the application of standard statistical techniques. In general, we could8bserve that the Random Forests achieved performances which are considerably worsewith respect to the others algorithms, sometimes even close to the random baseline.This could be due to the fact that decision tree are not capable to extract high-levelcorrelations among variables, but most of all that the random sub-sampling negativelyemphasize the inter-subject variability.

The experimental results supported the hypothesis (1) that different attentive loadspresent recurrent visual behaviors that can be characterized by a statistical analysisof variables related to eye ﬁxations. Furthermore, these patterns can be modeled (2) bydata-driven Machine Learning algorithms which are able to identify, with reasonableaccuracy, the different conditions in which individuals are involved. We show that situ-ations of conﬂict between System 1 and System 2 are captured by the gaze data and thestatistical variables analyzed. The combining of features related to both ﬁxations andsaccades increases the accuracy of the classiﬁers. This suggests that subjects, amongdifferent tasks, use to implement task-speciﬁc schemes to regulate their gaze dynamics.We found that the exploited normalization techniques is useful when addressing wideinter-subject variability to improve the comparison among different individuals. How-ever, these issues could be addressed more effectively by a large scale data collectionto obtain more versatile Machine Learning models and more reliable results.Future research directions could include the integration in the analysis data relatedto pupillary response, since they are already proven to be connected to attentive andcognitive load (Klingner, 2010; Mathˆot, 2018). This could help to explain more indepth connections among visual attention and eye movements, but also to develop morerobust practical scenarios. Indeed, similar analysis turn out to be useful in applica-tions such as monitoring attentive state of drivers (Palinko, Kun, Shyrokov, & Heeman,2010) or understanding truth telling and deception (Wang, Spezio, & Camerer, 2010).

References

Bench, C., Frith, C., Grasby, P., Friston, K., Paulesu, E., Frackowiak, R., & Dolan,R. J. (1993). Investigations of the functional anatomy of attention using the strooptest.

Neuropsychologia , (9), 907–922.Bishop, C. M. (2006). Pattern recognition and machine learning . springer.Castelhano, M. S., Mack, M. L., & Henderson, J. M. (2009). Viewing task inﬂuenceseye movement control during active scene perception.

Journal of vision , (3), 6–6.Connor, C. E., Egeth, H. E., & Yantis, S. (2004). Visual attention: bottom-up versustop-down. Current biology , (19), R850–R852.Dalrymple-Alford, E., & Budayr, B. (1966). Examination of some aspects of the stroopcolor-word test. Perceptual and motor skills , (3 suppl), 1211–1214.Devlin, S. J., Gnanadesikan, R., & Kettenring, J. R. (1975). Robust estimation andoutlier detection with correlation coefﬁcients. Biometrika , (3), 531–545.9isher, R. A. (1992). Statistical methods for research workers. In Breakthroughs instatistics (pp. 66–70). Springer.Jensen, A. R. (1965). Scoring the stroop test.

Acta psychologica , (5), 398–408.Kahneman, D. (2011). Thinking, fast and slow . Macmillan.Klingner, J. M. (2010).

Measuring cognitive load during visual tasks by combiningpupillometry and eye tracking . Unpublished doctoral dissertation, Stanford Univer-sity Palo Alto, CA.Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition.

Trends in cognitive sciences , (1), 6–14.Majaranta, P., & R¨aih¨a, K.-J. (2002). Twenty years of eye typing: systems and designissues. In Proceedings of the 2002 symposium on eye tracking research & applica-tions (pp. 15–22).Mathˆot, S. (2018). Pupillometry: Psychology, physiology, and function.

Journal ofCognition , (1).McConkie, G. W., & Dyre, B. P. (2000). Eye ﬁxation durations in reading: Models offrequency distributions. In Reading as a perceptual process (pp. 683–700). Elsevier.McMains, S. A., & Kastner, S. (2009). Visual attention.

Encyclopedia of neuroscience , , 4296–4302.Megherbi, H., Elbro, C., Oakhill, J., Segui, J., & New, B. (2018). The emergence ofautomaticity in reading: Effects of orthographic depth and word decoding ability onan adjusted stroop measure. Journal of experimental child psychology , , 652–663.Mjolsness, E., & DeCoste, D. (2001). Machine learning for science: state of the artand future prospects. science , (5537), 2051–2055.Murray, W. S., & Kennedy, A. (1988). Spatial coding in the processing of anaphorby good and poor readers: Evidence from eye movement analyses. The QuarterlyJournal of Experimental Psychology Section A , (4), 693–718.Oquendo, M., Baca-Garcia, E., Artes-Rodriguez, A., Perez-Cruz, F., Galfalvy, H.,Blasco-Fontecilla, H., . . . Duan, N. (2012). Machine learning and data mining:strategies for hypothesis generation. Molecular psychiatry , (10), 956–959.Palinko, O., Kun, A. L., Shyrokov, A., & Heeman, P. (2010). Estimating cognitiveload using remote eye tracking in a driving simulator. In Proceedings of the 2010symposium on eye-tracking research & applications (pp. 141–144).Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . oth-ers (2011). Scikit-learn: Machine learning in python.