End-to-End Models for the Analysis of System 1 and System 2 Interactions based on Eye-Tracking Data
Alessandro Rossi, Sara Ermini, Dario Bernabini, Dario Zanca, Marino Todisco, Alessandro Genovese, Antonio Rizzo
EEnd-to-End Models for the Analysis of
System 1 and
System 2
Interactionsbased on Eye-Tracking Data
Alessandro Rossi , Sara Ermini , Dario Bernabini , Dario Zanca ,Marino Todisco , Alessandro Genovese , and Antonio Rizzo DISPOC, University of Siena DSMCN, University of Siena AIDILAB, Siena, Italy
Abstract
While theories postulating a dual cognitive system take hold, quantitative confir-mations are still needed to understand and identify interactions between the twosystems or conflict events. Eye movements are among the most direct markers ofthe individual attentive load and may serve as an important proxy of information.In this work we propose a computational method, within a modified visual versionof the well-known Stroop test, for the identification of different tasks and potentialconflicts events between the two systems through the collection and processingof data related to eye movements. A statistical analysis shows that the selectedvariables can characterize the variation of attentive load within different scenar-ios. Moreover, we show that Machine Learning techniques allow to distinguishbetween different tasks with a good classification accuracy and to investigate morein depth the gaze dynamics.
Keywords:
System 1 and System 2, Eye-tracking, Data Analysis, Machine Learn-ing, Classification, Stroop test
Viewing is a complex activity, involving cognitive aspects, conscious and unconscious.It manifests itself through motor behavior that acquires salient information in the formof light radiation. When observing static images, this attentive activity exhibits rapideye movements called saccades , occurring between the so-called fixations . Duringfixations, the eye remains fixed and the information is sampled. It is well known that thecognitive load of individual tasks may influence eye movements statistics (McMains &Kastner, 2009; Connor, Egeth, & Yantis, 2004; Mathˆot, 2018), and in particular somevariables like average fixation duration , saccade length or saccade velocity , among1 a r X i v : . [ q - b i o . N C ] F e b thers. For this reason, it seems reasonable to define techniques based on eye-trackingdata in order to recognize recurring patterns related to the visual attention and identifythe task that the subject is performing (Klingner, 2010; Zagermann, Pfeil, & Reiterer,2016). Indeed, it has already been observed that the variation of the attentive loadwithin different tasks affects the eye movements (Castelhano, Mack, & Henderson,2009; Tanaka, Inuzuka, & Hirayama, 2019).In this work we analyzed the vision behaviour of subjects involved in a Strooptest (Stroop, 1935) while performing different visual tasks, naming and reading, inorder to explore possible effects on the human attention. The exploratory patterns areexpressed through variables related to eye fixations and saccadic movements, sincethey are both influenced by processing difficulty (Pollatsek, Rayner, & Balota, 1986).The execution of each task requires to the subject different attention loads: reading isperformed as a fast and automatic process, while naming the color of a word writtenwith an ink color mismatching its semantics is a slow conscious activity (Kahneman,2011). The delay in naming colors of incompatible words has been described as aconflict between the System 1 and the System 2 (Kahneman, 2011). This phenomenonis well-known in experimental psychology and several methods have been developedto test and measure it (Jensen, 1965; Dalrymple-Alford & Budayr, 1966; Bench etal., 1993). To this aim, we set up a visual version of the Stroop test during whichwe recorded the eye movements of 64 subjects, following the experimental protocoldefined in (Megherbi, Elbro, Oakhill, Segui, & New, 2018). The experiment involvetwo different tasks, defined as Naming and Reading, and two conditions, defined as”With Interference” and ”Without Interference”.The goals are (1) to verify the presence of recurrent visual behavioural patternsfor different tasks and conditions through a statistical data analysis and (2) to generateautomatic models which are able to identify in which task or condition the subjectcurrently is involved.The paper is organized as follows. The section ”Method” describes the experimen-tal protocol set up for stimuli presentation and data collection. In the section ”Experi-ments” we provide a detailed description of the data pre-processing, Machine Learningtechniques and metrics for evaluation of the results. Finally, in the ”Conclusions” wediscuss results and suggest direction for future works. We set up a visual Stroop test to record eye movements of 64 Italian subjects (32 fe-males and 32 males, average age = 30,2 ± Reading Without Interference (RWoI) - Participants had to read the words onscreen. The words ”ROSSO” (”red”), ”GIALLO” (”yellow”), ”VERDE” (”green”)and ”BLU” (”blue”) were all colored black. • Reading With Interferences (RWI) - Participants had to read the words on screen.The words ”ROSSO”, ”GIALLO”, ”VERDE” and ”BLU” were coloured red,yellow, green and blue, with a mismatching between the shade used and themeaning of the word (e.g. ”ROSSO” was never coloured in red). • Naming With Interference (NWoI) - Participants had to name the color of thewords on screen. In this case, the Latin letters were replaced by pseudo-lettersconstructed to match the real letters physical properties (height, number of pix-els, and contiguous pixels) by reconfiguring their original characteristics (Megherbiet al., 2018). The pseudo-words were colored red, green, yellow and blue. • Naming With Interferences (NWI) - Participants had to name the color of thewords on screen. The composition of the screen followed the same principlesused for the construction of the RWI condition.The presentation order of the conditions were randomized balancing the set of pos-sibilities among participants. Eye movements were recorded by an EyeLink PortableDuo set to 500 Hz sampling rate and in Head-Free mode. Images were presented ona 17 inches display (1920 x 1080 pixels), placed perpendicularly in front of the partic-ipant at a distance from the eyes ranging in 46-52 cm. The experiment was executedindividually only once and subjects unable to correctly read words and instructions onthe screen were discarded. The light in the room and external noise are controlled bythe experimenter. Written instructions are presented to the participants, followed by anoral explanation. The experiment is preceded by an initial unrecorded trial and a stan-dard 5-point calibration. Between the instructions and the stimulus screens, a whitescreen containing a circular trigger located at the top-left corner of the task image werepresented. Each trial began when the participant fixated the trigger for at least 100 ms.The trial is completed when the participant press the space key. During the execution,the experimenter annotated on an Excel spreadsheet any relevant information regardingthe experience of the subject and possible technical issues (e.g. if the first calibrationfailed). Variables about eye movements were extracted by the software released with the eye-tracker device . However, we found that a further cleaning process was necessaryto improve the data quality. Fixations that fall outside of the areas of interest werediscarded, since they could be either due to an instrumental artifact or to a subject’sactivity which is not related to the task. WoI vs. RWoI NWI vs. NWoI RWI vs. RWoINumber of fixations 0.285402 0.000002 0.088730Average fixation length 0.000052 0.027428 0.015745Maximum fixation length 0.000606 0.035081 0.008078Horizontal regressions 0.079062 0.000005 0.144568Vertical regressions 0.632652 0.000001 0.214819 p -values Table 1: Significance p -values generated by the one-way ANOVA on fixations vari-ables when comparing three pairs of tasks: NWoI vs. RWoI, NWI vs. NWoI andRWI vs. RWoI. For each subject and stimulus (NWI, NWoI, RWI, RWoI), a set of statistical featuresrelated to eye movements were extracted: • Number of fixations: total number of fixations. • Average fixation length: the average of the duration among all the fixations. • Maximum fixation length: the maximum of the duration among all the fixations. • Horizontal/Vertical regressions: the number of times that the eyes step backwardin their horizontal/vertical path (assumed left to right and up to down respec-tively), excluding the changes of line in the horizontal counting. • Up/Down/Left/Right Frequency: the counting of saccadic movements in eachdirection, normalized by the total number of saccades. • Minimum/Average/Maximum saccade duration: statistics about the duration ofeach saccade. • Minimum/Average/Maximum saccade velocity: statistics about the estimatedvelocity of each saccade. • Minimum/Average/Maximum saccade amplitude: statistics about the amplitudeof each saccade (in degrees of visual angle). • Minimum/Average/Maximum saccade distance: statistics about the distance ofeach saccade (in degrees of visual angle). • Minimum/Average/Maximum saccade slope: statistics about the slope of eachsaccade with respect to the horizontal axis.A first statistical hypothesis testing was performed by a standard one-way ANOVA(Fisher, 1992). The purpose was to assess the representativeness of the variables withindifferent tasks and conditions. For each variable we analyzed the differences in thegenerated statistical distributions when comparing two different conditions, in order toverify the confidence that the considered measurement is sampled from two different5opulations from the compared groups. The test is repeated for three comparisons:NWoI vs. RWoI, NWI vs. NWoI and RWI vs. RWoI. Since the difference betweenNaming and Reading tasks is proven in literature (Kahneman, 2011), the first test isa sort of control for the whole experimentation setting. However, we believe that ex-ploring connections with visual attention and eye movements could be of interest too.The second and third comparisons are the focus of the presented work in order to assesif different attention levels, required to perform two different tasks, can be caught byvariables related to eye movements. In Table 1 we report the significance values p ob-tained for in each comparison involving the variable about fixations. As we can see,the differences between Naming and Reading tasks are well represented by statisticsabout the duration of fixations (both average and maximum). In the second test, theeffects of interference in Naming is highly expressed by the number of fixations andthe eye regressions in both axis. This results appeared to be in agreement with the lit-erature, since saccades regressions are found to be more frequent and larger when thereader encounter some difficulties (Pollatsek et al., 1986; Murray & Kennedy, 1988).The results in this first two tests was also confirmed by subjects general interviews, inwhich they admitted to perceive the Naming task as highly counterintuitive, especiallyin presence of interference. On the other hand, they confirmed to perceive the Readingtask as more trivial, with little additional difficulty introduced by interference. Thisperception is also in agreement with our results, since the p -values for the RWI vs.RWoIare in general higher with respect to the other tests. Nevertheless, we observedthat variables related to saccadic movements produce a lower level of significance whenconsidered alone. Indeed, the minimum significance value p = .
005 was achieved byAverage Saccades Duration when comparing NWI vs. NWoI, but in few other caseswe obtained a p < . Even if we are analyze fixations, their positions and, hence, possible regressions are directly related tosaccadic movements.
Fix (features related to fixations),
Saccades (features related to saccades),
Fix+Saccades (features related to fixations and saccades),
Fix+Saccades-norm (features related tofixations and saccades with subject-wise normalization). The dashed red line indicatesthe score of the random baseline.tasks, and subtracted it to the original values. These should mitigate individual effectson each tasks, and improve the final representativeness of the variables.We repeated the same tests investigated within the statistical analysis by settingup three separated binary classification test: NWoI vs. RWoI, NWI vs. NWoI andRWI vs. RWoI. We avoid a global 4-class test since the dynamic of the tasks aretoo complex to be modeled by a such small numbers of samples. We exploited theScikit-learn (Pedregosa et al., 2011) Python software package to test four differentclassifiers (Bishop, 2006):i. Logistic Regression (Logistic) is a statistical model that in its basic form uses alogistic function, applied to a weighted average of the input features, to model abinary dependent variable (the model prediction);ii. Support Vector Machines (SVM) are supervised learning models used for binaryclassification. SVMs can learn non-linear separation surfaces by means of the so-called kernel trick, implicitly mapping their inputs into high-dimensional featuresspace;iii. Random Forests (RF) are an ensemble of decision trees based on bootstrapping.Different models are trained on a subset of samples and the final decision is takenby majority voting.iv. Artificial Neural Networks (ANN) are a well known class of learning algorithmsinspired by the biological neural networks; they are based on a collection of unitsor nodes, called artificial neurons, connected by edges which represent the flow ofinformation; edges are in fact numbers and represent the parameters of the model,7ypically learned by the back-propagation of an error signal with respect to thetarget.Our goal is to demonstrate that interactions between the two cognitive systems, System1 and System 2, affect the gaze dynamics and can be detected by machine learningalgorithms based on eye-related features. Eventually, we would like to find out thatclassification performances are consistently better than a random baseline in order toprove that the populations are intrinsically distinct. We performed a 5-fold Cross-Validation for each classifiers and computed the average of achieved F1-scores. Thisshould guarantee that results do not depend on the choice of the test set, even if therelative high variability presented depends on the small size of the test (one singlesample which is not classified correctly heavily affect the results).To improve the performances of Machine Learning algorithms, features were nor-malized in [ , ] (we found this method to slightly outperform z-normalization in ourcase). In addition, to investigate more in depth the information expressed by the fea-tures, we generated four sets of variables that we tested independently. • Fix.
It was composed by the variables extracted from fixations, when filteringout fixations shorter than 200 ms. • Saccades.
It was composed by the variables extracted from saccades, aimed atcapturing gaze dynamics and visual exploration schemes. • Fix+Saccades.
It is composed both from fixations and saccades features. • Fix+Saccades-norm.
Since variables related to eye-movements are character-ized by an strong inter-subject variability, for each subject we computed andsubtracted the mean of each variable to generate the third set of features. Theseprocess was aimed at centering the variables distributions related to each sub-ject around zero, translating the comparison among different subjects in a morecomparable scale.F1-scores achieved within the tests are reported in Fig. 2. All the classifiers andfeatures pairs are significantly above the random baseline, and in best cases above 0.8.These results corroborated the statistical analysis and the hypothesis that attention levelinfluences gaze dynamics and those differences. Furthermore, these connections canbe captured by an automatic classifier even at a small scale (i.e. with few training sam-ples). Interestingly, a global trend is observed while considering different sets of inputfeatures. Combining features about fixations and saccades brings an improvement onthe performances of each classifier, compared with the case in which separated featuresare exploited. This result also connects the attentive load to different exploration strate-gies of the visual scene. As already said, information about backward saccades, areconnected to more complex types of reasoning, typical of System 2, which are leadedby a need re-analysis or re-sampling of already visited portions of the scene (Pollatseket al., 1986; Murray & Kennedy, 1988). Moreover, a strong improvements have beenachieved by applying a subject-wise normalization. This confirms that the analyzedscenario is highly affected by personal behaviors, but we showed that these effects canbe mitigated by the application of standard statistical techniques. In general, we could8bserve that the Random Forests achieved performances which are considerably worsewith respect to the others algorithms, sometimes even close to the random baseline.This could be due to the fact that decision tree are not capable to extract high-levelcorrelations among variables, but most of all that the random sub-sampling negativelyemphasize the inter-subject variability.
The experimental results supported the hypothesis (1) that different attentive loadspresent recurrent visual behaviors that can be characterized by a statistical analysisof variables related to eye fixations. Furthermore, these patterns can be modeled (2) bydata-driven Machine Learning algorithms which are able to identify, with reasonableaccuracy, the different conditions in which individuals are involved. We show that situ-ations of conflict between System 1 and System 2 are captured by the gaze data and thestatistical variables analyzed. The combining of features related to both fixations andsaccades increases the accuracy of the classifiers. This suggests that subjects, amongdifferent tasks, use to implement task-specific schemes to regulate their gaze dynamics.We found that the exploited normalization techniques is useful when addressing wideinter-subject variability to improve the comparison among different individuals. How-ever, these issues could be addressed more effectively by a large scale data collectionto obtain more versatile Machine Learning models and more reliable results.Future research directions could include the integration in the analysis data relatedto pupillary response, since they are already proven to be connected to attentive andcognitive load (Klingner, 2010; Mathˆot, 2018). This could help to explain more indepth connections among visual attention and eye movements, but also to develop morerobust practical scenarios. Indeed, similar analysis turn out to be useful in applica-tions such as monitoring attentive state of drivers (Palinko, Kun, Shyrokov, & Heeman,2010) or understanding truth telling and deception (Wang, Spezio, & Camerer, 2010).
References
Bench, C., Frith, C., Grasby, P., Friston, K., Paulesu, E., Frackowiak, R., & Dolan,R. J. (1993). Investigations of the functional anatomy of attention using the strooptest.
Neuropsychologia , (9), 907–922.Bishop, C. M. (2006). Pattern recognition and machine learning . springer.Castelhano, M. S., Mack, M. L., & Henderson, J. M. (2009). Viewing task influenceseye movement control during active scene perception.
Journal of vision , (3), 6–6.Connor, C. E., Egeth, H. E., & Yantis, S. (2004). Visual attention: bottom-up versustop-down. Current biology , (19), R850–R852.Dalrymple-Alford, E., & Budayr, B. (1966). Examination of some aspects of the stroopcolor-word test. Perceptual and motor skills , (3 suppl), 1211–1214.Devlin, S. J., Gnanadesikan, R., & Kettenring, J. R. (1975). Robust estimation andoutlier detection with correlation coefficients. Biometrika , (3), 531–545.9isher, R. A. (1992). Statistical methods for research workers. In Breakthroughs instatistics (pp. 66–70). Springer.Jensen, A. R. (1965). Scoring the stroop test.
Acta psychologica , (5), 398–408.Kahneman, D. (2011). Thinking, fast and slow . Macmillan.Klingner, J. M. (2010).
Measuring cognitive load during visual tasks by combiningpupillometry and eye tracking . Unpublished doctoral dissertation, Stanford Univer-sity Palo Alto, CA.Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition.
Trends in cognitive sciences , (1), 6–14.Majaranta, P., & R¨aih¨a, K.-J. (2002). Twenty years of eye typing: systems and designissues. In Proceedings of the 2002 symposium on eye tracking research & applica-tions (pp. 15–22).Mathˆot, S. (2018). Pupillometry: Psychology, physiology, and function.
Journal ofCognition , (1).McConkie, G. W., & Dyre, B. P. (2000). Eye fixation durations in reading: Models offrequency distributions. In Reading as a perceptual process (pp. 683–700). Elsevier.McMains, S. A., & Kastner, S. (2009). Visual attention.
Encyclopedia of neuroscience , , 4296–4302.Megherbi, H., Elbro, C., Oakhill, J., Segui, J., & New, B. (2018). The emergence ofautomaticity in reading: Effects of orthographic depth and word decoding ability onan adjusted stroop measure. Journal of experimental child psychology , , 652–663.Mjolsness, E., & DeCoste, D. (2001). Machine learning for science: state of the artand future prospects. science , (5537), 2051–2055.Murray, W. S., & Kennedy, A. (1988). Spatial coding in the processing of anaphorby good and poor readers: Evidence from eye movement analyses. The QuarterlyJournal of Experimental Psychology Section A , (4), 693–718.Oquendo, M., Baca-Garcia, E., Artes-Rodriguez, A., Perez-Cruz, F., Galfalvy, H.,Blasco-Fontecilla, H., . . . Duan, N. (2012). Machine learning and data mining:strategies for hypothesis generation. Molecular psychiatry , (10), 956–959.Palinko, O., Kun, A. L., Shyrokov, A., & Heeman, P. (2010). Estimating cognitiveload using remote eye tracking in a driving simulator. In Proceedings of the 2010symposium on eye-tracking research & applications (pp. 141–144).Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . oth-ers (2011). Scikit-learn: Machine learning in python.