Assessing Levels of Attention using Low Cost Eye Tracking
AAssessing Levels of Attentionusing Low Cost Eye Tracking
Per Bækgaard (cid:63) , Michael Kai Petersen, and Jakob Eg Larsen
Cognitive SystemsDepartment of Applied Mathematics and Computer ScienceTechnical University of Denmark, Building 321DK-2800 Kgs.Lyngby, Denmark { pgba,mkai,jaeg } @dtu.dk Abstract.
The emergence of mobile eye trackers embedded in next gen-eration smartphones or VR displays will make it possible to trace notonly what objects we look at but also the level of attention in a givensituation. Exploring whether we can quantify the engagement of a userinteracting with a laptop, we apply mobile eye tracking in an in-depthstudy over 2 weeks with nearly 10.000 observations to assess pupil sizechanges, related to attentional aspects of alertness, orientation and con-flict resolution. Visually presenting conflicting cues and targets we hy-pothesize that it’s feasible to measure the allocated effort when respond-ing to confusing stimuli. Although such experiments are normally carriedout in a lab, we are able to differentiate between sustained alertness andcomplex decision making even with low cost eye tracking “in the wild”.From a quantified self perspective of individual behavioral adaptation,the correlations between the pupil size and the task dependent reactiontime and error rates may longer term provide a foundation for modify-ing smartphone content and interaction to the users perceived level ofattention.
Keywords:
Eye Tracking, Attention Network
This is an author-generated preprint.To be published in the HCI International 2016 Conference Proceedings.The final publication will be available at Springer via http://dx.doi.org/TODO
Low cost eye trackers which can be embedded in next generation smartphoneswill enable design of cognitive interfaces that adapt to the users perceived level ofattention. Even when “in the wild”, and no longer constrained to fixed lab setups, (cid:63)
Acknowledgment: This work is supported in part by the Innovation Fund Denmarkthrough the project Eye Tracking for Mobile Devices. a r X i v : . [ c s . H C ] F e b obile eye tracking provides novel opportunities for continuous self-tracking ofour ability to perform a variety of tasks across a number of different contexts.Interacting with a smartphone screen requires attention which in turn in-volves different networks in the brain related to alertness, spatial orientationand conflict resolution [20]. These aspects can be separated by flanker-type ofexperiments with differently cued, sometimes conflicting, prompts. Dependenton whether the task involves fixating the eyes on an unexpected part of thescreen, or resolving the direction of an arrow surrounded by distracting stimuli,different parts of the attention network will be activated, in turn resulting invarying reaction times [7].The dilation and constriction of the pupil is not only triggered by changes inlight and fixation but reflect fluctuations in arousal networks in the brain [13],which from a quantified self perspective may enable us to assess whether we aresufficiently concentrated when we interact with the screens of smartphones orlaptops, carrying out our daily tasks. Likewise the pupil size increases when weface an unexpected uncertainty [1], physically apply force by flexing muscles,or motivationally have to decide on whether the outcome of a task justifiesthe required effort [23]. Thus, when we perform specific actions, the cognitiveload involved can be estimated using eye tracking. The pupil dilates if the taskrequires a shift from a sustained tonic alertness and orientation to more complexdecision making, in turn triggering a phasic component caused by the release ofnorepinephrine neurotransmitters in the brain [2], [8], which may reflect boththe increased energization as well as the unexpected uncertainty related to thetask [1].Whereas these results have typically been obtained under controlled lab con-ditions, we explore in the present study the feasibility of assessing a users levelof attention “in the wild” using mobile eye tracking. This longitudinal study was performed repeatedly over the course of two weeksin September-October 2015. Two male right-handed subjects, A and B, (of av-erage age 56) each performed a session very similar to the Attention NetworkTest ( ant ) [7] approximately twice every weekday, resulting in 16 resp. 17 com-plete datasets, totaling 9 .
504 individual reaction time tests. The experimentran “in the wild” in typical office environments off a conventional MacBookPro 13” (2013 model with Retina screen) that had an Eye Tribe Eye Trackerconnected to it. The ant used here is implemented in PsychoPy [18] and isavailable on github [4]. Simultaneously, eye tracking data is recorded at 60 Hzand timestamped for synchronization through the Eye Tracker API [21] via thePeyeTribe [3] interface.Before the actual experimental procedure starts, a calibration of the EyeTracker is performed. The experiment contains an initial trial run that the user - m s Fixation Cue Fixation Target Empty m s m s R T No cueCenter cue Double cue Spatial cue CongruentNeutralIncongruent
Fig. 1.
This Attention Network Test procedure used here: Every 4 seconds, a cue (ei-ther of 4 conditions (
Top, Left )) precedes a target (either of 3 congruency conditions(
Top, Right )), to which the participant responds by pressing a key according to thecentral arrow. The reaction time differences between cue- and congruency conditionsform the basis for calculating the latencies of the attention, orientation and conflictresolution networks. may select to abort, after which 3 rounds of 2 ·
48 conditioned reaction timetests follows (Fig. 1); each test is conditioned on one of 3 targets:
Incongruent , Neutral or Congruent and on 4 cues:
No Cue , Center Cue , Double Cue or SpatialCue . At the start of each test, a fixation cross appears, and after a random delayof 0 . − .
6s the user is presented to a cue (when present for the particularcondition). 0 .
5s later the target appears, either with incongruent, neutral orcongruent flankers. The user is instructed to hit a button on the left or rightside of the keyboard with his left or right hand depending on the directionof the central arrow of the target, which appeared above or below the initialentred fixation cross. Half the targets appear above and half below the fixationcross, and left/right pointing central arrows also appear evenly distributed. Theresulting reaction time “from target presentation to first registered keypress” islogged, together with the conditions of the individual tests, whether the user hitthe correct left/right key or not, and a common timestamp. For further detailson the ant please see [7].Each test takes approximately 4s to perform. With 2 · · · · The reaction times for each experiment, for which the user responded correctlywithin 1 . t alertness = t no cue − t double cue t orientation = t center cue − t spatial cue t conflict resolution = t incongruent − t congruent where t cond = 1 N N (cid:88) i | i =cond t i Linear pupil size and inter-pupil distance data can be somewhat “noisy” whenrecording in office conditions. After epoch’ing to corresponding cue times for theindividual tests, invalid/missing data from blink-affected periods are removed,and a Hampel [9] filter is therefore applied, using a centered window of ± σ , to remove remaining outliers.Data is then downsampled to 100ms resolution using a windowed averaging filter,and scaled proportionally to the value at epoch start (cue presentation), so thatthe resulting pupil dilations represent relative change vs the pupil size at cuepresentation. This last part was done to compensate for varying environmentalluminosity changes and, to some degree, to offset any effect from immediatelypreceding reaction time test(s) and to compensate for accidental head positiondrift.Time-locked averaging is then done by grouping data from similar conditionswithin each experiment, from which the group-mean relative pupil dilations canbe derived. The data received from the eye tracker is uncalibrated and cannot easily be refer-enced to a metric measurement. t the same time, the inter-pupil distance is calculated, to ensure that pupilsize changes would not be the accidental result of moving the head slightly dur-ing the experiment. Additionally, a “baseline” experiment has been performed,recording eye tracking data in a condition where no action can be taken by theuser and when no arrow-heads are visible on the targets but otherwise presentedin similar conditions, in order to rule out that the recorded pupil dilations wouldbe the result of (small) luminosity changes caused by the presented cue and tar-gets, or a result of slightly changing accommodation between the focus points ofthe cue and the target.The inter-pupil distance variation was found to be significantly smaller (typ-ically much less than 0 . Table 1 shows the aggregate Overall Mean Reaction- and Attention Networktimings for each subject A and B, with estimates of the variation over the week.The figures are not significantly different from what is found in [7]; the Mean rt reported here is slightly higher than an estimated 512ms in the reference, whereasthe alertness, orientation and conflict resolution are slightly lower or similar tothe 47ms, 51ms and 84ms reported. Table 1.
Average Reaction- and Attention Network-Times over all correctly replied ex-periments for the two week period for either user (the variation over the period is givenas estimated ± Sample Standard Deviation of the aggregate values), in milliseconds.Subject Mean rt Alert Orient ConflictA 577 ( ±
54) 27 ( ±
21) 22 ( ±
18) 85 ( ± ±
55) 35 ( ±
17) 49 ( ±
15) 81 ( ± There are, however, behavioural variations in reaction time throughout theweeks. Fig. 2 shows the variation of the derived ant timings throughout theexperimental period, and the relative error rate for each experiment. The varia-tion appear to be statistically significant, as can be estimated from the standarderror of the mean (the shaded area), and may reflect underlying states of varyinglevels of attention, fatigue and motivation.To sum up the behavioral results, A shows a somewhat increasing trend inerror rate related to the objective task performance, whereas B shows a dimin-
Session − . . . . . . . . T i m e ( s ) / E rr o r R a t e alertorientconflicterror rate Session − . . . . . . . . T i m e ( s ) / E rr o r R a t e alertorientconflicterror rate Fig. 2.
Attention Network Timing over all sessions in the two week period. ConflictResolution (
Red ) is slower than Alertness (
Green ) and Orientation (
Blue ). A (
Left )shows an increasing error rate trend (
Solid ); Conflict Resolution for B gradually ap-proaches the other latencies. Both A and B have large variations over time, pointingto varying levels of attention, fatigue and motivation. ishing difference between the three estimated measures of conflict resolution,spatial orientation and alertness reaction time.
The group-mean relative linear pupil dilations for each of the 3 congruencyconditions are illustrated in Fig. 3. . . . . . . . . . Time since cue onset (s) − − R e l a t i v ec h a n g e ( % ) incongruentcongruentneutral . . . . . . . . . Time since cue onset (s) − − R e l a t i v ec h a n g e ( % ) incongruentcongruentneutral Fig. 3.
Averaged left-eye pupil dilations for each session, coloured according to con-gruency (A (
Left ) and B). All-session average shown in bold, with the shaded arearepresenting the standard error of the mean. The average incongruent (
Red ) pupildilation is stronger than the others, indicating a higher cognitive load.
Pupil dilation responses are all epoch’ed to the cue (at time 0ms) and targetpresentation (time 500ms). A small and slow pupil dilation onset is seen < − − Blue vs the median value over a selectedperiod that covers 48 reaction time tests, in this case for B, for two differentexperiments. Test-related pupil dilation responses, that occur every 4 seconds,are not immediately visible in this graph due to random noise and a relativelystrong longer-periodic variation over 20-60 seconds . The Green curve showsthe relative variation of the inter-pupil distance, with variations an order ofmagnitude smaller than the pupil size changes.
600 650 700 750
Time since start (s) − − − R e l a t i v ec h a n g e ( % ) Left pupilIPD
750 800 850 900
Time since start (s) − − − R e l a t i v ec h a n g e ( % ) Left pupilIPD
Fig. 4.
Filtered pupil size plots; 48-test long sections of two experiments (B, left-eye).Relative inter-pupil distance (
Green ) indicates stable eye-to-screen distances.
Fig 5 shows the area under the pupil dilation curve between 1 . − .
5s aftercue (1 . − .
0s after target) for each experiment, serving as a very rough indicatorof the relative cognitive load caused by the tests. From these, also a δ (incon)can be calculated by subtracting the congruent value from the incongruent.It is seen that both A and B have larger pupil dilation responses for theinitial two experiments, after which the level is lower. For B it remains at lowerlevels, indicating a training effect. For A, the pattern is less clear, with possiblyan increased load towards the end of the two week period. A frequency domain analysis of the signal shows, however, a distinct peak at 0 . Session . . . . . . . . P up il D il a t i o n ( A r e a und e r C u r v e ) incongruentneutralcongruent Session . . . . . . . . P up il D il a t i o n ( A r e a und e r C u r v e ) incongruentneutralcongruent Fig. 5.
Area under left-eye pupil dilation curves [1 . , . Left ) and B show initial trainingeffects; only A however shows an increasing trend in cognitive load for the remainingsessions.
In order to verify how well previous pupil dilations allow predicting the classof congruency condition, a subset of the 3 within-experiment 96 − average pupildilation responses from each subject were ordered in each of the 6 possible per-mutations of the 3 congruency conditions. A neural-network type classifier wasthen trained to identify which of the 3 averaged pupil dilations were the incon-gruent. Number of trials in block . . . . . . T e s t E rr o r R a t e AB[Chance Level]
Fig. 6.
Test error rates (0 . / . . Fig 6 shows the resulting test error rate vs. the number of averaged experi-mental tests, dividing the 96 equal-condition responses of each experiment intogroups of 96, 48, 32 or 24 tests, and using a test/train split of 0 . / .
1. Theperformance is clearly above chance level (66 . .4 Correlating response times and pupil reactions Table 2 show the Pearson Correlation Coefficients for all combinations of Atten-tion Network- and Reaction-Times, Pupil Dilation metrics and Time-of-Day foreach subject, as it varies over the two week period. As the data sets are small (16and 17 sets), caution is needed when judging the significance levels (p-values).
Table 2.
Pearsons correlation coefficients between key metrics for A (
Top ) and B. Ashows negative correlation between mean reaction time and error rate (”speed-accuracytradeoff”). B (opposed to A) shows correlation between pupil dilations and error rate,possibly indicating a different response to varying levels of fatigue or motivation; ad-ditionally alertness (and partly orientation) may inversely correlate to pupil dilations.Both show expected correlations between pupil dilation metrics.
Att.-Net/Reaction Time Pupil Dilation
Orient Conflict µ ( RT ) Incon Neutral Con δ (Incon) ToD
Errors
Att.-Net/Reaction Time
Alert 0 . − . − . − . − . − . − .
008 0 . − . − . † − . ∗ .
274 0 . − .
020 0 .
402 0 .
132 0 . . ∗ − . − .
149 0 . − .
147 0 . − . µ ( RT ) 0 .
002 0 . − .
069 0 .
068 0 . − . † Pupil Dilation
Incon . ‡ . ‡ . ‡ . − . . ‡ .
362 0 .
222 0 . .
034 0 . − . δ (Incon) 0 . − . ToD . ∗ . † % and ‡ . % marked. Att.-Net/Reaction Time Pupil Dilation
Orient Conflict µ ( RT ) Incon Neutral Con δ (Incon) ToD
Errors
Att.-Net/Reaction Time
Alert 0 . − .
107 0 . − . † − . † − . − . † . − . − .
094 0 . − . ∗ − . − . † − .
155 0 . − . .
289 0 .
431 0 .
439 0 .
362 0 .
309 0 .
411 0 . µ ( RT ) − . − . − . − .
173 0 . ∗ − . Pupil Dilation
Incon . ‡ . ‡ . ‡ − . . ‡ Neutral . ‡ . † − . . ‡ Con 0 . − . . † δ (Incon) − . . † ToD − . ∗ . † % and ‡ . % marked. With some variation between subjects, pupil dilation responses appear cor-related.ubject A shows correlation between orientation and conflict resolution tim-ings, which is however not seen at all for B. A also may have some correlationbetween mean reaction time and orientation resp conflict resolution timings,which are however again not quite as present with B.Subject B shows correlation between alertness timing and both incongruent,neutral and δ (incon) pupil dilations, as well as correlation between orientationtiming and congruent pupil dilations. These are not present for A, however.Also, there are indications of a correlation between the time of day and themean reaction time; the experiments done on B were spread out over largersections of the day than for A, which might explain why this is not seen for A.[7] reported correlations between the conflict resolution timing and the meanreaction time over a large group of people. As such, the conditions are not similarto the within-person variation, but it might be worth pointing out that a similarcorrelation is partly present for A and cannot be ruled out for B. Using low cost portable eye tracking to measure the variations in pupil size,we were able to differentiate and predict whether users were engaged in morecomplex decision making or merely maintaining a general alertness when inter-acting with a laptop, over nearly 10 .
000 tests. A parallel single-experiment study[5] repeating the experimental setup with nearly 10 .
000 additional tests over 18more subjects, have confirmed that similar significant pupil response differencescharacterize the contrasts between incongruent versus neutral or congruent taskconditions.In the present study, we found a significant difference based on the left eyepupil size for the conflict resolution task in contrast to the attentional networkcomponents of alertness and re-orientation, but not between these two lattertasks. These results may reflect findings in other studies indicating that thephasic component in attention is predominantly triggered by tasks requiring adecision, whereas the tonic alertness may suffice for solving less demanding taskslike responding to visual cues or re-orienting attention to an unexpected part ofthe screen [2] as seen in the “baseline” experiment, where no decision needs tobe made and no motor cortex activation takes place.From a quantified self perspective of individual behaviour, using mobile eyetracking to assess levels of engagement, the relations between pupil size (a possi-ble quantification of the cognitive load), and error rate/reaction time (a quantifi-cation of the objective task performance), indicate individual differences amongthe subjects’ behavioural adaptation to the attentional tasks. A is apparentlycoping with the cognitive load by trading off speed and accuracy to optimize per-formance, as indicated by the lack of correlation between pupil size and either ofthe performance related measures. However, for B the correlation between pupilsize and accuracy may suggest a behavior characterized by applying more effortto the task if the number of errors increase.s we have in this study only used the pupil size as a measure of atten-tion, even without considering the spatial density of fixations or the speed ofsaccadic eye movements that could entail further information, we suggest thatmobile eye tracking may not only enable us to assess the effort required whenundertaking a variety of tasks in an everyday context, but could also longer termprovide a foundation for continuously adapting the content and interaction withsmartphones and laptops based on our perceived level of attention.
References [1] Ang, Y.S., Manohar, S., Apps, M.A.J.: Commentary: Noradrenaline andDopamine Neurons in the Reward/Effort Trade-off: A Direct Electro-physiological Comparison in Behaving Monkeys. Frontiers in Behav-ioral Neuroscience 9(November), 310 (nov 2015), [2] Aston-Jones, G., Cohen, J.D.: An Integrative Theory of Locus Coeruleus-Norepinephrine Function: Adaptive Gain and Optimal Performance. Annual Re-view of Neuroscience 28(1), 403–450 (2005), [3] Bækgaard, P.: Simple python interface to the Eye Tribe eye tracker (2015), https://github.com/baekgaard/peyetribe/ [4] Bækgaard, P.: Attention Network Test implemented in PsychoPy (2016), https://github.com/baekgaard/ant [5] Baekgaard, P., Petersen, M.K., Larsen, J.E.: Differentiating attentional networkcomponents using mobile eye tracking. In preparation (2016)[6] Beatty, J.: Task-evoked pupillary responses, processing load, and the structure ofprocessing resources (1982)[7] Fan, J., McCandliss, B.D., Sommer, T., Raz, A., Posner, M.I.: Testing the Effi-ciency and Independence of Attentional Networks. Journal of Cognitive Neuro-science 14(3), 340–347 (2002), [8] Gabay, S., Pertzov, Y., Henik, A.: Orienting of attention, pupil size, and the nore-pinephrine system. Attention, perception & psychophysics 73(1), 123–9 (2011), [9] Hampel, F.R.: The Influence Curve and its Role in Robust Estimation. Journalof the American Statistical Association 69(346), 383–393 (1974), [10] Holmqvist, K.: Eye Tracking: a comprehensive guide to methods and measures.Oxford University Press (2011)[11] Hunter, J.D.: Matplotlib: A 2D graphics environment. Computing in Science andEngineering 9(3), 99–104 (2007)[12] Hy¨on¨a, J., Tommola, J., Alaja, A.M.: Pupil Dilation as a Measure of ProcessingLoad in Simultaneous Interpretation and Other Language Tasks. The QuarterlyJournal of Experimental Psychology Section A 48(3), 598–612 (1995), [13] Joshi, S., Li, Y., Kalwani, R.M., Gold, J.I.: Relationships between Pupil Diameterand Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex.Neuron 89(1), 221–234 (2016), http://dx.doi.org/10.1016/j.neuron.2015.11.028
14] Laeng, B., Ørbo, M., Holmlund, T., Miozzo, M.: Pupillary stroop effects. CognitiveProcessing 12(1), 13–21 (2011)[15] McKinney, W.: Data Structures for Statistical Computing in Python. Proceedingsof the 9th Python in Science Conference 1697900(Scipy), 51–56 (2010), http://conference.scipy.org/proceedings/scipy2010/mckinney.html [16] Oliphant, T.E.: SciPy: Open source scientific tools for Python. Computing inScience and Engineering 9, 10–20 (2007), [17] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Van-derplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duches-nay, ´E.: Scikit-learn: Machine Learning in Python. . . . of Machine Learn-ing . . . 12, 2825–2830 (2012), http://dl.acm.org/citation.cfm?id=2078195$\delimiter"026E30F$nhttp://arxiv.org/abs/1201.0490 [18] Peirce, J.W.: PsychoPy-Psychophysics software in Python. Journal of Neu-roscience Methods 162(1-2), 8–13 (2007), http://dx.doi.org/10.1016/j.jneumeth.2006.11.017 [19] P´erez, F., Granger, B.E.: { IP } ython: a System for Interactive Scientific Com-puting. Computing in Science and Engineering 9(3), 21–29 (may 2007), http://ipython.org [20] Posner, M.I.: Attentional networks and consciousness. Frontiers in Psy-chology 3(MAR), 1–4 (2012), [21] The Eye Tribe: The Eye Tribe API Reference, http://dev.theeyetribe.com/api/ [22] Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: A structurefor efficient numerical computation. Computing in Science and Engineering 13(2),22–30 (2011)[23] Varazzani, C., San-Galli, A., Gilardeau, S., Bouret, S.: Noradrenaline andDopamine Neurons in the Reward/Effort Trade-Off: A Direct Electro-physiological Comparison in Behaving Monkeys. Journal of Neuroscience35(20), 7866–7877 (may 2015),[22] Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: A structurefor efficient numerical computation. Computing in Science and Engineering 13(2),22–30 (2011)[23] Varazzani, C., San-Galli, A., Gilardeau, S., Bouret, S.: Noradrenaline andDopamine Neurons in the Reward/Effort Trade-Off: A Direct Electro-physiological Comparison in Behaving Monkeys. Journal of Neuroscience35(20), 7866–7877 (may 2015),