[PDF] Assessing Levels of Attention using Low Cost Eye Tracking

Abstract

The emergence of mobile eye trackers embedded in next generation smartphones or VR displays will make it possible to trace not only what objects we look at but also the level of attention in a given situation. Exploring whether we can quantify the engagement of a user interacting with a laptop, we apply mobile eye tracking in an in-depth study over 2 weeks with nearly 10.000 observations to assess pupil size changes, related to attentional aspects of alertness, orientation and conflict resolution. Visually presenting conflicting cues and targets we hypothesize that it's feasible to measure the allocated effort when responding to confusing stimuli. Although such experiments are normally carried out in a lab, we are able to differentiate between sustained alertness and complex decision making even with low cost eye tracking "in the wild". From a quantified self perspective of individual behavioral adaptation, the correlations between the pupil size and the task dependent reaction time and error rates may longer term provide a foundation for modifying smartphone content and interaction to the users perceived level of attention.

Full PDF

AAssessing Levels of Attentionusing Low Cost Eye Tracking

Per Bækgaard (cid:63) , Michael Kai Petersen, and Jakob Eg Larsen

Cognitive SystemsDepartment of Applied Mathematics and Computer ScienceTechnical University of Denmark, Building 321DK-2800 Kgs.Lyngby, Denmark { pgba,mkai,jaeg } @dtu.dk Abstract.

The emergence of mobile eye trackers embedded in next gen-eration smartphones or VR displays will make it possible to trace notonly what objects we look at but also the level of attention in a givensituation. Exploring whether we can quantify the engagement of a userinteracting with a laptop, we apply mobile eye tracking in an in-depthstudy over 2 weeks with nearly 10.000 observations to assess pupil sizechanges, related to attentional aspects of alertness, orientation and con-ﬂict resolution. Visually presenting conﬂicting cues and targets we hy-pothesize that it’s feasible to measure the allocated eﬀort when respond-ing to confusing stimuli. Although such experiments are normally carriedout in a lab, we are able to diﬀerentiate between sustained alertness andcomplex decision making even with low cost eye tracking “in the wild”.From a quantiﬁed self perspective of individual behavioral adaptation,the correlations between the pupil size and the task dependent reactiontime and error rates may longer term provide a foundation for modify-ing smartphone content and interaction to the users perceived level ofattention.

Keywords:

Eye Tracking, Attention Network

This is an author-generated preprint.To be published in the HCI International 2016 Conference Proceedings.The ﬁnal publication will be available at Springer via http://dx.doi.org/TODO

Low cost eye trackers which can be embedded in next generation smartphoneswill enable design of cognitive interfaces that adapt to the users perceived level ofattention. Even when “in the wild”, and no longer constrained to ﬁxed lab setups, (cid:63)

Acknowledgment: This work is supported in part by the Innovation Fund Denmarkthrough the project Eye Tracking for Mobile Devices. a r X i v : . [ c s . H C ] F e b obile eye tracking provides novel opportunities for continuous self-tracking ofour ability to perform a variety of tasks across a number of diﬀerent contexts.Interacting with a smartphone screen requires attention which in turn in-volves diﬀerent networks in the brain related to alertness, spatial orientationand conﬂict resolution [20]. These aspects can be separated by ﬂanker-type ofexperiments with diﬀerently cued, sometimes conﬂicting, prompts. Dependenton whether the task involves ﬁxating the eyes on an unexpected part of thescreen, or resolving the direction of an arrow surrounded by distracting stimuli,diﬀerent parts of the attention network will be activated, in turn resulting invarying reaction times [7].The dilation and constriction of the pupil is not only triggered by changes inlight and ﬁxation but reﬂect ﬂuctuations in arousal networks in the brain [13],which from a quantiﬁed self perspective may enable us to assess whether we aresuﬃciently concentrated when we interact with the screens of smartphones orlaptops, carrying out our daily tasks. Likewise the pupil size increases when weface an unexpected uncertainty [1], physically apply force by ﬂexing muscles,or motivationally have to decide on whether the outcome of a task justiﬁesthe required eﬀort [23]. Thus, when we perform speciﬁc actions, the cognitiveload involved can be estimated using eye tracking. The pupil dilates if the taskrequires a shift from a sustained tonic alertness and orientation to more complexdecision making, in turn triggering a phasic component caused by the release ofnorepinephrine neurotransmitters in the brain [2], [8], which may reﬂect boththe increased energization as well as the unexpected uncertainty related to thetask [1].Whereas these results have typically been obtained under controlled lab con-ditions, we explore in the present study the feasibility of assessing a users levelof attention “in the wild” using mobile eye tracking. This longitudinal study was performed repeatedly over the course of two weeksin September-October 2015. Two male right-handed subjects, A and B, (of av-erage age 56) each performed a session very similar to the Attention NetworkTest ( ant ) [7] approximately twice every weekday, resulting in 16 resp. 17 com-plete datasets, totaling 9 .

504 individual reaction time tests. The experimentran “in the wild” in typical oﬃce environments oﬀ a conventional MacBookPro 13” (2013 model with Retina screen) that had an Eye Tribe Eye Trackerconnected to it. The ant used here is implemented in PsychoPy [18] and isavailable on github [4]. Simultaneously, eye tracking data is recorded at 60 Hzand timestamped for synchronization through the Eye Tracker API [21] via thePeyeTribe [3] interface.Before the actual experimental procedure starts, a calibration of the EyeTracker is performed. The experiment contains an initial trial run that the user - m s Fixation Cue Fixation Target Empty m s m s R T No cueCenter cue Double cue Spatial cue CongruentNeutralIncongruent

Fig. 1.

This Attention Network Test procedure used here: Every 4 seconds, a cue (ei-ther of 4 conditions (

Top, Left )) precedes a target (either of 3 congruency conditions(

Top, Right )), to which the participant responds by pressing a key according to thecentral arrow. The reaction time diﬀerences between cue- and congruency conditionsform the basis for calculating the latencies of the attention, orientation and conﬂictresolution networks. may select to abort, after which 3 rounds of 2 ·

48 conditioned reaction timetests follows (Fig. 1); each test is conditioned on one of 3 targets:

Incongruent , Neutral or Congruent and on 4 cues:

No Cue , Center Cue , Double Cue or SpatialCue . At the start of each test, a ﬁxation cross appears, and after a random delayof 0 . − .

6s the user is presented to a cue (when present for the particularcondition). 0 .

5s later the target appears, either with incongruent, neutral orcongruent ﬂankers. The user is instructed to hit a button on the left or rightside of the keyboard with his left or right hand depending on the directionof the central arrow of the target, which appeared above or below the initialentred ﬁxation cross. Half the targets appear above and half below the ﬁxationcross, and left/right pointing central arrows also appear evenly distributed. Theresulting reaction time “from target presentation to ﬁrst registered keypress” islogged, together with the conditions of the individual tests, whether the user hitthe correct left/right key or not, and a common timestamp. For further detailson the ant please see [7].Each test takes approximately 4s to perform. With 2 · · · · The reaction times for each experiment, for which the user responded correctlywithin 1 . t alertness = t no cue − t double cue t orientation = t center cue − t spatial cue t conﬂict resolution = t incongruent − t congruent where t cond = 1 N N (cid:88) i | i =cond t i Linear pupil size and inter-pupil distance data can be somewhat “noisy” whenrecording in oﬃce conditions. After epoch’ing to corresponding cue times for theindividual tests, invalid/missing data from blink-aﬀected periods are removed,and a Hampel [9] ﬁlter is therefore applied, using a centered window of ± σ , to remove remaining outliers.Data is then downsampled to 100ms resolution using a windowed averaging ﬁlter,and scaled proportionally to the value at epoch start (cue presentation), so thatthe resulting pupil dilations represent relative change vs the pupil size at cuepresentation. This last part was done to compensate for varying environmentalluminosity changes and, to some degree, to oﬀset any eﬀect from immediatelypreceding reaction time test(s) and to compensate for accidental head positiondrift.Time-locked averaging is then done by grouping data from similar conditionswithin each experiment, from which the group-mean relative pupil dilations canbe derived. The data received from the eye tracker is uncalibrated and cannot easily be refer-enced to a metric measurement. t the same time, the inter-pupil distance is calculated, to ensure that pupilsize changes would not be the accidental result of moving the head slightly dur-ing the experiment. Additionally, a “baseline” experiment has been performed,recording eye tracking data in a condition where no action can be taken by theuser and when no arrow-heads are visible on the targets but otherwise presentedin similar conditions, in order to rule out that the recorded pupil dilations wouldbe the result of (small) luminosity changes caused by the presented cue and tar-gets, or a result of slightly changing accommodation between the focus points ofthe cue and the target.The inter-pupil distance variation was found to be signiﬁcantly smaller (typ-ically much less than 0 . Table 1 shows the aggregate Overall Mean Reaction- and Attention Networktimings for each subject A and B, with estimates of the variation over the week.The ﬁgures are not signiﬁcantly diﬀerent from what is found in [7]; the Mean rt reported here is slightly higher than an estimated 512ms in the reference, whereasthe alertness, orientation and conﬂict resolution are slightly lower or similar tothe 47ms, 51ms and 84ms reported. Table 1.

Average Reaction- and Attention Network-Times over all correctly replied ex-periments for the two week period for either user (the variation over the period is givenas estimated ± Sample Standard Deviation of the aggregate values), in milliseconds.Subject Mean rt Alert Orient ConﬂictA 577 ( ±

54) 27 ( ±

21) 22 ( ±

18) 85 ( ± ±

55) 35 ( ±

17) 49 ( ±

15) 81 ( ± There are, however, behavioural variations in reaction time throughout theweeks. Fig. 2 shows the variation of the derived ant timings throughout theexperimental period, and the relative error rate for each experiment. The varia-tion appear to be statistically signiﬁcant, as can be estimated from the standarderror of the mean (the shaded area), and may reﬂect underlying states of varyinglevels of attention, fatigue and motivation.To sum up the behavioral results, A shows a somewhat increasing trend inerror rate related to the objective task performance, whereas B shows a dimin-

Session − . . . . . . . . T i m e ( s ) / E rr o r R a t e alertorientconﬂicterror rate Session − . . . . . . . . T i m e ( s ) / E rr o r R a t e alertorientconﬂicterror rate Fig. 2.

Attention Network Timing over all sessions in the two week period. ConﬂictResolution (

Red ) is slower than Alertness (

Green ) and Orientation (

Blue ). A (

Left )shows an increasing error rate trend (

Solid ); Conﬂict Resolution for B gradually ap-proaches the other latencies. Both A and B have large variations over time, pointingto varying levels of attention, fatigue and motivation. ishing diﬀerence between the three estimated measures of conﬂict resolution,spatial orientation and alertness reaction time.

The group-mean relative linear pupil dilations for each of the 3 congruencyconditions are illustrated in Fig. 3. . . . . . . . . . Time since cue onset (s) − − R e l a t i v ec h a n g e ( % ) incongruentcongruentneutral . . . . . . . . . Time since cue onset (s) − − R e l a t i v ec h a n g e ( % ) incongruentcongruentneutral Fig. 3.

Averaged left-eye pupil dilations for each session, coloured according to con-gruency (A (

Left ) and B). All-session average shown in bold, with the shaded arearepresenting the standard error of the mean. The average incongruent (

Red ) pupildilation is stronger than the others, indicating a higher cognitive load.

Pupil dilation responses are all epoch’ed to the cue (at time 0ms) and targetpresentation (time 500ms). A small and slow pupil dilation onset is seen < − − Blue vs the median value over a selectedperiod that covers 48 reaction time tests, in this case for B, for two diﬀerentexperiments. Test-related pupil dilation responses, that occur every 4 seconds,are not immediately visible in this graph due to random noise and a relativelystrong longer-periodic variation over 20-60 seconds . The Green curve showsthe relative variation of the inter-pupil distance, with variations an order ofmagnitude smaller than the pupil size changes.

600 650 700 750

Time since start (s) − − − R e l a t i v ec h a n g e ( % ) Left pupilIPD

750 800 850 900

Time since start (s) − − − R e l a t i v ec h a n g e ( % ) Left pupilIPD

Fig. 4.

Filtered pupil size plots; 48-test long sections of two experiments (B, left-eye).Relative inter-pupil distance (

Green ) indicates stable eye-to-screen distances.

Fig 5 shows the area under the pupil dilation curve between 1 . − .

5s aftercue (1 . − .

0s after target) for each experiment, serving as a very rough indicatorof the relative cognitive load caused by the tests. From these, also a δ (incon)can be calculated by subtracting the congruent value from the incongruent.It is seen that both A and B have larger pupil dilation responses for theinitial two experiments, after which the level is lower. For B it remains at lowerlevels, indicating a training eﬀect. For A, the pattern is less clear, with possiblyan increased load towards the end of the two week period. A frequency domain analysis of the signal shows, however, a distinct peak at 0 . Session . . . . . . . . P up il D il a t i o n ( A r e a und e r C u r v e ) incongruentneutralcongruent Session . . . . . . . . P up il D il a t i o n ( A r e a und e r C u r v e ) incongruentneutralcongruent Fig. 5.

Area under left-eye pupil dilation curves [1 . , . Left ) and B show initial trainingeﬀects; only A however shows an increasing trend in cognitive load for the remainingsessions.

In order to verify how well previous pupil dilations allow predicting the classof congruency condition, a subset of the 3 within-experiment 96 − average pupildilation responses from each subject were ordered in each of the 6 possible per-mutations of the 3 congruency conditions. A neural-network type classiﬁer wasthen trained to identify which of the 3 averaged pupil dilations were the incon-gruent. Number of trials in block . . . . . . T e s t E rr o r R a t e AB[Chance Level]

Fig. 6.

Test error rates (0 . / . . Fig 6 shows the resulting test error rate vs. the number of averaged experi-mental tests, dividing the 96 equal-condition responses of each experiment intogroups of 96, 48, 32 or 24 tests, and using a test/train split of 0 . / .

1. Theperformance is clearly above chance level (66 . .4 Correlating response times and pupil reactions Table 2 show the Pearson Correlation Coeﬃcients for all combinations of Atten-tion Network- and Reaction-Times, Pupil Dilation metrics and Time-of-Day foreach subject, as it varies over the two week period. As the data sets are small (16and 17 sets), caution is needed when judging the signiﬁcance levels (p-values).

Table 2.

Pearsons correlation coeﬃcients between key metrics for A (

Top ) and B. Ashows negative correlation between mean reaction time and error rate (”speed-accuracytradeoﬀ”). B (opposed to A) shows correlation between pupil dilations and error rate,possibly indicating a diﬀerent response to varying levels of fatigue or motivation; ad-ditionally alertness (and partly orientation) may inversely correlate to pupil dilations.Both show expected correlations between pupil dilation metrics.

Att.-Net/Reaction Time Pupil Dilation

Orient Conﬂict µ ( RT ) Incon Neutral Con δ (Incon) ToD

Errors

Att.-Net/Reaction Time

Alert 0 . − . − . − . − . − . − .

008 0 . − . − . † − . ∗ .

274 0 . − .

020 0 .

402 0 .

132 0 . . ∗ − . − .

149 0 . − .

147 0 . − . µ ( RT ) 0 .

002 0 . − .

069 0 .

068 0 . − . † Pupil Dilation

Incon . ‡ . ‡ . ‡ . − . . ‡ .

362 0 .

222 0 . .

034 0 . − . δ (Incon) 0 . − . ToD . ∗ . † % and ‡ . % marked. Att.-Net/Reaction Time Pupil Dilation

Orient Conﬂict µ ( RT ) Incon Neutral Con δ (Incon) ToD

Errors

Att.-Net/Reaction Time

Alert 0 . − .

107 0 . − . † − . † − . − . † . − . − .

094 0 . − . ∗ − . − . † − .

155 0 . − . .

289 0 .

431 0 .

439 0 .

362 0 .

309 0 .

411 0 . µ ( RT ) − . − . − . − .

173 0 . ∗ − . Pupil Dilation

Incon . ‡ . ‡ . ‡ − . . ‡ Neutral . ‡ . † − . . ‡ Con 0 . − . . † δ (Incon) − . . † ToD − . ∗ . † % and ‡ . % marked. With some variation between subjects, pupil dilation responses appear cor-related.ubject A shows correlation between orientation and conﬂict resolution tim-ings, which is however not seen at all for B. A also may have some correlationbetween mean reaction time and orientation resp conﬂict resolution timings,which are however again not quite as present with B.Subject B shows correlation between alertness timing and both incongruent,neutral and δ (incon) pupil dilations, as well as correlation between orientationtiming and congruent pupil dilations. These are not present for A, however.Also, there are indications of a correlation between the time of day and themean reaction time; the experiments done on B were spread out over largersections of the day than for A, which might explain why this is not seen for A.[7] reported correlations between the conﬂict resolution timing and the meanreaction time over a large group of people. As such, the conditions are not similarto the within-person variation, but it might be worth pointing out that a similarcorrelation is partly present for A and cannot be ruled out for B. Using low cost portable eye tracking to measure the variations in pupil size,we were able to diﬀerentiate and predict whether users were engaged in morecomplex decision making or merely maintaining a general alertness when inter-acting with a laptop, over nearly 10 .

000 tests. A parallel single-experiment study[5] repeating the experimental setup with nearly 10 .

000 additional tests over 18more subjects, have conﬁrmed that similar signiﬁcant pupil response diﬀerencescharacterize the contrasts between incongruent versus neutral or congruent taskconditions.In the present study, we found a signiﬁcant diﬀerence based on the left eyepupil size for the conﬂict resolution task in contrast to the attentional networkcomponents of alertness and re-orientation, but not between these two lattertasks. These results may reﬂect ﬁndings in other studies indicating that thephasic component in attention is predominantly triggered by tasks requiring adecision, whereas the tonic alertness may suﬃce for solving less demanding taskslike responding to visual cues or re-orienting attention to an unexpected part ofthe screen [2] as seen in the “baseline” experiment, where no decision needs tobe made and no motor cortex activation takes place.From a quantiﬁed self perspective of individual behaviour, using mobile eyetracking to assess levels of engagement, the relations between pupil size (a possi-ble quantiﬁcation of the cognitive load), and error rate/reaction time (a quantiﬁ-cation of the objective task performance), indicate individual diﬀerences amongthe subjects’ behavioural adaptation to the attentional tasks. A is apparentlycoping with the cognitive load by trading oﬀ speed and accuracy to optimize per-formance, as indicated by the lack of correlation between pupil size and either ofthe performance related measures. However, for B the correlation between pupilsize and accuracy may suggest a behavior characterized by applying more eﬀortto the task if the number of errors increase.s we have in this study only used the pupil size as a measure of atten-tion, even without considering the spatial density of ﬁxations or the speed ofsaccadic eye movements that could entail further information, we suggest thatmobile eye tracking may not only enable us to assess the eﬀort required whenundertaking a variety of tasks in an everyday context, but could also longer termprovide a foundation for continuously adapting the content and interaction withsmartphones and laptops based on our perceived level of attention.

References [1] Ang, Y.S., Manohar, S., Apps, M.A.J.: Commentary: Noradrenaline andDopamine Neurons in the Reward/Eﬀort Trade-oﬀ: A Direct Electro-physiological Comparison in Behaving Monkeys. Frontiers in Behav-ioral Neuroscience 9(November), 310 (nov 2015), [2] Aston-Jones, G., Cohen, J.D.: An Integrative Theory of Locus Coeruleus-Norepinephrine Function: Adaptive Gain and Optimal Performance. Annual Re-view of Neuroscience 28(1), 403–450 (2005), [3] Bækgaard, P.: Simple python interface to the Eye Tribe eye tracker (2015), https://github.com/baekgaard/peyetribe/ [4] Bækgaard, P.: Attention Network Test implemented in PsychoPy (2016), https://github.com/baekgaard/ant [5] Baekgaard, P., Petersen, M.K., Larsen, J.E.: Diﬀerentiating attentional networkcomponents using mobile eye tracking. In preparation (2016)[6] Beatty, J.: Task-evoked pupillary responses, processing load, and the structure ofprocessing resources (1982)[7] Fan, J., McCandliss, B.D., Sommer, T., Raz, A., Posner, M.I.: Testing the Eﬃ-ciency and Independence of Attentional Networks. Journal of Cognitive Neuro-science 14(3), 340–347 (2002), [8] Gabay, S., Pertzov, Y., Henik, A.: Orienting of attention, pupil size, and the nore-pinephrine system. Attention, perception & psychophysics 73(1), 123–9 (2011), [9] Hampel, F.R.: The Inﬂuence Curve and its Role in Robust Estimation. Journalof the American Statistical Association 69(346), 383–393 (1974), [10] Holmqvist, K.: Eye Tracking: a comprehensive guide to methods and measures.Oxford University Press (2011)[11] Hunter, J.D.: Matplotlib: A 2D graphics environment. Computing in Science andEngineering 9(3), 99–104 (2007)[12] Hy¨on¨a, J., Tommola, J., Alaja, A.M.: Pupil Dilation as a Measure of ProcessingLoad in Simultaneous Interpretation and Other Language Tasks. The QuarterlyJournal of Experimental Psychology Section A 48(3), 598–612 (1995), [13] Joshi, S., Li, Y., Kalwani, R.M., Gold, J.I.: Relationships between Pupil Diameterand Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex.Neuron 89(1), 221–234 (2016), http://dx.doi.org/10.1016/j.neuron.2015.11.028

14] Laeng, B., Ørbo, M., Holmlund, T., Miozzo, M.: Pupillary stroop eﬀects. CognitiveProcessing 12(1), 13–21 (2011)[15] McKinney, W.: Data Structures for Statistical Computing in Python. Proceedingsof the 9th Python in Science Conference 1697900(Scipy), 51–56 (2010), http://conference.scipy.org/proceedings/scipy2010/mckinney.html [16] Oliphant, T.E.: SciPy: Open source scientiﬁc tools for Python. Computing inScience and Engineering 9, 10–20 (2007), [17] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Van-derplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duches-nay, ´E.: Scikit-learn: Machine Learning in Python. . . . of Machine Learn-ing . . . 12, 2825–2830 (2012), http://dl.acm.org/citation.cfm?id=2078195$\delimiter"026E30F$nhttp://arxiv.org/abs/1201.0490 [18] Peirce, J.W.: PsychoPy-Psychophysics software in Python. Journal of Neu-roscience Methods 162(1-2), 8–13 (2007), http://dx.doi.org/10.1016/j.jneumeth.2006.11.017 [19] P´erez, F., Granger, B.E.: { IP } ython: a System for Interactive Scientiﬁc Com-puting. Computing in Science and Engineering 9(3), 21–29 (may 2007), http://ipython.org [20] Posner, M.I.: Attentional networks and consciousness. Frontiers in Psy-chology 3(MAR), 1–4 (2012), [21] The Eye Tribe: The Eye Tribe API Reference, http://dev.theeyetribe.com/api/ [22] Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: A structurefor eﬃcient numerical computation. Computing in Science and Engineering 13(2),22–30 (2011)[23] Varazzani, C., San-Galli, A., Gilardeau, S., Bouret, S.: Noradrenaline andDopamine Neurons in the Reward/Eﬀort Trade-Oﬀ: A Direct Electro-physiological Comparison in Behaving Monkeys. Journal of Neuroscience35(20), 7866–7877 (may 2015),[22] Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: A structurefor eﬃcient numerical computation. Computing in Science and Engineering 13(2),22–30 (2011)[23] Varazzani, C., San-Galli, A., Gilardeau, S., Bouret, S.: Noradrenaline andDopamine Neurons in the Reward/Eﬀort Trade-Oﬀ: A Direct Electro-physiological Comparison in Behaving Monkeys. Journal of Neuroscience35(20), 7866–7877 (may 2015),