Control of fixation duration during visual search task execution
CControl of fixation duration during visual search task execution
A.Y.Vasilyev
Queen Mary University of London, Mile End Road, E1 4NS
Abstract
We study the ability of human observer to control fixation duration during execution of visual search tasks.We conducted the eye-tracking experiments with natural and synthetic images and found the dependency offixation duration on difficulty of the task and the lengths of preceding and succeeding saccades. In order toexplain it, we developed the novel control model of human eye-movements that incorporates continuous-timedecision making, observation and update of belief state. This model is based on Partially Observable MarkovDecision Process with lag in observation and saccade execution that accounts for a delay between eye andcortex. We validated the computational model through comparison of statistical properties of simulated andexperimental eye-movement trajectories.
Keywords: fixation duration, visual search, reinforcement learning
1. Introduction
Eye-movements are essential actions of the humanoculomotor system that overcome the limitation ofacuity outside of foveal region. The saccades - highvelocity gaze shifts - are used to direct high acuityfoveal vision to the most informative locations of ascenes. The saccades are followed by fixations, dur-ing which the fixational eye-movements are gener-ated in order to maintain high resolution perceptualinput. The implementation of these processes is cru-cial to the natural behavioural objectives of humansand animals. However, relatively little is knownabout the general scheme of control of fixation dura-tion and location. Humans exhibit high variabilityin fixation duration, which depends on many fac-tors including the visual task and the health condi-tions of observers. For example, the patients withAge-related Macular Degeneration demonstrate sig-nificantly longer fixation duration than the normalcontrols [1]. Meanwhile, the normal controls alsodemonstrate the increase of average fixation dura- tion with the increase of difficulty of visual searchtask [2, 3, 4].Previously, it was suggested that control of fixa-tion duration is involuntary, and there is a mecha-nism that estimates the required time needed for thediscrimination of target [2, 5]. Later it was argued[6] that fixation duration is influenced both by globalcontrol that defines prior distribution of fixation du-ration and also a local control that inhibits the nextsaccade if visual processing is not completed. Thisprinciple of mixed control was incorporated intosaccade timing models, which successfully explainthe statistics of fixation duration during observationof natural scenes [7], visual search [8] and reading[9]. Despite the remarkable achievements of saccadetiming models in reproducing of eye-movement be-haviour, they leave unexplained why the neural im-plementation of this complex framework is beneficialfor human observers in the first place. These modelsperfectly capture the dynamic of saccade program-ming, but they don’t reflect the process of task ex-ecution itself, because the status of task execution
Preprint submitted to Elsevier April 24, 2020 a r X i v : . [ q - b i o . N C ] A p r doesn’t influence the model’s internal variables. Forexample, in the ICAT model [8] the visual searchis not terminated when the target item is fixatedand processed by foveal vision. Because perceptionis not incorporated into saccade timing models, thetarget objects are processed in the same way as dis-tractors, therefore these models can’t judge if theobserver has located the target. We argue that sac-cade timing models don’t explain the execution ofvisual search task, but instead describe dynamics ofvisual attention in array of distractors.Meanwhile, the control models of human eye-movements [10, 11, 12] give interpretation of visualbehavior from optimality point of view and have ca-pability to incorporate mixed control of fixation du-ration as a mere consequence of their architecture.The global control of fixation duration is relatedto overall task requirement [8] and influences thefixation duration throughout the execution of theepisode. The prior knowledge of properties of stim-ulus image and target object(s) in control modelsdefine the policy of observer, which governs it’s eye-movement behaviour and remains unchanged duringthe task execution. Therefore, the control modelsincorporate the global control through learning andimplementation of the policy. In the meantime, thelocal control, which is related to decision makingbased solely on processing of visual input during thefixation [6], is implemented in control models as inte-gration of visual input into belief state and output ofpolicy (decision) based on updated belief state. Theonly component, which is absent in control models,but required for control of fixation duration, is oper-ation at frequency higher than frequency of saccadicevents.In order to study the adaptivity of fixation du-ration we extended the computational model pre-sented in our previous work [12] by introduction ofcontinuous observation and decision making. In sucha model the observer is given an ability to terminatethe current fixation at any moment of time. There-fore, the optimal observer is faced with the problemof trade-off between the quality of extracted visualinformation and time costs of fixation. We expectthat an extended computational model subject totemporal constraints will provide quantitative ex- planation of increase of average fixation duration inour psycho-physical experiment with decrease of de-tectability of target object. Furthermore, the noisyobservation model [13] reformulated in continuoustime is identical to evidence accumulation modelssuch as LATEST [14], and we expect that this com-putational model will provide similar distribution offixation duration.The modeling of decision making in continuoustime will allow us to take into account the influ-ence of saccade latency. It’s well known that sac-cade programming is executed in two stages: labileand non-labile stage [15, 9]. During the first stagethe observer can cancel the execution of a saccadeaccording to initial decision, and initiate a saccadetowards a new direction. Continuous perception anddecision making can describe this process and takeinto account the influence of duration of non-labilestage.Reformulation of the computational model of eye-movements in continuous time, in the future, willallow us to add fixational eye-movements during fix-ation intervals, and study their influence on deci-sion making of the observer. Previously, the controlmodels of eye-movements [13, 16] didn’t take intoaccount the presence of drift and microsaccades dur-ing fixation intervals. The microsaccade occurrenceshortly before saccade onset results in increase ofsaccade reaction time [17] in response task, whichindicates the influence of fixational eye-movementson saccade programming. Furthermore, the primaryrole of saccades in counteracting of perceptual fad-ing [18] indicates their importance in the process ofextraction of visual information during the fixation.
2. Extension of computational model
We developed the novel computational modelof human eye-movements that incorporatescontinuous-time decision making, observationand update of belief state. We motivate this ex-tension by experimental evidence of longer fixationduration in lower visibility condition [1]. .2 Observation and inference According to the literature, saccade programmingis assumed to be a two-stage process that consistsof labile and non-labile stages [15, 9]. The labilestage is the first stage of the saccade programming,during which the initial saccade command can becancelled in a favour of saccade to another location.The saccade to the next location is executed afterthe non-labile stage. The visual input is active dur-ing both labile and non-labile stages and suppressedduring the execution of a saccade. Therefore, thevisual input received during non-labile stage of sac-cade programming can be used for decision-makingonly during the next fixation.We model this effect as a time lag in interaction τ between the observer and environment, which hasthe duration equal to the length of non-labile stage.In our model the interaction between observer andenvironment is modelled as exchange of "messages"with certain frequency Ω = 1 / Θ int , where Θ int iscalled resolution interval. The resolution interval in-troduces discretization required for numerical solverand has no biological meaning. At time step n ob-server receives observation S n and vector of actualfixation location x n , which are used for inference ofbelief state b n . In our model the belief state is prob-ability distribution over potential target locations,and observation is stochastic noise signal at each lo-cation. The belief state is used for decision making- estimation of the next intended location to fixate: x n +1 = π ( b n ) . However this decision to fixate is notexecuted immediately, but with delay, which definedby resolution interval and time lag in interaction: lag = (cid:100) τ / Θ int (cid:101) , where (cid:100)(cid:101) is ceiling function. There-fore, the environment receives the delayed messageof decision x n − lag +1 from the observer, which is usedfor execution of saccade: x = α ( x ) . At this stagethe world state is updated, and the gaze is trans-ferred to the new location. The environment emitsthe observation S n + lag , which will be available tothe observer only after a number of steps definedby lag . If at the moment environment was in thestage of execution of the previous saccade command x n − lag , the new saccade signal is neglected, and theenvironment emits null observation. The time lag Figure 1: At time step n observer receives observation S n andvector of actual fixation location x n , which are used for infer-ence of belief state b n . This belief state is used for decisionmaking - estimation of the next intended location to fixate: x n +1 = π ( b n ) . This decision is not executed immediately, bywith delay defined as: lag = (cid:100) τ/ Θ int (cid:101) . On the same timestep n , environment receives the delayed message of decision x n − lag +1 from the observer, which is used for execution ofsaccade: x = α ( x ) . At this stage the world state is updated,and the gaze is transferred to the new location. The environ-ment emits the observation S n + lag , which will be available tothe observer only after a number of steps defined by lag . If atthe moment environment was in the stage of execution of theprevious saccade command x n − lag , the new saccade signal isneglected, and the environment emits null observation. in interaction τ is chosen as ms, which is an ex-perimental value for the delay between eye and V1[19, 20]. We extend the previous computational model [12]by introduction of continuous time Gaussian whitenoise signal ˙ S (cid:0) t, x t , u (cid:96) (cid:1) at each location u (cid:96) , eachcurrent fixation location x t at time t of world state,which satisfies following:1. E [ ˙ S (cid:0) t, x t , u (cid:96) (cid:1) ] = δ (cid:0) | u (cid:96) − u (cid:63) | (cid:1) / Θ E [ ˙ S (cid:0) t , x t , u l (cid:1) , ˙ S (cid:0) t , x t , u l (cid:1) ] = δ t ,t δ l ,l / (Θ V ( u (cid:96) − x t )) where u (cid:63) is location of target object, and Θ = 250 ms average fixation in double fixation task [10]. .3 Decision making and action execution S n = S ( t n , x t n , u (cid:96) ) = (cid:82) t n +∆ tt n ˙ S (cid:0) t, x t , u (cid:96) (cid:1) dt n with finite duration ∆ t . These signals, according toconditions, will be Gaussian random variables withthe following means and standard deviations: E [ S n ] = ∆ tδ (cid:0) | u (cid:96) − u (cid:63) | (cid:1) / Θ (1) σ [ S n ] = √ ∆ t √ Θ V ( u (cid:96) − x n ) (2)After receiving of observation signal S n Bayesian in-ference is used to update the belief state: b n,(cid:96) ∝ p (cid:0) S n | x t n (cid:1) b n − ,(cid:96) . (3)Our next assumption is that observer is effectivelyblind during execution of saccades, therefore, thesignal from time intervals, which contain saccadesshould be more noisy or be neglected. For each timeinterval we estimate the total time ∆ t s the gaze isfixed. Then we use equations 1 and 2 to generate theobservation and equation 3 to estimate the beliefstate after receiving of observation from this timeinterval. If the saccade’s duration is longer than thewhole time step, the environment doesn’t yield theobservation. After the inference of belief state b n observermakes decision where to fixate next according to thepolicy: x n +1 ← π ( b n ) . (4)The command of execution of the next saccade isreceived by the environment with delay defined by lag = (cid:100) τ / Θ int (cid:101) . If at the moment of arrival of com-mand the environment was not executing the pre-vious saccade, the next saccade (or fixation) is exe-cuted as: x n +1 = (cid:40) x n +1 , if x n +1 = x n α ( x n +1 ) , otherwise (5)here α ( D ) is the execution function: x n +1 = α ( x n +1 ) = x n +1 + ζ n (6) where ζ n is Gaussian-distributed random error withzero mean and standard deviation ν defined in [9]: ν = ζ + ζ (cid:107) x n +1 − x n (cid:107) (7)the error of saccade execution is proportional tointended saccade amplitude (cid:107) x n +1 − x n (cid:107) given indegrees, the value of parameters: ζ = 0 .
87 deg , ζ = 0 . (from [9]). The time t of world stateis updated in the following way: t n +1 = t n + Θ int (8)The position of fixation is updated after the saccadecommand is executed. The time required to executethe saccade command x n is defined as: Θ sc ( n ) = Θ mv ( n ) + Θ st ( n ) (9)where Θ mv ( n ) is time required to execute the move-ment: Θ mv ( n ) = τ mv (cid:107) x n − x n − (cid:107) . (10)and Θ st ( n ) is time required to mechanically stop theeye-movement: Θ st ( n ) = τ st (cid:107) x n − x n − (cid:107) (11)The constants τ mv = 21 ms · deg − . , τ st = 6 ms/ deg come from eye-tracking data from experiments of[21, 22, 23] and [24, 25] correspondingly. In the caseif Θ st ( n ) + Θ mv ( n ) > m Θ int , where m is an inte-ger, the next m saccade commands will be cancelled.Furthermore, the eye-movement is triggered only ifdecision where to fixate next doesn’t match with cur-rent location: x n +1 (cid:54) = x n , according to equation 9.This means that saccades don’t start at every timestep, and the k -th saccade will start at time step n k ≥ k . Therefore the length of k -th saccade equals: A k = (cid:13)(cid:13) x n k − x n k − (cid:13)(cid:13) (12)And the corresponding duration of k -th durationequals: Θ k = t n k +1 − t n k − Θ sc ( n k ) (13)which means that k -th fixation starts after k -th sac-cade is completely finished. The diagram 2 explainsthe notations in this section. Figure 2: The sequence of saccade with endpoints ( x n k − , x n k , x n k +1 ) and corresponding fixation durations ( Θ k , Θ k +1 ) . Given the initial world state S we define the valuefunction for policy µ as an expectation of a return R : V µ ( S ) = E [ R | µ, S ] (14)The random variable R denotes the return and isdefined by: R ≡ − N (cid:88) n =0 Θ int = N Θ int (15)where N is total number of steps in episode. Westart with representation of the stochastic policy µ [11]: µ ( D, p ) = exp( f ( D, p )) (cid:80) l exp( f ( l, p )) (16)where f ( D, p ) is the decision preference function,which indicates a tendency to choose the decision D in the belief state p . In this study we limit thesearch of f ( D, p ) to convolution of belief state p dueto shift-invariance of visual search process [11]: f ( D, p ) = (cid:88) l K ( D − l ) p ( l ) (17)our task is the search of kernel function K , whichcorresponds to the policy that maximizes the valuefunction V µ : K ∗ = arg max K V µ ( K ) ( S ) (18)for any starting world state ∀ S . The policy µ ( K ∗ ) is called optimal policy of gaze allocation. We ap-proach problem 18 with algorithm of Policy Gradi-ent Parameter Exploration [26]. The table 1 showsthe values of all parameters used in simulations ofcomputational model.
3. Psychophysical experiments
In this section we describe the eye-tracking exper-iments of visual search with natural and syntheticimages.
The group of five participants with normal tocorrected-to-normal vision participated in the exper-iment. All participants were postgraduate studentsof Queen Mary University of London. The partic-ipants was aware of the experimental settings andpassed 10 minutes of training sessions with four dif-ferent experimental conditions, which correspond tothe certain value of the RMS contrast of backgroundnoise. The experiments were approved by the ethicscommittee of Queen Mary University of London andinformed consent was obtained.
We used DELL P2210 22” LCD monitor (resolu-tion × , refresh rate 60 Hz) driven by aDell Precision laptop for all experiments. The eyemovements of the right eye were registered using EyeTracker device SMI-500 with a sampling frequency of Hz. The Eye tracker device was mounted on themonitor. Matlab Psychtoolbox [28] was used to runthe experiments and generate the stimulus images.The saccades and fixations were detected automat-ically by software provided by SMI with dispersionbased algorithm I-DT [29]. The dispersion-based al-gorithm associates the gaze samples with the samefixation if the samples are located within a region ofsize . for the minimal fixation duration of ms. .4 Procedure Meaning Symbol Value Source
RMS contrast of background noise e n (0 . , . , . , . SetRMS contrast of target e t ζ ζ Θ ms [10]saccade duration coefficient τ sac ms · deg − . [27]time to stop the saccade: slope Θ ,fix ms/ deg [24]resolution interval Θ int (128 , , , , ms Setsize of stimulus image S ×
15 deg Setdimensionality of computational grid N × N × Set
Table 1: The parameters of computational model used in our simulations, including those that were set experimentally.
We used the database of 1204 natural images from[30] and the procedure [31] of estimation of d-prime- value of visibility map in the center of visual field- for 4 cycles per degree windowed sine-wave grat-ing with a width of . , which was a target ob-ject in our visual search experiments. We estimateda mean of d-primes of randomly placed 4 cpdsine-wave gratings for each image and sorted all 415images according to the mean value. On the nextstep the sorted array of images was divided on 5equal ( ) bands of d-prime. Therefore, we sortedour images by the level of detectability of target ob-ject. Next, we computed average d-prime for eachband. For each image of the band we selected thelocation of target, whose estimate of d-prime is theclosest to average one in the band, and generatedthe image with this target location. So, we formed 5bands of difficulty with images that have close val-ues of d-prime. The values of average d-prime for 5bands are: 8.53, 3.98, 2.89, 2.24, 1.57. The Figures3 demonstrates stimulus images for the first and thefifth bands of difficulty. The second type of stim-uli were 1/f noise images described in the originalexperiment [10]. The 1/f noise was generated on asquare region on the screen, which spans the visualangle of ×
15 deg . The target was sine grating − framed by a symmetric raised cosine. Thetarget appeared randomly at any possible locationon the stimuli image within the square region. The experiments were provided for one level of RMS con-trast of target e t = 0 . and several levels of 1/f noiseRMS contrast e n ∈ (0 . , . , . , . . Five participants with normal or corrected to nor-mal vision performed visual search task on generatedimages. We used the SMI eye-tracker to record theeye-movements of participants. The visual searchexperiment consisted of five blocks corresponding tofive bands with 2 minutes of rest between blocks.Each trial of search started with displaying of thefixation cross in the center of the screen for 1 sec-ond. The participants were instructed to find thetarget object on the image as soon as possible, af-ter the stimulus image was displayed on the screen.The participants were asked to press "Next" but-ton once they stopped their eyes on the target loca-tion. If the participants stopped their eyes furtherthan 2 degrees from the target location, we consid-ered this trial as unsuccessful and excluded it fromfurther analysis. The number of unsuccessful trialsgrows with the difficulty of the band: 3.1%, 5.3%,8.3%, 11.2%, 17.1 % for corresponding five bands ofd-prime.Psychophysical experiments of visual search [12]were repeated for constant target contrast (cid:15) t = 0 . and noise RMS contrast (cid:15) n = 0 . , . , . , . . 5normal controls were performing 240 trials of visualsearch task. .4 Procedure (a) Stimulus images corresponding to the first band of d-prime. (b)
Stimulus images corresponding to the fifth band of d-prime.
Figure 3: We used the database of natural images from [30] to create the stimulus images for visual search experiment. Thetarget object was 4 cycles per degree windowed sine-wave grating. We estimated a mean of d-primes of randomly placed4 cpd sine-wave gratings for each image and sorted all 415 images according to the mean value. On the next step the sortedarray of images was divided on 5 equal ( ) bands of d-prime. The target object on stimulus images from the first band (top)is the least difficult to detect. Whereas the fifth band consists of the most difficult stimulus images in experiment. .2 Interaction with saccade length
4. Statistics of fixation duration
In this section we discuss the statistics of fixationduration in human eye-movements and trajectoriessimulated with learned policy. Both learning of pol-icy and simulations were conducted with parameterspresented in the table 1 and resolution interval ms. The figure 4 shows the change of fixation dura-tion with difficulty for human observer and simu-lated agent in the case of synthetic images (left) andnatural images (right). In the case of experimentalvalues, we discovered significant difference betweenmeans of fixation durations for different values ofRMS contrast of background noise in the case of syn-thetic images. Using Bonferroni method for multiplecomparison we found mean fixation duration for alldifficulty levels differ across all levels ( p < . ) forpsychophysical experiments for both types of stim-ulus.We found that there is significant difference be-tween the values of fixation duration predicted bythe model and measured in experiments. The com-putational model can’t explain the heavy tail in theexperimental distribution for any case of difficulty(see Figures 5 and B.9). We assume that the reasonof this mismatch is that our model doesn’t take intoaccount the fixational eye-movements (FEMs): driftand microsaccades. The FEMs change the positionof gaze during the fixation, which allows to extractadditional visual information without initiation ofsaccade. This favors longer fixations and lower sac-cade frequency. However, the model correctly pre-dicts the change of fixation duration with difficulty.Assuming that the model can’t explain 35 ms of dif-ference between simulated and experimental fixationduration, if the difference is subtracted from experi-mental values, they will not be significantly differentfrom simulated values. Another reason of mismatchbetween model simulations and human behaviour isassumed preference of saccade initiation towards cer-tain mean rate [7, 6, 9], which originates from rhyth-mic activity in central neural system [32]. Becauseour approach is phenomenological, it uses Bayesian Visual search on synthetic imagesCase Preceding fixation Preceding saccade noisecontrast R p-value R p-value Table 2: Pearson correlation and p-values for the case ofvisual search task on synthetic images. The time series ofduration of fixation (succeeding) preceding the saccades are(positively) negatively correlated to time series of lengths of(incoming) outgoing saccades. framework [33] to model the processing within cen-tral neural system neglecting its temporal aspects.More detailed theoretical model of neural system isrequired to account for preference in saccade initia-tion rate.
Another piece evidence of control of fixation du-ration during visual search process is correlation be-tween the time series of lengths of both precedingand succeeding saccades and fixation duration. Weestimated the Pearson correlation coefficients and p-values for both cases of psycho-physical experimentswith natural and synthetic images and presentedthem in corresponding tables 2 and 3. The timeseries of duration of fixation (succeeding) precedingthe saccades are (positively) negatively correlatedto time series of lengths of (incoming) outgoing sac-cades. Meanwhile, in the case of synthetic imagesthe p-value exceeds the significance level . onlyfor the RMS contrast value of . , and in the case ofnatural images p-value exceeds the significance levelfor the first difficulty band. In the lower difficultycases human observers execute the visual search taskfaster [13], which results in smaller number of eye-movements. Lower number of sample points didn’tallow us to confirm the hypothesis on significancelevel 0.05.Our next goal is to compare the dependency offixation duration on the lengths preceding and suc- .2 Interaction with saccade length D u r a t i o n , m s extended modelhuman 1 2 3 4 5Difficulty band200210220230240250260270 D u r a t i o n , m s extended modelhuman Figure 4: The fixation duration in simulated eye-movement trajectories and experimental data. The computational modelsignificantly underestimates the values of fixation duration in humans for both cases of synthetic images (left) and naturalimages (right). F r e q u e n c y humanextended model RMS of noise = 0.1 F r e q u e n c y humanextended model RMS of noise = 0.15 F r e q u e n c y humanextended model RMS of noise = 0.2 F r e q u e n c y humanextended model RMS of noise = 0.25
Figure 5: The distributions of fixation duration in experiment on synthetic images and simulations of extended model. Thecomputational model can’t explain the heavy tail in the experimental distribution for any case of difficulty, and we assume thatthe reason of this mismatch is that our model doesn’t take into account the fixational eye-movements (FEMs), which favourlonger fixation duration. .2 Interaction with saccade length difficultyband R p-value R p-value
Table 3: Pearson correlation and p-values for the case of vi-sual search task on natural images. The time series of dura-tion of fixation (succeeding) preceding the saccades are (pos-itively) negatively correlated to time series of lengths of (in-coming) outgoing saccades. ceding of saccades between simulations and experi-ment. Before the regression analysis, we find an op-timal transformation of data with multivariate BoxCox transformation [34, 35]. The Box Cox transfor-mation of a variable x is defined as: x ( λ ) = (cid:40) x λ − λ , if λ (cid:54) = 0log( x ) , if λ = 0 (19)The aim of multivariate Box Cox transformation isto find such parameter λ that the transformed ob-servations of x λ satisfy full normality assumption,e.g. the independent and normally distributed. Weapply Box Cox transformation to the experimen-tal observations ( Θ λ , A λ ) to find such parameters λ and λ that observations satisfy the multivariateassumption: ( Θ λ , A λ ) ∼ N ( µ, Σ) , where µ and Σ are mean and covariance matrix correspondingly.We present the result of the analysis in the tablesC.4 and C.5. We apply the log-log transformationof data for both variables due to proximity of in-ferred parameters to zero.Next, we make an assumption of power-law rela-tionship between saccade length and fixation: Θ = pA q due to previously reported non-linear relation-ship between main saccadic characteristics [27]. Inthe log-transformed data coefficients p and q cor-respond to intercept and slope of the linear rela-tionship: log Θ = q log A + log p . We perform thelog-transformation due to convenience of s We log-transformed our data and formed the vec-tors log( Θ i ) , log(∆ A i ) of fixation durations and out-going saccade lengths and log( Θ i +1 ) , log(∆ A i ) offixation durations and incoming saccade lengths.We computed coefficients of linear robust fit us-ing RANSAC method [36] for log-transformed vec-tors together with principal components. RANSACmethod was used due to its robustness to outliers.The least square distance was chosen a measureof error. The maximal residual for a data pointto be classified as an inlier was chosen as the me-dian absolute deviation of the target values. Oneach iteration of RANSAC the linear model wasfitted to data points identified as inliers. The it-erations of RANSAC stop if the number of itera-tions reaches the maximal one ( N =1000). The fig-ures C.10 and D.11 show scatter-plots for vectors log( Θ i ) , log(∆ A i ) ( log( Θ i +1 ) , log(∆ A i ) ) that cor-responds to relation between preceding (succeeding)fixation duration and succeding (preceding) saccadelength in the case of synthetic images. The figuresD.12 and D.13 demonstrate the same relation forsaccade lengths and fixation duration in simulateddata. We can see that in both cases of experimentaland simulated data the regression line of robust fit isclose to horizontal and coincides with the first prin-cipal component, which explains of variabilityin experimental data and of variability on av-erage in simulated data (see figures D.14 and D.15for the case of natural images).The red (black) line on Figures 6 and 7 showthe slope and intercept of linear dependency be-tween log-transformed length of saccade and log-transformed fixation duration on RMS contrast ofnoise for simulated (experimental) eye-movements.We can see that slopes of linear dependency betweenfixation duration and preceding saccades are positiveboth for simulated and experimental eye-movementsin both cases of natural and synthetic images, whichis consistent with literature. Previously it was shownthat fixation duration is correlated with precedingsaccade length during execution of fixation tasks.Salthouse et al. [22] hypothesized that the eyescan’t be stopped immediately after execution of sac-cade, and faster was the preceding eye-movement -a longer time interval is required to stop it. .2 Interaction with saccade length S l o p e slope of linear model, preceding fixation extended modelhuman 0.1 0.15 0.2 0.25RMS contrast of noise5.45.55.65.75.8 I n t e r c e p t extended modelhuman intercept of linear model, preceding fixation S l o p e slope of linear model, preceding saccade extended modelhuman 0.1 0.15 0.2 0.25RMS contrast of noise5.25.35.45.55.6 I n t e r c e p t intercept of linear model, preceding saccade extended modelhuman Figure 6: The slopes and intercepts of linear approximation of log-transformed data of fixation duration and saccade lengthstime-series in the case of synthetic images. We can see that slopes of linear dependency between fixation duration and precedingsaccades are positive both for simulated and experimental eye-movements in the case of synthetic images, which is consistent withliterature. In the other hand we can see that the values of slope of linear dependency between fixation duration and succedingsaccade length obtained through RANSAC method are negative, which was not previously discussed in the literature. .2 Interaction with saccade length S l o p e slope of linear model, preceding fixation extended modelhuman I n t e r c e p t intercept of linear model, preceding fixation extended modelhuman1 2 3 4 5Difficulty band0.100.120.140.160.180.200.22 S l o p e slope of linear model, preceding saccade extended modelhuman 1 2 3 4 5Difficulty band5.2255.2505.2755.3005.3255.3505.3755.400 I n t e r c e p t intercept of linear model, preceding saccade extended modelhuman Figure 7: The slopes and intercepts of linear approximation of log-transformed data of fixation duration and saccade lengthstime-series in the case of natural images. We can see that slopes of linear dependency between fixation duration and precedingsaccades are positive both for simulated and experimental eye-movements in the case of natural images, which is consistent withliterature. In the other hand we can see that the values of slope of linear dependency between fixation duration and succedingsaccade length obtained through RANSAC method are negative, which was not previously discussed in the literature.
5. Conclusion
Our goal was to study the ability of human ob-server to control fixation duration during the exe-cution of visual search task. We conducted the eye-tracking experiments with natural and synthetic im-ages and found the following consistent patterns invisual behavior: a steady increase in average fixa-tion duration with difficulty and dependency of fix-ation duration on the length preceding and succed-ing saccades. In order to explain these effects, wepresented the extension of eye-movements model byintroduction of continuous-time observation and de-cision making. The simulations of the model weremade with discretized time steps of progressively de-creasing duration in order to demonstrate the con-vergence of statistics of trajectories with resolutions.We have shown that basic characteristic of simu-lated time-series converge with increase of resolu-tion, which means that decision making process isinvariant to time step after reaching resolution of 16ms. This allows us to perform reinforcement learn-ing for different visibility maps corresponding to var-ious difficulty conditions. We used the model to simulate the eye-movementtime series and to compare them with experimen-tal ones. We found significant difference betweenpredicted and experimental values of fixation dura-tion. However, the model is capable to correctlypredict the increase in values with difficulty. We as-sume that the reason of the mismatch is that ourmodel doesn’t take into account the fixational eye-movements (FEMs): drift and microsaccades, whichfavour longer fixation duration. Despite the additionof FEMs to the model is not challenging itself, werequire finer computational grid since the amplitudeof FEMs is much lower (within the single time step)than the one of saccade. To investigate the effect ofFEMs on extraction of visual information during thefixation we may require the size of grid cell .
01 deg and time step of ms [37]. Since our current simula-tions with grid size of . deg 1 and time step msrequire several hours to learn the optimal policy, it isnot feasible to introduce the addition of FEMs. Thesecond reason of mismatch between model simula-tions and human behaviour is limitations of currentapproach in modelling of processes within centralneural system. Our computational model is formu-lated in terms of Po-Mdp , which relies on Bayesianframework [33] for inference of world states. Ourapproach is agnostic about temporal aspects of neu-ral processing, and, therefore, it neglects the influ-ence of rhythmic activity in central neural systemon dynamics of visual attention and eye-movements[32, 38]. We expect the scenario of bias of saccaderate towards theta-waves frequency [39], which can’tbe explained by phenomenological control models ofvisual behaviour.We studied the dependency between fixation du-ration and the length of preceding and succedingsaccades. The time series of fixation duration andlength of saccades were log-transformed due to theirskewed distribution. We used RANSAC method tofind the coefficients of linear model, and found thatthe slope of the linear model is positive for the caseof saccades preceding to the fixation. This effectwas previously found in fixation task by Salthouseet al. [22], and has simple mechanical explanation.In the other, we found that the slope is negative forthe case of outgoing saccades, which was discussed in
EFERENCES
References [1] S. Van der Stigchel, R. A. Bethlehem, B. P.Klein, T. T. Berendschot, T. Nijboer, S. O. Du-moulin, Macular degeneration affects eye move-ment behavior during visual search, Frontiersin psychology 4 (2013) 579.[2] I. T. C. Hooge, C. J. Erkelens, Control of fix-ation duration in a simple search task, Percep-tion & Psychophysics 58 (1996) 969–976.[3] J. D. Gould, Eye movements during visualsearch and memory search., Journal of experi-mental psychology 98 (1973) 184.[4] K. Moffitt, Evaluation of the fixation durationin visual search, Attention, Perception, & Psy-chophysics 27 (1980) 370–372.[5] I. T. C. Hooge, C. J. Erkelens, Adjustment offixation duration in visual search, Vision re-search 38 (1998) 1295–IN4.[6] R. Engbert, A. Longtin, R. Kliegl, A dynamicalmodel of saccade generation in reading based onspatially distributed lexical processing, Visionresearch 42 (2002) 621–636.[7] A. Nuthmann, T. J. Smith, R. Engbert, J. M.Henderson, Crisp: a computational model offixation durations in scene viewing., Psycho-logical review 117 (2010) 382.[8] H. A. Trukenbrod, R. Engbert, Icat: A compu-tational model for the adaptive control of fixa-tion durations, Psychonomic bulletin & review21 (2014) 907–934.[9] R. Engbert, A. Nuthmann, E. M. Richter,R. Kliegl, Swift: a dynamical model of sac-cade generation during reading., Psychologicalreview 112 (2005) 777.[10] J. Najemnik, W. S. Geisler, Optimal eye move-ment strategies in visual search, Nature 434(2005) 387–391.
EFERENCES
Appendix A. Convergence of basic charac-teristics
First of all, we demonstrate how the values ofthree main characteristics: length of saccade, fixa-tion duration and response time - change with timeresolution. We consider the case e n = 0 . , e t = 0 . and estimate the visibility map according to [10].We learn the policy for several resolution intervals Θ int = (128 , , , , ms with PGPE in order todemonstrate the convergence of basic characteristicswith increase of resolution. The procedure of learn-ing with PGPE is described in our previous workand the current implementation has the same valuesof hyper-parameters [12].We simulated episodes of Po-Mdp for eachtime resolution and corresponding policy to computemean and standard error of characteristics. The fig-ure A.8 shows that a difference of saccade lengthwith resolution is not significant after resultion of ms (Student’s t-test p > . ). At the same time,fixation duration decreases with resolution until itreaches ms, and then the change is not signif-icant ( p > . ). Execution time significantly de-creases from . sec to . sec, and after reachingresolution of ms there is no significant change( p > . ). We observed convergence of three basiccharacteristics for all experimental conditions. Appendix B. Distributions of fixation dura-tion in experiments on naturalimagesAppendix C. Coefficients of Box-Cox trans-formationAppendix D. Scatter plots of log-transformed data
128 64 32 16 8
Resolution, ms S a cc ade l eng t h , deg saccade length
128 64 32 16 8
Resolution, ms F i x a t i on du r a t i on , m s fixation duration
128 64 32 16 8
Resolution, ms E x e c u t i on t i m e , s e c execution time Figure A.8: Convergence of basic characteristics. We simulated episodes of PO-MDP for each time resolution and corre-sponding policy to compute mean and standard error of characteristics. A change of saccade with resolution is not significant(left). Fixation duration decreases with resolution until it reaches ms , and then the change is not significant. Executiontime significantly decreases from sec to . sec, and after reaching resolution of ms there is no significant change. Visual search on synthetic imagesCase Preceding fixation Preceding saccade noisecontrast Lambdas Standard errors Lambdas Standard errors
Table C.4: Predicted coefficients of Box Cox transformation and corresponding standard errors for the experimental data inthe case of synthetic images.
Visual search on real imagesCase Preceding fixation Preceding saccade difficultyband Lambdas Standard Error Lambdas Standard Error
Table C.5: Predicted coefficients of Box Cox transformation and corresponding standard errors for the experimental data inthe case of natural images. F r e q u e n c y humanextended model 0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model 0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model Figure B.9: The distribution of fixation duration in experiment on natural images and corresponding simulations of extendedmodel. As well as in the case of synthetic images, the computational model can’t explain the heavy tail in the experimentaldistributions, because fixational eye-movements were taken into account. saccade lengths, degsaccade lengths, deg preceding fixation duration vs saccade lengths, RMS = 0.2 saccade lengths, deg preceding fixation duration vs saccade lengths, RMS = 0.15 saccade lengths, deg preceding fixation duration vs saccade lengths, RMS = 0.1 preceding fixation duration vs saccade lengths, RMS=0.25 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure C.10: Scatter plots of log-transformed data: preceding fixation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust fit in the case of experiments on syntheticimages. The regression line of robust fit is close to horizontal and coincides with the first principal component, which explains of variability in experimental data for the case of synthetic images and preceding fixation. saccade lengths, degsaccade lengths, deg succeding fixation duration vs saccade lengths, RMS = 0.2 saccade lengths, deg succeding fixation duration vs saccade lengths, RMS = 0.15 saccade lengths, deg succeding fixation duration vs saccade lengths, RMS = 0.1 succeding fixation duration vs saccade lengths, RMS=0.25 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure D.11: Scatter plots of log-transformed data: preceding fixation duration (vertical axis) and saccade lengths (horizontalaxis) on logarithmic scale, principal components and linear approximation by robust fit for preceding fixation in the case ofexperiments on synthetic images. The regression line of robust fit is close to horizontal and coincides with the first principalcomponent, which explains of variability in experimental data for the case of synthetic images and preceding fixation. length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.1 robust fitPCA1PCA2 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.15 robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.2 robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.25 robust fitPCA1PCA2 Figure D.12: Scatter plot of log-transformed data: succeding fixation duration (vertical axis) and saccade length (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust fit for preceding fixation in the case ofsimulations for synthetic images. The regression line of robust fit is close to horizontal and coincides with the first principalcomponent, which explains of variability in simulations for the case of synthetic images and succeding fixation. Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.1 robust fitPCA1PCA2 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.15 robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.2 robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.25 robust fitPCA1PCA2 Figure D.13: Scatter plot of log-transformed data: succeding fixation duration (vertical axis) and saccade length (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust fit in the case of simulations for syntheticimages. The regression line of robust fit is close to horizontal and coincides with the first principal component, which explains of variability in simulations for the case of synthetic images and succeding fixation. saccade lengths, deg preceding fixation duration vs saccade lengths, band 4 saccade lengths, deg preceding fixation duration vs saccade lengths, band 3 saccade lengths, deg preceding fixation duration vs saccade lengths, band 2 saccade lengths, deg preceding fixation duration vs saccade lengths, band 1 saccade lengths, deg preceding fixation duration vs saccade lengths, band 5 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure D.14: Scatter plots of log-transformed data: preceding fixation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust fit in the case of experiments on naturalimages. The regression line of robust fit is close to horizontal and coincides with the first principal component, which explains of variability in experimental data for the case of natural images. datarobust fit PC1PC2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 533.544.555.566.577.58 data robust fitPC1PC2 saccade lengths, deg succeding fixation duration vs saccade lengths, band 1 saccade lengths, deg succeding fixation duration vs saccade lengths, band 5 saccade lengths, deg succeding fixation duration vs saccade lengths, band 4 saccade lengths, deg succeding fixation duration vs saccade lengths, band 3 saccade lengths, deg succeding fixation duration vs saccade lengths, band 2 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure D.15: Scatter plots of log-transformed data: succeding fixation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scalse, principal components and linear approximation by robust fit for preceding saccade in the case ofexperiments on natural images. The regression line of robust fit is close to horizontal and coincides with the first principalcomponent, which explains of variability in experimental data for the case of natural images and succeding fixation. length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 1st band robust fitPCA1PCA2 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 2nd band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 3rd band robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 4th band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 5th band robust fitPCA1PCA2 Figure D.16: Scatter plot of log-transformed data: preceding fixation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust fit for preceding fixation in the case ofsimulation for natural images. The regression line of robust fit is close to horizontal and coincides with the first principalcomponent, which explains of variability in simulations for the case of natural images and preceding fixation. Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 1st band robust fitPCA1PCA2 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 2nd band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 3rd band robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 4th band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 5th band robust fitPCA1PCA2 Figure D.17: Scatter plot of log-transformed data: succeding fixation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust fit for preceding saccade in the case ofsimulation for natural images. The regression line of robust fit is close to horizontal and coincides with the first principalcomponent, which explains57%