[PDF] Control of fixation duration during visual search task execution

Abstract

We study the ability of human observer to control fixation duration during execution of visual search tasks. We conducted the eye-tracking experiments with natural and synthetic images and found the dependency of fixation duration on difficulty of the task and the lengths of preceding and succeeding saccades. In order to explain it, we developed the novel control model of human eye-movements that incorporates continuous-time decision making, observation and update of belief state. This model is based on Partially Observable Markov Decision Process with delay in observation and saccade execution that accounts for a delay between eye and cortex. We validated the computational model through comparison of statistical properties of simulated and experimental eye-movement trajectories.

Full PDF

CControl of ﬁxation duration during visual search task execution

A.Y.Vasilyev

Queen Mary University of London, Mile End Road, E1 4NS

Abstract

We study the ability of human observer to control ﬁxation duration during execution of visual search tasks.We conducted the eye-tracking experiments with natural and synthetic images and found the dependency ofﬁxation duration on diﬃculty of the task and the lengths of preceding and succeeding saccades. In order toexplain it, we developed the novel control model of human eye-movements that incorporates continuous-timedecision making, observation and update of belief state. This model is based on Partially Observable MarkovDecision Process with lag in observation and saccade execution that accounts for a delay between eye andcortex. We validated the computational model through comparison of statistical properties of simulated andexperimental eye-movement trajectories.

Keywords: ﬁxation duration, visual search, reinforcement learning

1. Introduction

Eye-movements are essential actions of the humanoculomotor system that overcome the limitation ofacuity outside of foveal region. The saccades - highvelocity gaze shifts - are used to direct high acuityfoveal vision to the most informative locations of ascenes. The saccades are followed by ﬁxations, dur-ing which the ﬁxational eye-movements are gener-ated in order to maintain high resolution perceptualinput. The implementation of these processes is cru-cial to the natural behavioural objectives of humansand animals. However, relatively little is knownabout the general scheme of control of ﬁxation dura-tion and location. Humans exhibit high variabilityin ﬁxation duration, which depends on many fac-tors including the visual task and the health condi-tions of observers. For example, the patients withAge-related Macular Degeneration demonstrate sig-niﬁcantly longer ﬁxation duration than the normalcontrols [1]. Meanwhile, the normal controls alsodemonstrate the increase of average ﬁxation dura- tion with the increase of diﬃculty of visual searchtask [2, 3, 4].Previously, it was suggested that control of ﬁxa-tion duration is involuntary, and there is a mecha-nism that estimates the required time needed for thediscrimination of target [2, 5]. Later it was argued[6] that ﬁxation duration is inﬂuenced both by globalcontrol that deﬁnes prior distribution of ﬁxation du-ration and also a local control that inhibits the nextsaccade if visual processing is not completed. Thisprinciple of mixed control was incorporated intosaccade timing models, which successfully explainthe statistics of ﬁxation duration during observationof natural scenes [7], visual search [8] and reading[9]. Despite the remarkable achievements of saccadetiming models in reproducing of eye-movement be-haviour, they leave unexplained why the neural im-plementation of this complex framework is beneﬁcialfor human observers in the ﬁrst place. These modelsperfectly capture the dynamic of saccade program-ming, but they don’t reﬂect the process of task ex-ecution itself, because the status of task execution

Preprint submitted to Elsevier April 24, 2020 a r X i v : . [ q - b i o . N C ] A p r doesn’t inﬂuence the model’s internal variables. Forexample, in the ICAT model [8] the visual searchis not terminated when the target item is ﬁxatedand processed by foveal vision. Because perceptionis not incorporated into saccade timing models, thetarget objects are processed in the same way as dis-tractors, therefore these models can’t judge if theobserver has located the target. We argue that sac-cade timing models don’t explain the execution ofvisual search task, but instead describe dynamics ofvisual attention in array of distractors.Meanwhile, the control models of human eye-movements [10, 11, 12] give interpretation of visualbehavior from optimality point of view and have ca-pability to incorporate mixed control of ﬁxation du-ration as a mere consequence of their architecture.The global control of ﬁxation duration is relatedto overall task requirement [8] and inﬂuences theﬁxation duration throughout the execution of theepisode. The prior knowledge of properties of stim-ulus image and target object(s) in control modelsdeﬁne the policy of observer, which governs it’s eye-movement behaviour and remains unchanged duringthe task execution. Therefore, the control modelsincorporate the global control through learning andimplementation of the policy. In the meantime, thelocal control, which is related to decision makingbased solely on processing of visual input during theﬁxation [6], is implemented in control models as inte-gration of visual input into belief state and output ofpolicy (decision) based on updated belief state. Theonly component, which is absent in control models,but required for control of ﬁxation duration, is oper-ation at frequency higher than frequency of saccadicevents.In order to study the adaptivity of ﬁxation du-ration we extended the computational model pre-sented in our previous work [12] by introduction ofcontinuous observation and decision making. In sucha model the observer is given an ability to terminatethe current ﬁxation at any moment of time. There-fore, the optimal observer is faced with the problemof trade-oﬀ between the quality of extracted visualinformation and time costs of ﬁxation. We expectthat an extended computational model subject totemporal constraints will provide quantitative ex- planation of increase of average ﬁxation duration inour psycho-physical experiment with decrease of de-tectability of target object. Furthermore, the noisyobservation model [13] reformulated in continuoustime is identical to evidence accumulation modelssuch as LATEST [14], and we expect that this com-putational model will provide similar distribution ofﬁxation duration.The modeling of decision making in continuoustime will allow us to take into account the inﬂu-ence of saccade latency. It’s well known that sac-cade programming is executed in two stages: labileand non-labile stage [15, 9]. During the ﬁrst stagethe observer can cancel the execution of a saccadeaccording to initial decision, and initiate a saccadetowards a new direction. Continuous perception anddecision making can describe this process and takeinto account the inﬂuence of duration of non-labilestage.Reformulation of the computational model of eye-movements in continuous time, in the future, willallow us to add ﬁxational eye-movements during ﬁx-ation intervals, and study their inﬂuence on deci-sion making of the observer. Previously, the controlmodels of eye-movements [13, 16] didn’t take intoaccount the presence of drift and microsaccades dur-ing ﬁxation intervals. The microsaccade occurrenceshortly before saccade onset results in increase ofsaccade reaction time [17] in response task, whichindicates the inﬂuence of ﬁxational eye-movementson saccade programming. Furthermore, the primaryrole of saccades in counteracting of perceptual fad-ing [18] indicates their importance in the process ofextraction of visual information during the ﬁxation.

2. Extension of computational model

We developed the novel computational modelof human eye-movements that incorporatescontinuous-time decision making, observationand update of belief state. We motivate this ex-tension by experimental evidence of longer ﬁxationduration in lower visibility condition [1]. .2 Observation and inference According to the literature, saccade programmingis assumed to be a two-stage process that consistsof labile and non-labile stages [15, 9]. The labilestage is the ﬁrst stage of the saccade programming,during which the initial saccade command can becancelled in a favour of saccade to another location.The saccade to the next location is executed afterthe non-labile stage. The visual input is active dur-ing both labile and non-labile stages and suppressedduring the execution of a saccade. Therefore, thevisual input received during non-labile stage of sac-cade programming can be used for decision-makingonly during the next ﬁxation.We model this eﬀect as a time lag in interaction τ between the observer and environment, which hasthe duration equal to the length of non-labile stage.In our model the interaction between observer andenvironment is modelled as exchange of "messages"with certain frequency Ω = 1 / Θ int , where Θ int iscalled resolution interval. The resolution interval in-troduces discretization required for numerical solverand has no biological meaning. At time step n ob-server receives observation S n and vector of actualﬁxation location x n , which are used for inference ofbelief state b n . In our model the belief state is prob-ability distribution over potential target locations,and observation is stochastic noise signal at each lo-cation. The belief state is used for decision making- estimation of the next intended location to ﬁxate: x n +1 = π ( b n ) . However this decision to ﬁxate is notexecuted immediately, but with delay, which deﬁnedby resolution interval and time lag in interaction: lag = (cid:100) τ / Θ int (cid:101) , where (cid:100)(cid:101) is ceiling function. There-fore, the environment receives the delayed messageof decision x n − lag +1 from the observer, which is usedfor execution of saccade: x = α ( x ) . At this stagethe world state is updated, and the gaze is trans-ferred to the new location. The environment emitsthe observation S n + lag , which will be available tothe observer only after a number of steps deﬁnedby lag . If at the moment environment was in thestage of execution of the previous saccade command x n − lag , the new saccade signal is neglected, and theenvironment emits null observation. The time lag Figure 1: At time step n observer receives observation S n andvector of actual ﬁxation location x n , which are used for infer-ence of belief state b n . This belief state is used for decisionmaking - estimation of the next intended location to ﬁxate: x n +1 = π ( b n ) . This decision is not executed immediately, bywith delay deﬁned as: lag = (cid:100) τ/ Θ int (cid:101) . On the same timestep n , environment receives the delayed message of decision x n − lag +1 from the observer, which is used for execution ofsaccade: x = α ( x ) . At this stage the world state is updated,and the gaze is transferred to the new location. The environ-ment emits the observation S n + lag , which will be available tothe observer only after a number of steps deﬁned by lag . If atthe moment environment was in the stage of execution of theprevious saccade command x n − lag , the new saccade signal isneglected, and the environment emits null observation. in interaction τ is chosen as ms, which is an ex-perimental value for the delay between eye and V1[19, 20]. We extend the previous computational model [12]by introduction of continuous time Gaussian whitenoise signal ˙ S (cid:0) t, x t , u (cid:96) (cid:1) at each location u (cid:96) , eachcurrent ﬁxation location x t at time t of world state,which satisﬁes following:1. E [ ˙ S (cid:0) t, x t , u (cid:96) (cid:1) ] = δ (cid:0) | u (cid:96) − u (cid:63) | (cid:1) / Θ E [ ˙ S (cid:0) t , x t , u l (cid:1) , ˙ S (cid:0) t , x t , u l (cid:1) ] = δ t ,t δ l ,l / (Θ V ( u (cid:96) − x t )) where u (cid:63) is location of target object, and Θ = 250 ms average ﬁxation in double ﬁxation task [10]. .3 Decision making and action execution S n = S ( t n , x t n , u (cid:96) ) = (cid:82) t n +∆ tt n ˙ S (cid:0) t, x t , u (cid:96) (cid:1) dt n with ﬁnite duration ∆ t . These signals, according toconditions, will be Gaussian random variables withthe following means and standard deviations: E [ S n ] = ∆ tδ (cid:0) | u (cid:96) − u (cid:63) | (cid:1) / Θ (1) σ [ S n ] = √ ∆ t √ Θ V ( u (cid:96) − x n ) (2)After receiving of observation signal S n Bayesian in-ference is used to update the belief state: b n,(cid:96) ∝ p (cid:0) S n | x t n (cid:1) b n − ,(cid:96) . (3)Our next assumption is that observer is eﬀectivelyblind during execution of saccades, therefore, thesignal from time intervals, which contain saccadesshould be more noisy or be neglected. For each timeinterval we estimate the total time ∆ t s the gaze isﬁxed. Then we use equations 1 and 2 to generate theobservation and equation 3 to estimate the beliefstate after receiving of observation from this timeinterval. If the saccade’s duration is longer than thewhole time step, the environment doesn’t yield theobservation. After the inference of belief state b n observermakes decision where to ﬁxate next according to thepolicy: x n +1 ← π ( b n ) . (4)The command of execution of the next saccade isreceived by the environment with delay deﬁned by lag = (cid:100) τ / Θ int (cid:101) . If at the moment of arrival of com-mand the environment was not executing the pre-vious saccade, the next saccade (or ﬁxation) is exe-cuted as: x n +1 = (cid:40) x n +1 , if x n +1 = x n α ( x n +1 ) , otherwise (5)here α ( D ) is the execution function: x n +1 = α ( x n +1 ) = x n +1 + ζ n (6) where ζ n is Gaussian-distributed random error withzero mean and standard deviation ν deﬁned in [9]: ν = ζ + ζ (cid:107) x n +1 − x n (cid:107) (7)the error of saccade execution is proportional tointended saccade amplitude (cid:107) x n +1 − x n (cid:107) given indegrees, the value of parameters: ζ = 0 .

87 deg , ζ = 0 . (from [9]). The time t of world stateis updated in the following way: t n +1 = t n + Θ int (8)The position of ﬁxation is updated after the saccadecommand is executed. The time required to executethe saccade command x n is deﬁned as: Θ sc ( n ) = Θ mv ( n ) + Θ st ( n ) (9)where Θ mv ( n ) is time required to execute the move-ment: Θ mv ( n ) = τ mv (cid:107) x n − x n − (cid:107) . (10)and Θ st ( n ) is time required to mechanically stop theeye-movement: Θ st ( n ) = τ st (cid:107) x n − x n − (cid:107) (11)The constants τ mv = 21 ms · deg − . , τ st = 6 ms/ deg come from eye-tracking data from experiments of[21, 22, 23] and [24, 25] correspondingly. In the caseif Θ st ( n ) + Θ mv ( n ) > m Θ int , where m is an inte-ger, the next m saccade commands will be cancelled.Furthermore, the eye-movement is triggered only ifdecision where to ﬁxate next doesn’t match with cur-rent location: x n +1 (cid:54) = x n , according to equation 9.This means that saccades don’t start at every timestep, and the k -th saccade will start at time step n k ≥ k . Therefore the length of k -th saccade equals: A k = (cid:13)(cid:13) x n k − x n k − (cid:13)(cid:13) (12)And the corresponding duration of k -th durationequals: Θ k = t n k +1 − t n k − Θ sc ( n k ) (13)which means that k -th ﬁxation starts after k -th sac-cade is completely ﬁnished. The diagram 2 explainsthe notations in this section. Figure 2: The sequence of saccade with endpoints ( x n k − , x n k , x n k +1 ) and corresponding ﬁxation durations ( Θ k , Θ k +1 ) . Given the initial world state S we deﬁne the valuefunction for policy µ as an expectation of a return R : V µ ( S ) = E [ R | µ, S ] (14)The random variable R denotes the return and isdeﬁned by: R ≡ − N (cid:88) n =0 Θ int = N Θ int (15)where N is total number of steps in episode. Westart with representation of the stochastic policy µ [11]: µ ( D, p ) = exp( f ( D, p )) (cid:80) l exp( f ( l, p )) (16)where f ( D, p ) is the decision preference function,which indicates a tendency to choose the decision D in the belief state p . In this study we limit thesearch of f ( D, p ) to convolution of belief state p dueto shift-invariance of visual search process [11]: f ( D, p ) = (cid:88) l K ( D − l ) p ( l ) (17)our task is the search of kernel function K , whichcorresponds to the policy that maximizes the valuefunction V µ : K ∗ = arg max K V µ ( K ) ( S ) (18)for any starting world state ∀ S . The policy µ ( K ∗ ) is called optimal policy of gaze allocation. We ap-proach problem 18 with algorithm of Policy Gradi-ent Parameter Exploration [26]. The table 1 showsthe values of all parameters used in simulations ofcomputational model.

3. Psychophysical experiments

In this section we describe the eye-tracking exper-iments of visual search with natural and syntheticimages.

The group of ﬁve participants with normal tocorrected-to-normal vision participated in the exper-iment. All participants were postgraduate studentsof Queen Mary University of London. The partic-ipants was aware of the experimental settings andpassed 10 minutes of training sessions with four dif-ferent experimental conditions, which correspond tothe certain value of the RMS contrast of backgroundnoise. The experiments were approved by the ethicscommittee of Queen Mary University of London andinformed consent was obtained.

We used DELL P2210 22” LCD monitor (resolu-tion × , refresh rate 60 Hz) driven by aDell Precision laptop for all experiments. The eyemovements of the right eye were registered using EyeTracker device SMI-500 with a sampling frequency of Hz. The Eye tracker device was mounted on themonitor. Matlab Psychtoolbox [28] was used to runthe experiments and generate the stimulus images.The saccades and ﬁxations were detected automat-ically by software provided by SMI with dispersionbased algorithm I-DT [29]. The dispersion-based al-gorithm associates the gaze samples with the sameﬁxation if the samples are located within a region ofsize . for the minimal ﬁxation duration of ms. .4 Procedure Meaning Symbol Value Source

RMS contrast of background noise e n (0 . , . , . , . SetRMS contrast of target e t ζ ζ Θ ms [10]saccade duration coeﬃcient τ sac ms · deg − . [27]time to stop the saccade: slope Θ ,fix ms/ deg [24]resolution interval Θ int (128 , , , , ms Setsize of stimulus image S ×

15 deg Setdimensionality of computational grid N × N × Set

Table 1: The parameters of computational model used in our simulations, including those that were set experimentally.

We used the database of 1204 natural images from[30] and the procedure [31] of estimation of d-prime- value of visibility map in the center of visual ﬁeld- for 4 cycles per degree windowed sine-wave grat-ing with a width of . , which was a target ob-ject in our visual search experiments. We estimateda mean of d-primes of randomly placed 4 cpdsine-wave gratings for each image and sorted all 415images according to the mean value. On the nextstep the sorted array of images was divided on 5equal ( ) bands of d-prime. Therefore, we sortedour images by the level of detectability of target ob-ject. Next, we computed average d-prime for eachband. For each image of the band we selected thelocation of target, whose estimate of d-prime is theclosest to average one in the band, and generatedthe image with this target location. So, we formed 5bands of diﬃculty with images that have close val-ues of d-prime. The values of average d-prime for 5bands are: 8.53, 3.98, 2.89, 2.24, 1.57. The Figures3 demonstrates stimulus images for the ﬁrst and theﬁfth bands of diﬃculty. The second type of stim-uli were 1/f noise images described in the originalexperiment [10]. The 1/f noise was generated on asquare region on the screen, which spans the visualangle of ×

15 deg . The target was sine grating − framed by a symmetric raised cosine. Thetarget appeared randomly at any possible locationon the stimuli image within the square region. The experiments were provided for one level of RMS con-trast of target e t = 0 . and several levels of 1/f noiseRMS contrast e n ∈ (0 . , . , . , . . Five participants with normal or corrected to nor-mal vision performed visual search task on generatedimages. We used the SMI eye-tracker to record theeye-movements of participants. The visual searchexperiment consisted of ﬁve blocks corresponding toﬁve bands with 2 minutes of rest between blocks.Each trial of search started with displaying of theﬁxation cross in the center of the screen for 1 sec-ond. The participants were instructed to ﬁnd thetarget object on the image as soon as possible, af-ter the stimulus image was displayed on the screen.The participants were asked to press "Next" but-ton once they stopped their eyes on the target loca-tion. If the participants stopped their eyes furtherthan 2 degrees from the target location, we consid-ered this trial as unsuccessful and excluded it fromfurther analysis. The number of unsuccessful trialsgrows with the diﬃculty of the band: 3.1%, 5.3%,8.3%, 11.2%, 17.1 % for corresponding ﬁve bands ofd-prime.Psychophysical experiments of visual search [12]were repeated for constant target contrast (cid:15) t = 0 . and noise RMS contrast (cid:15) n = 0 . , . , . , . . 5normal controls were performing 240 trials of visualsearch task. .4 Procedure (a) Stimulus images corresponding to the ﬁrst band of d-prime. (b)

Stimulus images corresponding to the ﬁfth band of d-prime.

Figure 3: We used the database of natural images from [30] to create the stimulus images for visual search experiment. Thetarget object was 4 cycles per degree windowed sine-wave grating. We estimated a mean of d-primes of randomly placed4 cpd sine-wave gratings for each image and sorted all 415 images according to the mean value. On the next step the sortedarray of images was divided on 5 equal ( ) bands of d-prime. The target object on stimulus images from the ﬁrst band (top)is the least diﬃcult to detect. Whereas the ﬁfth band consists of the most diﬃcult stimulus images in experiment. .2 Interaction with saccade length

4. Statistics of ﬁxation duration

In this section we discuss the statistics of ﬁxationduration in human eye-movements and trajectoriessimulated with learned policy. Both learning of pol-icy and simulations were conducted with parameterspresented in the table 1 and resolution interval ms. The ﬁgure 4 shows the change of ﬁxation dura-tion with diﬃculty for human observer and simu-lated agent in the case of synthetic images (left) andnatural images (right). In the case of experimentalvalues, we discovered signiﬁcant diﬀerence betweenmeans of ﬁxation durations for diﬀerent values ofRMS contrast of background noise in the case of syn-thetic images. Using Bonferroni method for multiplecomparison we found mean ﬁxation duration for alldiﬃculty levels diﬀer across all levels ( p < . ) forpsychophysical experiments for both types of stim-ulus.We found that there is signiﬁcant diﬀerence be-tween the values of ﬁxation duration predicted bythe model and measured in experiments. The com-putational model can’t explain the heavy tail in theexperimental distribution for any case of diﬃculty(see Figures 5 and B.9). We assume that the reasonof this mismatch is that our model doesn’t take intoaccount the ﬁxational eye-movements (FEMs): driftand microsaccades. The FEMs change the positionof gaze during the ﬁxation, which allows to extractadditional visual information without initiation ofsaccade. This favors longer ﬁxations and lower sac-cade frequency. However, the model correctly pre-dicts the change of ﬁxation duration with diﬃculty.Assuming that the model can’t explain 35 ms of dif-ference between simulated and experimental ﬁxationduration, if the diﬀerence is subtracted from experi-mental values, they will not be signiﬁcantly diﬀerentfrom simulated values. Another reason of mismatchbetween model simulations and human behaviour isassumed preference of saccade initiation towards cer-tain mean rate [7, 6, 9], which originates from rhyth-mic activity in central neural system [32]. Becauseour approach is phenomenological, it uses Bayesian Visual search on synthetic imagesCase Preceding ﬁxation Preceding saccade noisecontrast R p-value R p-value Table 2: Pearson correlation and p-values for the case ofvisual search task on synthetic images. The time series ofduration of ﬁxation (succeeding) preceding the saccades are(positively) negatively correlated to time series of lengths of(incoming) outgoing saccades. framework [33] to model the processing within cen-tral neural system neglecting its temporal aspects.More detailed theoretical model of neural system isrequired to account for preference in saccade initia-tion rate.

Another piece evidence of control of ﬁxation du-ration during visual search process is correlation be-tween the time series of lengths of both precedingand succeeding saccades and ﬁxation duration. Weestimated the Pearson correlation coeﬃcients and p-values for both cases of psycho-physical experimentswith natural and synthetic images and presentedthem in corresponding tables 2 and 3. The timeseries of duration of ﬁxation (succeeding) precedingthe saccades are (positively) negatively correlatedto time series of lengths of (incoming) outgoing sac-cades. Meanwhile, in the case of synthetic imagesthe p-value exceeds the signiﬁcance level . onlyfor the RMS contrast value of . , and in the case ofnatural images p-value exceeds the signiﬁcance levelfor the ﬁrst diﬃculty band. In the lower diﬃcultycases human observers execute the visual search taskfaster [13], which results in smaller number of eye-movements. Lower number of sample points didn’tallow us to conﬁrm the hypothesis on signiﬁcancelevel 0.05.Our next goal is to compare the dependency ofﬁxation duration on the lengths preceding and suc- .2 Interaction with saccade length D u r a t i o n , m s extended modelhuman 1 2 3 4 5Difficulty band200210220230240250260270 D u r a t i o n , m s extended modelhuman Figure 4: The ﬁxation duration in simulated eye-movement trajectories and experimental data. The computational modelsigniﬁcantly underestimates the values of ﬁxation duration in humans for both cases of synthetic images (left) and naturalimages (right). F r e q u e n c y humanextended model RMS of noise = 0.1 F r e q u e n c y humanextended model RMS of noise = 0.15 F r e q u e n c y humanextended model RMS of noise = 0.2 F r e q u e n c y humanextended model RMS of noise = 0.25

Figure 5: The distributions of ﬁxation duration in experiment on synthetic images and simulations of extended model. Thecomputational model can’t explain the heavy tail in the experimental distribution for any case of diﬃculty, and we assume thatthe reason of this mismatch is that our model doesn’t take into account the ﬁxational eye-movements (FEMs), which favourlonger ﬁxation duration. .2 Interaction with saccade length diﬃcultyband R p-value R p-value

Table 3: Pearson correlation and p-values for the case of vi-sual search task on natural images. The time series of dura-tion of ﬁxation (succeeding) preceding the saccades are (pos-itively) negatively correlated to time series of lengths of (in-coming) outgoing saccades. ceding of saccades between simulations and experi-ment. Before the regression analysis, we ﬁnd an op-timal transformation of data with multivariate BoxCox transformation [34, 35]. The Box Cox transfor-mation of a variable x is deﬁned as: x ( λ ) = (cid:40) x λ − λ , if λ (cid:54) = 0log( x ) , if λ = 0 (19)The aim of multivariate Box Cox transformation isto ﬁnd such parameter λ that the transformed ob-servations of x λ satisfy full normality assumption,e.g. the independent and normally distributed. Weapply Box Cox transformation to the experimen-tal observations ( Θ λ , A λ ) to ﬁnd such parameters λ and λ that observations satisfy the multivariateassumption: ( Θ λ , A λ ) ∼ N ( µ, Σ) , where µ and Σ are mean and covariance matrix correspondingly.We present the result of the analysis in the tablesC.4 and C.5. We apply the log-log transformationof data for both variables due to proximity of in-ferred parameters to zero.Next, we make an assumption of power-law rela-tionship between saccade length and ﬁxation: Θ = pA q due to previously reported non-linear relation-ship between main saccadic characteristics [27]. Inthe log-transformed data coeﬃcients p and q cor-respond to intercept and slope of the linear rela-tionship: log Θ = q log A + log p . We perform thelog-transformation due to convenience of s We log-transformed our data and formed the vec-tors log( Θ i ) , log(∆ A i ) of ﬁxation durations and out-going saccade lengths and log( Θ i +1 ) , log(∆ A i ) ofﬁxation durations and incoming saccade lengths.We computed coeﬃcients of linear robust ﬁt us-ing RANSAC method [36] for log-transformed vec-tors together with principal components. RANSACmethod was used due to its robustness to outliers.The least square distance was chosen a measureof error. The maximal residual for a data pointto be classiﬁed as an inlier was chosen as the me-dian absolute deviation of the target values. Oneach iteration of RANSAC the linear model wasﬁtted to data points identiﬁed as inliers. The it-erations of RANSAC stop if the number of itera-tions reaches the maximal one ( N =1000). The ﬁg-ures C.10 and D.11 show scatter-plots for vectors log( Θ i ) , log(∆ A i ) ( log( Θ i +1 ) , log(∆ A i ) ) that cor-responds to relation between preceding (succeeding)ﬁxation duration and succeding (preceding) saccadelength in the case of synthetic images. The ﬁguresD.12 and D.13 demonstrate the same relation forsaccade lengths and ﬁxation duration in simulateddata. We can see that in both cases of experimentaland simulated data the regression line of robust ﬁt isclose to horizontal and coincides with the ﬁrst prin-cipal component, which explains of variabilityin experimental data and of variability on av-erage in simulated data (see ﬁgures D.14 and D.15for the case of natural images).The red (black) line on Figures 6 and 7 showthe slope and intercept of linear dependency be-tween log-transformed length of saccade and log-transformed ﬁxation duration on RMS contrast ofnoise for simulated (experimental) eye-movements.We can see that slopes of linear dependency betweenﬁxation duration and preceding saccades are positiveboth for simulated and experimental eye-movementsin both cases of natural and synthetic images, whichis consistent with literature. Previously it was shownthat ﬁxation duration is correlated with precedingsaccade length during execution of ﬁxation tasks.Salthouse et al. [22] hypothesized that the eyescan’t be stopped immediately after execution of sac-cade, and faster was the preceding eye-movement -a longer time interval is required to stop it. .2 Interaction with saccade length S l o p e slope of linear model, preceding fixation extended modelhuman 0.1 0.15 0.2 0.25RMS contrast of noise5.45.55.65.75.8 I n t e r c e p t extended modelhuman intercept of linear model, preceding fixation S l o p e slope of linear model, preceding saccade extended modelhuman 0.1 0.15 0.2 0.25RMS contrast of noise5.25.35.45.55.6 I n t e r c e p t intercept of linear model, preceding saccade extended modelhuman Figure 6: The slopes and intercepts of linear approximation of log-transformed data of ﬁxation duration and saccade lengthstime-series in the case of synthetic images. We can see that slopes of linear dependency between ﬁxation duration and precedingsaccades are positive both for simulated and experimental eye-movements in the case of synthetic images, which is consistent withliterature. In the other hand we can see that the values of slope of linear dependency between ﬁxation duration and succedingsaccade length obtained through RANSAC method are negative, which was not previously discussed in the literature. .2 Interaction with saccade length S l o p e slope of linear model, preceding fixation extended modelhuman I n t e r c e p t intercept of linear model, preceding fixation extended modelhuman1 2 3 4 5Difficulty band0.100.120.140.160.180.200.22 S l o p e slope of linear model, preceding saccade extended modelhuman 1 2 3 4 5Difficulty band5.2255.2505.2755.3005.3255.3505.3755.400 I n t e r c e p t intercept of linear model, preceding saccade extended modelhuman Figure 7: The slopes and intercepts of linear approximation of log-transformed data of ﬁxation duration and saccade lengthstime-series in the case of natural images. We can see that slopes of linear dependency between ﬁxation duration and precedingsaccades are positive both for simulated and experimental eye-movements in the case of natural images, which is consistent withliterature. In the other hand we can see that the values of slope of linear dependency between ﬁxation duration and succedingsaccade length obtained through RANSAC method are negative, which was not previously discussed in the literature.

5. Conclusion

Our goal was to study the ability of human ob-server to control ﬁxation duration during the exe-cution of visual search task. We conducted the eye-tracking experiments with natural and synthetic im-ages and found the following consistent patterns invisual behavior: a steady increase in average ﬁxa-tion duration with diﬃculty and dependency of ﬁx-ation duration on the length preceding and succed-ing saccades. In order to explain these eﬀects, wepresented the extension of eye-movements model byintroduction of continuous-time observation and de-cision making. The simulations of the model weremade with discretized time steps of progressively de-creasing duration in order to demonstrate the con-vergence of statistics of trajectories with resolutions.We have shown that basic characteristic of simu-lated time-series converge with increase of resolu-tion, which means that decision making process isinvariant to time step after reaching resolution of 16ms. This allows us to perform reinforcement learn-ing for diﬀerent visibility maps corresponding to var-ious diﬃculty conditions. We used the model to simulate the eye-movementtime series and to compare them with experimen-tal ones. We found signiﬁcant diﬀerence betweenpredicted and experimental values of ﬁxation dura-tion. However, the model is capable to correctlypredict the increase in values with diﬃculty. We as-sume that the reason of the mismatch is that ourmodel doesn’t take into account the ﬁxational eye-movements (FEMs): drift and microsaccades, whichfavour longer ﬁxation duration. Despite the additionof FEMs to the model is not challenging itself, werequire ﬁner computational grid since the amplitudeof FEMs is much lower (within the single time step)than the one of saccade. To investigate the eﬀect ofFEMs on extraction of visual information during theﬁxation we may require the size of grid cell .

01 deg and time step of ms [37]. Since our current simula-tions with grid size of . deg 1 and time step msrequire several hours to learn the optimal policy, it isnot feasible to introduce the addition of FEMs. Thesecond reason of mismatch between model simula-tions and human behaviour is limitations of currentapproach in modelling of processes within centralneural system. Our computational model is formu-lated in terms of Po-Mdp , which relies on Bayesianframework [33] for inference of world states. Ourapproach is agnostic about temporal aspects of neu-ral processing, and, therefore, it neglects the inﬂu-ence of rhythmic activity in central neural systemon dynamics of visual attention and eye-movements[32, 38]. We expect the scenario of bias of saccaderate towards theta-waves frequency [39], which can’tbe explained by phenomenological control models ofvisual behaviour.We studied the dependency between ﬁxation du-ration and the length of preceding and succedingsaccades. The time series of ﬁxation duration andlength of saccades were log-transformed due to theirskewed distribution. We used RANSAC method toﬁnd the coeﬃcients of linear model, and found thatthe slope of the linear model is positive for the caseof saccades preceding to the ﬁxation. This eﬀectwas previously found in ﬁxation task by Salthouseet al. [22], and has simple mechanical explanation.In the other, we found that the slope is negative forthe case of outgoing saccades, which was discussed in

EFERENCES

References [1] S. Van der Stigchel, R. A. Bethlehem, B. P.Klein, T. T. Berendschot, T. Nijboer, S. O. Du-moulin, Macular degeneration aﬀects eye move-ment behavior during visual search, Frontiersin psychology 4 (2013) 579.[2] I. T. C. Hooge, C. J. Erkelens, Control of ﬁx-ation duration in a simple search task, Percep-tion & Psychophysics 58 (1996) 969–976.[3] J. D. Gould, Eye movements during visualsearch and memory search., Journal of experi-mental psychology 98 (1973) 184.[4] K. Moﬃtt, Evaluation of the ﬁxation durationin visual search, Attention, Perception, & Psy-chophysics 27 (1980) 370–372.[5] I. T. C. Hooge, C. J. Erkelens, Adjustment ofﬁxation duration in visual search, Vision re-search 38 (1998) 1295–IN4.[6] R. Engbert, A. Longtin, R. Kliegl, A dynamicalmodel of saccade generation in reading based onspatially distributed lexical processing, Visionresearch 42 (2002) 621–636.[7] A. Nuthmann, T. J. Smith, R. Engbert, J. M.Henderson, Crisp: a computational model ofﬁxation durations in scene viewing., Psycho-logical review 117 (2010) 382.[8] H. A. Trukenbrod, R. Engbert, Icat: A compu-tational model for the adaptive control of ﬁxa-tion durations, Psychonomic bulletin & review21 (2014) 907–934.[9] R. Engbert, A. Nuthmann, E. M. Richter,R. Kliegl, Swift: a dynamical model of sac-cade generation during reading., Psychologicalreview 112 (2005) 777.[10] J. Najemnik, W. S. Geisler, Optimal eye move-ment strategies in visual search, Nature 434(2005) 387–391.

EFERENCES

Appendix A. Convergence of basic charac-teristics

First of all, we demonstrate how the values ofthree main characteristics: length of saccade, ﬁxa-tion duration and response time - change with timeresolution. We consider the case e n = 0 . , e t = 0 . and estimate the visibility map according to [10].We learn the policy for several resolution intervals Θ int = (128 , , , , ms with PGPE in order todemonstrate the convergence of basic characteristicswith increase of resolution. The procedure of learn-ing with PGPE is described in our previous workand the current implementation has the same valuesof hyper-parameters [12].We simulated episodes of Po-Mdp for eachtime resolution and corresponding policy to computemean and standard error of characteristics. The ﬁg-ure A.8 shows that a diﬀerence of saccade lengthwith resolution is not signiﬁcant after resultion of ms (Student’s t-test p > . ). At the same time,ﬁxation duration decreases with resolution until itreaches ms, and then the change is not signif-icant ( p > . ). Execution time signiﬁcantly de-creases from . sec to . sec, and after reachingresolution of ms there is no signiﬁcant change( p > . ). We observed convergence of three basiccharacteristics for all experimental conditions. Appendix B. Distributions of ﬁxation dura-tion in experiments on naturalimagesAppendix C. Coeﬃcients of Box-Cox trans-formationAppendix D. Scatter plots of log-transformed data

128 64 32 16 8

Resolution, ms S a cc ade l eng t h , deg saccade length

128 64 32 16 8

Resolution, ms F i x a t i on du r a t i on , m s fixation duration

128 64 32 16 8

Resolution, ms E x e c u t i on t i m e , s e c execution time Figure A.8: Convergence of basic characteristics. We simulated episodes of PO-MDP for each time resolution and corre-sponding policy to compute mean and standard error of characteristics. A change of saccade with resolution is not signiﬁcant(left). Fixation duration decreases with resolution until it reaches ms , and then the change is not signiﬁcant. Executiontime signiﬁcantly decreases from sec to . sec, and after reaching resolution of ms there is no signiﬁcant change. Visual search on synthetic imagesCase Preceding ﬁxation Preceding saccade noisecontrast Lambdas Standard errors Lambdas Standard errors

Table C.4: Predicted coeﬃcients of Box Cox transformation and corresponding standard errors for the experimental data inthe case of synthetic images.

Visual search on real imagesCase Preceding ﬁxation Preceding saccade diﬃcultyband Lambdas Standard Error Lambdas Standard Error

Table C.5: Predicted coeﬃcients of Box Cox transformation and corresponding standard errors for the experimental data inthe case of natural images. F r e q u e n c y humanextended model 0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model 0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model0 200 400 600 800 1000Duration of fixation, ms0.0000.0010.0020.0030.0040.0050.006 F r e q u e n c y humanextended model Figure B.9: The distribution of ﬁxation duration in experiment on natural images and corresponding simulations of extendedmodel. As well as in the case of synthetic images, the computational model can’t explain the heavy tail in the experimentaldistributions, because ﬁxational eye-movements were taken into account. saccade lengths, degsaccade lengths, deg preceding fixation duration vs saccade lengths, RMS = 0.2 saccade lengths, deg preceding fixation duration vs saccade lengths, RMS = 0.15 saccade lengths, deg preceding fixation duration vs saccade lengths, RMS = 0.1 preceding fixation duration vs saccade lengths, RMS=0.25 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure C.10: Scatter plots of log-transformed data: preceding ﬁxation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust ﬁt in the case of experiments on syntheticimages. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principal component, which explains of variability in experimental data for the case of synthetic images and preceding ﬁxation. saccade lengths, degsaccade lengths, deg succeding fixation duration vs saccade lengths, RMS = 0.2 saccade lengths, deg succeding fixation duration vs saccade lengths, RMS = 0.15 saccade lengths, deg succeding fixation duration vs saccade lengths, RMS = 0.1 succeding fixation duration vs saccade lengths, RMS=0.25 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure D.11: Scatter plots of log-transformed data: preceding ﬁxation duration (vertical axis) and saccade lengths (horizontalaxis) on logarithmic scale, principal components and linear approximation by robust ﬁt for preceding ﬁxation in the case ofexperiments on synthetic images. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principalcomponent, which explains of variability in experimental data for the case of synthetic images and preceding ﬁxation. length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.1 robust fitPCA1PCA2 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.15 robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.2 robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, RMS = 0.25 robust fitPCA1PCA2 Figure D.12: Scatter plot of log-transformed data: succeding ﬁxation duration (vertical axis) and saccade length (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust ﬁt for preceding ﬁxation in the case ofsimulations for synthetic images. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principalcomponent, which explains of variability in simulations for the case of synthetic images and succeding ﬁxation. Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.1 robust fitPCA1PCA2 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.15 robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.2 robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, RMS = 0.25 robust fitPCA1PCA2 Figure D.13: Scatter plot of log-transformed data: succeding ﬁxation duration (vertical axis) and saccade length (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust ﬁt in the case of simulations for syntheticimages. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principal component, which explains of variability in simulations for the case of synthetic images and succeding ﬁxation. saccade lengths, deg preceding fixation duration vs saccade lengths, band 4 saccade lengths, deg preceding fixation duration vs saccade lengths, band 3 saccade lengths, deg preceding fixation duration vs saccade lengths, band 2 saccade lengths, deg preceding fixation duration vs saccade lengths, band 1 saccade lengths, deg preceding fixation duration vs saccade lengths, band 5 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure D.14: Scatter plots of log-transformed data: preceding ﬁxation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust ﬁt in the case of experiments on naturalimages. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principal component, which explains of variability in experimental data for the case of natural images. datarobust fit PC1PC2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 533.544.555.566.577.58 data robust fitPC1PC2 saccade lengths, deg succeding fixation duration vs saccade lengths, band 1 saccade lengths, deg succeding fixation duration vs saccade lengths, band 5 saccade lengths, deg succeding fixation duration vs saccade lengths, band 4 saccade lengths, deg succeding fixation duration vs saccade lengths, band 3 saccade lengths, deg succeding fixation duration vs saccade lengths, band 2 f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s f i x a t i on du r a t i on , m s Figure D.15: Scatter plots of log-transformed data: succeding ﬁxation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scalse, principal components and linear approximation by robust ﬁt for preceding saccade in the case ofexperiments on natural images. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principalcomponent, which explains of variability in experimental data for the case of natural images and succeding ﬁxation. length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 1st band robust fitPCA1PCA2 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 2nd band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 3rd band robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 4th band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 length of succeeding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s preceding fixation duration vs saccade length, 5th band robust fitPCA1PCA2 Figure D.16: Scatter plot of log-transformed data: preceding ﬁxation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust ﬁt for preceding ﬁxation in the case ofsimulation for natural images. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principalcomponent, which explains of variability in simulations for the case of natural images and preceding ﬁxation. Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 1st band robust fitPCA1PCA2 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 2nd band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 3rd band robust fitPCA1PCA2 0.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 4th band robust fitPCA1PCA20.0 0.5 1.0 1.5 2.0 2.5 Length of preceding saccade, log scale, deg D u r a t i o n o f f i x a t i o n , l o g s c a l e , m s succeeding fixation duration vs saccade length, 5th band robust fitPCA1PCA2 Figure D.17: Scatter plot of log-transformed data: succeding ﬁxation duration (vertical axis) and saccade lengths (horizontalaxis) in logarithmic scale, principal components and linear approximation by robust ﬁt for preceding saccade in the case ofsimulation for natural images. The regression line of robust ﬁt is close to horizontal and coincides with the ﬁrst principalcomponent, which explains57%