[PDF] Automatic selection of eye tracking variables in visual categorization in adults and infants

Abstract

Visual categorization and learning of visual categories exhibit early onset, however the underlying mechanisms of early categorization are not well understood. The main limiting factor for examining these mechanisms is the limited duration of infant cooperation (10-15 minutes), which leaves little room for multiple test trials. With its tight link to visual attention, eye tracking is a promising method for getting access to the mechanisms of category learning. But how should researchers decide which aspects of the rich eye tracking data to focus on? To date, eye tracking variables are generally handpicked, which may lead to biases in the eye tracking data. Here, we propose an automated method for selecting eye tracking variables based on analyses of their usefulness to discriminate learners from non-learners of visual categories. We presented infants and adults with a category learning task and tracked their eye movements. We then extracted an over-complete set of eye tracking variables encompassing durations, probabilities, latencies, and the order of fixations and saccadic eye movements. We compared three statistical techniques for identifying those variables among this large set that are useful for discriminating learners form non-learners: ANOVA ranking, Bayes ranking, and L1 regularized logistic regression. We found remarkable agreement between these methods in identifying a small set of discriminant variables. Moreover, the same eye tracking variables allow us to classify category learners from non-learners among adults and 6- to 8-month-old infants with accuracies above 71%.

Full PDF

AAutomatic selection of eye tracking variables in visual categorization inadults and infants

Samuel Rivera , Catherine A. Best , Hyungwook Yim , Dirk B. Walther , Vladimir M. Sloutsky ,Aleix M. Martinez Department of Electrical and Computer Engineering, The Ohio State University Department of Psychology, The Ohio State University Center for Cognitive Science, The Ohio State University Department of Psychology, University of Toronto

Visual categorization and learning of visual categories exhibit early onset, however theunderlying mechanisms of early categorization are not well understood. The main limitingfactor for examining these mechanisms is the limited duration of infant cooperation (10-15minutes), which leaves little room for multiple test trials. With its tight link to visual attention,eye tracking is a promising method for getting access to the mechanisms of category learning.But how should researchers decide which aspects of the rich eye tracking data to focus on?To date, eye tracking variables are generally handpicked, which may lead to biases in the eyetracking data. Here, we propose an automated method for selecting eye tracking variablesbased on analyses of their usefulness to discriminate learners from non-learners of visualcategories. We presented infants and adults with a category learning task and tracked their eyemovements. We then extracted an over-complete set of eye tracking variables encompassingdurations, probabilities, latencies, and the order of ﬁxations and saccadic eye movements. Wecompared three statistical techniques for identifying those variables among this large set thatare useful for discriminating learners form non-learners: ANOVA ranking, Bayes ranking, andL1 regularized logistic regression. We found remarkable agreement between these methodsin identifying a small set of discriminant variables. Moreover, the top eye tracking variablesallow us to identify category learners among adults and 6- to 8-month-old infants.keywords: eye tracking, infant category learning, eye tracking variables

Introduction

Categorization is the process of forming an equivalenceclass, such that discriminable entities elicit a common rep-resentation and / or a common response. While categorylearning exhibits early onset (Quinn, Eimas, & Rosenkrantz,1993), relatively little is known about the underlying mech-anism and the development of early categorization. The pri-mary reason is the limited duration of infants’ cooperation,yielding only a small number of data points per participant.These limitations have restricted researchers’ ability to an-swer fundamental questions about categorization in infants:How do infants learn a category? And does this process un-dergo development?Analyses of eye movements may help solve some of theseproblems: eye movements are tightly linked to visual atten-tion (see Rayner, 1998, for a review) and they yield multi-ple (albeit not necessarily independent) data points even forrelatively short trial durations. Therefore, analyses of eyemovements can provide critical information of how attentionallocation changes in the course of category learning. How-ever, eye tracking results in a large amount of data, and it isnot clear a priori what (if any) components of eye movementare related to category learning. As a result, eye trackingresearchers are free to choose from a large set of variables without a common set of principles for deciding which orhow many variables to analyze. In the following, we re-view the infant category learning eye tracking literature inorder to substantiate this claim. Given the limited numberof eye tracking categorization studies with infants, we take abroader approach and review studies that examined catego-rization, object completion, and visual attention.One variable that has been used across a variety of tasks issaccade latency (Amso & Johnson, 2006; Johnson, Amso, &Slemmer, 2003). For example, Johnson, Amso, and Slemmer(2003) examined whether learning a ﬀ ects object representa-tions in infancy. Four- and six-month-old infants were pre-sented with an object that moved behind an occluder and thenreemerging on the other side of the occluder. The researchersreasoned that if babies maintain the existence of the occludedobject, they should anticipate the object to reemerge from theoccluder. In this case, participants should exhibit a faster eyemovement to the point of reemergence than if they do notanticipate the object. In two other studies (Amso & Johnson,2005, 2008), researchers used saccade latency to examine thedevelopment of visual selection. Participants (3-, 6-, and 9-month-olds and adults) were presented with a Spatial Neg-ative Priming (SNP) paradigm. On a given trial they wereshown an attention grabbing target in Location 1 and a dis-creet distracter in Location 2. On the next trial, they were a r X i v : . [ q - b i o . Q M ] N ov SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ either shown the target in Location 2 (a negative primingprobe) or in Location 3 (a control trial). SNP was inferredfrom greater saccade latency on the probe trials than on thecontrol trials.Other potentially informative variables are (a) frequenciesof ﬁxations per unit of time within one or more Areas ofInterest (AOIs), (b) dwell times within one or more AOIs,and (c) frequencies of saccades within and between AOIs(Johnson, Slemmer, & Amso, 2004). In one study, Johnsonand colleagues (2004) examined the development of objectunity perception in infancy using both behavioral and eyetracking data. In the task, participants were habituated to arod moving behind an occluder. After participants habitu-ated, the occluder was removed to reveal either a broken rodor a complete rod. Infants who perceived the rod as mov-ing behind the occluder as a coherent object, indicated bya recovery of looking after habituation, were identiﬁed asperceivers. Participants who perceived a broken rod did notrecover their looking after habituation and were considerednon-perceivers. The authors then examined eye tracking datafor perceivers and non-perceivers using the eye tracking vari-ables described above.Perhaps the most frequently used eye tracking variableis ﬁxation location. Researchers have relied on this vari-able across a variety of tasks, including object completion(Amso & Johnson, 2006; Johnson, Davidow, Hall-Haro, &Frank, 2008), understanding other people’s actions (Falck-Ytter, Gredeb¨ack, & von Hofsten, 2006), and a variety ofcategorization and category learning tasks (Best, Robinson,& Sloutsky, 2010; McMurray & Aslin, 2004; Quinn, Doran,Reiss, & Ho ﬀ man, 2009). For example, Quinn et al. (2009)examined categorization of cats and dogs in 6- to 7-month-olds. They found that when items were presented in thecanonical upright position, categorization accuracy was asso-ciated with a high proportion of looking to the head; whereas,when items were presented in an inverted position, catego-rization was associated with the large proportion of lookingto the body. Best et al (2010) presented 16- to 24-month-oldswith a category learning task. Categories included artiﬁcialitems that had shapes in four locations, with two of the shapesbeing category relevant (i.e., present in all members of thecategory, but not in non-members) and two being irrelevant(i.e., exhibiting both within- and between-category variabil-ity). The researchers examined the proportion of ﬁxationsto category-relevant features and its change in the course offamiliarization. A summary of the reviewed studies is pre-sented in Table 1.Several conclusions can be drawn from this brief review.First, multiple eye tracking variables have been used acrossstudies to examine infants’ learning. And second, althoughall these variables make intuitive sense, no formal selectionprocess of these variables has been deﬁned. This poses sev-eral concerns and questions. Namely, since di ﬀ erent vari-ables are used in di ﬀ erent studies, do these variables corre-late and thus provide redundant information? If not, whyshould any one variable be used instead of another? Shouldthe variables be selected based on the speciﬁc categorizationtask, or should a ﬁxed subset of eye tracking variables be used across all studies? Can we deﬁne a principled way ofdetermining which variables to analyze in a given categorylearning study? The current study deﬁnes a methodology toaddress these questions and concerns.Our approach was as follows. We extracted a large setof possible variables from the adult or infant gaze sequenceduring a categorization task (e.g. ﬁxations, saccades, gazesequences, etc). Some of the variables have been used inanalyzing categorization experiments, whereas others werenew. Our goal was to use the power of statistics and ma-chine learning to identify eye tracking variables that bestpredict category learning in adults and subsequently in in-fants. The signiﬁcant contribution of this work is that it pro-vides a systematic methodology for identifying eye trackingvariables that are linked to category learning, thus allowingresearchers to better understand category learning from eyetracking data. Futhermore, our results retrospectively vali-date the use of several variables from the eye tracking studiesmentioned above. Methods

Participants

Three category learning experiments were conductedwhere two focused on adults and one on infants. Twenty-fouradults participated in Experiment 1. Forty-six adults who didnot participate in Experiment 1 participated in Experiment2. All adult participants had normal or corrected to normalvision and were undergraduate students at The Ohio StateUniversity participating for course credit.In Experiment 3, ﬁfteen 6- to 8-month-old infants partic-ipated in the experiment. Parents provided written consentupon arrival to the laboratory. All parents reported their in-fants to be developing typically and in good health.

Materials

Category members were ﬂower-like objects with sixpetals. An example object is shown in Fig. 1(a), with thepetals enumerated for clarity. There were four di ﬀ erent cate-gories, each deﬁned by a single petal having a distinguishingcolor and shape. Speciﬁcally, the category deﬁning featureswere category A: a pink triangle at position 4; category B: ablue semi-circle at position 4; category C: an orange squareat position 6; and category D: a yellow pentagon at position6. Each object was uniquely associated with one category.That is, no one object exhibited the deﬁning features for twoor more categories. Stimuli were displayed on the computersubtending an approximate horizontal visual angle of 11 ◦ andan approximate vertical visual angle of 11 ◦ . The eccentric-ity of the stimuli subtended an approximate horizontal visualangle of 14.4 ◦ and an approximate vertical visual angle of11.5 ◦ .During all three experiments, the participants’ eye gazewas recorded using a Tobii T60 eye-tracker (Falls Church,VA) at the sampling rate of 60Hz while they sat approxi-mately 60 cm away from the display screen. UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS Source Task Eye Tracking Variable

Johnson, et al., PNAS, 2003 Object completion Saccade latencyAmso and Johnson, Developmental Psychology, 2006 Object completion Proportion ﬁxation to AOIJohnson, et al, Infancy, 2004 Object completion Fixation frequency, dwell time, saccade frequencyJohnson, et al, Developmental Psychology, 2008 Object completion Proportion ﬁxation to AOIFalck-Ytter, et al, Nature Neuroscience 2006 Goal perception Proportion ﬁxation to AOI, AOI ﬁxation timeAmso and Johnson, Cognition, 2005 Visual Search Saccade latencyAmso and Johnson, Infancy, 2008 Visual Search Saccade latencyQuinn et al, Child Development, 2009 Categorization Proportion ﬁxation to AOIMcMurray and Aslin, Infancy, 2004 Category Learning Proportion ﬁxation to AOIBest, Robinson, and Sloutsky, Proceedings of Cognitive Science Society, 2010 Category learning Proportion ﬁxation to AOI

Table 1

Comparison of previous eye tracking variables. (a) Category object (b) AOI Example

Figure 1 . Image (a) is an example category object used in theeye tracking study, with the Areas of interest (AOI)s enumerated.Numbers were not displayed to the participants. Image (b) illus-trates the concept of AOIs. Each stick ﬁgure is divided into 3 AOIscontaining the head, torso, and legs. The relevant AOI for gender discrimination is bracketed in red. Only the head AOI is relevantbecause the other AOIs are the same across both categories.

Experiment 1 - Adult supervised

To validate the e ﬃ cacy of the approach before applying itto infants, adult participants were tested. In Experiment 1,participants were instructed to look for a single distinguish-ing feature prior to the start of the experiment. Previous re-search suggests that this hint (i.e., a form of supervised learn-ing) has large consequences with respect to how quickly par-ticipants learn to classify the objects, especially when thereare few overlapping features (Kloos & Sloutsky, 2008).The experiment had 8 blocks where in each block therewere 8 learning trials followed by 4 testing trials. In a learn-ing trial, a category member was displayed in the center ofthe screen one at a time for 1 . ◦ horizontal vi-sual angles from the center of the screen. Test stimuli weredisplayed on the screen until the participant made a deci-sion via key press about which stimulus was a member ofthe learned category. The left / right position of the test stim-uli was counter-balanced. A randomly located ﬁxation point(cross-hair) directed the participant’s gaze to a position onthe monitor in-between trials. The to-be-learned categoryremained the same for the ﬁrst 4 blocks. A second to-be-learned category was introduced in the ﬁnal 4 blocks withoutnotice to the participant. If the experiment started with a cat-egory deﬁned by the petal at position 4 (category A or B),the second category was deﬁned by the petal at position 6(category C or D), and vice-versa. Using categories havingdeﬁnitive features at di ﬀ erent positions provided a mecha-

98 10111213 141 2 3456 7

Figure 2 . Illustration of category pair image with AOIs labeled.Numbers were not shown to participants. nism to verify the reproducibility of the variables determinedmost important.

Experiment 2 - Adult unsupervised

The procedure in Experiment 2 (unsupervised condition)was identical to that in Experiment 1 except that participantsdid not receive supervision (i.e., no hint provided) about thecategory structure.

Experiment 3 - Infant supervised

The infant experiment was conceptually similar to Exper-iment 1, but was methodologically adapted for infants by us-ing a familiarization paradigm. To aid infant learning, cate-gory exemplars were shown in pairs on each trial. This wasalso done so that the presentation of stimuli in the learningand testing phases had an identical layout. An example withlabeled AOIs is shown in Fig. 2. Furthermore, there was onlya supervised condition, in which the infants were presentedwith a pre-trial ﬁxation video of synchronized sound and mo-tion (e.g., looming ﬂower petal with corresponding whistlesound) to draw their attention to the single category-relevantfeature. It should be noted that no unsupervised conditionwas conducted with infants because previous developmentalresearch suggests supervision is necessary for young childrento learn categories with a sparse category structure (Kloos &Sloutsky, 2008). Once the infant looked at the ﬁxation video,the learning trial commenced. Infants had to accumulate 3seconds of looking to the category exemplar pairs. When-ever an infant looked away, an attention-grabbing ﬁxationwas presented until the infant reconnected with the imageson the screen. After accumulating 3 seconds of looking tothe stimulus pair, the supervisory ﬁxation video was againpresented followed by another learning image pair. This pro-cedure was repeated for 8 blocks with 8 learning pairs perblock.

SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ In the testing phase, a novel category member was pairedwith a novel non-category member as in the adult experi-ments. The standard assumption is that an infant can dis-criminate between the category and non-category objects ifhe or she displays a novelty or familiarity preference. Therewere two test trials per block, where a novel exemplar fromthe learned category was paired with a novel exemplar froma novel category. Test trials were presented for a ﬁxed dura-tion of 6 seconds, and left / right position of familiar or novelcategory objects was counterbalanced. Collecting and ﬁltering eye tracking data

Eye movements were monitored during object viewingwith the Tobii T60 eye tracker. The system tracks eye move-ments by illuminating the eye with infrared light and captur-ing corneal reﬂection at a frequency of 60 Hz (i.e., every 16.6ms). As the eye moves, the angle between the pupil and thecorneal reﬂection increases, allowing the x-y coordinates ofthe gaze position to be measured over time.Unfortunately, the gaze data contain noise, missing data,and micro-saccades, which makes identifying true ﬁxationsand saccades di ﬃ cult. Therefore, we processed these datausing MATLAB-based software created in our laboratory bythe ﬁrst author. The raw eye tracking data from every exper-imental block were ﬁltered using a Kalman ﬁlter (Murphy,2004) before extracting the variables of interest. The eyegaze data from both the left and right eye were ﬁltered sep-arately. The average of the ﬁltered data from left and righteyes yielded the mean eye gaze data, which were used in thecurrent analyses. Labeling the Data

The eye movement sequences during the learning phase of the experiment aid in understanding category learning,while the sequences during the testing phase aid our under-standing of category use. Before applying our methodologyto understand these processes, however, the eye tracking datafrom both the learning and testing phases of the experimentswere labeled as learner (class 1), non-learner (class 0), or indeterminate (class 2). Indeterminate samples were not an-alyzed.

Adult Labels : Intuitively, labels for adult data are readilyidentiﬁed based on the accuracy of the responses during thetesting phase. An uninterrupted string of correct responsesduring the testing phase suggests that the participant haslearned the category. Each adult experimental block yielded12 eye movement sequences. These correspond to eye move-ments during the presentation of 8 exemplar images duringthe learning phase and 4 test images during the testing phase.Adult participants had 4 blocks of learning and discriminat-ing the same category before switching to a new category.This amounted to 32 samples of the learning phase, and 16samples of the testing phase for each category per participant.The 16 samples from the testing phase were associated witha 16 digit binary string, called the response string . This datastructure shows performance over the ﬁrst and last 4 blocksof the experiment. A one identiﬁes a correct response, while

Cat. A: 1001010101111111Cat. C: 1001011111111111

Figure 3 . Illustration of a time series for one subject. Ones encodecorrect category discrimination, while zeros encode incorrect re-sponses. The ﬁrst row shows the accuracy over the ﬁrst four blocks(presentation of ﬁrst category), while the second row shows accu-racy over the last four blocks (presentation of second category). Theclass labels (learner or non-learner) are determined separately foreach row, because the category condition is di ﬀ erent for each row. a zero denotes an incorrect response on the associated testtrial. An example is shown in Fig. 3. We labeled each 16digit response string separately as follows.We expect a learner’s response string to contain a series ofones beginning within the string and terminating at the end ofthe response string. This pattern indicates that at some pointthe participant learned the category and correctly discrimi-nated the category from that point on. A participant who hasnot learned the category (non-learner) would select one ofthe two stimuli by chance in each trial. A non-learner couldget lucky and achieve a series of correct guesses. In orderto determine if a participant is a learner or a non-learner weneed to establish a criterion that allows us to reject chance asthe cause for a series of ones. The question that we need toanswer is how many ones we should expect for a learner. Weaddress this problem by assessing how likely it is that we seea sequence of M consecutive ones in a binary response stringof length R =

16. Under the null hypothesis, the participantdoes not know the category label and selects one of the stim-uli by chance, giving her a 50% chance of correctly guess-ing the category member. Each sequence is equally likelygiven this assumption, so the probability of guessing at least M right in a row is the total number of sequences having M ones in a row ( ( R − M + × ( R − M ) ) divided by the totalnumber of binary sequences of length R ( 2 R ). This yieldsthe probability p = ( R − M + / (2 M ). For R = M = p < .

01 ( p = . test phase and learning phase samples before the POL werelabeled as non-learner, while the samples after the POL werelabeled as learner. The learning phase samples from theblock associated with the POL were labeled as indetermi-nate, because it was unclear at exactly which trial during theblock the category was learned.If the learning criterion was not achieved, we then identi-ﬁed the remaining non-learning and indeterminate samples.We ﬁrst labeled correct responses at the end of the respondstring as indeterminate. Those samples did not meet thelearning criterion, but might be attributed to learning late inthe experiment. The remaining samples were labeled as non-learner. Approximately 8% of the adult eye track sampleswere labeled indeterminate. UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS Infant Labels : Obviously, infants are not able to respondby keyboard to identify a category object. Instead, we useda variant of the preferential looking paradigm to determineif an infant could discriminate between novel exemplars ofa familiar category object and a novel category object. Re-call that the preferential looking paradigm assumes that in-fants who consistently look more to one class of stimuli whenshown two classes of stimuli are able to discriminate betweenthe two classes. This means that if the infant consistentlylooks longer at the learned category object (or novel cate-gory object), then he or she is assumed to be discriminatingbetween the familiar and novel categories.Given this paradigm, we labeled each infant’s gaze databy blocks. Each block consisted of two test phase samples.We determined novelty preference as the ratio of total look-ing time to the novel category object compared to the totallooking time to the novel category plus the familiar categoryobject. We sorted the mean of the novelty preference foreach block according to the absolute di ﬀ erence from 0 .

5. Athird of the blocks with mean novelty preference closest to0 . . Variable List

We compiled an over-complete list of eye tracking vari-ables. We began with the fundamental variables, ﬁxationsand saccades. Fixations occur when eye gaze is maintainedat a single position for at least 100ms. They were identi-ﬁed using the dispersion threshold algorithm of (Salvucci &Goldberg, 2000). Saccades are rapid eye movements thatmove the eye gaze between points of ﬁxation. To be consid-ered a saccade, the eye movement needed to exceed smoothpursuit velocity of 30 ◦ per second or 0 . ◦ per sample at 60Hz(Stampe, 1993). The ﬁxations and saccades were determinedwith respect to a speciﬁc AOI within an object. AOIs are re-gions of an object image or scene that can be grouped in somemeaningful way, such as color uniformity or the structuralnature of the object. The AOIs can further be described asrelevant or non-relevant, based on their role in determiningobject category membership. Fig. 1(b) illustrates this con-cept for the stick-ﬁgure gender category. In this toy exampleonly the head is relevant for category membership becausethe torso and legs are identical across stick-ﬁgures and thusdo not help one to discriminate gender.These fundamental eye tracking variables were combinedin various ways to derive a larger set of variables. Our vari-able list is deﬁned as follows:1. AOI ﬁxation percentage describes the percentage oftime ﬁxated at the di ﬀ erent AOIs during a trial. All non-AOI ﬁxations were discarded in this and all of the variablesdeﬁned. For an image with q AOIs, this variable was encodedas a q -dimensional feature vector with a value for each AOI.The ﬁxation percentages were normalized so that they sumto 1, unless there were no ﬁxations at AOIs. In that case, all percentages were set to 0.2. Relevant AOI ﬁxation density is a scalar value betweenzero and one which describes the percentage of the total timeﬁxated which is at the relevant AOI(s).3.

AOI ﬁxation sequence describes the sequence of AOIﬁxations during one trial. We limited this sequence to sevenﬁxations, starting with trial onset (not counting ﬁxations tothe ﬁxation mark). We encoded a ﬁxation sequence of f ﬁx-ations over q AOIs as a q × f binary matrix, where each col-umn of the matrix had a 1 in the position corresponding to theAOI which was ﬁxated, and zero otherwise. If there were lessthan f ﬁxations, the last columns were set to 0. This binaryencoding of the ﬁxation sequence allowed us to describe anysequence of ﬁxations without imposing an ordering of theAOIs. In addition, the ﬁxation sequence was represented asa sequence of relevant and non-relevant AOI ﬁxations. Thisrepresentation yielded a 2 × f binary matrix, in which eachcolumn had a 1 in the ﬁrst row if a relevant AOI was ﬁxatedor a 1 in the second row if a non-relevant AOI was ﬁxated.If there were less than f ﬁxations, the last columns were setto 0. The analysis showed that the latter representation wasmore informative in some cases. Note that it was necessaryto use a pair of binary variables to encode each ﬁxation of thelatter representation because it allowed for three cases: ﬁx-ation at a relevant AOI or non-relevant AOI, and less than f ﬁxations. The number of ﬁxations to consider as well as thestart position were determined using cross validation (CV).In cross validation, the training data is separated into k par-titions, and for each partition, samples are classiﬁed usinga classiﬁer that is trained with the remaining k − Duration of ﬁxations in sequence describes the durationof each ﬁxation in the sequence described by variable 3. Thisvariable was encoded by an f -dimensional vector.5. Total distance traveled by eye is a scalar describing thetotal distance traveled by the eye gaze during a trial.6.

Histogram of ﬁxation distances to relevant AOI de-scribes how much time is spent ﬁxated near or far from therelevant AOI(s). A histogram with h bins and an image with r relevant AOIs yielded an h × r dimensional matrix. Eachcolumn corresponds to a di ﬀ erent relevant AOI, and eachrow corresponds to a particular range of distances from thatAOI. The entries deﬁne the percentage of time ﬁxated at thedistance ranges, so each column sums to 1. If no ﬁxationsoccured, all values were set to 0. The number of bins wasdetermined using CV. The bins corresponding to AOI 4 areillustrated in Fig. 4.7. Number of unique AOIs visited is a scalar describingthe total number of unique AOIs ﬁxated during a trial. AOIrevisits were not counted as new.8.

Saccade sequence is similar to variable 3 but describesthe sequence of AOI saccades during one trial. All saccadeswhose targets were not to AOIs were discarded in this and all

SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ of the variables deﬁned. The sequence was limited to sevensaccades, starting at the ﬁrst saccade. The number of sac-cades to consider as well as the start saccade were determinedusing CV. We encoded a saccade sequence of s saccades over q AOIs as a q × s binary matrix. Each column of the matrixhad a 1 in the position corresponding to the AOI which wasthe target of the saccade, and zero otherwise. If there werefewer than s saccades, the last column(s) were set to 0. In ad-dition, the saccade sequence was represented as a sequenceof saccades to relevant and non-relevant AOIs. This repre-sentation yielded a 2 × s binary matrix, with each columncontaining a 1 in the ﬁrst row if saccading to a relevant AOIor a 1 in the second row if saccading to a non-relevant AOI.If there were fewer than s saccades, the last column(s) wereset to 0.9. Relative number of saccades to an AOI is the saccadeanalogue of variable 1 and describes the relative number ofsaccades to the AOIs during one eye movement. An imagewith q AOIs yielded a q -dimensional feature vector with eachentry counting the number of saccade targets at the corre-sponding AOI. The vector was normalized by the sum of allentries such that the entries added to 1 unless there were nosaccades. In that case, all entries were set to 0.10. Fixation latency to relevant AOI describes the delay be-fore ﬁxating at a relevant AOI during an eye movement. Itwas encoded as a scalar between 0 and 1, with 0 correspond-ing to ﬁxating to a relevant AOI immediately and 1 describ-ing a sequence with no ﬁxation on a relevant AOI. The valuewas computed as the start time of the ﬁrst relevant AOI ﬁxa-tion divided by the total eye track time.11.

Saccade latency to relevant AOI describes the delaybefore a saccade to a relevant AOI. It was also encoded as ascalar between 0 and 1, deﬁned by the end time of the ﬁrstsaccade to a relevant AOI divided by the total eye gaze time.Thus, eye movements were represented by a feature vector x = ( x , x , . . . , x d ) T whose d entries correspond to the vari-ables described. Each feature x i was normalized to zero meanand unit variance over the entire dataset. In addition, each x was associated with a class label, y ∈ { , } . For clarity, fea-tures denote the entries of the feature vector which encodesthe eye tracking variables, while variables correspond to themeasures of the eye tracking enumerated above. Therefore, d is much larger than 11, because encoding certain variablesrequires multiple feature values. Note that d was the same forall feature vectors corresponding to images having the samenumber of AOIs and relevant AOIs because a ﬁxed numberof ﬁxations and saccades were analyzed.In the case of a single category object having one relevantAOI, variable 2 is identical to one of the values of variable 1.Therefore, after extracting all variables from the gaze data ofall participants, we did a simple redundancy check to elim-inate cases of identical valued features. For features x i , x j to be identical, they must mirror each other over all featurevectors for a particular category condition and within eitherthe learning or testing phase. In addition, the informationencoded by several of these features overlaps. This over-complete representation allows us to ﬁnd the encoding thatis best suited to describe the categorization task. To this end,

21 3 4 5 6 7 8 9...-- (a) Single object (b) Object pair

Figure 4 . Illustration of the histogram bins for distance to AOI4, with bins numbered. Variable 6 describes the percent of timeﬁxating within each bin for each relevant AOI. Bin sizes were de-termined using CV. we performed variable selection on this over-complete set.

Variable Selection

Our goal was to identify the subset of variables from theset deﬁned above that can best separate the classes: categorylearners and non-learners. This was achieved using ANOVAfeature selection by ranking, Naive Bayes Ranking (NBR),and L1 logistic regression (L1-LR).

ANOVA feature selection relies on a standard hypothesistest on each feature of x . Speciﬁcally, let x i denote the i th feature of x . Using a dataset of eye tracking feature vectorsand the associated class labels, we performed a two tailed t -test of the null hypothesis, which states that samples of x i coming from classes 1 and 0 are independent randomsamples from normal distributions with equal means, µ i and µ i , respectively. The alternative says that the classmeans are di ﬀ erent. We calculated the test statistics andthe corresponding p -value. A low p -value means the nullhypothesis is rejected with conﬁdence. Since the goal was toﬁnd the variables which best separate the classes, the featurewith lowest p -value was ranked as best. The p -values werecalculated for all features x i , i = . . . d , and they wereranked from best to worst according to increasing p -values. Naive Bayes Ranking (NBR) assumes that if the labeled fea-ture vectors can be accurately classiﬁed given a single fea-ture, x i , then that feature separates the two classes well. Inessence, the classiﬁcation accuracy is a surrogate for the classseparability achieved by the particular feature. Therefore, thefeatures are ranked from best to worst according to decreas-ing classiﬁcation accuracy.The Bayes classiﬁer assigns a sample, x , to the class hav-ing the highest posterior distribution. More formally, assumethat the class-conditional density functions of a feature, givenits class, p ( x i | y ), are modeled as normally distributed withmean and variance, µ iy and σ iy , respectively. Then by ap-plying the Bayes formula, the posterior probability of class y is P ( y | x i ) = p ( x i | y ) P ( y ) p ( x i ) , where P ( y ) is the prior of class y , and p ( x i ) is a scale factor which ensures that the probabilities sumto 1. In this work, we have P ( y = = P ( y = = .

5, corre-sponding to the assumption that a priori a sample is equallylikely to come from a learner as from a non-learner. The

UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS scale factor is the same for both classes, so it can be omittedin the classiﬁcation rule. Finally, the predicted class label, ˆ y is given by: ˆ y = arg max j ∈{ , } p ( x i | y = j ) P ( y = j ) . (1)L1 Logistic Regression (L1-LR) is a linear classiﬁer model,which returns a probability that a sample belongs to a partic-ular class. It accomplishes this by modeling the natural log-arithm of the ratio, or odds, of two probabilities as a linearfunction of x . More formally, ln (cid:32) p ( y = | x )1 − p ( y = | x ) (cid:33) = w T x − b , (2)where ln denotes the natural logarithm. The two class prob-abilities are then given by p ( y = | x ) = + exp( − w T x + b ) , p ( y = | x ) = exp( − w T x + b )1 + exp( − w T x + b ) . The parameters, w and b , are estimated via Maximum Likeli-hood (ML) estimation. A regularization term λ is introducedto penalize large elements in w . Using an L1-norm regular-izer yields a sparse model. More formally, the regularizedML objective is,ˆ w = arg max w , b N (cid:88) i = logP ( y i | x i ) − λ (cid:107) w (cid:107) , (3)where ( x i , y i ) , i = , . . . , N are the full feature vectors andtheir associated labels, λ is a user determined real valued pos-itive regularization parameter, and (cid:107)·(cid:107) denotes the L1-norm.Increasing the value of λ will result in more elements of w being shrunk to zero, i.e., more sparse. Variable selection isperformed by increasing the value of λ until a desired numberof w elements are non-zero. The elements of x correspondingto the non-zero elements of w are the top ranked variables.These top ranked variables can then be sorted from best toworst by sorting the corresponding entries of w in order ofdescending absolute magnitude. We use the L1-LR imple-mentation of (Schmidt, 2011).Each method results in a ranking of the features, x i , frombest to worst. If we vectorize the indices of the t top rankedfeatures as k = ( k , k , . . . , k t ) T , then after feature selection x = ( x k , x k . . . , x k t ) T . Linear Classiﬁcation

Once the important variables were identiﬁed, we usedthem to classify the gaze data as having originated from alearner or non-learner. This required that we train a classi-ﬁer to distinguish between two classes of data. Recall thateach eye movement resulted in a feature vector, or sample w b X X Figure 5 . Illustration of a linear classiﬁer. w is the normal vectorof the hyperplane which separates the feature space into two deci-sion regions, and b is the distance from the origin to the hyperplane.The blue circles represent samples from class 1, while the greensquares represent samples from class 0. All but one of the bluecircles exists on the positive side of hyperplane, and are classiﬁedcorrectly. x . A classiﬁer deﬁnes a decision rule for predicting whethera sample is from class 0 or 1. A linear classiﬁer was usedbecause of its ease of interpretation (Martinez & Zhu, 2005)– the absolute model weights give the relative importance ofthe eye tracking variables. We illustrate in Fig. 5 with a 2-dimensional linear classiﬁer model speciﬁed by w and b . w is the normal vector of the hyperplane which separates thefeature space into two decision regions, and b is the distancefrom the origin to the hyperplane (i.e., the o ﬀ set).All samples x above the hyperplane are assigned to class1 while the samples below are assigned to class 0. Data sam-ples x existing on the boundary satisfy w T x − b =

0. There-fore, samples are classiﬁed according to the sign of w T x − b .In this example w = ( − . , . T , so the second dimension, x , is more informative for classiﬁcation. Note that in ourcase the feature space has not two but up to 334 dimensions,depending on the cut-o ﬀ for variable selection.Several varieties of linear classiﬁers exist. In this work,we used the Bayes classiﬁer with equal covariances, L1-LR,and the Support Vector Machine (SVM) algorithm. Bayes with equal covariances (Bayes):

When both classesare assumed to be multivariate normally distributed with thesame covariance Σ , means µ and µ , and equal priors, theBayes classiﬁer decision boundary is a hyperplane given by w = Σ − ( µ − µ ) and b = w T ( µ − µ ) (Duda, Hart, &Stork, 2001).L1 Logistic regression:

Recall that L1-LR yields a proba-bility that a sample belongs to a particular class. It uses themodel of Equation (2), where w deﬁnes the normal of thehyperplane, and the sign of w T x − b determines the classlabel. Support Vector Machine:

SVM is a linear classiﬁer whichmaximizes the margin between two classes of data (Burges,1998). In the case that the training samples are perfectly sep-arable by a hyperplane, we can ﬁnd w and b such that the SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ data satisﬁes the following constraints, x Ti w − b ≥ y i = , (4) x Ti w − b ≤ − y i = . (5)Essentially, these constraints specify that the samples fromthe di ﬀ erent classes reside on opposite sides of the decisionboundary. The margin between the classes, deﬁned by (cid:107) w (cid:107) where (cid:107) · (cid:107) deﬁnes the L2-norm, is then maximized subjectto the above constraints. The dual formulation of the con-strained optimization problem results in a quadratic programfor w and b . In the case that samples from each class arenot linearly separable, a penalty is introduced to penalize theamount that a sample is on the wrong side of the hyperplane.Again, the dual formulation results in a quadratic programfor w and b . We used the implementation of (Chang & Lin,2001). Classiﬁcation Accuracy

The classiﬁcation accuracy used for adults was the leave-one-subject-out cross-validation (LOSO-CV) accuracy. InLOSO-CV, the samples belonging to one participant are se-questered, and the remaining samples are used to train theclassiﬁer. The sequestered samples are then classiﬁed withthe learned classiﬁer, and the procedure is repeated for everyparticipant in the database. The total number of correctlyclassiﬁed samples divided by the total number of samples isthe LOSO-CV accuracy.The classiﬁcation accuracy used for infants was the leave-one-experiment-block-out cross-validation (LOBO-CV) ac-curacy. This alternative accuracy measure makes more e ﬀ ec-tive use of the eye movement data when the sample size isvery small. In LOBO-CV, the samples belonging to one ex-periment block are sequestered, and the remaining samplesare used to train the classiﬁer. The sequestered samples arethen classiﬁed with the learned classiﬁer, and the procedureis repeated for every block in the database. The total num-ber of correctly classiﬁed samples over the total number ofsamples is the LOBO-CV accuracy. Results

Adult Experiment

We ﬁrst labeled the adult trials as category learner or non-learner. This resulted in 728 learning class samples and1 ,

256 non-learning class samples for the learning phase, and473 learning class samples and 601 non-learning class sam-ples for the testing phase in the category A or B categorylearning condition. There were 496 learning class samplesand 1 ,

568 non-learning class samples for the learning phase,and 323 learning class samples and 717 non-learning classsamples for the testing phase in the category C or D categorylearning condition. The indeterminate samples were not usedin any of the experiments. We then extracted the eye track-ing variables from each trial’s gaze sequence. Each labeleddata sample resulted in a 182-dimensional feature vector for

ANOVA, Group AB Learn L O S O − C V a cc u r a cy NBR, Group AB Learn

L1−LR, Group AB Learn

ANOVA, Group CD Learn L O S O − C V a cc u r a cy NBR, Group CD Learn

L1−LR, Group CD Learn

ANOVA, group AB Test L O S O − C V a cc u r a cy NBR, group AB Test

L1−LR, group AB Test

ANOVA, group CD Test number of features L O S O − C V a cc u r a cy NBR, group CD Test number of features

L1−LR, group CD Test number of features

Figure 6 . Leave one subject out cross-validation accuracy for adultsubjects as a function of the number of top ranked variables used forclassiﬁcation. The ﬁrst two rows show results for the learning phaseof the experiments (categories AB and CD, respectively). The lasttwo rows show the results for the testing phase of the experiments(categories AB and CD, respectively). ANOVA, NBR, and L1-LRcorrespond to ANOVA feature selection, Naive Bayes feature se-lection, and L1 penalized logistic regression feature selection, re-spectively. AB and CD correspond to category object A or B andC or D respectively. In almost all cases, the classiﬁcation accuracywas near the maximum after including very few features and didnot change much when including more. Chance level is plottedas the accuracy resulting from classifying each sample as the mostcommon class. the learning phase samples, and a 334-dimensional featurevector for the testing phase samples.We applied the variable selection algorithms to identifythe most important variables for separating learners fromnon-learners, and validated those variables using the threelinear classiﬁers. The LOSO-CV error is reported as a func-tion of the number of top features used for classiﬁcation inFig. 6. Recall that the features encode the eye tracking vari-ables. The results show that a very small numbers of featuresyield a high classiﬁcation rate, and including more featuresdoes not improve the accuracy.The stable performance beyond just a few features sug-gests that a small number of variables is su ﬃ cient for dis-criminating learners and non-learners. The top ﬁve variablesfor ANOVA, NBR, and L1-LR are listed in Table 2. We bold-faced the variables that were consistently ranked in the topﬁve variables across both the category A or B and categoryC or D conditions and all feature selection algorithms. Weunderlined variables that were consistently ranked in the top UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS A o r B Learning condition

ANOVA NBR L1-LR1.

Lat to rel AOI ﬁx Lat to rel AOI ﬁx Den of ﬁx at AOI 4 Den of ﬁx at AOI 4 Den of ﬁx at AOI 4 Lat to rel AOI ﬁx

3. AOI 4, DHB 2 AOI 4, DHB 2 2 nd ﬁx at AOI 44. 2 nd ﬁx at AOI 4 2 nd ﬁx at AOI 4 5 th ﬁx at AOI 45. 1 st ﬁx at AOI 4 1 st ﬁx at AOI 4 3 rd ﬁx at rel AOI C o r D Lat to rel AOI ﬁx Den of ﬁx at AOI 6 Den of ﬁx at AOI 6 Den of ﬁx at AOI 6 Lat to rel AOI ﬁx Lat to rel AOI ﬁx

3. AOI 6, DHB 5 AOI 6, DHB 2 1 st ﬁx at rel AOI4. 1 st ﬁx at AOI 6 1 st ﬁx at AOI 6 1 st sac to rel AOI5. 1 st ﬁx at non-rel AOI 2 nd ﬁx at AOI 6 Den of ﬁx at AOI 1 A o r B Testing condition

ANOVA NBR L1-LR1. 3 rd ﬁx at non-rel AOI 2 nd ﬁx at non-rel AOI 2 nd ﬁx at non-rel AOI2. 2 nd ﬁx at non-rel AOI 2 nd sac to non-rel AOI 3 rd ﬁx at non-rel AOI3. 2 nd sac to non-rel AOI 1 st ﬁx at non-rel AOI 1 st ﬁx at non-rel AOI4. 1 st ﬁx at non-rel AOI Duration of 3 rd ﬁx 2 nd sac to non-rel AOI5. Number AOIs ﬁxated 3 rd ﬁx at non-rel AOI 1 st sac to non-rel AOI C o r D

1. 4 th ﬁx at non-rel AOI Rel AOI ﬁx density 2 nd sac to non-rel AOI2. 3 rd ﬁx at non-rel AOI 1 st ﬁx at non-rel AOI 1 st ﬁx at non-rel AOI3. 2 nd sac to non-rel AOI Den of ﬁx at AOI 13 Rel AOI ﬁx density4. Number AOIs ﬁxated 1 st ﬁx at rel AOI 2 nd ﬁx at non-rel AOI5. 3 rd sac to non-rel AOI 2 nd ﬁx at non-rel AOI 3 rd ﬁx at non-rel AOI Table 2

Adult Experiment: The following variables were determinedmost relevant during the category learning and category dis-crimination phases of the adult experiment. The bold face en-tries show variables that were consistently determined mostrelevant using all feature selection algorithms and on twoseparate category object conditions. The underlined entriesshow variables that were determined most relevant by atleast two feature selection algorithms and across both cat-egory conditions. ANOVA, NBR, and L1-LR correspond tothe di ﬀ erent feature selection algorithms. AOI 4 is relevantin the category A or B condition, and corresponds to AOI 6in the category C or D condition. We use the following short-hand convention: ﬁxation (ﬁx), saccade (sac), relevant (rel),density (den), latency (lat), distance histogram bin (DHB). ﬁve variables by at least two of the three features selectionalgorithms and across category A or B and category C or Dconditions. Note that AOI 4 for the category A or B condi-tion is equivalent to AOI 6 in the category C or D condition.The consistent top variables in the learning condition were latency to a ﬁxation at the relevant AOI , density of ﬁxationsat the relevant AOI , and ﬁrst ﬁxation at the relevant AOI . Thetop variables in the testing condition were ﬁrst, second, andthird ﬁxations . Infant Experiment

We ﬁrst labeled the infant trials as category learner or non-learner. This amounted to 135 learning class samples and137 non-learning class samples for the learning phase, and40 learning class samples and 40 non-learning class samplesfor the testing phase in the category A or B category learningcondition. The C or D category learning condition resulted in139 learning class samples and 127 non-learning class sam-ples for the learning phase, and 40 learning class samples and

ANOVA, Group AB Learn L O B O − C V a cc u r a cy NBR, Group AB Learn

L1−LR, Group AB Learn

ANOVA, Group CD Learn L O B O − C V a cc u r a cy NBR, Group CD Learn

L1−LR, Group CD Learn

ANOVA, group AB Test L O B O − C V a cc u r a cy NBR, group AB Test

L1−LR, group AB Test

ANOVA, group CD Test number of features L O B O − C V a cc u r a cy NBR, group CD Test number of features

L1−LR, group CD Test number of features

Figure 7 . Leave one experimental block out cross-validation ac-curacy for infant subjects as a function of the number of top rankedvariables used for classiﬁcation. We use the same conventions ofFig. 6.

40 non-learning class samples for the testing phase. As in theadult experiment, the indeterminate samples were not used.After labeling the data and extracting the variables from eachgaze sequence, each sample resulted in a 334-dimensionalfeature vector for both the learning and testing phase sam-ples.The three linear classiﬁers discussed above were appliedto determine the LOBO-CV error as a function of the numberof top features selected by the three di ﬀ erent feature selec-tion algorithms. The results are shown in Fig. 7, where wesee that classifying infants requires signiﬁcantly more vari-ables than the adult case. This is to be expected because ofthe di ﬀ use looking pattern typical of babies. The top ﬁveinfant variables are shown in Table 3. The underlined entrieswere consistently selected by at least two feature selectionalgorithms and across both category conditions. The consis-tent top variables in the learning and testing conditions were density of ﬁxations and DHB , which describes the density ofﬁxations at di ﬀ erent distances from the relevant AOI(s). The fourth ﬁxation was also relevant in the testing condition. Comparing Infants to Adults

The above results raise a new question. How similar arethe attention models of adults and infants? Speciﬁcally, sincethe infant data are so noisy, can we use the adult model toimprove on the infant one? To test this, we used the adultSVM classiﬁer model trained with the top ﬁve variables from SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ A o r B Learning condition

ANOVA NBR L1-LR1. Den of ﬁx at AOI 10 Den of ﬁx at AOI 2 Den of ﬁx at AOI 102. 3 rd ﬁx at AOI 10 Den of ﬁx at AOI 10 1 st sac to AOI 53. AOI 11, DHB 5 AOI 11, DHB 5 Den of ﬁx at AOI 14. Den of sac to AOI 10 AOI 11, DHB 35 2 nd ﬁx at AOI 15. AOI 4, DHB 20 AOI 11, DHB 22 Den of ﬁx at AOI 26. 2 nd ﬁx at AOI 10 Den of sac to AOI 2 AOI 11, DHB 57. 2 nd ﬁx at AOI 1 Den of ﬁx at AOI 1 AOI 11, DHB 228. Den of ﬁx at AOI 1 Den of ﬁx at AOI 9 2 nd sac to AOI 39. Den of ﬁx at AOI 2 AOI 4, DHB 7 AOI 11, DHB 1610. AOI 11, DHB 22 AOI 4, DHB 12 Den of ﬁx at AOI 3 C o r D

1. AOI 13, DHB 5 4 th ﬁx at AOI 2 AOI 13, DHB 52. AOI 6, DHB 21 3 rd sac to AOI 2 AOI 6, DHB 213. Den of ﬁx at AOI 13 1 st ﬁx at AOI 5 Den of ﬁx at AOI 134. AOI 13, DHB 2 3 rd sac to non-rel AOI 3 rd sac to AOI 105. 3 rd sac to non-rel AOI 3 rd ﬁx at AOI 6 4 th ﬁx at AOI 106. 1 st ﬁx at AOI 14 3 rd ﬁx at AOI 9 3 rd sac to non-rel AOI7. 1 st ﬁx at non-rel AOI AOI 6, DHB 21 4 th ﬁx at AOI 58. Den of ﬁx at AOI 14 AOI 13, DHB 5 1 st ﬁx at AOI 149. AOI 6, DHB 8 1 st ﬁx at non-rel AOI 5 th ﬁx at AOI 310. 4 th ﬁx at AOI 5 4 th ﬁx at AOI 4 4 th sac to AOI 1 A o r B Testing condition

ANOVA NBR L1-LR1. 6 th ﬁx at non-rel AOI 6 th ﬁx at non-rel AOI 6 th ﬁx at non-rel AOI2. AOI 4, DHB 20 AOI 4, DHB 20 AOI 4, DHB 83. AOI 4, DHB 8 Den of ﬁx at AOI 10 4 th ﬁx at AOI 74. 4 th ﬁx at AOI 7 Den of sac to AOI 10 Den of ﬁx at AOI 135. Den of sac to AOI 10 4 th ﬁx at AOI 7 AOI 4, DHB 206. AOI 4, DHB 10 AOI 4, DHB 8 AOI 11, DHB 167. Den of ﬁx at AOI 10 AOI 4, DHB 10 7 th ﬁx at AOI 78. 7 th ﬁx at AOI 10 number unique AOIs ﬁxated AOI 4, DHB 109. 1 st sac to AOI 10 1 st ﬁx at AOI 13 Den of sac to AOI 1010. 1 st ﬁx at AOI 1 2 nd ﬁx at AOI 14 Den of ﬁx at AOI 1 C o r D

1. Den of ﬁx at AOI 7 Den of ﬁx at AOI 7 3 rd sac to AOI 32. 3 rd sac to AOI 3 3 rd ﬁx at AOI 7 Den of ﬁx at AOI 73. 1 st ﬁx at AOI 7 6 th ﬁx at AOI 7 4 th ﬁx at AOI 104. 4 th ﬁx at AOI 10 Duration of 2 nd ﬁx 4 th ﬁx at AOI 125. 3 rd ﬁx at AOI 7 AOI 13, DHB 2 2 nd ﬁx at AOI 16. 6 th ﬁx at AOI 7 3 rd sac to AOI 3 2 nd sac to AOI 87. 2 nd ﬁx at AOI 1 1 st ﬁx at AOI 7 2 nd ﬁx at AOI 28. 2 nd ﬁx at AOI 2 4 th ﬁx at AOI 7 6 th ﬁx at AOI 89. 1 st sac to AOI 10 4 th ﬁx at AOI 10 2 nd sac to AOI 110. AOI 13, DHB 12 AOI 6, DHB 7 Den of ﬁx at AOI 1 Table 3

Infant Experiment: The following variables were determinedmost relevant during the category learning and category dis-crimination phases of the infant experiment. The consistentlyselected variables are underlined. We use the same conven-tions as Table 2.

ANOVA to predict if infants were learners or non-learners.This was done only for the testing phase, because the testingphase images for adults and infants are similar so that theextracted variables correspond. Infants were classiﬁed with49% accuracy in the category A or B condition. Infants wereclassiﬁed with 50% accuracy in the category C or D condi-tion. These chance performance of the adult model identi-fying infant learners suggests that adults and infants attendto category objects di ﬀ erently. The remaining challenge is toexamine the generality of this ﬁnding by testing a broader setof categories. Discussion

The analysis demonstrates that the proposed method ofvariable selection is viable. We can predict if adults havelearned a category based on a very small number of topranked eye track variables. Furthermore, there is strongagreement between the di ﬀ erent ranking approaches aboutwhich variables are most important. Speciﬁcally, the con-sistently top ranked variables in the learning condition were latency to a ﬁxation at the relevant AOI , density of ﬁxationsat the relevant AOI , and ﬁrst ﬁxation at the relevant AOI .The consistently top ranked variables in the testing condi-tion were ﬁrst, second, and third ﬁxations . These resultssuggest that during learning, adult category learners focustheir attention on the relevant category features. The resultsalso suggest that adult category learners make discriminationjudgments within the ﬁrst few ﬁxations.The infant data analysis also demonstrated that we canpredict category learning, but it requires a larger number ofvariables. Again, there was agreement between the di ﬀ erentranking approaches about which variables are most impor-tant. The consistent top variables in the learning and testingconditions describe the ﬁxation density at di ﬀ erent areas ofthe image. The fourth ﬁxation was also relevant in the testingcondition. These results suggest that for infants, the patternof ﬁxations over the entire object is more informative thanthe amount of time ﬁxating the relevant AOI. Therefore, itappears that whereas category learning in adults in markedby focused attention to category-relevant features, categorylearning in infants is marked by more di ﬀ used attention cou-pled with exploration of multiple areas of interest. Finally,we showed that the adult model does not predict infant cat-egory learning. We address these ﬁndings in the followingsections. Why were the best variables di ﬀ erent for infantsand adults? There is an important di ﬀ erence between the variable se-lection results of the adult experiment versus the infant ex-periment. Namely, while adult learners are identiﬁed readilywith a small set of variables emphasizing early looks at therelevant AOI(s), infant learners are better identiﬁed based ontheir pattern of ﬁxating over the trial. We propose an expla-nation based on the goals of adult versus infant participants.Although the experiment stimuli were the same for adultsand infants, there were fundamental di ﬀ erences in the designof the experiments. Namely, the objectives during the ex-periment were di ﬀ erent for adults versus infants. In the caseof adults, the participants were given a particular task: learnhow to identify a member of this category from a set of ex-emplars, then identify a member of that category from a pairof objects. Therefore, the adults’ goal was to learn the cate-gory object as quickly as possible given the limited numberof training examples, such that discrimination could be per-formed accurately during the testing phase. Given this goal,it was reasonable that the consistently selected variables wereassociated with relevant AOI ﬁxation density as well as earlylooks (see Table 2 ). UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS In the case of infants, we used sound and motion to drawthe infant ˜Os attention to the relevant AOI in hopes that heor she learned to identify the category object. Then, we as-sumed that if the category was learned, the infants wouldshow a preference for either the learned category or novelcategory during the discrimination phase. To this uncertainty,we ought to add the large amounts of random movements ofthe infant’s gaze. As we see in our results, a larger set ofvariables is required to reliably distinguishing learners fromnon-learners. In addition, while ﬁxation density is important,the emphasis is not on ﬁxating the relevant AOI.

Conclusion

We have developed a methodology for automatically de-termining eye tracking variables that are relevant to un-derstanding category learning and discrimination processes.Previous research has relied on ad-hoc techniques to deter-mine which variables should be analyzed. Instead, we usedstatistical methods to ﬁnd the important variables in an over-complete set of variables.The e ﬃ cacy of the approach was veriﬁed with an adultand infant categorization study. The variables determinedmost relevant for adults emphasize looking at the relevantAOI(s) longer, and earlier during the categorization tasks.This result is satisfying for two reasons: 1) It is expectedthat category learners quickly focus their e ﬀ orts on the rel-evant AOI(s), and 2) these variables coincide with the vari-ables proportion ﬁxation time and relative priority of previ-ous eye tracking category learning studies such as (Rehder &Ho ﬀ man, 2005). The variables determined most relevant forinfants emphasize the overall pattern of ﬁxating the object.This result is also satisfying because infants are expected toexplore objects.Note that the important variables were veriﬁed by the task and stimuli described. Altering these parameters may resultin di ﬀ erent important variables. By comparing the importantvariables among di ﬀ erent tasks and stimuli, we can furtherdissociate which eye tracking variables are linked to speciﬁcprocesses during categorization. Acknowledgments

This research was partially supported by NIH grant R01EY-020834 to AM, NSF grant BCS-0720135 and NIH grantR01 HD-056105 to VS, and a Seed Grant by the Center forCognitive Science (CCS) at OSU to DBW, VS, and AM. SRwas partially supported by a fellowship from the CCS.

References

Amso, D., & Johnson, S. P. (2005). Selection and inhibition ininfancy: evidence from the spatial negative priming paradigm.

Cognition , (2), B27–B36.Amso, D., & Johnson, S. P. (2006). Learning by selection: Visualsearch and object perception in young infants. DevelopmentalPsychology , (6), 1236–1245.Amso, D., & Johnson, S. P. (2008). Development of visual selectionin 3- to 9-month-olds: Evidence from saccades to previously ig-nored locations. Infancy , , 675–686. Best, C. A., Robinson, C. W., & Sloutsky, V. M. (2010). The e ﬀ ectof labels on visual attention: An eye tracking study. Proceedingsof the 32nd Annual Conference of the Cognitive Science Society ,1846–1851.Burges, C. J. C. (1998). A tutorial on support vector machines forpattern recognition.

Data Mining and Knowledge Discovery , ,121–167.Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for sup-port vector machines [Computer software manual]. (Softwareavailable at http: // / cjlin / libsvm)Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classiﬁca-tion (2nd edition) (2nd ed.). Wiley-Interscience. Hardcover.Falck-Ytter, T., Gredeb¨ack, G., & von Hofsten, C. (2006). Infantspredict other people’s action goals [Journal article].

Nature Neu-roscience , (7), 878–879.Johnson, S. P., Amso, D., & Slemmer, J. A. (2003). Developmentof object concepts in infancy: Evidence for early learning in aneye-tracking paradigm. Proceedings of the National Academy ofSciences , (18), 10568–10573.Johnson, S. P., Davidow, J., Hall-Haro, C., & Frank, M. C. (2008).Development of perceptual completion originates in informationacquisition. Developmental Psychology , (5), 1214–1224.Johnson, S. P., Slemmer, J. A., & Amso, D. (2004). Where infantslook determines how they see: Eye movements and object per-ception performance in 3-month-olds. Infancy , (2), 185–201.Kloos, H., & Sloutsky, V. M. (2008). What’s behind di ﬀ erent kindsof kinds: E ﬀ ects of statistical density on learning and represen-tation of categories. Journal of Experimental Psychology: Gen-eral , (1), 52-72.Martinez, A. M., & Zhu, M. (2005). Where are linear feature ex-traction methods applicable? Pattern Analysis and Machine In-telligence, IEEE Transactions on , (12), 1934–1944.McMurray, B., & Aslin, R. N. (2004). Anticipatory eye move-ments reveal infants’ auditory and visual categories. Infancy , (2), 203–229.Murphy, K. (2004). Kalman ﬁlter toolbox for mat-lab.

Retrieved from

Quinn, P. C., Doran, M. M., Reiss, J. E., & Ho ﬀ man, J. E. (2009).Time course of visual attention in infant categorization of catsversus dogs: evidence for a head bias as revealed through eyetracking. Child development , (1), 151–161.Quinn, P. C., Eimas, P. D., & Rosenkrantz, S. L. (1993). Evidencefor representations of perceptually similar natural categories by3-month-old and 4-month-old infants. Perception , (4), 463–475.Rayner, K. (1998). Eye movements in reading and information pro-cessing: 20 years of research. Psychological Bulletin , (3),372–422.Rehder, B., & Ho ﬀ man, A. B. (2005). Eyetracking and selectiveattention in category learning. Cognitive Psychology , , 1-41.Salvucci, D. D., & Goldberg, J. H. (2000). Identifying ﬁxations andsaccades in eye-tracking protocols. In Etra ’00: Proceedings ofthe 2000 symposium on eye tracking research & applications (pp. 71–78). New York, NY, USA.Schmidt, M. (2011). L1general - matlab code for solving l1-regularization problems.

Retrieved from

Stampe, D. M. (1993). Heuristic ﬁltering and reliable calibrationmethods for video-based pupil-tracking systems.

Behavioral Re-search Methods, Instruments, & Computers ,25