Automatic selection of eye tracking variables in visual categorization in adults and infants
Samuel Rivera, Catherine A. Best, Hyungwook Yim, Dirk B. Walther, Vladimir M. Sloutsky, Aleix M. Martinez
AAutomatic selection of eye tracking variables in visual categorization inadults and infants
Samuel Rivera , Catherine A. Best , Hyungwook Yim , Dirk B. Walther , Vladimir M. Sloutsky ,Aleix M. Martinez Department of Electrical and Computer Engineering, The Ohio State University Department of Psychology, The Ohio State University Center for Cognitive Science, The Ohio State University Department of Psychology, University of Toronto
Visual categorization and learning of visual categories exhibit early onset, however theunderlying mechanisms of early categorization are not well understood. The main limitingfactor for examining these mechanisms is the limited duration of infant cooperation (10-15minutes), which leaves little room for multiple test trials. With its tight link to visual attention,eye tracking is a promising method for getting access to the mechanisms of category learning.But how should researchers decide which aspects of the rich eye tracking data to focus on?To date, eye tracking variables are generally handpicked, which may lead to biases in the eyetracking data. Here, we propose an automated method for selecting eye tracking variablesbased on analyses of their usefulness to discriminate learners from non-learners of visualcategories. We presented infants and adults with a category learning task and tracked their eyemovements. We then extracted an over-complete set of eye tracking variables encompassingdurations, probabilities, latencies, and the order of fixations and saccadic eye movements. Wecompared three statistical techniques for identifying those variables among this large set thatare useful for discriminating learners form non-learners: ANOVA ranking, Bayes ranking, andL1 regularized logistic regression. We found remarkable agreement between these methodsin identifying a small set of discriminant variables. Moreover, the top eye tracking variablesallow us to identify category learners among adults and 6- to 8-month-old infants.keywords: eye tracking, infant category learning, eye tracking variables
Introduction
Categorization is the process of forming an equivalenceclass, such that discriminable entities elicit a common rep-resentation and / or a common response. While categorylearning exhibits early onset (Quinn, Eimas, & Rosenkrantz,1993), relatively little is known about the underlying mech-anism and the development of early categorization. The pri-mary reason is the limited duration of infants’ cooperation,yielding only a small number of data points per participant.These limitations have restricted researchers’ ability to an-swer fundamental questions about categorization in infants:How do infants learn a category? And does this process un-dergo development?Analyses of eye movements may help solve some of theseproblems: eye movements are tightly linked to visual atten-tion (see Rayner, 1998, for a review) and they yield multi-ple (albeit not necessarily independent) data points even forrelatively short trial durations. Therefore, analyses of eyemovements can provide critical information of how attentionallocation changes in the course of category learning. How-ever, eye tracking results in a large amount of data, and it isnot clear a priori what (if any) components of eye movementare related to category learning. As a result, eye trackingresearchers are free to choose from a large set of variables without a common set of principles for deciding which orhow many variables to analyze. In the following, we re-view the infant category learning eye tracking literature inorder to substantiate this claim. Given the limited numberof eye tracking categorization studies with infants, we take abroader approach and review studies that examined catego-rization, object completion, and visual attention.One variable that has been used across a variety of tasks issaccade latency (Amso & Johnson, 2006; Johnson, Amso, &Slemmer, 2003). For example, Johnson, Amso, and Slemmer(2003) examined whether learning a ff ects object representa-tions in infancy. Four- and six-month-old infants were pre-sented with an object that moved behind an occluder and thenreemerging on the other side of the occluder. The researchersreasoned that if babies maintain the existence of the occludedobject, they should anticipate the object to reemerge from theoccluder. In this case, participants should exhibit a faster eyemovement to the point of reemergence than if they do notanticipate the object. In two other studies (Amso & Johnson,2005, 2008), researchers used saccade latency to examine thedevelopment of visual selection. Participants (3-, 6-, and 9-month-olds and adults) were presented with a Spatial Neg-ative Priming (SNP) paradigm. On a given trial they wereshown an attention grabbing target in Location 1 and a dis-creet distracter in Location 2. On the next trial, they were a r X i v : . [ q - b i o . Q M ] N ov SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ either shown the target in Location 2 (a negative primingprobe) or in Location 3 (a control trial). SNP was inferredfrom greater saccade latency on the probe trials than on thecontrol trials.Other potentially informative variables are (a) frequenciesof fixations per unit of time within one or more Areas ofInterest (AOIs), (b) dwell times within one or more AOIs,and (c) frequencies of saccades within and between AOIs(Johnson, Slemmer, & Amso, 2004). In one study, Johnsonand colleagues (2004) examined the development of objectunity perception in infancy using both behavioral and eyetracking data. In the task, participants were habituated to arod moving behind an occluder. After participants habitu-ated, the occluder was removed to reveal either a broken rodor a complete rod. Infants who perceived the rod as mov-ing behind the occluder as a coherent object, indicated bya recovery of looking after habituation, were identified asperceivers. Participants who perceived a broken rod did notrecover their looking after habituation and were considerednon-perceivers. The authors then examined eye tracking datafor perceivers and non-perceivers using the eye tracking vari-ables described above.Perhaps the most frequently used eye tracking variableis fixation location. Researchers have relied on this vari-able across a variety of tasks, including object completion(Amso & Johnson, 2006; Johnson, Davidow, Hall-Haro, &Frank, 2008), understanding other people’s actions (Falck-Ytter, Gredeb¨ack, & von Hofsten, 2006), and a variety ofcategorization and category learning tasks (Best, Robinson,& Sloutsky, 2010; McMurray & Aslin, 2004; Quinn, Doran,Reiss, & Ho ff man, 2009). For example, Quinn et al. (2009)examined categorization of cats and dogs in 6- to 7-month-olds. They found that when items were presented in thecanonical upright position, categorization accuracy was asso-ciated with a high proportion of looking to the head; whereas,when items were presented in an inverted position, catego-rization was associated with the large proportion of lookingto the body. Best et al (2010) presented 16- to 24-month-oldswith a category learning task. Categories included artificialitems that had shapes in four locations, with two of the shapesbeing category relevant (i.e., present in all members of thecategory, but not in non-members) and two being irrelevant(i.e., exhibiting both within- and between-category variabil-ity). The researchers examined the proportion of fixationsto category-relevant features and its change in the course offamiliarization. A summary of the reviewed studies is pre-sented in Table 1.Several conclusions can be drawn from this brief review.First, multiple eye tracking variables have been used acrossstudies to examine infants’ learning. And second, althoughall these variables make intuitive sense, no formal selectionprocess of these variables has been defined. This poses sev-eral concerns and questions. Namely, since di ff erent vari-ables are used in di ff erent studies, do these variables corre-late and thus provide redundant information? If not, whyshould any one variable be used instead of another? Shouldthe variables be selected based on the specific categorizationtask, or should a fixed subset of eye tracking variables be used across all studies? Can we define a principled way ofdetermining which variables to analyze in a given categorylearning study? The current study defines a methodology toaddress these questions and concerns.Our approach was as follows. We extracted a large setof possible variables from the adult or infant gaze sequenceduring a categorization task (e.g. fixations, saccades, gazesequences, etc). Some of the variables have been used inanalyzing categorization experiments, whereas others werenew. Our goal was to use the power of statistics and ma-chine learning to identify eye tracking variables that bestpredict category learning in adults and subsequently in in-fants. The significant contribution of this work is that it pro-vides a systematic methodology for identifying eye trackingvariables that are linked to category learning, thus allowingresearchers to better understand category learning from eyetracking data. Futhermore, our results retrospectively vali-date the use of several variables from the eye tracking studiesmentioned above. Methods
Participants
Three category learning experiments were conductedwhere two focused on adults and one on infants. Twenty-fouradults participated in Experiment 1. Forty-six adults who didnot participate in Experiment 1 participated in Experiment2. All adult participants had normal or corrected to normalvision and were undergraduate students at The Ohio StateUniversity participating for course credit.In Experiment 3, fifteen 6- to 8-month-old infants partic-ipated in the experiment. Parents provided written consentupon arrival to the laboratory. All parents reported their in-fants to be developing typically and in good health.
Materials
Category members were flower-like objects with sixpetals. An example object is shown in Fig. 1(a), with thepetals enumerated for clarity. There were four di ff erent cate-gories, each defined by a single petal having a distinguishingcolor and shape. Specifically, the category defining featureswere category A: a pink triangle at position 4; category B: ablue semi-circle at position 4; category C: an orange squareat position 6; and category D: a yellow pentagon at position6. Each object was uniquely associated with one category.That is, no one object exhibited the defining features for twoor more categories. Stimuli were displayed on the computersubtending an approximate horizontal visual angle of 11 ◦ andan approximate vertical visual angle of 11 ◦ . The eccentric-ity of the stimuli subtended an approximate horizontal visualangle of 14.4 ◦ and an approximate vertical visual angle of11.5 ◦ .During all three experiments, the participants’ eye gazewas recorded using a Tobii T60 eye-tracker (Falls Church,VA) at the sampling rate of 60Hz while they sat approxi-mately 60 cm away from the display screen. UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS Source Task Eye Tracking Variable
Johnson, et al., PNAS, 2003 Object completion Saccade latencyAmso and Johnson, Developmental Psychology, 2006 Object completion Proportion fixation to AOIJohnson, et al, Infancy, 2004 Object completion Fixation frequency, dwell time, saccade frequencyJohnson, et al, Developmental Psychology, 2008 Object completion Proportion fixation to AOIFalck-Ytter, et al, Nature Neuroscience 2006 Goal perception Proportion fixation to AOI, AOI fixation timeAmso and Johnson, Cognition, 2005 Visual Search Saccade latencyAmso and Johnson, Infancy, 2008 Visual Search Saccade latencyQuinn et al, Child Development, 2009 Categorization Proportion fixation to AOIMcMurray and Aslin, Infancy, 2004 Category Learning Proportion fixation to AOIBest, Robinson, and Sloutsky, Proceedings of Cognitive Science Society, 2010 Category learning Proportion fixation to AOI
Table 1
Comparison of previous eye tracking variables. (a) Category object (b) AOI Example
Figure 1 . Image (a) is an example category object used in theeye tracking study, with the Areas of interest (AOI)s enumerated.Numbers were not displayed to the participants. Image (b) illus-trates the concept of AOIs. Each stick figure is divided into 3 AOIscontaining the head, torso, and legs. The relevant AOI for gender discrimination is bracketed in red. Only the head AOI is relevantbecause the other AOIs are the same across both categories.
Experiment 1 - Adult supervised
To validate the e ffi cacy of the approach before applying itto infants, adult participants were tested. In Experiment 1,participants were instructed to look for a single distinguish-ing feature prior to the start of the experiment. Previous re-search suggests that this hint (i.e., a form of supervised learn-ing) has large consequences with respect to how quickly par-ticipants learn to classify the objects, especially when thereare few overlapping features (Kloos & Sloutsky, 2008).The experiment had 8 blocks where in each block therewere 8 learning trials followed by 4 testing trials. In a learn-ing trial, a category member was displayed in the center ofthe screen one at a time for 1 . ◦ horizontal vi-sual angles from the center of the screen. Test stimuli weredisplayed on the screen until the participant made a deci-sion via key press about which stimulus was a member ofthe learned category. The left / right position of the test stim-uli was counter-balanced. A randomly located fixation point(cross-hair) directed the participant’s gaze to a position onthe monitor in-between trials. The to-be-learned categoryremained the same for the first 4 blocks. A second to-be-learned category was introduced in the final 4 blocks withoutnotice to the participant. If the experiment started with a cat-egory defined by the petal at position 4 (category A or B),the second category was defined by the petal at position 6(category C or D), and vice-versa. Using categories havingdefinitive features at di ff erent positions provided a mecha-
98 10111213 141 2 3456 7
Figure 2 . Illustration of category pair image with AOIs labeled.Numbers were not shown to participants. nism to verify the reproducibility of the variables determinedmost important.
Experiment 2 - Adult unsupervised
The procedure in Experiment 2 (unsupervised condition)was identical to that in Experiment 1 except that participantsdid not receive supervision (i.e., no hint provided) about thecategory structure.
Experiment 3 - Infant supervised
The infant experiment was conceptually similar to Exper-iment 1, but was methodologically adapted for infants by us-ing a familiarization paradigm. To aid infant learning, cate-gory exemplars were shown in pairs on each trial. This wasalso done so that the presentation of stimuli in the learningand testing phases had an identical layout. An example withlabeled AOIs is shown in Fig. 2. Furthermore, there was onlya supervised condition, in which the infants were presentedwith a pre-trial fixation video of synchronized sound and mo-tion (e.g., looming flower petal with corresponding whistlesound) to draw their attention to the single category-relevantfeature. It should be noted that no unsupervised conditionwas conducted with infants because previous developmentalresearch suggests supervision is necessary for young childrento learn categories with a sparse category structure (Kloos &Sloutsky, 2008). Once the infant looked at the fixation video,the learning trial commenced. Infants had to accumulate 3seconds of looking to the category exemplar pairs. When-ever an infant looked away, an attention-grabbing fixationwas presented until the infant reconnected with the imageson the screen. After accumulating 3 seconds of looking tothe stimulus pair, the supervisory fixation video was againpresented followed by another learning image pair. This pro-cedure was repeated for 8 blocks with 8 learning pairs perblock.
SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ In the testing phase, a novel category member was pairedwith a novel non-category member as in the adult experi-ments. The standard assumption is that an infant can dis-criminate between the category and non-category objects ifhe or she displays a novelty or familiarity preference. Therewere two test trials per block, where a novel exemplar fromthe learned category was paired with a novel exemplar froma novel category. Test trials were presented for a fixed dura-tion of 6 seconds, and left / right position of familiar or novelcategory objects was counterbalanced. Collecting and filtering eye tracking data
Eye movements were monitored during object viewingwith the Tobii T60 eye tracker. The system tracks eye move-ments by illuminating the eye with infrared light and captur-ing corneal reflection at a frequency of 60 Hz (i.e., every 16.6ms). As the eye moves, the angle between the pupil and thecorneal reflection increases, allowing the x-y coordinates ofthe gaze position to be measured over time.Unfortunately, the gaze data contain noise, missing data,and micro-saccades, which makes identifying true fixationsand saccades di ffi cult. Therefore, we processed these datausing MATLAB-based software created in our laboratory bythe first author. The raw eye tracking data from every exper-imental block were filtered using a Kalman filter (Murphy,2004) before extracting the variables of interest. The eyegaze data from both the left and right eye were filtered sep-arately. The average of the filtered data from left and righteyes yielded the mean eye gaze data, which were used in thecurrent analyses. Labeling the Data
The eye movement sequences during the learning phase of the experiment aid in understanding category learning,while the sequences during the testing phase aid our under-standing of category use. Before applying our methodologyto understand these processes, however, the eye tracking datafrom both the learning and testing phases of the experimentswere labeled as learner (class 1), non-learner (class 0), or indeterminate (class 2). Indeterminate samples were not an-alyzed.
Adult Labels : Intuitively, labels for adult data are readilyidentified based on the accuracy of the responses during thetesting phase. An uninterrupted string of correct responsesduring the testing phase suggests that the participant haslearned the category. Each adult experimental block yielded12 eye movement sequences. These correspond to eye move-ments during the presentation of 8 exemplar images duringthe learning phase and 4 test images during the testing phase.Adult participants had 4 blocks of learning and discriminat-ing the same category before switching to a new category.This amounted to 32 samples of the learning phase, and 16samples of the testing phase for each category per participant.The 16 samples from the testing phase were associated witha 16 digit binary string, called the response string . This datastructure shows performance over the first and last 4 blocksof the experiment. A one identifies a correct response, while
Cat. A: 1001010101111111Cat. C: 1001011111111111
Figure 3 . Illustration of a time series for one subject. Ones encodecorrect category discrimination, while zeros encode incorrect re-sponses. The first row shows the accuracy over the first four blocks(presentation of first category), while the second row shows accu-racy over the last four blocks (presentation of second category). Theclass labels (learner or non-learner) are determined separately foreach row, because the category condition is di ff erent for each row. a zero denotes an incorrect response on the associated testtrial. An example is shown in Fig. 3. We labeled each 16digit response string separately as follows.We expect a learner’s response string to contain a series ofones beginning within the string and terminating at the end ofthe response string. This pattern indicates that at some pointthe participant learned the category and correctly discrimi-nated the category from that point on. A participant who hasnot learned the category (non-learner) would select one ofthe two stimuli by chance in each trial. A non-learner couldget lucky and achieve a series of correct guesses. In orderto determine if a participant is a learner or a non-learner weneed to establish a criterion that allows us to reject chance asthe cause for a series of ones. The question that we need toanswer is how many ones we should expect for a learner. Weaddress this problem by assessing how likely it is that we seea sequence of M consecutive ones in a binary response stringof length R =
16. Under the null hypothesis, the participantdoes not know the category label and selects one of the stim-uli by chance, giving her a 50% chance of correctly guess-ing the category member. Each sequence is equally likelygiven this assumption, so the probability of guessing at least M right in a row is the total number of sequences having M ones in a row ( ( R − M + × ( R − M ) ) divided by the totalnumber of binary sequences of length R ( 2 R ). This yieldsthe probability p = ( R − M + / (2 M ). For R = M = p < .
01 ( p = . test phase and learning phase samples before the POL werelabeled as non-learner, while the samples after the POL werelabeled as learner. The learning phase samples from theblock associated with the POL were labeled as indetermi-nate, because it was unclear at exactly which trial during theblock the category was learned.If the learning criterion was not achieved, we then identi-fied the remaining non-learning and indeterminate samples.We first labeled correct responses at the end of the respondstring as indeterminate. Those samples did not meet thelearning criterion, but might be attributed to learning late inthe experiment. The remaining samples were labeled as non-learner. Approximately 8% of the adult eye track sampleswere labeled indeterminate. UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS Infant Labels : Obviously, infants are not able to respondby keyboard to identify a category object. Instead, we useda variant of the preferential looking paradigm to determineif an infant could discriminate between novel exemplars ofa familiar category object and a novel category object. Re-call that the preferential looking paradigm assumes that in-fants who consistently look more to one class of stimuli whenshown two classes of stimuli are able to discriminate betweenthe two classes. This means that if the infant consistentlylooks longer at the learned category object (or novel cate-gory object), then he or she is assumed to be discriminatingbetween the familiar and novel categories.Given this paradigm, we labeled each infant’s gaze databy blocks. Each block consisted of two test phase samples.We determined novelty preference as the ratio of total look-ing time to the novel category object compared to the totallooking time to the novel category plus the familiar categoryobject. We sorted the mean of the novelty preference foreach block according to the absolute di ff erence from 0 .
5. Athird of the blocks with mean novelty preference closest to0 . . Variable List
We compiled an over-complete list of eye tracking vari-ables. We began with the fundamental variables, fixationsand saccades. Fixations occur when eye gaze is maintainedat a single position for at least 100ms. They were identi-fied using the dispersion threshold algorithm of (Salvucci &Goldberg, 2000). Saccades are rapid eye movements thatmove the eye gaze between points of fixation. To be consid-ered a saccade, the eye movement needed to exceed smoothpursuit velocity of 30 ◦ per second or 0 . ◦ per sample at 60Hz(Stampe, 1993). The fixations and saccades were determinedwith respect to a specific AOI within an object. AOIs are re-gions of an object image or scene that can be grouped in somemeaningful way, such as color uniformity or the structuralnature of the object. The AOIs can further be described asrelevant or non-relevant, based on their role in determiningobject category membership. Fig. 1(b) illustrates this con-cept for the stick-figure gender category. In this toy exampleonly the head is relevant for category membership becausethe torso and legs are identical across stick-figures and thusdo not help one to discriminate gender.These fundamental eye tracking variables were combinedin various ways to derive a larger set of variables. Our vari-able list is defined as follows:1. AOI fixation percentage describes the percentage oftime fixated at the di ff erent AOIs during a trial. All non-AOI fixations were discarded in this and all of the variablesdefined. For an image with q AOIs, this variable was encodedas a q -dimensional feature vector with a value for each AOI.The fixation percentages were normalized so that they sumto 1, unless there were no fixations at AOIs. In that case, all percentages were set to 0.2. Relevant AOI fixation density is a scalar value betweenzero and one which describes the percentage of the total timefixated which is at the relevant AOI(s).3.
AOI fixation sequence describes the sequence of AOIfixations during one trial. We limited this sequence to sevenfixations, starting with trial onset (not counting fixations tothe fixation mark). We encoded a fixation sequence of f fix-ations over q AOIs as a q × f binary matrix, where each col-umn of the matrix had a 1 in the position corresponding to theAOI which was fixated, and zero otherwise. If there were lessthan f fixations, the last columns were set to 0. This binaryencoding of the fixation sequence allowed us to describe anysequence of fixations without imposing an ordering of theAOIs. In addition, the fixation sequence was represented asa sequence of relevant and non-relevant AOI fixations. Thisrepresentation yielded a 2 × f binary matrix, in which eachcolumn had a 1 in the first row if a relevant AOI was fixatedor a 1 in the second row if a non-relevant AOI was fixated.If there were less than f fixations, the last columns were setto 0. The analysis showed that the latter representation wasmore informative in some cases. Note that it was necessaryto use a pair of binary variables to encode each fixation of thelatter representation because it allowed for three cases: fix-ation at a relevant AOI or non-relevant AOI, and less than f fixations. The number of fixations to consider as well as thestart position were determined using cross validation (CV).In cross validation, the training data is separated into k par-titions, and for each partition, samples are classified usinga classifier that is trained with the remaining k − Duration of fixations in sequence describes the durationof each fixation in the sequence described by variable 3. Thisvariable was encoded by an f -dimensional vector.5. Total distance traveled by eye is a scalar describing thetotal distance traveled by the eye gaze during a trial.6.
Histogram of fixation distances to relevant AOI de-scribes how much time is spent fixated near or far from therelevant AOI(s). A histogram with h bins and an image with r relevant AOIs yielded an h × r dimensional matrix. Eachcolumn corresponds to a di ff erent relevant AOI, and eachrow corresponds to a particular range of distances from thatAOI. The entries define the percentage of time fixated at thedistance ranges, so each column sums to 1. If no fixationsoccured, all values were set to 0. The number of bins wasdetermined using CV. The bins corresponding to AOI 4 areillustrated in Fig. 4.7. Number of unique AOIs visited is a scalar describingthe total number of unique AOIs fixated during a trial. AOIrevisits were not counted as new.8.
Saccade sequence is similar to variable 3 but describesthe sequence of AOI saccades during one trial. All saccadeswhose targets were not to AOIs were discarded in this and all
SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ of the variables defined. The sequence was limited to sevensaccades, starting at the first saccade. The number of sac-cades to consider as well as the start saccade were determinedusing CV. We encoded a saccade sequence of s saccades over q AOIs as a q × s binary matrix. Each column of the matrixhad a 1 in the position corresponding to the AOI which wasthe target of the saccade, and zero otherwise. If there werefewer than s saccades, the last column(s) were set to 0. In ad-dition, the saccade sequence was represented as a sequenceof saccades to relevant and non-relevant AOIs. This repre-sentation yielded a 2 × s binary matrix, with each columncontaining a 1 in the first row if saccading to a relevant AOIor a 1 in the second row if saccading to a non-relevant AOI.If there were fewer than s saccades, the last column(s) wereset to 0.9. Relative number of saccades to an AOI is the saccadeanalogue of variable 1 and describes the relative number ofsaccades to the AOIs during one eye movement. An imagewith q AOIs yielded a q -dimensional feature vector with eachentry counting the number of saccade targets at the corre-sponding AOI. The vector was normalized by the sum of allentries such that the entries added to 1 unless there were nosaccades. In that case, all entries were set to 0.10. Fixation latency to relevant AOI describes the delay be-fore fixating at a relevant AOI during an eye movement. Itwas encoded as a scalar between 0 and 1, with 0 correspond-ing to fixating to a relevant AOI immediately and 1 describ-ing a sequence with no fixation on a relevant AOI. The valuewas computed as the start time of the first relevant AOI fixa-tion divided by the total eye track time.11.
Saccade latency to relevant AOI describes the delaybefore a saccade to a relevant AOI. It was also encoded as ascalar between 0 and 1, defined by the end time of the firstsaccade to a relevant AOI divided by the total eye gaze time.Thus, eye movements were represented by a feature vector x = ( x , x , . . . , x d ) T whose d entries correspond to the vari-ables described. Each feature x i was normalized to zero meanand unit variance over the entire dataset. In addition, each x was associated with a class label, y ∈ { , } . For clarity, fea-tures denote the entries of the feature vector which encodesthe eye tracking variables, while variables correspond to themeasures of the eye tracking enumerated above. Therefore, d is much larger than 11, because encoding certain variablesrequires multiple feature values. Note that d was the same forall feature vectors corresponding to images having the samenumber of AOIs and relevant AOIs because a fixed numberof fixations and saccades were analyzed.In the case of a single category object having one relevantAOI, variable 2 is identical to one of the values of variable 1.Therefore, after extracting all variables from the gaze data ofall participants, we did a simple redundancy check to elim-inate cases of identical valued features. For features x i , x j to be identical, they must mirror each other over all featurevectors for a particular category condition and within eitherthe learning or testing phase. In addition, the informationencoded by several of these features overlaps. This over-complete representation allows us to find the encoding thatis best suited to describe the categorization task. To this end,
21 3 4 5 6 7 8 9...-- (a) Single object (b) Object pair
Figure 4 . Illustration of the histogram bins for distance to AOI4, with bins numbered. Variable 6 describes the percent of timefixating within each bin for each relevant AOI. Bin sizes were de-termined using CV. we performed variable selection on this over-complete set.
Variable Selection
Our goal was to identify the subset of variables from theset defined above that can best separate the classes: categorylearners and non-learners. This was achieved using ANOVAfeature selection by ranking, Naive Bayes Ranking (NBR),and L1 logistic regression (L1-LR).
ANOVA feature selection relies on a standard hypothesistest on each feature of x . Specifically, let x i denote the i th feature of x . Using a dataset of eye tracking feature vectorsand the associated class labels, we performed a two tailed t -test of the null hypothesis, which states that samples of x i coming from classes 1 and 0 are independent randomsamples from normal distributions with equal means, µ i and µ i , respectively. The alternative says that the classmeans are di ff erent. We calculated the test statistics andthe corresponding p -value. A low p -value means the nullhypothesis is rejected with confidence. Since the goal was tofind the variables which best separate the classes, the featurewith lowest p -value was ranked as best. The p -values werecalculated for all features x i , i = . . . d , and they wereranked from best to worst according to increasing p -values. Naive Bayes Ranking (NBR) assumes that if the labeled fea-ture vectors can be accurately classified given a single fea-ture, x i , then that feature separates the two classes well. Inessence, the classification accuracy is a surrogate for the classseparability achieved by the particular feature. Therefore, thefeatures are ranked from best to worst according to decreas-ing classification accuracy.The Bayes classifier assigns a sample, x , to the class hav-ing the highest posterior distribution. More formally, assumethat the class-conditional density functions of a feature, givenits class, p ( x i | y ), are modeled as normally distributed withmean and variance, µ iy and σ iy , respectively. Then by ap-plying the Bayes formula, the posterior probability of class y is P ( y | x i ) = p ( x i | y ) P ( y ) p ( x i ) , where P ( y ) is the prior of class y , and p ( x i ) is a scale factor which ensures that the probabilities sumto 1. In this work, we have P ( y = = P ( y = = .
5, corre-sponding to the assumption that a priori a sample is equallylikely to come from a learner as from a non-learner. The
UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS scale factor is the same for both classes, so it can be omittedin the classification rule. Finally, the predicted class label, ˆ y is given by: ˆ y = arg max j ∈{ , } p ( x i | y = j ) P ( y = j ) . (1)L1 Logistic Regression (L1-LR) is a linear classifier model,which returns a probability that a sample belongs to a partic-ular class. It accomplishes this by modeling the natural log-arithm of the ratio, or odds, of two probabilities as a linearfunction of x . More formally, ln (cid:32) p ( y = | x )1 − p ( y = | x ) (cid:33) = w T x − b , (2)where ln denotes the natural logarithm. The two class prob-abilities are then given by p ( y = | x ) = + exp( − w T x + b ) , p ( y = | x ) = exp( − w T x + b )1 + exp( − w T x + b ) . The parameters, w and b , are estimated via Maximum Likeli-hood (ML) estimation. A regularization term λ is introducedto penalize large elements in w . Using an L1-norm regular-izer yields a sparse model. More formally, the regularizedML objective is,ˆ w = arg max w , b N (cid:88) i = logP ( y i | x i ) − λ (cid:107) w (cid:107) , (3)where ( x i , y i ) , i = , . . . , N are the full feature vectors andtheir associated labels, λ is a user determined real valued pos-itive regularization parameter, and (cid:107)·(cid:107) denotes the L1-norm.Increasing the value of λ will result in more elements of w being shrunk to zero, i.e., more sparse. Variable selection isperformed by increasing the value of λ until a desired numberof w elements are non-zero. The elements of x correspondingto the non-zero elements of w are the top ranked variables.These top ranked variables can then be sorted from best toworst by sorting the corresponding entries of w in order ofdescending absolute magnitude. We use the L1-LR imple-mentation of (Schmidt, 2011).Each method results in a ranking of the features, x i , frombest to worst. If we vectorize the indices of the t top rankedfeatures as k = ( k , k , . . . , k t ) T , then after feature selection x = ( x k , x k . . . , x k t ) T . Linear Classification
Once the important variables were identified, we usedthem to classify the gaze data as having originated from alearner or non-learner. This required that we train a classi-fier to distinguish between two classes of data. Recall thateach eye movement resulted in a feature vector, or sample w b X X Figure 5 . Illustration of a linear classifier. w is the normal vectorof the hyperplane which separates the feature space into two deci-sion regions, and b is the distance from the origin to the hyperplane.The blue circles represent samples from class 1, while the greensquares represent samples from class 0. All but one of the bluecircles exists on the positive side of hyperplane, and are classifiedcorrectly. x . A classifier defines a decision rule for predicting whethera sample is from class 0 or 1. A linear classifier was usedbecause of its ease of interpretation (Martinez & Zhu, 2005)– the absolute model weights give the relative importance ofthe eye tracking variables. We illustrate in Fig. 5 with a 2-dimensional linear classifier model specified by w and b . w is the normal vector of the hyperplane which separates thefeature space into two decision regions, and b is the distancefrom the origin to the hyperplane (i.e., the o ff set).All samples x above the hyperplane are assigned to class1 while the samples below are assigned to class 0. Data sam-ples x existing on the boundary satisfy w T x − b =
0. There-fore, samples are classified according to the sign of w T x − b .In this example w = ( − . , . T , so the second dimension, x , is more informative for classification. Note that in ourcase the feature space has not two but up to 334 dimensions,depending on the cut-o ff for variable selection.Several varieties of linear classifiers exist. In this work,we used the Bayes classifier with equal covariances, L1-LR,and the Support Vector Machine (SVM) algorithm. Bayes with equal covariances (Bayes):
When both classesare assumed to be multivariate normally distributed with thesame covariance Σ , means µ and µ , and equal priors, theBayes classifier decision boundary is a hyperplane given by w = Σ − ( µ − µ ) and b = w T ( µ − µ ) (Duda, Hart, &Stork, 2001).L1 Logistic regression:
Recall that L1-LR yields a proba-bility that a sample belongs to a particular class. It uses themodel of Equation (2), where w defines the normal of thehyperplane, and the sign of w T x − b determines the classlabel. Support Vector Machine:
SVM is a linear classifier whichmaximizes the margin between two classes of data (Burges,1998). In the case that the training samples are perfectly sep-arable by a hyperplane, we can find w and b such that the SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ data satisfies the following constraints, x Ti w − b ≥ y i = , (4) x Ti w − b ≤ − y i = . (5)Essentially, these constraints specify that the samples fromthe di ff erent classes reside on opposite sides of the decisionboundary. The margin between the classes, defined by (cid:107) w (cid:107) where (cid:107) · (cid:107) defines the L2-norm, is then maximized subjectto the above constraints. The dual formulation of the con-strained optimization problem results in a quadratic programfor w and b . In the case that samples from each class arenot linearly separable, a penalty is introduced to penalize theamount that a sample is on the wrong side of the hyperplane.Again, the dual formulation results in a quadratic programfor w and b . We used the implementation of (Chang & Lin,2001). Classification Accuracy
The classification accuracy used for adults was the leave-one-subject-out cross-validation (LOSO-CV) accuracy. InLOSO-CV, the samples belonging to one participant are se-questered, and the remaining samples are used to train theclassifier. The sequestered samples are then classified withthe learned classifier, and the procedure is repeated for everyparticipant in the database. The total number of correctlyclassified samples divided by the total number of samples isthe LOSO-CV accuracy.The classification accuracy used for infants was the leave-one-experiment-block-out cross-validation (LOBO-CV) ac-curacy. This alternative accuracy measure makes more e ff ec-tive use of the eye movement data when the sample size isvery small. In LOBO-CV, the samples belonging to one ex-periment block are sequestered, and the remaining samplesare used to train the classifier. The sequestered samples arethen classified with the learned classifier, and the procedureis repeated for every block in the database. The total num-ber of correctly classified samples over the total number ofsamples is the LOBO-CV accuracy. Results
Adult Experiment
We first labeled the adult trials as category learner or non-learner. This resulted in 728 learning class samples and1 ,
256 non-learning class samples for the learning phase, and473 learning class samples and 601 non-learning class sam-ples for the testing phase in the category A or B categorylearning condition. There were 496 learning class samplesand 1 ,
568 non-learning class samples for the learning phase,and 323 learning class samples and 717 non-learning classsamples for the testing phase in the category C or D categorylearning condition. The indeterminate samples were not usedin any of the experiments. We then extracted the eye track-ing variables from each trial’s gaze sequence. Each labeleddata sample resulted in a 182-dimensional feature vector for
ANOVA, Group AB Learn L O S O − C V a cc u r a cy NBR, Group AB Learn
L1−LR, Group AB Learn
ANOVA, Group CD Learn L O S O − C V a cc u r a cy NBR, Group CD Learn
L1−LR, Group CD Learn
ANOVA, group AB Test L O S O − C V a cc u r a cy NBR, group AB Test
L1−LR, group AB Test
ANOVA, group CD Test number of features L O S O − C V a cc u r a cy NBR, group CD Test number of features
L1−LR, group CD Test number of features
Figure 6 . Leave one subject out cross-validation accuracy for adultsubjects as a function of the number of top ranked variables used forclassification. The first two rows show results for the learning phaseof the experiments (categories AB and CD, respectively). The lasttwo rows show the results for the testing phase of the experiments(categories AB and CD, respectively). ANOVA, NBR, and L1-LRcorrespond to ANOVA feature selection, Naive Bayes feature se-lection, and L1 penalized logistic regression feature selection, re-spectively. AB and CD correspond to category object A or B andC or D respectively. In almost all cases, the classification accuracywas near the maximum after including very few features and didnot change much when including more. Chance level is plottedas the accuracy resulting from classifying each sample as the mostcommon class. the learning phase samples, and a 334-dimensional featurevector for the testing phase samples.We applied the variable selection algorithms to identifythe most important variables for separating learners fromnon-learners, and validated those variables using the threelinear classifiers. The LOSO-CV error is reported as a func-tion of the number of top features used for classification inFig. 6. Recall that the features encode the eye tracking vari-ables. The results show that a very small numbers of featuresyield a high classification rate, and including more featuresdoes not improve the accuracy.The stable performance beyond just a few features sug-gests that a small number of variables is su ffi cient for dis-criminating learners and non-learners. The top five variablesfor ANOVA, NBR, and L1-LR are listed in Table 2. We bold-faced the variables that were consistently ranked in the topfive variables across both the category A or B and categoryC or D conditions and all feature selection algorithms. Weunderlined variables that were consistently ranked in the top UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS A o r B Learning condition
ANOVA NBR L1-LR1.
Lat to rel AOI fix Lat to rel AOI fix Den of fix at AOI 4 Den of fix at AOI 4 Den of fix at AOI 4 Lat to rel AOI fix
3. AOI 4, DHB 2 AOI 4, DHB 2 2 nd fix at AOI 44. 2 nd fix at AOI 4 2 nd fix at AOI 4 5 th fix at AOI 45. 1 st fix at AOI 4 1 st fix at AOI 4 3 rd fix at rel AOI C o r D Lat to rel AOI fix Den of fix at AOI 6 Den of fix at AOI 6 Den of fix at AOI 6 Lat to rel AOI fix Lat to rel AOI fix
3. AOI 6, DHB 5 AOI 6, DHB 2 1 st fix at rel AOI4. 1 st fix at AOI 6 1 st fix at AOI 6 1 st sac to rel AOI5. 1 st fix at non-rel AOI 2 nd fix at AOI 6 Den of fix at AOI 1 A o r B Testing condition
ANOVA NBR L1-LR1. 3 rd fix at non-rel AOI 2 nd fix at non-rel AOI 2 nd fix at non-rel AOI2. 2 nd fix at non-rel AOI 2 nd sac to non-rel AOI 3 rd fix at non-rel AOI3. 2 nd sac to non-rel AOI 1 st fix at non-rel AOI 1 st fix at non-rel AOI4. 1 st fix at non-rel AOI Duration of 3 rd fix 2 nd sac to non-rel AOI5. Number AOIs fixated 3 rd fix at non-rel AOI 1 st sac to non-rel AOI C o r D
1. 4 th fix at non-rel AOI Rel AOI fix density 2 nd sac to non-rel AOI2. 3 rd fix at non-rel AOI 1 st fix at non-rel AOI 1 st fix at non-rel AOI3. 2 nd sac to non-rel AOI Den of fix at AOI 13 Rel AOI fix density4. Number AOIs fixated 1 st fix at rel AOI 2 nd fix at non-rel AOI5. 3 rd sac to non-rel AOI 2 nd fix at non-rel AOI 3 rd fix at non-rel AOI Table 2
Adult Experiment: The following variables were determinedmost relevant during the category learning and category dis-crimination phases of the adult experiment. The bold face en-tries show variables that were consistently determined mostrelevant using all feature selection algorithms and on twoseparate category object conditions. The underlined entriesshow variables that were determined most relevant by atleast two feature selection algorithms and across both cat-egory conditions. ANOVA, NBR, and L1-LR correspond tothe di ff erent feature selection algorithms. AOI 4 is relevantin the category A or B condition, and corresponds to AOI 6in the category C or D condition. We use the following short-hand convention: fixation (fix), saccade (sac), relevant (rel),density (den), latency (lat), distance histogram bin (DHB). five variables by at least two of the three features selectionalgorithms and across category A or B and category C or Dconditions. Note that AOI 4 for the category A or B condi-tion is equivalent to AOI 6 in the category C or D condition.The consistent top variables in the learning condition were latency to a fixation at the relevant AOI , density of fixationsat the relevant AOI , and first fixation at the relevant AOI . Thetop variables in the testing condition were first, second, andthird fixations . Infant Experiment
We first labeled the infant trials as category learner or non-learner. This amounted to 135 learning class samples and137 non-learning class samples for the learning phase, and40 learning class samples and 40 non-learning class samplesfor the testing phase in the category A or B category learningcondition. The C or D category learning condition resulted in139 learning class samples and 127 non-learning class sam-ples for the learning phase, and 40 learning class samples and
ANOVA, Group AB Learn L O B O − C V a cc u r a cy NBR, Group AB Learn
L1−LR, Group AB Learn
ANOVA, Group CD Learn L O B O − C V a cc u r a cy NBR, Group CD Learn
L1−LR, Group CD Learn
ANOVA, group AB Test L O B O − C V a cc u r a cy NBR, group AB Test
L1−LR, group AB Test
ANOVA, group CD Test number of features L O B O − C V a cc u r a cy NBR, group CD Test number of features
L1−LR, group CD Test number of features
Figure 7 . Leave one experimental block out cross-validation ac-curacy for infant subjects as a function of the number of top rankedvariables used for classification. We use the same conventions ofFig. 6.
40 non-learning class samples for the testing phase. As in theadult experiment, the indeterminate samples were not used.After labeling the data and extracting the variables from eachgaze sequence, each sample resulted in a 334-dimensionalfeature vector for both the learning and testing phase sam-ples.The three linear classifiers discussed above were appliedto determine the LOBO-CV error as a function of the numberof top features selected by the three di ff erent feature selec-tion algorithms. The results are shown in Fig. 7, where wesee that classifying infants requires significantly more vari-ables than the adult case. This is to be expected because ofthe di ff use looking pattern typical of babies. The top fiveinfant variables are shown in Table 3. The underlined entrieswere consistently selected by at least two feature selectionalgorithms and across both category conditions. The consis-tent top variables in the learning and testing conditions were density of fixations and DHB , which describes the density offixations at di ff erent distances from the relevant AOI(s). The fourth fixation was also relevant in the testing condition. Comparing Infants to Adults
The above results raise a new question. How similar arethe attention models of adults and infants? Specifically, sincethe infant data are so noisy, can we use the adult model toimprove on the infant one? To test this, we used the adultSVM classifier model trained with the top five variables from SAMUEL RIVERA , CATHERINE A. BEST , HYUNGWOOK YIM , DIRK B. WALTHER , VLADIMIR M. SLOUTSKY , ALEIX M. MARTINEZ A o r B Learning condition
ANOVA NBR L1-LR1. Den of fix at AOI 10 Den of fix at AOI 2 Den of fix at AOI 102. 3 rd fix at AOI 10 Den of fix at AOI 10 1 st sac to AOI 53. AOI 11, DHB 5 AOI 11, DHB 5 Den of fix at AOI 14. Den of sac to AOI 10 AOI 11, DHB 35 2 nd fix at AOI 15. AOI 4, DHB 20 AOI 11, DHB 22 Den of fix at AOI 26. 2 nd fix at AOI 10 Den of sac to AOI 2 AOI 11, DHB 57. 2 nd fix at AOI 1 Den of fix at AOI 1 AOI 11, DHB 228. Den of fix at AOI 1 Den of fix at AOI 9 2 nd sac to AOI 39. Den of fix at AOI 2 AOI 4, DHB 7 AOI 11, DHB 1610. AOI 11, DHB 22 AOI 4, DHB 12 Den of fix at AOI 3 C o r D
1. AOI 13, DHB 5 4 th fix at AOI 2 AOI 13, DHB 52. AOI 6, DHB 21 3 rd sac to AOI 2 AOI 6, DHB 213. Den of fix at AOI 13 1 st fix at AOI 5 Den of fix at AOI 134. AOI 13, DHB 2 3 rd sac to non-rel AOI 3 rd sac to AOI 105. 3 rd sac to non-rel AOI 3 rd fix at AOI 6 4 th fix at AOI 106. 1 st fix at AOI 14 3 rd fix at AOI 9 3 rd sac to non-rel AOI7. 1 st fix at non-rel AOI AOI 6, DHB 21 4 th fix at AOI 58. Den of fix at AOI 14 AOI 13, DHB 5 1 st fix at AOI 149. AOI 6, DHB 8 1 st fix at non-rel AOI 5 th fix at AOI 310. 4 th fix at AOI 5 4 th fix at AOI 4 4 th sac to AOI 1 A o r B Testing condition
ANOVA NBR L1-LR1. 6 th fix at non-rel AOI 6 th fix at non-rel AOI 6 th fix at non-rel AOI2. AOI 4, DHB 20 AOI 4, DHB 20 AOI 4, DHB 83. AOI 4, DHB 8 Den of fix at AOI 10 4 th fix at AOI 74. 4 th fix at AOI 7 Den of sac to AOI 10 Den of fix at AOI 135. Den of sac to AOI 10 4 th fix at AOI 7 AOI 4, DHB 206. AOI 4, DHB 10 AOI 4, DHB 8 AOI 11, DHB 167. Den of fix at AOI 10 AOI 4, DHB 10 7 th fix at AOI 78. 7 th fix at AOI 10 number unique AOIs fixated AOI 4, DHB 109. 1 st sac to AOI 10 1 st fix at AOI 13 Den of sac to AOI 1010. 1 st fix at AOI 1 2 nd fix at AOI 14 Den of fix at AOI 1 C o r D
1. Den of fix at AOI 7 Den of fix at AOI 7 3 rd sac to AOI 32. 3 rd sac to AOI 3 3 rd fix at AOI 7 Den of fix at AOI 73. 1 st fix at AOI 7 6 th fix at AOI 7 4 th fix at AOI 104. 4 th fix at AOI 10 Duration of 2 nd fix 4 th fix at AOI 125. 3 rd fix at AOI 7 AOI 13, DHB 2 2 nd fix at AOI 16. 6 th fix at AOI 7 3 rd sac to AOI 3 2 nd sac to AOI 87. 2 nd fix at AOI 1 1 st fix at AOI 7 2 nd fix at AOI 28. 2 nd fix at AOI 2 4 th fix at AOI 7 6 th fix at AOI 89. 1 st sac to AOI 10 4 th fix at AOI 10 2 nd sac to AOI 110. AOI 13, DHB 12 AOI 6, DHB 7 Den of fix at AOI 1 Table 3
Infant Experiment: The following variables were determinedmost relevant during the category learning and category dis-crimination phases of the infant experiment. The consistentlyselected variables are underlined. We use the same conven-tions as Table 2.
ANOVA to predict if infants were learners or non-learners.This was done only for the testing phase, because the testingphase images for adults and infants are similar so that theextracted variables correspond. Infants were classified with49% accuracy in the category A or B condition. Infants wereclassified with 50% accuracy in the category C or D condi-tion. These chance performance of the adult model identi-fying infant learners suggests that adults and infants attendto category objects di ff erently. The remaining challenge is toexamine the generality of this finding by testing a broader setof categories. Discussion
The analysis demonstrates that the proposed method ofvariable selection is viable. We can predict if adults havelearned a category based on a very small number of topranked eye track variables. Furthermore, there is strongagreement between the di ff erent ranking approaches aboutwhich variables are most important. Specifically, the con-sistently top ranked variables in the learning condition were latency to a fixation at the relevant AOI , density of fixationsat the relevant AOI , and first fixation at the relevant AOI .The consistently top ranked variables in the testing condi-tion were first, second, and third fixations . These resultssuggest that during learning, adult category learners focustheir attention on the relevant category features. The resultsalso suggest that adult category learners make discriminationjudgments within the first few fixations.The infant data analysis also demonstrated that we canpredict category learning, but it requires a larger number ofvariables. Again, there was agreement between the di ff erentranking approaches about which variables are most impor-tant. The consistent top variables in the learning and testingconditions describe the fixation density at di ff erent areas ofthe image. The fourth fixation was also relevant in the testingcondition. These results suggest that for infants, the patternof fixations over the entire object is more informative thanthe amount of time fixating the relevant AOI. Therefore, itappears that whereas category learning in adults in markedby focused attention to category-relevant features, categorylearning in infants is marked by more di ff used attention cou-pled with exploration of multiple areas of interest. Finally,we showed that the adult model does not predict infant cat-egory learning. We address these findings in the followingsections. Why were the best variables di ff erent for infantsand adults? There is an important di ff erence between the variable se-lection results of the adult experiment versus the infant ex-periment. Namely, while adult learners are identified readilywith a small set of variables emphasizing early looks at therelevant AOI(s), infant learners are better identified based ontheir pattern of fixating over the trial. We propose an expla-nation based on the goals of adult versus infant participants.Although the experiment stimuli were the same for adultsand infants, there were fundamental di ff erences in the designof the experiments. Namely, the objectives during the ex-periment were di ff erent for adults versus infants. In the caseof adults, the participants were given a particular task: learnhow to identify a member of this category from a set of ex-emplars, then identify a member of that category from a pairof objects. Therefore, the adults’ goal was to learn the cate-gory object as quickly as possible given the limited numberof training examples, such that discrimination could be per-formed accurately during the testing phase. Given this goal,it was reasonable that the consistently selected variables wereassociated with relevant AOI fixation density as well as earlylooks (see Table 2 ). UTOMATIC SELECTION OF EYE TRACKING VARIABLES IN VISUAL CATEGORIZATION IN ADULTS AND INFANTS In the case of infants, we used sound and motion to drawthe infant ˜Os attention to the relevant AOI in hopes that heor she learned to identify the category object. Then, we as-sumed that if the category was learned, the infants wouldshow a preference for either the learned category or novelcategory during the discrimination phase. To this uncertainty,we ought to add the large amounts of random movements ofthe infant’s gaze. As we see in our results, a larger set ofvariables is required to reliably distinguishing learners fromnon-learners. In addition, while fixation density is important,the emphasis is not on fixating the relevant AOI.
Conclusion
We have developed a methodology for automatically de-termining eye tracking variables that are relevant to un-derstanding category learning and discrimination processes.Previous research has relied on ad-hoc techniques to deter-mine which variables should be analyzed. Instead, we usedstatistical methods to find the important variables in an over-complete set of variables.The e ffi cacy of the approach was verified with an adultand infant categorization study. The variables determinedmost relevant for adults emphasize looking at the relevantAOI(s) longer, and earlier during the categorization tasks.This result is satisfying for two reasons: 1) It is expectedthat category learners quickly focus their e ff orts on the rel-evant AOI(s), and 2) these variables coincide with the vari-ables proportion fixation time and relative priority of previ-ous eye tracking category learning studies such as (Rehder &Ho ff man, 2005). The variables determined most relevant forinfants emphasize the overall pattern of fixating the object.This result is also satisfying because infants are expected toexplore objects.Note that the important variables were verified by the task and stimuli described. Altering these parameters may resultin di ff erent important variables. By comparing the importantvariables among di ff erent tasks and stimuli, we can furtherdissociate which eye tracking variables are linked to specificprocesses during categorization. Acknowledgments
This research was partially supported by NIH grant R01EY-020834 to AM, NSF grant BCS-0720135 and NIH grantR01 HD-056105 to VS, and a Seed Grant by the Center forCognitive Science (CCS) at OSU to DBW, VS, and AM. SRwas partially supported by a fellowship from the CCS.
References
Amso, D., & Johnson, S. P. (2005). Selection and inhibition ininfancy: evidence from the spatial negative priming paradigm.
Cognition , (2), B27–B36.Amso, D., & Johnson, S. P. (2006). Learning by selection: Visualsearch and object perception in young infants. DevelopmentalPsychology , (6), 1236–1245.Amso, D., & Johnson, S. P. (2008). Development of visual selectionin 3- to 9-month-olds: Evidence from saccades to previously ig-nored locations. Infancy , , 675–686. Best, C. A., Robinson, C. W., & Sloutsky, V. M. (2010). The e ff ectof labels on visual attention: An eye tracking study. Proceedingsof the 32nd Annual Conference of the Cognitive Science Society ,1846–1851.Burges, C. J. C. (1998). A tutorial on support vector machines forpattern recognition.
Data Mining and Knowledge Discovery , ,121–167.Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for sup-port vector machines [Computer software manual]. (Softwareavailable at http: // / cjlin / libsvm)Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classifica-tion (2nd edition) (2nd ed.). Wiley-Interscience. Hardcover.Falck-Ytter, T., Gredeb¨ack, G., & von Hofsten, C. (2006). Infantspredict other people’s action goals [Journal article].
Nature Neu-roscience , (7), 878–879.Johnson, S. P., Amso, D., & Slemmer, J. A. (2003). Developmentof object concepts in infancy: Evidence for early learning in aneye-tracking paradigm. Proceedings of the National Academy ofSciences , (18), 10568–10573.Johnson, S. P., Davidow, J., Hall-Haro, C., & Frank, M. C. (2008).Development of perceptual completion originates in informationacquisition. Developmental Psychology , (5), 1214–1224.Johnson, S. P., Slemmer, J. A., & Amso, D. (2004). Where infantslook determines how they see: Eye movements and object per-ception performance in 3-month-olds. Infancy , (2), 185–201.Kloos, H., & Sloutsky, V. M. (2008). What’s behind di ff erent kindsof kinds: E ff ects of statistical density on learning and represen-tation of categories. Journal of Experimental Psychology: Gen-eral , (1), 52-72.Martinez, A. M., & Zhu, M. (2005). Where are linear feature ex-traction methods applicable? Pattern Analysis and Machine In-telligence, IEEE Transactions on , (12), 1934–1944.McMurray, B., & Aslin, R. N. (2004). Anticipatory eye move-ments reveal infants’ auditory and visual categories. Infancy , (2), 203–229.Murphy, K. (2004). Kalman filter toolbox for mat-lab.
Retrieved from
Quinn, P. C., Doran, M. M., Reiss, J. E., & Ho ff man, J. E. (2009).Time course of visual attention in infant categorization of catsversus dogs: evidence for a head bias as revealed through eyetracking. Child development , (1), 151–161.Quinn, P. C., Eimas, P. D., & Rosenkrantz, S. L. (1993). Evidencefor representations of perceptually similar natural categories by3-month-old and 4-month-old infants. Perception , (4), 463–475.Rayner, K. (1998). Eye movements in reading and information pro-cessing: 20 years of research. Psychological Bulletin , (3),372–422.Rehder, B., & Ho ff man, A. B. (2005). Eyetracking and selectiveattention in category learning. Cognitive Psychology , , 1-41.Salvucci, D. D., & Goldberg, J. H. (2000). Identifying fixations andsaccades in eye-tracking protocols. In Etra ’00: Proceedings ofthe 2000 symposium on eye tracking research & applications (pp. 71–78). New York, NY, USA.Schmidt, M. (2011). L1general - matlab code for solving l1-regularization problems.
Retrieved from
Stampe, D. M. (1993). Heuristic filtering and reliable calibrationmethods for video-based pupil-tracking systems.
Behavioral Re-search Methods, Instruments, & Computers ,25