Eleonora Vig
Xerox
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eleonora Vig.
Spatial Vision | 2009
Eleonora Vig; Michael Dorr; Erhardt Barth
We deal with the analysis of eye movements made on natural movies in free-viewing conditions. Saccades are detected and used to label two classes of movie patches as attended and non-attended. Machine learning techniques are then used to determine how well the two classes can be separated, i.e., how predictable saccade targets are. Although very simple saliency measures are used and then averaged to obtain just one average value per scale, the two classes can be separated with an ROC score of around 0.7, which is higher than previously reported results. Moreover, predictability is analysed for different representations to obtain indirect evidence for the likelihood of a particular representation. It is shown that the predictability correlates with the local intrinsic dimension in a movie.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Eleonora Vig; Michael Dorr; Thomas Martinetz; Erhardt Barth
Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.
computer vision and pattern recognition | 2016
Saumya Jetley; Naila Murray; Eleonora Vig
Most saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection. Data-driven methods for training saliency models using eye-fixation data are increasingly popular, particularly with the introduction of large-scale datasets and deep architectures. However, current methods in this latter paradigm use loss functions designed for classification or regression tasks whereas saliency estimation is evaluated on topographical maps. In this work, we introduce a new saliency map model which formulates a map as a generalized Bernoulli distribution. We then train a deep architecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distributions. We show in extensive experiments the effectiveness of such loss functions over standard ones on four public benchmark datasets, and demonstrate improved performance over state-of-the-art saliency methods.
european conference on computer vision | 2016
César Roberto De Souza; Adrien Gaidon; Eleonora Vig; Antonio M. López
Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos.
Journal of Field Robotics | 2014
Michael Milford; Eleonora Vig; Walter J. Scheirer; David Cox
For robots operating in outdoor environments, a number of factors, including weather, time of day, rough terrain, high speeds, and hardware limitations, make performing vision-based simultaneous localization and mapping with current techniques infeasible due to factors such as image blur and/or underexposure, especially on smaller platforms and low-cost hardware. In this paper, we present novel visual place-recognition and odometry techniques that address the challenges posed by low lighting, perceptual change, and low-cost cameras. Our primary contribution is a novel two-step algorithm that combines fast low-resolution whole image matching with a higher-resolution patch-verification step, as well as image saliency methods that simultaneously improve performance and decrease computing time. The algorithms are demonstrated using consumer cameras mounted on a small vehicle in a mixed urban and vegetated environment and a car traversing highway and suburban streets, at different times of day and night and in various weather conditions. The algorithms achieve reliable mapping over the course of a day, both when incrementally incorporating new visual scenes from different times of day into an existing map, and when using a static map comprising visual scenes captured at only one point in time. Using the two-step place-recognition process, we demonstrate for the first time single-image, error-free place recognition at recall rates above 50% across a day-night dataset without prior training or utilization of image sequences. This place-recognition performance enables topologically correct mapping across day-night cycles.
international conference on robotics and automation | 2014
Michael Milford; Walter J. Scheirer; Eleonora Vig; Arren Glover; Oliver Baumann; Jason B. Mattingley; David Cox
In this paper we present a novel, condition-invariant place recognition algorithm inspired by recent discoveries in human visual neuroscience. The algorithm combines intolerant but fast low resolution whole image matching with highly tolerant, sub-image patch matching processes. The approach does not require prior training and works on single images, alleviating the need for either a velocity signal or image sequence, differentiating it from current state of the art methods. We conduct an exhaustive set of experiments evaluating the relationship between place recognition performance and computational resources using part of the challenging Alderley sunny day - rainy night dataset, which has only been previously solved by integrating over 320 frame long image sequences. We achieve recall rates of up to 51% at 100% precision, matching places that have undergone drastic perceptual change while rejecting match hypotheses between highly aliased images of different places. Human trials demonstrate the performance is approaching human capability. The results provide a new benchmark for single image, condition-invariant place recognition.
Cognitive Computation | 2011
Eleonora Vig; Michael Dorr; Thomas Martinetz; Erhardt Barth
A less studied component of gaze allocation in dynamic real-world scenes is the time lag of eye movements in responding to dynamic attention-capturing events. Despite the vast amount of research on anticipatory gaze behaviour in natural situations, such as action execution and observation, little is known about the predictive nature of eye movements when viewing different types of natural or realistic scene sequences. In the present study, we quantify the degree of anticipation during the free viewing of dynamic natural scenes. The cross-correlation analysis of image-based saliency maps with an empirical saliency measure derived from eye movement data reveals the existence of predictive mechanisms responsible for a near-zero average lag between dynamic changes of the environment and the responding eye movements. We also show that the degree of anticipation is reduced when moving away from natural scenes by introducing camera motion, jump cuts, and film-editing.
international conference on image processing | 2012
Eleonora Vig; Michael Dorr; David Cox
Local spatiotemporal descriptors are being successfully used as a powerful video representation for action recognition. Particularly competitive recognition performance is achieved when these descriptors are densely sampled on a regular grid; in contrast to existing approaches that are based on features at interest points, dense sampling captures more contextual information, albeit at high computational cost. We here combine advantages of both dense and sparse sampling. Once descriptors are extracted on a dense grid, we prune them either randomly or based on a sparse saliency mask of the underlying video. The method is evaluated using two state-of-the-art algorithms on the challenging Hollywood2 benchmark. Classification performance is maintained with as little as 30% of descriptors, while more modest saliency-based pruning of descriptors yields improved performance. With roughly 80% of descriptors of the Dense Trajectories model, we outperform all previously reported methods, obtaining a mean average precision of 59.5%.
international conference on artificial neural networks | 2010
Eleonora Vig; Michael Dorr; Thomas Martinetz; Erhardt Barth
We investigate the extent to which eye movements in natural dynamic scenes can be predicted with a simple model of bottom-up saliency, which learns on different visual representations to discriminate between salient and less salient movie regions. Our image representations, the geometrical invariants of the structure tensor, are computed on multiple scales of an anisotropic spatio-temporal multiresolution pyramid. Eye movement data is used to label video locations as salient. For each location, low-dimensional features are extracted on the multiscale representations and used to train a classifier. The quality of the predictor is tested on a large test set of eye movement data and compared with the performance of two state-of-the-art saliency models on this data set. The proposed model demonstrates significant improvement - mean ROC score of 0.665 - over the selected baseline models with ROC scores of 0.625 and 0.635.
Visual Cognition | 2012
Michael Dorr; Eleonora Vig; Erhardt Barth
We here study the predictability of eye movements when viewing high-resolution natural videos. We use three recently published gaze data sets that contain a wide range of footage, from scenes of almost still-life character to professionally made, fast-paced advertisements and movie trailers. Intersubject gaze variability differs significantly between data sets, with variability being lowest for the professional movies. We then evaluate three state-of-the-art saliency models on these data sets. A model that is based on the invariants of the structure tensor and that combines very generic, sparse video representations with machine learning techniques outperforms the two reference models; performance is further improved for two data sets when the model is extended to a perceptually inspired colour space. Finally, a combined analysis of gaze variability and predictability shows that eye movements on the professionally made movies are the most coherent (due to implicit gaze-guidance strategies of the movie directors), yet the least predictable (presumably due to the frequent cuts). Our results highlight the need for standardized benchmarks to comparatively evaluate eye movement prediction algorithms.