Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefano Alletto is active.

Publication


Featured researches published by Stefano Alletto.


computer vision and pattern recognition | 2014

From Ego to Nos-Vision: Detecting Social Relationships in First-Person Views

Stefano Alletto; Giuseppe Serra; Simone Calderara; Francesco Solera; Rita Cucchiara

In this paper we present a novel approach to detect groups in ego-vision scenarios. People in the scene are tracked through the video sequence and their head pose and 3D location are estimated. Based on the concept of f-formation, we define with the orientation and distance an inherently social pairwise feature that describes the affinity of a pair of people in the scene. We apply a correlation clustering algorithm that merges pairs of people into socially related groups. Due to the very shifting nature of social interactions and the different meanings that orientations and distances can assume in different contexts, we learn the weight vector of the correlation clustering using Structural SVMs. We extensively test our approach on two publicly available datasets showing encouraging results when detecting groups from first-person camera views.


Pattern Recognition | 2015

Understanding social relationships in egocentric vision

Stefano Alletto; Giuseppe Serra; Simone Calderara; Rita Cucchiara

The understanding of mutual people interaction is a key component for recognizing people social behavior, but it strongly relies on a personal point of view resulting difficult to be a-priori modeled. We propose the adoption of the unique head mounted cameras first person perspective (ego-vision) to promptly detect people interaction in different social contexts. The proposal relies on a complete and reliable system that extracts peoples head pose combining landmarks and shape descriptors in a temporal smoothed HMM framework. Finally, interactions are detected through supervised clustering on mutual head orientation and people distances exploiting a structural learning framework that specifically adjusts the clustering measure according to a peculiar scenario. Our solution provides the flexibility to capture the interactions disregarding the number of individuals involved and their level of acquaintance in context with a variable degree of social involvement. The proposed system shows competitive performances on both publicly available ego-vision datasets and ad hoc benchmarks built with real life situations. HighlightsA head pose estimation method designed to work in ego-vision scenarios is provided.We define a 3D people localization method that works without any camera calibration.We estimate social groups with supervised correlation clustering and structural SVM.A tracking state-of-the-art evaluation applied to first person videos is provided.


international conference on pattern recognition | 2014

Head Pose Estimation in First-Person Camera Views

Stefano Alletto; Giuseppe Serra; Simone Calderara; Rita Cucchiara

In this paper we present a new method for head pose real-time estimation in ego-vision scenarios that is a key step in the understanding of social interactions. In order to robustly detect head under changing aspect ratio, scale and orientation we use and extend the Hough-Based Tracker which allows to follow simultaneously each subject in the scene. In an ego-vision scenario where a group interacts in a discussion, each subjects head orientation will be more likely to remain focused for a while on the person who has the floor. In order to encode this behavior we include a stateful Hidden Markov Model technique that enforces the predicted pose with the temporal coherence from a video sequence. We extensively test our approach on several indoor and outdoor ego-vision videos with high illumination variations showing its validity and outperforming other recent related state of the art approaches.


Sensors | 2016

Exploring Architectural Details Through a Wearable Egocentric Vision Device

Stefano Alletto; Davide Abati; Giuseppe Serra; Rita Cucchiara

Augmented user experiences in the cultural heritage domain are in increasing demand by the new digital native tourists of 21st century. In this paper, we propose a novel solution that aims at assisting the visitor during an outdoor tour of a cultural site using the unique first person perspective of wearable cameras. In particular, the approach exploits computer vision techniques to retrieve the details by proposing a robust descriptor based on the covariance of local features. Using a lightweight wearable board, the solution can localize the user with respect to the 3D point cloud of the historical landmark and provide him with information about the details at which he is currently looking. Experimental results validate the method both in terms of accuracy and computational effort. Furthermore, user evaluation based on real-world experiments shows that the proposal is deemed effective in enriching a cultural experience.


ieee intelligent vehicles symposium | 2017

Learning where to attend like a human driver

Andrea Palazzi; Francesco Solera; Simone Calderara; Stefano Alletto; Rita Cucchiara

Despite the advent of autonomous cars, its likely — at least in the near future — that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task. In this paper we study the dynamics of the drivers gaze and use it as a proxy to understand related attentional mechanisms. First, we build our analysis upon two questions: where and what the driver is looking at? Second, we model the drivers gaze by training a coarse-to-fine convolutional network on short sequences extracted from the DR(eye)VE dataset. Experimental comparison against different baselines reveal that the drivers gaze can indeed be learnt to some extent, despite i) being highly subjective and ii) having only one drivers gaze available for each sequence due to the irreproducibility of the scene. Eventually, we advocate for a new assisted driving paradigm which suggests to the driver, with no intervention, where she should focus her attention.


computer vision and pattern recognition | 2016

Body Part Based Re-Identification from an Egocentric Perspective

Federica Fergnani; Stefano Alletto; Giuseppe Serra; Joaquim De Mira; Rita Cucchiara

With the spread of wearable cameras, many consumer applications ranging from social tagging to video summarization would greatly benefit from people re-identification methods capable of dealing with the egocentric perspective. In this regard, first-person camera views present such a unique setting that traditional re-identification methods results in poor performance when applied to this scenario. In this paper, we present a simple but effective solution that overcomes the limitations of traditional approaches by dividing people images into meaningful body parts. Furthermore, by taking into account human gaze information concerning where people look at when trying to recognize a person, we devise a meaningful way to weight the contributions of different bodyparts. Experimental results validate the proposal on a novel egocentric re-identification dataset, the first of its kind, showing that the performance increases when compared to current state of the art on egocentric sequences is significant.


acm multimedia | 2016

Motion Segmentation using Visual and Bio-mechanical Features

Stefano Alletto; Giuseppe Serra; Rita Cucchiara

Nowadays, egocentric wearable devices are continuously increasing their widespread among both the academic community and the general public. For this reason, methods capable of automatically segment the video based on the recorder motion patterns are gaining attention. These devices present the unique opportunity of both high quality video recordings and multimodal sensors readings. Significant efforts have been made in either analyzing the video stream recorded by these devices or the bio-mechanical sensor information. So far, the integration between these two realities has not been fully addressed, and the real capabilities of these devices are not yet exploited. In this paper, we present a solution to segment a video sequence into motion activities by introducing a novel data fusion technique based on the covariance of visual and bio-mechanical features. The experimental results are promising and show that the proposed integration strategy outperforms the results achieved focusing solely on a single source.


Computer Vision and Image Understanding | 2017

Video registration in egocentric vision under day and night illumination changes

Stefano Alletto; Giuseppe Serra; Rita Cucchiara

SIFT keypoint matching fails under night/day illumination changes.Image-registration relies on SIFTs and fails under night/day illumination changes.We propose a novel embedding formulation that accounts for illumination changes.The embedding space includes temporal and spatial constraints.Video-registration method obtains state of the art results under lighting changes. With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e.the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences.


international conference on image analysis and processing | 2015

Egocentric Object Tracking: An Odometry-Based Solution

Stefano Alletto; Giuseppe Serra; Rita Cucchiara

Tracking objects moving around a person is one of the key steps in human visual augmentation: we could estimate their locations when they are out of our field of view, know their position, distance or velocity just to name a few possibilities. This is no easy task: in this paper, we show how current state-of-the-art visual tracking algorithms fail if challenged with a first-person sequence recorded from a wearable camera attached to a moving user. We propose an evaluation that highlights these algorithms’ limitations and, accordingly, develop a novel approach based on visual odometry and 3D localization that overcomes many issues typical of egocentric vision. We implement our algorithm on a wearable board and evaluate its robustness, showing in our preliminary experiments an increase in tracking performance of nearly 20% if compared to currently state-of-the-art techniques.


International Conference on Augmented Reality, Virtual Reality and Computer Graphics | 2016

Optimizing Image Registration for Interactive Applications

Riccardo Gasparini; Stefano Alletto; Giuseppe Serra; Rita Cucchiara

With the spread of wearable and mobile devices, the request for interactive augmented reality applications is in constant growth. Among the different possibilities, we focus on the cultural heritage domain where a key step in the development applications for augmented cultural experiences is to obtain a precise localization of the user, i.e. the 6 degree-of-freedom of the camera acquiring the images used by the application. Current state of the art perform this task by extracting local descriptors from a query and exhaustively matching them to a sparse 3D model of the environment. While this procedure obtains good localization performance, due to the vast search space involved in the retrieval of 2D-3D correspondences this is often not feasible in real-time and interactive environments. In this paper we hence propose to perform descriptor quantization to reduce the search space and employ multiple KD-Trees combined with a principal component analysis dimensionality reduction to enable an efficient search. We experimentally show that our solution can halve the computational requirements of the correspondence search with regard to the state of the art while maintaining similar accuracy levels.

Collaboration


Dive into the Stefano Alletto's collaboration.

Top Co-Authors

Avatar

Rita Cucchiara

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar

Giuseppe Serra

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar

Simone Calderara

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar

Davide Abati

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar

Francesco Solera

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar

Andrea Palazzi

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge