Lorenzo Porzi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lorenzo Porzi is active.

Explore More

Publication

Featured researches published by Lorenzo Porzi.

acm multimedia | 2013

A smart watch-based gesture recognition system for assisting people with visual impairments

Lorenzo Porzi; Stefano Messelodi; Carla Maria Modena; Elisa Ricci

Modern mobile devices provide several functionalities and new ones are being added at a breakneck pace. Unfortunately browsing the menu and accessing the functions of a mobile phone is not a trivial task for visual impaired users. Low vision people typically rely on screen readers and voice commands. However, depending on the situations, screen readers are not ideal because blind people may need their hearing for safety, and automatic recognition of voice commands is challenging in noisy environments. Novel smart watches technologies provides an interesting opportunity to design new forms of user interaction with mobile phones. We present our first works towards the realization of a system, based on the combination of a mobile phone and a smart watch for gesture control, for assisting low vision people during daily life activities. More specifically we propose a novel approach for gesture recognition which is based on global alignment kernels and is shown to be effective in the challenging scenario of user independent recognition. This method is used to build a gesture-based user interaction module and is embedded into a system targeted to visually impaired which will also integrate several other modules. We present two of them: one for identifying wet floor signs, the other for automatic recognition of predefined logos.

acm multimedia | 2015

Predicting and Understanding Urban Perception with Convolutional Neural Networks

Lorenzo Porzi; Samuel Rota Bulò; Bruno Lepri; Elisa Ricci

Cities visual appearance plays a central role in shaping human perception and response to the surrounding urban environment. For example, the visual qualities of urban spaces affect the psychological states of their inhabitants and can induce negative social outcomes. Hence, it becomes critically important to understand peoples perceptions and evaluations of urban spaces. Previous works have demonstrated that algorithms can be used to predict high level attributes of urban scenes (e.g. safety, attractiveness, uniqueness), accurately emulating human perception. In this paper we propose a novel approach for predicting the perceived safety of a scene from Google Street View Images. Opposite to previous works, we formulate the problem of learning to predict high level judgments as a ranking task and we employ a Convolutional Neural Network (CNN), significantly improving the accuracy of predictions over previous methods. Interestingly, the proposed CNN architecture relies on a novel pooling layer, which permits to automatically discover the most important areas of the images for predicting the concept of perceived safety. An extensive experimental evaluation, conducted on the publicly available Place Pulse dataset, demonstrates the advantages of the proposed approach over state-of-the-art methods.

IEEE Transactions on Multimedia | 2016

Learning Personalized Models for Facial Expression Analysis and Gesture Recognition

Gloria Zen; Lorenzo Porzi; Enver Sangineto; Elisa Ricci; Nicu Sebe

Facial expression and gesture recognition algorithms are key enabling technologies for human-computer interaction (HCI) systems. State of the art approaches for automatic detection of body movements and analyzing emotions from facial features heavily rely on advanced machine learning algorithms. Most of these methods are designed for the average user, but the assumption “one-size-fits-all” ignores diversity in cultural background, gender, ethnicity, and personal behavior, and limits their applicability in real-world scenarios. A possible solution is to build personalized interfaces, which practically implies learning person-specific classifiers and usually collecting a significant amount of labeled samples for each novel user. As data annotation is a tedious and time-consuming process, in this paper we present a framework for personalizing classification models which does not require labeled target data. Personalization is achieved by devising a novel transfer learning approach. Specifically, we propose a regression framework which exploits auxiliary (source) annotated data to learn the relation between person-specific sample distributions and parameters of the corresponding classifiers. Then, when considering a new target user, the classification model is computed by simply feeding the associated (unlabeled) sample distribution into the learned regression function. We evaluate the proposed approach in different applications: pain recognition and action unit detection using visual data and gestures classification using inertial measurements, demonstrating the generality of our method with respect to different input data types and basic classifiers. We also show the advantages of our approach in terms of accuracy and computational time both with respect to user-independent approaches and to previous personalization techniques.

workshop on environmental energy and structural monitoring systems | 2012

Visual-inertial tracking on Android for Augmented Reality applications

Lorenzo Porzi; Elisa Ricci; Thomas A. Ciarfuglia; Michele Zanin

Augmented Reality (AR) aims to enhance a persons vision of the real world with useful information about the surrounding environment. Amongst all the possible applications, AR systems can be very useful as visualization tools for structural and environmental monitoring. While the large majority of AR systems run on a laptop or on a head-mounted device, the advent of smartphones have created new opportunities. One of the most important functionality of an AR system is the ability of the device to self localize. This can be achieved through visual odometry, a very challenging task for smartphone. Indeed, on most of the available smartphone AR applications, self localization is achieved through GPS and/or inertial sensors. Hence, developing an AR system on a mobile phone also poses new challenges due to the limited amount of computational resources. In this paper we describe the development of a egomotion estimation algorithm for an Android smartphone. We also present an approach based on an Extended Kalman Filter for improving localization accuracy integrating the information from inertial sensors. The implemented solution achieves a localization accuracy comparable to the PC implementation while running on an Android device.

international conference on distributed smart cameras | 2014

Learning Contours for Automatic Annotations of Mountains Pictures on a Smartphone

Lorenzo Porzi; Samuel Rota Bulò; Paolo Valigi; Oswald Lanz; Elisa Ricci

In the last few years the ubiquity and computational power of modern smartphones, together with the significant progresses made on wireless broadband technologies, have made Augmented Reality (AR) technically feasible in consumer devices. In this paper we present an AR application for mobile phones to augment pictures of mountainous landscapes with geo-referenced data (e.g. the peaks names, positions of mountain dews or hiking tracks). Our application is based on a novel approach for image-to-world registration, which exploits different information collected with on-board sensors. First, GPS and inertial sensors are used to compute a rough estimate of device position and orientation, then visual cues are exploited to refine it. Specifically, a new learning-based contour detection method based on Random Ferns is used to extract visible mountain profiles from a picture, which are then aligned to synthetic ones obtained from Digital Elevation Models. This solution guarantees an increased accuracy with respect to previous works based only on sensors or on standard edge detection and filtering algorithms. An experimental evaluation conducted on a large set of manually aligned photographs demonstrates that the proposed registration method is both accurate in reconstructing camera position and orientation, and computationally efficient when implemented on a smartphone.

international conference on robotics and automation | 2017

Learning Depth-Aware Deep Representations for Robotic Perception

Lorenzo Porzi; Samuel Rota Bulò; Adrian Penate-Sanchez; Elisa Ricci; Francesc Moreno-Noguer

Exploiting RGB-D data by means of convolutional neural networks (CNNs) is at the core of a number of robotics applications, including object detection, scene semantic segmentation, and grasping. Most existing approaches, however, exploit RGB-D data by simply considering depth as an additional input channel for the network. In this paper we show that the performance of deep architectures can be boosted by introducing DaConv, a novel, general-purpose CNN block which exploits depth to learn scale-aware feature representations. We demonstrate the benefits of DaConv on a variety of robotics oriented tasks, involving affordance detection, object coordinate regression, and contour detection in RGB-D images. In each of these experiments we show the potential of the proposed block and how it can be readily integrated into existing CNN architectures.

international conference on 3d vision | 2015

Matchability Prediction for Full-Search Template Matching Algorithms

Adrian Penate-Sanchez; Lorenzo Porzi; Francesc Moreno-Noguer

While recent approaches have shown that it is possible to do template matching by exhaustively scanning the parameter space, the resulting algorithms are still quite demanding. In this paper we alleviate the computational load of these algorithms by proposing an efficient approach for predicting the match ability of a template, before it is actually performed. This avoids large amounts of unnecessary computations. We learn the match ability of templates by using dense convolutional neural network descriptors that do not require ad-hoc criteria to characterize a template. By using deep learning descriptions of patches we are able to predict match ability over the whole image quite reliably. We will also show how no specific training data is required to solve problems like panorama stitching in which you usually require data from the scene in question. Due to the highly parallelizable nature of this tasks we offer an efficient technique with a negligible computational cost at test time.

international conference on image analysis and processing | 2017

Just DIAL: DomaIn Alignment Layers for Unsupervised Domain Adaptation.

Fabio Maria Carlucci; Lorenzo Porzi; Barbara Caputo; Elisa Ricci; Samuel Rota Bulò

The empirical fact that classifiers, trained on given data collections, perform poorly when tested on data acquired in different settings is theoretically explained in domain adaptation through a shift among distributions of the source and target domains. Alleviating the domain shift problem, especially in the challenging setting where no labeled data are available for the target domain, is paramount for having visual recognition systems working in the wild. As the problem stems from a shift among distributions, intuitively one should try to align them. In the literature, this has resulted in a stream of works attempting to align the feature representations learned from the source and target domains. Here we take a different route. Rather than introducing regularization terms aiming to promote the alignment of the two representations, we act at the distribution level through the introduction of emph{DomaIn Alignment Layers} (DIAL), able to match the observed source and target data distributions to a reference one. Thorough experiments on three different public benchmarks we confirm the power of our approach.

acm multimedia | 2016

A Deeply-Supervised Deconvolutional Network for Horizon Line Detection

Lorenzo Porzi; Samuel Rota Bulò; Elisa Ricci

Automatic skyline detection from mountain pictures is an important task in many applications, such as web image retrieval, augmented reality and autonomous robot navigation. Recent works addressing the problem of Horizon Line Detection (HLD) demonstrated that learning-based boundary detection techniques are more accurate than traditional filtering methods. In this paper we introduce a novel approach for skyline detection, which adheres to a learning-based paradigm and exploits the representation power of deep architectures to improve the horizon line detection accuracy. Differently from previous works, we explore a novel deconvolutional architecture, which introduces intermediate levels of supervision to support the learning process. Our experiments, conducted on a publicly available dataset, confirm that the proposed method outperforms previous learning-based HLD techniques by reducing the number of spurious edge pixels.

international conference on image analysis and processing | 2015

i-Street: Detection, Identification, Augmentation of Street Plates in a Touristic Mobile Application

Stefano Messelodi; Carla Maria Modena; Lorenzo Porzi; Paul Chippendale

Smartphone technology with embedded cameras, sensors, and powerful computational resources have made mobile Augmented Reality possible. In this paper, we present i-Street, an Android touristic application whose aim is to detect, identify and read the street plates in a video flow and then to estimate relative pose in order to accurately augment them with virtual overlays. The system was successfully tested in the historical centre of Grenoble (France), proving to be robust to outdoor illumination conditions and to device pose variance. The average identification rate in realistic laboratory tests was about 82%, remaining cases were rejected with no false positives.

Explore More