Ludovico Minto | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ludovico Minto is active.

Explore More

Publication

Featured researches published by Ludovico Minto.

Archive | 2016

Time-of-Flight and Structured Light Depth Cameras

Pietro Zanuttigh; Giulio Marin; Carlo Dal Mutto; Fabio Dominio; Ludovico Minto; Guido M. Cortelazzo

This book provides a comprehensive overview of the key technologies and applications related to new cameras that have brought 3D data acquisition to the mass market. It covers both the theoretical principles behind the acquisition devices and the practical implementation aspects of the computer vision algorithms needed for the various applications. Real data examples are used in order to show the performances of the various algorithms. The performance and limitations of the depth camera technology are explored, along with an extensive review of the most effective methods for addressing challenges in common applications. Applications covered in specific detail include scene segmentation, 3D scene reconstruction, human pose estimation and tracking and gesture recognition. This book offers students, practitioners and researchers the tools necessary to explore the potential uses of depth data in light of the expanding number of devices available for sale. It explores the impact of these devices on the rapidly growing field of depth-based computer vision.

eurographics, italian chapter conference | 2015

Exploiting Silhouette Descriptors and Synthetic Data for Hand Gesture Recognition

Alvise Memo; Ludovico Minto; Pietro Zanuttigh

This paper proposes a novel real-time hand gesture recognition scheme explicitly targeted to depth data. The hand silhouette is firstly extracted from the acquired data and then two ad-hoc feature sets are computed from this representation. The first is based on the local curvature of the hand contour, while the second represents the thickness of the hand region close to each contour point using a distance transform. The two feature sets are rearranged in a three dimensional data structure representing the values of the two features at each contour location and then this representation is fed into a multi-class Support Vector Machine. The classifier is trained on a synthetic dataset generated with an ad-hoc rendering system developed for the purposes of this work. This approach allows a fast construction of the training set without the need of manually acquiring large training datasets. Experimental results on real data show how the approach is able to achieve a 90% accuracy on a typical hand gesture recognition dataset with very limited computational resources.

european conference on computer vision | 2016

Scene Segmentation Driven by Deep Learning and Surface Fitting

Ludovico Minto; Giampaolo Pagnutti; Pietro Zanuttigh

This paper proposes a joint color and depth segmentation scheme exploiting together geometrical clues and a learning stage. The approach starts from an initial over-segmentation based on spectral clustering. The input data is also fed to a Convolutional Neural Network (CNN) thus producing a per-pixel descriptor vector for each scene sample. An iterative merging procedure is then used to recombine the segments into the regions corresponding to the various objects and surfaces. The proposed algorithm starts by considering all the adjacent segments and computing a similarity metric according to the CNN features. The couples of segments with higher similarity are considered for merging. Finally the algorithm uses a NURBS surface fitting scheme on the segments in order to understand if the selected couples correspond to a single surface. The comparison with state-of-the-art methods shows how the proposed method provides an accurate and reliable scene segmentation.

Archive | 2016

Operating Principles of Structured Light Depth Cameras

Pietro Zanuttigh; Giulio Marin; Carlo Dal Mutto; Fabio Dominio; Ludovico Minto; Guido M. Cortelazzo

This chapter uses the camera virtualization approach to offer a unified treatment of various structured light depth cameras. Readers will learn how these cameras can differ in characteristics like number of cameras, baseline, position of the projector, and projected patterns. We also present fundamentals of illuminator design, the most critical component of structured light depth cameras, using the concept of uniqueness, and explore its implementation by wavelength, range, time, and space multiplexing. Various non-idealities of structured light depth cameras are also explained. In the last part of the chapter, theoretical ideas are applied to an analysis of the most popular structured light depth camera products in the market, like the Primesense Camera used in the Kinect v1 and other more recent products, such as the Intel RealSense F200 and R200.

international joint conference on computer vision imaging and computer graphics theory and applications | 2018

Deep Learning for 3D Shape Classification based on Volumetric Density and Surface Approximation Clues.

Ludovico Minto; Pietro Zanuttigh; Giampaolo Pagnutti

This paper proposes a novel approach for the classification of 3D shapes exploiting surface and volumetric clues inside a deep learning framework. The proposed algorithm uses three different data representations. The first is a set of depth maps obtained by rendering the 3D object. The second is a novel volumetric representation obtained by counting the number of filled voxels along each direction. Finally NURBS surfaces are fitted over the 3D object and surface curvature parameters are selected as the third representation. All the three data representations are fed to a multi-branch Convolutional Neural Network. Each branch processes a different data source and produces a feature vector by using convolutional layers of progressively reduced resolution. The extracted feature vectors are fed to a linear classifier that combines the outputs in order to get the final predictions. Experimental results on the ModelNet dataset show that the proposed approach is able to obtain a state-of-the-art performance.

Iet Computer Vision | 2017

Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting

Giampaolo Pagnutti; Ludovico Minto; Pietro Zanuttigh

We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over-segmentation is performed using spectral clustering and a set of non-uniform rational B-spline surfaces is fitted on the extracted segments. Then a convolutional neural network (CNN) receives in input colour and geometry data together with surface fitting parameters. The network is made of nine convolutional stages followed by a softmax classifier and produces a vector of descriptors for each sample. In the next step, an iterative merging algorithm recombines the output of the over-segmentation into larger regions matching the various elements of the scene. The couples of adjacent segments with higher similarity according to the CNN features are candidate to be merged and the surface fitting accuracy is used to detect which couples of segments belong to the same surface. Finally, a set of labelled segments is obtained by combining the segmentation output with the descriptors from the CNN. Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling.

Archive | 2016

Scene Segmentation Assisted by Depth Data

Pietro Zanuttigh; Giulio Marin; Carlo Dal Mutto; Fabio Dominio; Ludovico Minto; Guido M. Cortelazzo

Segmentation, or detecting scene elements within an image, can be drastically simplified by combining depth and color data. This approach delivers segmentation tools which outperform techniques based on color alone. This chapter shows how consumer depth camera data can be used for three different tasks. The first is video matting, the separation of foreground objects from the background. The second is scene segmentation, the partitioning of color images and depth maps into different regions corresponding to scene elements. The third is semantic segmentation, the task of segmenting the framed scene and associating each segment to a specific category of object. We present various algorithms and methodologies for both single frame, color, and depth video sequences.

Archive | 2016

3D Scene Reconstruction from Depth Camera Data

Pietro Zanuttigh; Giulio Marin; Carlo Dal Mutto; Fabio Dominio; Ludovico Minto; Guido M. Cortelazzo

Obtaining a 3D model of the world around us is a challenging task but consumer depth cameras are poised to make 3D reconstruction available to everyone. The distinct characteristics of data provided by these cameras require a rethinking of the algorithms and procedures for 3D reconstruction, since 3D modeling from consumer depth camera data requires the registration and fusion of many noisy frames. In this chapter, we discuss several approaches targeted to depth cameras. Additionally, solutions for pre-processing, pairwise, and global registration as well as fusion of views are presented, including the KinectFusion approach and its extension to dynamic scenes. The related simultaneous localization and mapping problem (RGB-D SLAM) is also briefly touched upon.

Archive | 2016

Face Detection Coupling Texture, Color and Depth Data

Loris Nanni; Alessandra Lumini; Ludovico Minto; Pietro Zanuttigh

In this chapter, we propose an ensemble of face detectors for maximizing the number of true positives found by the system. Unfortunately, combining different face detectors increases both the number of true positives and false positives. To overcome this difficulty, several methods for reducing false positives are tested and proposed. The different filtering steps are based on the characteristics of the depth map related to the subwindows of the whole image that contain the candidate faces. The most simple and easiest criteria to use, for instance, is to filter the candidate face region by considering its size in metric units.

Archive | 2016

Human Pose Estimation and Tracking

Pietro Zanuttigh; Giulio Marin; Carlo Dal Mutto; Fabio Dominio; Ludovico Minto; Guido M. Cortelazzo

Human pose estimation and tracking is one of the most intriguing yet challenging applications of consumer depth cameras. After an overview of common human hand and body models, we introduce approaches for pose recovery from a single frame, starting from the popular method based on Random Decision Forests proposed by Shotton et al. and used for the Microsoft Kinect. Various pose tracking approaches are then presented to recover the human pose configuration over time from a sequence of frames. We discuss some of the main solutions available today, including algorithms based on numerical optimization methods, filtering approaches, and recent advances based on Markov Random Fields.

Explore More