Lorenzo Torresani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lorenzo Torresani is active.

Explore More

Publication

Featured researches published by Lorenzo Torresani.

international conference on computer vision | 2015

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran; Lubomir D. Bourdev; Rob Fergus; Lorenzo Torresani; Manohar Paluri

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets, 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets, and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors

Lorenzo Torresani; Aaron Hertzmann; Christoph Bregler

This paper describes methods for recovering time-varying shape and motion of nonrigid 3D objects from uncalibrated 2D point tracks. For example, given a video recording of a talking person, we would like to estimate the 3D shape of the face at each instant and learn a model of facial deformation. Time-varying shape is modeled as a rigid transformation combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed, and thus additional assumptions about deformations are required. We first suggest restricting shapes to lie within a low-dimensional subspace and describe estimation algorithms. However, this restriction alone is insufficient to constrain reconstruction. To address these problems, we propose a reconstruction method using a Probabilistic Principal Components Analysis (PPCA) shape model and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points. We then extend the model to represent temporal dynamics in object shape, allowing the algorithm to robustly handle severe cases of missing data.

european conference on computer vision | 2008

Feature Correspondence Via Graph Matching: Models and Global Optimization

Lorenzo Torresani; Vladimir Kolmogorov; Carsten Rother

In this paper we present a new approach for establishing correspondences between sparse image features related by an unknown non-rigid mapping and corrupted by clutter and occlusion, such as points extracted from a pair of images containing a human figure in distinct poses. We formulate this matching task as an energy minimization problem by defining a complex objective function of the appearance and the spatial arrangement of the features. Optimization of this energy is an instance of graph matching, which is in general a NP-hard problem. We describe a novel graph matching optimization technique, which we refer to as dual decomposition (DD), and demonstrate on a variety of examples that this method outperforms existing graph matching algorithms. In the majority of our examples DD is able to find the global minimum within a minute. The ability to globally optimize the objective allows us to accurately learn the parameters of our matching model from training examples. We show on several matching tasks that our learned model yields results superior to those of state-of-the-art methods.

computer vision and pattern recognition | 2001

Tracking and modeling non-rigid objects with rank constraints

Lorenzo Torresani; Danny B. Yang; Eugene J. Alexander; Christoph Bregler

This paper presents a novel solution for flow-based tracking and 3D reconstruction of deforming objects in monocular image sequences. A non-rigid 3D object undergoing rotation and deformation can be effectively approximated using a linear combination of 3D basis shapes. This puts a bound on the rank of the tracking matrix. The rank constraint is used to achieve robust and precise low-level optical flow estimation without prior knowledge of the 3D shape of the object. The bound on the rank is also exploited to handle occlusion at the tracking level leading to the possibility of recovering the complete trajectories of occluded/disoccluded points. Following the same low-rank principle, the resulting flow matrix can be factored to get the 3D pose, configuration coefficients, and 3D basis shapes. The flow matrix is factored in an iterative manner, looping between solving for pose, configuration, and basis shapes. The flow-based tracking is applied to several video sequences and provides the input to the 3D non-rigid reconstruction task. Additional results on synthetic data and comparisons to ground truth complete the experiments.

computer vision and pattern recognition | 2015

DeepEdge: A multi-scale bifurcated deep network for top-down contour detection

Gedas Bertasius; Jianbo Shi; Lorenzo Torresani

Contour detection has been a fundamental component in many image segmentation and object detection systems. Most previous work utilizes low-level features such as texture or saliency to detect contours and then use them as cues for a higher-level task such as object detection. However, we claim that recognizing objects and predicting contours are two mutually related tasks. Contrary to traditional approaches, we show that we can invert the commonly established pipeline: instead of detecting contours with low-level cues for a higher-level recognition task, we exploit object-related features as high-level cues for contour detection.

international conference on computer vision | 2009

Weakly supervised discriminative localization and classification: a joint learning process

Minh Hoai Nguyen; Lorenzo Torresani; Fernando De la Torre; Carsten Rother

Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manually annotated with masks or bounding boxes. The reliance on time-consuming human labeling effectively limits the application of these methods to problems involving very few categories. Furthermore, the human selection of the masks introduces arbitrary biases (e.g. in terms of window size and location) which may be suboptimal for classification. In this paper we propose a novel method for learning a discriminative subwindow classifier from examples annotated with binary labels indicating the presence of an object or action of interest, but not its location. During training, our approach simultaneously localizes the instances of the positive class and learns a subwindow SVM to recognize them. We extend our method to classification of time series by presenting an algorithm that localizes the most discriminative set of temporal segments in the signal. We evaluate our approach on several datasets for object and action recognition and show that it achieves results similar and in many cases superior to those obtained with full supervision.

NATO ASI | 1996

2D Deformable Models for Visual Speech Analysis

Tarcisio Coianiz; Lorenzo Torresani; Bruno Caprile

A scheme for describing the mouth of a speaker in color image sequences is proposed which is based on a parametric 2D model of the lips. Key information for the parameters estimation is extracted from chrominance analysis. A detailed description of the techniques employed is given, and some preliminary results are shown.

european conference on computer vision | 2002

Space-Time Tracking

Lorenzo Torresani; Christoph Bregler

We propose a new tracking technique that is able to capture non-rigid motion by exploiting a space-time rank constraint. Most tracking methods use a prior model in order to deal with challenging local features. The model usually has to be trained on carefully hand-labeled example data before the tracking algorithm can be used. Our new model-free tracking technique can overcome such limitations. This can be achieved in redefining the problem. Instead of first training a model and then tracking the model parameters, we are able to derive trajectory constraints first, and then estimate the model. This reduces the search space significantly and allows for a better feature disambiguation that would not be possible with traditional trackers. We demonstrate that sampling in the trajectory space, instead of in the space of shape configurations, allows us to track challenging footage without use of prior models.

workshop on mobile computing systems and applications | 2012

WalkSafe: a pedestrian safety app for mobile phone users who walk and talk while crossing roads

Tianyu Wang; Giuseppe Cardone; Antonio Corradi; Lorenzo Torresani; Andrew T. Campbell

Research in social science has shown that mobile phone conversations distract users, presenting a significant impact to pedestrian safety; for example, a mobile phone user deep in conversation while crossing a street is generally more at risk than other pedestrians not engaged in such behavior. We propose WalkSafe, an Android smartphone application that aids people that walk and talk, improving the safety of pedestrian mobile phone users. WalkSafe uses the back camera of the mobile phone to detect vehicles approaching the user, alerting the user of a potentially unsafe situation; more specifically WalkSafe i) uses machine learning algorithms implemented on the phone to detect the front views and back views of moving vehicles and ii) exploits phone APIs to save energy by running the vehicle detection algorithm only during active calls. We present our initial design, implementation and evaluation of the WalkSafe App that is capable of real-time detection of the front and back views of cars, indicating cars are approaching or moving away from the user, respectively. WalkSafe is implemented on Android phones and alerts the user of unsafe conditions using sound and vibration from the phone. WalkSafe is available on Android Market.

computer vision and pattern recognition | 2012

Meta-class features for large-scale object categorization on a budget

Alessandro Bergamo; Lorenzo Torresani

In this paper we introduce a novel image descriptor enabling accurate object categorization even with linear models. Akin to the popular attribute descriptors, our feature vector comprises the outputs of a set of classifiers evaluated on the image. However, unlike traditional attributes which represent hand-selected object classes and predefined visual properties, our features are learned automatically and correspond to “abstract” categories, which we name meta-classes. Each meta-class is a super-category obtained by grouping a set of object classes such that, collectively, they are easy to distinguish from other sets of categories. By using “learnability” of the meta-classes as criterion for feature generation, we obtain a set of attributes that encode general visual properties shared by multiple object classes and that are effective in describing and recognizing even novel categories, i.e., classes not present in the training set. We demonstrate that simple linear SVMs trained on our meta-class descriptor significantly outperform the best known classifier on the Caltech256 benchmark. We also present results on the 2010 ImageNet Challenge database where our system produces results approaching those of the best systems, but at a much lower computational cost.

Explore More