Ricardo Silveira Cabral

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricardo Silveira Cabral is active.

Explore More

Publication

Featured researches published by Ricardo Silveira Cabral.

european conference on computer vision | 2012

Robust regression

Dong Huang; Ricardo Silveira Cabral; Fernando De la Torre

Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features (X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015

Matrix Completion for Weakly-Supervised Multi-Label Image Classification

Ricardo Silveira Cabral; Fernando De la Torre; João Paulo Costeira; Alexandre Bernardino

In the last few years, image classification has become an incredibly active research topic, with widespread applications. Most methods for visual recognition are fully supervised, as they make use of bounding boxes or pixelwise segmentations to locate objects of interest. However, this type of manual labeling is time consuming, error prone and it has been shown that manual segmentations are not necessarily the optimal spatial enclosure for object classifiers. This paper proposes a weakly-supervised system for multi-label image classification. In this setting, training images are annotated with a set of keywords describing their contents, but the visual concepts are not explicitly segmented in the images. We formulate the weakly-supervised image classification as a low-rank matrix completion problem. Compared to previous work, our proposed framework has three advantages: (1) Unlike existing solutions based on multiple-instance learning methods, our model is convex. We propose two alternative algorithms for matrix completion specifically tailored to visual data, and prove their convergence. (2) Unlike existing discriminative methods, our algorithm is robust to labeling errors, background noise and partial occlusions. (3) Our method can potentially be used for semantic segmentation. Experimental validation on several data sets shows that our method outperforms state-of-the-art classification algorithms, while effectively capturing each class appearance.

computer vision and pattern recognition | 2014

Piecewise Planar and Compact Floorplan Reconstruction from Images

Ricardo Silveira Cabral; Yasutaka Furukawa

This paper presents a system to reconstruct piecewise planar and compact floorplans from images, which are then converted to high quality texture-mapped models for free- viewpoint visualization. There are two main challenges in image-based floorplan reconstruction. The first is the lack of 3D information that can be extracted from images by Structure from Motion and Multi-View Stereo, as indoor scenes abound with non-diffuse and homogeneous surfaces plus clutter. The second challenge is the need of a sophisticated regularization technique that enforces piecewise planarity, to suppress clutter and yield high quality texture mapped models. Our technical contributions are twofold. First, we propose a novel structure classification technique to classify each pixel to three regions (floor, ceiling, and wall), which provide 3D cues even from a single image. Second, we cast floorplan reconstruction as a shortest path problem on a specially crafted graph, which enables us to enforce piecewise planarity. Besides producing compact piecewise planar models, this formulation allows us to directly control the number of vertices (i.e., density) of the output mesh. We evaluate our system on real indoor scenes, and show that our texture mapped mesh models provide compelling free-viewpoint visualization experiences, when compared against the state-of-the-art and ground truth.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Robust Regression

Dong Huang; Ricardo Silveira Cabral; Fernando De la Torre

Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR.

asian conference on computer vision | 2014

Multi-label Discriminative Weakly-Supervised Human Activity Recognition and Localization

Ehsan Adeli Mosabbeb; Ricardo Silveira Cabral; Fernando De la Torre; Mahmood Fathy

Activity recognition in video has become increasingly important due to its many applications ranging from in-home elder care, surveillance, human computer interaction to automatic sports commentary. To date, most approaches to video rely on fully supervised settings that require time consuming and error prone manual labeling. Moreover, existing supervised approaches are typically tailored for classification, not detection problems (the spatial and temporal support of the action has to be detected). Recently, weakly-supervised learning (WSL) approaches were able to learn discriminative classifiers while localizing the action in space and/or time using weak labels. However, existing approaches for WSL provide coarse localization in terms of spatial regions or spatio-temporal volumes. Moreover, it is unclear how to extend current approaches to the multi-label case that is common in practical applications. This paper proposes a matrix completion approach to the problem of WSL for multi-label learning for video. Our approach localizes non-rectangular spatio-temporal discriminative regions that are inferred by clustering regions of common texture and motion features. We illustrate how our approach improves existing WSL and supervised learning techniques in three standard databases: Hollywood, UCF sports, and MSR-II.

IEEE Transactions on Image Processing | 2016

Feature and Region Selection for Visual Learning

Ji Zhao; Liantao Wang; Ricardo Silveira Cabral; Fernando De la Torre

Visual learning problems, such as object classification and action recognition, are typically approached using extensions of the popular bag-of-words (BoWs) model. Despite its great success, it is unclear what visual features the BoW model is learning. Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: 1) our approach accommodates non-linear additive kernels, such as the popular χ2 and intersection kernel; 2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; 3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; and 4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach.

computer vision and pattern recognition | 2016

Motion from Structure (MfS): Searching for 3D Objects in Cluttered Point Trajectories

Jayakorn Vongkulbhisal; Ricardo Silveira Cabral; Fernando De la Torre; João Paulo Costeira

Object detection has been a long standing problem in computer vision, and state-of-the-art approaches rely on the use of sophisticated features and/or classifiers. However, these learning-based approaches heavily depend on the quality and quantity of labeled data, and do not generalize well to extreme poses or textureless objects. In this work, we explore the use of 3D shape models to detect objects in videos in an unsupervised manner. We call this problem Motion from Structure (MfS): given a set of point trajectories and a 3D model of the object of interest, find a subset of trajectories that correspond to the 3D model and estimate its alignment (i.e., compute the motion matrix). MfS is related to Structure from Motion (SfM) and motion segmentation problems: unlike SfM, the structure of the object is known but the correspondence between the trajectories and the object is unknown, unlike motion segmentation, the MfS problem incorporates 3D structure, providing robustness to tracking mismatches and outliers. Experiments illustrate how our MfS algorithm outperforms alternative approaches in both synthetic data and real videos extracted from YouTube.

neural information processing systems | 2011