Roberto Javier López-Sastre

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Javier López-Sastre is active.

Explore More

Publication

Featured researches published by Roberto Javier López-Sastre.

international conference on computer vision | 2011

Deformable part models revisited: A performance evaluation for object category pose estimation

Roberto Javier López-Sastre; Tinne Tuytelaars; Silvio Savarese

Deformable Part Models (DPMs) as introduced by Felzenszwalb et al. have shown remarkably good results for category-level object detection. In this paper, we explore whether they are also well suited for the related problem of category-level object pose estimation. To this end, we extend the original DPM so as to improve its accuracy in object category pose estimation and design novel and more effective learning strategies. We benchmark the methods using various publicly available data sets. Provided that the training data is sufficiently balanced and clean, our method outperforms the state-of-the-art.

european conference on computer vision | 2016

Towards Perspective-Free Object Counting with Deep Learning

Daniel Oñoro-Rubio; Roberto Javier López-Sastre

In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance.

eurographics | 2012

SHREC'12 track: generic 3D shape retrieval

Bo Li; Afzal Godil; Masaki Aono; X. Bai; Takahiko Furuya; L. Li; Roberto Javier López-Sastre; Henry Johan; Ryutarou Ohbuchi; Carolina Redondo-Cabrera; Atsushi Tatsuma; Tomohiro Yanagimachi; S. Zhang

Generic 3D shape retrieval is a fundamental research area in the field of content-based 3D model retrieval. The aim of this track is to measure and compare the performance of generic 3D shape retrieval methods implemented by different participants over the world. The track is based on a new generic 3D shape benchmark, which contains 1200 triangle meshes that are equally classified into 60 categories. In this track, 16 runs have been submitted by 5 groups and their retrieval accuracies were evaluated using 7 commonly used performance metrics.

computer vision and pattern recognition | 2012

SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition

Carolina Redondo-Cabrera; Roberto Javier López-Sastre; Javier Acevedo-Rodríguez; Saturnino Maldonado-Bascón

This paper proposes a novel approach to recognize object categories in point clouds. By quantizing 3D SURF local descriptors, computed on partial 3D shapes extracted from the point clouds, a vocabulary of 3D visual words is generated. Using this codebook, we build a Bag-of-Words representation in 3D, which is used in conjunction with a SVM classification machinery. We also introduce the 3D Spatial Pyramid Matching Kernel, which works by partitioning a working volume into fine sub-volumes, and computing a hierarchical weighted sum of histogram intersections at each level of the pyramid structure. With the aim of increasing both the classification accuracy and the computational efficiency of the kernel, we propose selective hierarchical volume decomposition strategies, based on representative and discriminative (sub-)volume selection processes, which drastically reduce the pyramid to consider. Results on the challenging large-scale RGB-D object dataset show that our kernels significantly outperform the state-of-the-art results by using a single 3D shape feature type extracted from individual depth images.

Computer Vision and Image Understanding | 2011

Towards a more discriminative and semantic visual vocabulary

Roberto Javier López-Sastre; Tinne Tuytelaars; Francisco Javier Acevedo-Rodríguez; Saturnino Maldonado-Bascón

We present a novel method for constructing a visual vocabulary that takes into account the class labels of images, thus resulting in better recognition performance and more efficient learning. Our method consists of two stages: Cluster Precision Maximisation (CPM) and Adaptive Refinement. In the first stage, a Reciprocal Nearest Neighbours (RNN) clustering algorithm is guided towards class representative visual words by maximising a new cluster precision criterion. As we are able to optimise the vocabulary without the need for expensive cross-validation, the overall training time is significantly reduced without a negative impact on the results. Next, an adaptive threshold refinement scheme is proposed with the aim of increasing vocabulary compactness while at the same time improving the recognition rate and further increasing the representativeness of the visual words for category-level object recognition. This is a correlation clustering based approach, which works as a meta-clustering and optimises the cut-off threshold for each cluster separately. In the experiments we analyse the recognition rate of different vocabularies for a subset of the Caltech 101 dataset, showing how RNN in combination with CPM selects the optimal codebooks, and how the clustering refinement step succeeds in further increasing the recognition rate.

Expert Systems With Applications | 2010

A decision support system for the automatic management of keep-clear signs based on support vector machines and geographic information systems

Sergio Lafuente-Arroyo; Sancho Salcedo-Sanz; Saturnino Maldonado-Bascón; José Antonio Portilla-Figueras; Roberto Javier López-Sastre

This paper presents a decision support system for automatic keep-clear signs management. The system consists of several modules. First of all, an acquisition module obtains images using a vehicle equipped with two recording cameras. A recognition module, which is based on Support Vector Machines (SVMs), analyzes each image and decides if there is a keep-clear sign in it. The images with keep-clear signs are included into a Geographical Information System (GIS) database. Finally in the management module, the data in the GIS are compared with the council database in order to decide actions such as repairing or reposition of signs, detection of possible frauds etc. We present the first tests of the system in a Spanish city (Meco, Madrid), where the systems is being tested for its application in the near future.

international work-conference on the interplay between natural and artificial computation | 2007

Automatic Control of Video Surveillance Camera Sabotage

Pedro Gil-Jiménez; Roberto Javier López-Sastre; Philip Siegmann; Javier Acevedo-Rodríguez; Saturnino Maldonado-Bascón

One of the main characteristics of a video surveillance system is its reliability. To this end, it is needed that the images captured by the videocameras are an accurate representation of the scene. Unfortunately, some activities can make the proper operation of the cameras fail, distorting in some way the images which are going to be processed. When these activities are voluntary, they are usually called sabotage, which include partial o total occlusion of the lens, image defocus or change of the field of view. In this paper, we will analyze the different kinds of sabotage that could be done to a video surveillance system, and some algorithms to detect these inconveniences will be developed. The experimental results show good performance in the detection of sabotage situations, while keeping a very low false alarm probability.

Computers & Graphics | 2013

Evaluating 3D spatial pyramids for classifying 3D shapes

Roberto Javier López-Sastre; A. García-Fuertes; Carolina Redondo-Cabrera; Francisco Javier Acevedo-Rodríguez; Saturnino Maldonado-Bascón

This paper focuses on the problem of 3D shape categorization. For a given set of training 3D shapes, a 3D shape recognition system must be able to predict the class label for a test 3D shape. We introduce a novel discriminative approach for recognizing 3D shape categories which is based on a 3D Spatial Pyramid (3DSP) decomposition. 3D local descriptors computed on the 3D shapes have to be extracted, to be then quantized in order to build a 3D visual vocabulary for characterizing the shapes. Our approach repeatedly subdivides a cube inscribed in the 3D shape, and computes a weighted sum of histogram of visual word occurrences at increasingly fine sub-volumes. Additionally, we integrate this pyramidal representation with different types of kernels, such as the Histogram Intersection Kernel and the extended Gaussian Kernel with ? 2 distance. Finally, we perform a thorough evaluation on different publicly available datasets, defining an elaborate experimental setup to be used for establishing further comparisons among different 3D shape categorization methods. Graphical abstractDisplay Omitted HighlightsWe introduce the 3D spatial pyramid representation for 3D shape categorization.This pyramid representation repeatedly subdivides a cube inscribed in the 3D shape.Then, a weighted sum of histogram of visual word occurrences is computed.Different types of kernels are integrated into the approach for training a SVM.Results on publicly available benchmarks have been reported.

IEEE Transactions on Instrumentation and Measurement | 2008

Fundaments in Luminance and Retroreflectivity Measurements of Vertical Traffic Signs Using a Color Digital Camera

Philip Siegmann; Roberto Javier López-Sastre; Pedro Gil-Jiménez; Sergio Lafuente-Arroyo; Saturnino Maldonado-Bascón

This paper is a study of the influences of the different parameters which affect the photometric evaluation of light-emitting surfaces (due to reflection or self-emission) when a conventional color digital camera is used. The overall purpose of this paper is to evaluate the luminance and the reflectivity of the vertical traffic sign with the camera in order to provide an automatic recognition of deteriorated reflective sheeting material of which the traffic signs were made. This paper describes how the A/D converter output signal given by a pixel of the digital camera can be related to the luminance and the reflectivity of the corresponding surface element whose image is formed on a pixel. Thus, each surface element of the traffic signs surface can be separately evaluated. By photometrically calibrating the camera, we have been able to prove this relationship in our experiments.

british machine vision conference | 2014

All together now: Simultaneous Detection and Continuous Pose Estimation using a Hough Forest with Probabilistic Locally Enhanced Voting.

Carolina Redondo-Cabrera; Roberto Javier López-Sastre; Tinne Tuytelaars

Simultaneous object detection and pose estimation is a challenging task in computer vision. In this paper, we tackle the problem using Hough Forests. Unlike most methods in the literature, we focus on the problem of continuous pose estimation. Moreover, we aim for a probabilistic output. We first introduce a new pose purity criterion for splitting a node during the forest training. Second, we propose the concept of Probabilistic Locally Enhanced Voting (PLEV), a novel regression strategy which consists in modulating the regression with a kernel density estimation to consolidate the votes in a local region near the maxima detected in the Hough space. And third, we propose a pose-based backprojection strategy to improve the bounding box estimation. With these three additions, we show that our Hough Forest can achieve state-of-the-art results without needing 3D CAD models. We present a quite versatile method, showing results for different categories (cars as well as faces) and for different modalities (RGB as well as depth images).

Explore More