Francesc Moreno-Noguer

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francesc Moreno-Noguer is active.

Explore More

Publication

Featured researches published by Francesc Moreno-Noguer.

International Journal of Computer Vision | 2009

EPnP: An Accurate O(n) Solution to the PnP Problem

Vincent Lepetit; Francesc Moreno-Noguer; Pascal Fua

We propose a non-iterative solution to the PnP problem—the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences—whose computational complexity grows linearly with n. This is in contrast to state-of-the-art methods that are O(n5) or even O(n8), without being more accurate. Our method is applicable for all n≥4 and handles properly both planar and non-planar configurations. Our central idea is to express the n 3D points as a weighted sum of four virtual control points. The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix and solving a small constant number of quadratic equations to pick the right weights. Furthermore, if maximal precision is required, the output of the closed-form solution can be used to initialize a Gauss-Newton scheme, which improves accuracy with negligible amount of additional time. The advantages of our method are demonstrated by thorough testing on both synthetic and real-data.

international conference on computer vision | 2007

Accurate Non-Iterative O(n) Solution to the PnP Problem

Francesc Moreno-Noguer; Vincent Lepetit; Pascal Fua

We propose a non-iterative solution to the PnP problem-the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences-whose computational complexity grows linearly with n. This is in contrast to state-of-the-art methods that are O(n5) or even O(n8), without being more accurate. Our method is applicable for all nges4 and handles properly both planar and non-planar configurations. Our central idea is to express the n 3D points as a weighted sum of four virtual control points. The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12 x12 matrix and solving a small constant number of quadratic equations to pick the right weights. The advantages of our method are demonstrated by thorough testing on both synthetic and real-data.

international conference on computer vision | 2015

Discriminative Learning of Deep Convolutional Feature Point Descriptors

Edgar Simo-Serra; Eduard Trulls; Luis Ferraz; Iasonas Kokkinos; Pascal Fua; Francesc Moreno-Noguer

Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, such as correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. By using the L2 distance during both training and testing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT. We demonstrate consistent performance gains over the state of the art, and generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes. Our descriptors are efficient to compute and amenable to modern GPUs, and are publicly available.

international conference on computer graphics and interactive techniques | 2007

Active refocusing of images and videos

Francesc Moreno-Noguer; Peter N. Belhumeur; Shree K. Nayar

We present a system for refocusing images and videos of dynamic scenes using a novel, single-view depth estimation method. Our method for obtaining depth is based on the defocus of a sparse set of dots projected onto the scene. In contrast to other active illumination techniques, the projected pattern of dots can be removed from each captured image and its brightness easily controlled in order to avoid under- or over-exposure. The depths corresponding to the projected dots and a color segmentation of the image are used to compute an approximate depth map of the scene with clean region boundaries. The depth map is used to refocus the acquired image after the dots are removed, simulating realistic depth of field effects. Experiments on a wide variety of scenes, including close-ups and live action, demonstrate the effectiveness of our method.

european conference on computer vision | 2008

Closed-Form Solution to Non-rigid 3D Surface Registration

Mathieu Salzmann; Francesc Moreno-Noguer; Vincent Lepetit; Pascal Fua

We present a closed-form solution to the problem of recovering the 3D shape of a non-rigid inelastic surface from 3D-to-2D correspondences. This lets us detect and reconstruct such a surface by matching individual images against a reference configuration, which is in contrast to all existing approaches that require initial shape estimates and track deformations from image to image.

computer vision and pattern recognition | 2012

Single image 3D human pose estimation from noisy observations

Edgar Simo-Serra; Arnau Ramisa; Guillem Alenyà; Carme Torras; Francesc Moreno-Noguer

Markerless 3D human pose detection from a single image is a severely underconstrained problem because different 3D poses can have similar image projections. In order to handle this ambiguity, current approaches rely on prior shape models that can only be correctly adjusted if 2D image features are accurately detected. Unfortunately, although current 2D part detector algorithms have shown promising results, they are not yet accurate enough to guarantee a complete disambiguation of the 3D inferred shape. In this paper, we introduce a novel approach for estimating 3D human pose even when observations are noisy. We propose a stochastic sampling strategy to propagate the noise from the image plane to the shape space. This provides a set of ambiguous 3D shapes, which are virtually undistinguishable from their image projections. Disambiguation is then achieved by imposing kinematic constraints that guarantee the resulting pose resembles a 3D human shape. We validate the method on a variety of situations in which state-of-the-art 2D detectors yield either inaccurate estimations or partly miss some of the body parts.

computer vision and pattern recognition | 2013

A Joint Model for 2D and 3D Pose Estimation from a Single Image

Edgar Simo-Serra; Ariadna Quattoni; Carme Torras; Francesc Moreno-Noguer

We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.

international conference on robotics and automation | 2012

Using depth and appearance features for informed robot grasping of highly wrinkled clothes

Arnau Ramisa; Guillem Alenyà; Francesc Moreno-Noguer; Carme Torras

Detecting grasping points is a key problem in cloth manipulation. Most current approaches follow a multiple re-grasp strategy for this purpose, in which clothes are sequentially grasped from different points until one of them yields to a desired configuration. In this paper, by contrast, we circumvent the need for multiple re-graspings by building a robust detector that identifies the grasping points, generally in one single step, even when clothes are highly wrinkled. In order to handle the large variability a deformed cloth may have, we build a Bag of Features based detector that combines appearance and 3D geometry features. An image is scanned using a sliding window with a linear classifier, and the candidate windows are refined using a non-linear SVM and a “grasp goodness” criterion to select the best grasping point. We demonstrate our approach detecting collars in deformed polo shirts, using a Kinect camera. Experimental results show a good performance of the proposed method not only in identifying the same trained textile object part under severe deformations and occlusions, but also the corresponding part in other clothes, exhibiting a degree of generalization.

european conference on computer vision | 2008

Pose Priors for Simultaneously Solving Alignment and Correspondence

Francesc Moreno-Noguer; Vincent Lepetit; Pascal Fua

Estimating a camera pose given a set of 3D-object and 2D-image feature points is a well understood problem when correspondences are given. However, when such correspondences cannot be established a priori, one must simultaneously compute them along with the pose. Most current approaches to solving this problem are too computationally intensive to be practical. An interesting exception is the SoftPosit algorithm, that looks for the solution as the minimum of a suitable objective function. It is arguably one of the best algorithms but its iterative nature means it can fail in the presence of clutter, occlusions, or repetitive patterns. In this paper, we propose an approach that overcomes this limitation by taking advantage of the fact that, in practice, some prior on the camera pose is often available. We model it as a Gaussian Mixture Model that we progressively refine by hypothesizing new correspondences. This rapidly reduces the number of potential matches for each 3D point and lets us explore the pose space more thoroughly than SoftPosit at a similar computational cost. We will demonstrate the superior performance of our approach on both synthetic and real data.

computer vision and pattern recognition | 2015

Neuroaesthetics in fashion: Modeling the perception of fashionability

Edgar Simo-Serra; Sanja Fidler; Francesc Moreno-Noguer; Raquel Urtasun

In this paper, we analyze the fashion of clothing of a large social website. Our goal is to learn and predict how fashionable a person looks on a photograph and suggest subtle improvements the user could make to improve her/his appeal. We propose a Conditional Random Field model that jointly reasons about several fashionability factors such as the type of outfit and garments the user is wearing, the type of the user, the photographs setting (e.g., the scenery behind the user), and the fashionability score. Importantly, our model is able to give rich feedback back to the user, conveying which garments or even scenery she/he should change in order to improve fashionability. We demonstrate that our joint approach significantly outperforms a variety of intelligent baselines. We additionally collected a novel heterogeneous dataset with 144,169 user posts containing diverse image, textual and meta information which can be exploited for our task. We also provide a detailed analysis of the data, showing different outfit trends and fashionability scores across the globe and across a span of 6 years.

Explore More