Is this you? Create Your Porfile

Diego Thomas

National Institute of Informatics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Diego Thomas is active.

Explore More

Publication

Featured researches published by Diego Thomas.

ieee international conference on automatic face gesture recognition | 2015

Real-time multi-view facial landmark detector learned by the structured output SVM

Michal Uricar; Vojtech Franc; Diego Thomas; Akihiro Sugimoto; Václav Hlaváč

While the problem of facial landmark detection is getting big attention in the computer vision community recently, most of the methods deal only with near-frontal views and there is only a few really multi-view detectors available, that are capable of detection in a wide range of yaw angle (e.g. Φ ε (-90°, 90°)). We describe a multi-view facial landmark detector based on the Deformable Part Models, which treats the problem of the simultaneous landmark detection and the viewing angle estimation within a structured output classification framework. We present an easily extensible and flexible framework which provides a real-time performance on the “in the wild” images, evaluated on a challenging “Annotated Facial Landmarks in the Wild” database. We show that our detector achieves better results than the current state of the art in terms of the localization error.

international conference on computer vision | 2013

A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera

Diego Thomas; Akihiro Sugimoto

Updating a global 3D model with live RGB-D measurements has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expensive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene representation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of indoor scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.

Computer Vision and Image Understanding | 2011

Robustly registering range images using local distribution of albedo

Diego Thomas; Akihiro Sugimoto

We propose a robust method for registering overlapping range images of a Lambertian object under a rough estimate of illumination. Because reflectance properties are invariant to changes in illumination, the albedo is promising to range image registration of Lambertian objects lacking in discriminative geometric features under variable illumination. We use adaptive regions in our method to model the local distribution of albedo, which enables us to stably extract the reliable attributes of each point against illumination estimates. We use a level-set method to grow robust and adaptive regions to define these attributes. A similarity metric between two attributes is also defined to match points in the overlapping area. Moreover, remaining mismatches are efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic and real data demonstrate the robustness and effectiveness of our proposed method.

international conference on computer vision | 2013

Compact and Accurate 3-D Face Modeling Using an RGB-D Camera: Let's Open the Door to 3-D Video Conference

Pavan Kumar Anasosalu; Diego Thomas; Akihiro Sugimoto

We present a method for producing an accurate and compact 3-D face model in real time using a low cost RGB-D sensor like the Kinect camera. We extend and use Bump Images for highly accurate and low memory consumption 3-D reconstruction of the human face. Bump Images are generated by representing the Cartesian coordinates of points on the face in the spherical coordinate system whose origin is the center of the head. After initialization, the Bump Images are updated in real time with every RGB-D frame with respect to the current viewing direction and head pose that are estimated using the frame-to-global-model registration strategy. While high accuracy of the representation allows to recover fine details, low memory use opens new possible applications of consumer depth cameras such as 3-D video conferencing. We validate our approach by quantitatively comparing our result with the result obtained by a commercial high resolution laser scanner. We also discuss the potential of our proposed method for a 3-D video conferencing application with existing internet speeds.

intelligent robots and systems | 2013

Learning to discover objects in RGB-D images using correlation clustering

Michael Firman; Diego Thomas; Simon J. Julier; Akihiro Sugimoto

We introduce a method to discover objects from RGB-D image collections which does not require a user to specify the number of objects expected to be found. We propose a probabilistic formulation to find pairwise similarity between image segments, using a classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We then use a correlation clustering solver to both find the optimal clustering of all the segments in the collection and to recover the number of clusters. Unlike traditional supervised learning methods, our training data need not be of the same class or category as the objects we expect to discover. We show that this parameter-free supervised clustering method has superior performance to traditional clustering methods.

computer vision and pattern recognition | 2016

Augmented Blendshapes for Real-Time Simultaneous 3D Head Modeling and Facial Motion Capture

Diego Thomas; Rin-ichiro Taniguchi

We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously comprehensive facial motion tracking and a detailed 3D model of the users head. Anyones head can be instantly reconstructed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demonstrate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for animation production and 3D video telecommunications.

international conference on 3d imaging, modeling, processing, visualization & transmission | 2012

Robust Simultaneous 3D Registration via Rank Minimization

Diego Thomas; Yasuyuki Matsushita; Akihiro Sugimoto

We present a robust and accurate 3D registration method for a dense sequence of depth images taken from unknown viewpoints. Our method simultaneously estimates multiple extrinsic parameters of the depth images to obtain a registered full 3D model of the scanned scene. By arranging the depth measurements in a matrix form, we formulate the problem as a simultaneous estimation of multiple extrinsics and a low-rank matrix, which corresponds to the aligned depth images as well as a sparse error matrix. Unlike previous approaches that use sequential or heuristic global registration approaches, our solution method uses an advanced convex optimization technique for obtaining a robust solution via rank minimization. To achieve accurate computation, we develop a depth projection method that has minimum sensitivity to sampling by reading projected depth values in the input depth images. We demonstrate the effectiveness of the proposed method through extensive experiments and compare it with previous standard techniques.

International Journal of Computer Vision | 2017

Parametric Surface Representation with Bump Image for Dense 3D Modeling Using an RBG-D Camera

Diego Thomas; Akihiro Sugimoto

When constructing a dense 3D model of an indoor static scene from a sequence of RGB-D images, the choice of the 3D representation (e.g. 3D mesh, cloud of points or implicit function) is of crucial importance. In the last few years, the volumetric truncated signed distance function (TSDF) and its extensions have become popular in the community and largely used for the task of dense 3D modelling using RGB-D sensors. However, as this representation is voxel based, it offers few possibilities for manipulating and/or editing the constructed 3D model, which limits its applicability. In particular, the amount of data required to maintain the volumetric TSDF rapidly becomes huge which limits possibilities for portability. Moreover, simplifications (such as mesh extraction and surface simplification) significantly reduce the accuracy of the 3D model (especially in the color space), and editing the 3D model is difficult. We propose a novel compact, flexible and accurate 3D surface representation based on parametric surface patches augmented by geometric and color texture images. Simple parametric shapes such as planes are roughly fitted to the input depth images, and the deviations of the 3D measurements to the fitted parametric surfaces are fused into a geometric texture image (called the Bump image). A confidence and color texture image are also built. Our 3D scene representation is accurate yet memory efficient. Moreover, updating or editing the 3D model becomes trivial since it is reduced to manipulating 2D images. Our experimental results demonstrate the advantages of our proposed 3D representation through a concrete indoor scene reconstruction application.

european conference on computer vision | 2014

A Two-Stage Strategy for Real-Time Dense 3D Reconstruction of Large-Scale Scenes

Diego Thomas; Akihiro Sugimoto

The frame-to-global-model approach is widely used for accurate 3D modeling from sequences of RGB-D images. Because still no perfect camera tracking system exists, the accumulation of small errors generated when registering and integrating successive RGB-D images causes deformations of the 3D model being built up. In particular, the deformations become significant when the scale of the scene to model is large. To tackle this problem, we propose a two-stage strategy to build in details a large-scale 3D model with minimal deformations where the first stage creates accurate small-scale 3D scenes in real-time from short subsequences of RGB-D images while the second stage re-organises all the results from the first stage in a geometrically consistent manner to reduce deformations as much as possible. By employing planar patches as the 3D scene representation, our proposed method runs in real-time to build accurate 3D models with minimal deformations even for large-scale scenes. Our experiments using real data confirm the effectiveness of our proposed method.

Computer Vision and Image Understanding | 2017

Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras

Diego Thomas; Akihiro Sugimoto

Abstract Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.

Explore More