Dan Casas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan Casas is active.

Explore More

Publication

Featured researches published by Dan Casas.

international conference on computer graphics and interactive techniques | 2017

VNect: real-time 3D human pose estimation with a single RGB camera

Dushyant Mehta; Srinath Sridhar; Oleksandr Sotnychenko; Helge Rhodin; Mohammad Shafiei; Hans-Peter Seidel; Weipeng Xu; Dan Casas; Christian Theobalt

We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our methods accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

ACM Transactions on Graphics | 2016

Reconstruction of Personalized 3D Face Rigs from Monocular Video

Pablo Garrido; Michael Zollhöfer; Dan Casas; Levi Valgaerts; Kiran Varanasi; Patrick Pérez; Christian Theobalt

We present a novel approach for the automatic creation of a personalized high-quality 3D face rig of an actor from just monocular video data (e.g., vintage movies). Our rig is based on three distinct layers that allow us to model the actor’s facial shape as well as capture his person-specific expression characteristics at high fidelity, ranging from coarse-scale geometry to fine-scale static and transient detail on the scale of folds and wrinkles. At the heart of our approach is a parametric shape prior that encodes the plausible subspace of facial identity and expression variations. Based on this prior, a coarse-scale reconstruction is obtained by means of a novel variational fitting approach. We represent person-specific idiosyncrasies, which cannot be represented in the restricted shape and expression space, by learning a set of medium-scale corrective shapes. Fine-scale skin detail, such as wrinkles, are captured from video via shading-based refinement, and a generative detail formation model is learned. Both the medium- and fine-scale detail layers are coupled with the parametric prior by means of a novel sparse linear regression formulation. Once reconstructed, all layers of the face rig can be conveniently controlled by a low number of blendshape expression parameters, as widely used by animation artists. We show captured face rigs and their motions for several actors filmed in different monocular video formats, including legacy footage from YouTube, and demonstrate how they can be used for 3D animation and 2D video editing. Finally, we evaluate our approach qualitatively and quantitatively and compare to related state-of-the-art methods.

international conference on computer graphics and interactive techniques | 2015

Avatar reshaping and automatic rigging using a deformable model

Andrew W. Feng; Dan Casas; Ari Shapiro

3D scans of human figures have become widely available through online marketplaces and have become relatively easy to acquire using commodity scanning hardware. In addition to static uses of such 3D models, such as 3D printed figurines or rendered 3D still imagery, there are numerous uses for an animated 3D character that uses such 3D scan data. In order to effectively use such models as dynamic 3D characters, the models must be properly rigged before they are animated. In this work, we demonstrate a method to automatically rig a 3D mesh by matching a set of morphable models against the 3D scan. Once the morphable model has been matched against the 3D scan, the skeleton position and skinning attributes are then copied, resulting in a skinning and rigging that is similar in quality to the original hand-rigged model. In addition, the use of a morphable model allows us to reshape and resize the 3D scan according to approximate human proportions. Thus, a human 3D scan can be modified to be taller, shorter, fatter or skinnier. Such manipulations of the 3D scan are useful both for social science research, as well as for visualization for applications such as fitness, body image, plastic surgery and the like.

european conference on computer vision | 2016

General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

Helge Rhodin; Nadia Robertini; Dan Casas; Christian Richardt; Hans-Peter Seidel; Christian Theobalt

Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation – skeleton, volumetric shape, appearance, and optionally a body surface – and estimates the actor’s motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as a Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume ray casting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, and variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.

european conference on computer vision | 2016

Real-Time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Srinath Sridhar; Franziska Mueller; Michael Zollhöfer; Dan Casas; Antti Oulasvirta; Christian Theobalt

Real-time simultaneous tracking of hands manipulating and interacting with external objects has many potential applications in augmented reality, tangible computing, and wearable computing. However, due to difficult occlusions, fast motions, and uniform hand appearance, jointly tracking hand and object pose is more challenging than tracking either of the two separately. Many previous approaches resort to complex multi-camera setups to remedy the occlusion problem and often employ expensive segmentation and optimization steps which makes real-time tracking impossible. In this paper, we propose a real-time solution that uses a single commodity RGB-D camera. The core of our approach is a 3D articulated Gaussian mixture alignment strategy tailored to hand-object tracking that allows fast pose optimization. The alignment energy uses novel regularizers to address occlusions and hand-object contacts. For added robustness, we guide the optimization with discriminative part classification of the hand and segmentation of the object. We conducted extensive experiments on several existing datasets and introduce a new annotated hand-object dataset. Quantitative and qualitative results show the key advantages of our method: speed, accuracy, and robustness.

international conference on computer vision | 2017

Real-Time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor

Franziska Mueller; Dushyant Mehta; Oleksandr Sotnychenko; Srinath Sridhar; Dan Casas; Christian Theobalt

We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail for hand-object interactions in cluttered scenes imaged from egocentric viewpoints—common for virtual or augmented reality applications. Our approach uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the hand and regress 3D joint locations. Hand localization is achieved by using a CNN to estimate the 2D position of the hand center in the input, even in the presence of clutter and occlusions. The localized hand position, together with the corresponding input depth value, is used to generate a normalized cropped image that is fed into a second CNN to regress relative 3D hand joint locations in real time. For added accuracy, robustness and temporal stability, we refine the pose estimates using a kinematic pose tracking energy. To train the CNNs, we introduce a new photorealistic dataset that uses a merged reality approach to capture and synthesize large amounts of annotated data of natural hand interaction in cluttered scenes. Through quantitative and qualitative evaluation, we show that our method is robust to self-occlusion and occlusions by objects, particularly in moving egocentric perspectives.

international conference on computer graphics and interactive techniques | 2016

EgoCap: egocentric marker-less motion capture with two fisheye cameras

Helge Rhodin; Christian Richardt; Dan Casas; Eldar Insafutdinov; Mohammad Shafiei; Hans-Peter Seidel; Bernt Schiele; Christian Theobalt

Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort with marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion with an inside-in setup, i.e. without external sensors. This makes capture independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. Therefore, we propose a new method for real-time, marker-less, and egocentric motion capture: estimating the full-body skeleton pose from a lightweight stereo pair of fisheye cameras attached to a helmet or virtual reality headset - an optical inside-in method, so to speak. This allows full-body motion capture in general indoor and outdoor scenes, including crowded scenes with many people nearby, which enables reconstruction in larger-scale activities. Our approach combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. It is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.

international conference on 3d vision | 2016

Model-Based Outdoor Performance Capture

Nadia Robertini; Dan Casas; Helge Rhodin; Hans-Peter Seidel; Christian Theobalt

We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.

intelligent virtual agents | 2015

A Platform for Building Mobile Virtual Humans

Andrew W. Feng; Anton Leuski; Stacy Marsella; Dan Casas; Sin-Hwa Kang; Ari Shapiro

We describe an authoring framework for developing virtual humans on mobile applications. The framework abstracts many elements needed for virtual human generation and interaction, such as the rapid development of nonverbal behavior, lip syncing to speech, dialogue management, access to speech transcription services, and access to mobile sensors such as the microphone, gyroscope and location components.

computer animation and social agents | 2016

Rapid Photorealistic Blendshape Modeling from RGB-D Sensors

Dan Casas; Andrew W. Feng; Oleg Alexander; Graham Fyffe; Paul E. Debevec; Ryosuke Ichikari; Hao Li; Kyle Olszewski; Evan A. Suma; Ari Shapiro

Creating and animating realistic 3D human faces is an important element of virtual reality, video games, and other areas that involve interactive 3D graphics. In this paper, we propose a system to generate photorealistic 3D blendshape-based face models automatically using only a single consumer RGB-D sensor. The capture and processing requires no artistic expertise to operate, takes 15 seconds to capture and generate a single facial expression, and approximately 1 minute of processing time per expression to transform it into a blendshape model. Our main contributions include a complete end-to-end pipeline for capturing and generating photorealistic blendshape models automatically and a registration method that solves dense correspondences between two face scans by utilizing facial landmarks detection and optical flows. We demonstrate the effectiveness of the proposed method by capturing different human subjects with a variety of sensors and puppeteering their 3D faces with real-time facial performance retargeting. The rapid nature of our method allows for just-in-time construction of a digital face. To that end, we also integrated our pipeline with a virtual reality facial performance capture system that allows dynamic embodiment of the generated faces despite partial occlusion of the users real face by the head-mounted display.

Explore More