Helge Rhodin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Helge Rhodin is active.

Explore More

Publication

Featured researches published by Helge Rhodin.

international conference on computer graphics and interactive techniques | 2017

VNect: real-time 3D human pose estimation with a single RGB camera

Dushyant Mehta; Srinath Sridhar; Oleksandr Sotnychenko; Helge Rhodin; Mohammad Shafiei; Hans-Peter Seidel; Weipeng Xu; Dan Casas; Christian Theobalt

We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our methods accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

international conference on 3d vision | 2014

Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model

Srinath Sridhar; Helge Rhodin; Hans-Peter Seidel; Antti Oulasvirta; Christian Theobalt

Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this paper, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representation based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differentiable making fast gradient based pose optimization possible. This shape representation, together with a full perspective projection model, enables more accurate hand modeling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualitatively and quantitatively on publicly available datasets.

european conference on computer vision | 2016

General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

Helge Rhodin; Nadia Robertini; Dan Casas; Christian Richardt; Hans-Peter Seidel; Christian Theobalt

Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation – skeleton, volumetric shape, appearance, and optionally a body surface – and estimates the actor’s motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as a Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume ray casting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, and variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.

Computer Graphics Forum | 2014

Interactive motion mapping for real-time character control

Helge Rhodin; James Tompkin; Kwang In Kim; Kiran Varanasi; Hans-Peter Seidel; Christian Theobalt

It is now possible to capture the 3D motion of the human body on consumer hardware and to puppet in real time skeleton‐based virtual characters. However, many characters do not have humanoid skeletons. Characters such as spiders and caterpillars do not have boned skeletons at all, and these characters have very different shapes and motions. In general, character control under arbitrary shape and motion transformations is unsolved ‐ how might these motions be mapped? We control characters with a method which avoids the rigging‐skinning pipeline — source and target characters do not have skeletons or rigs. We use interactively‐defined sparse pose correspondences to learn a mapping between arbitrary 3D point source sequences and mesh target sequences. Then, we puppet the target character in real time. We demonstrate the versatility of our method through results on diverse virtual characters with different input motion controllers. Our method provides a fast, flexible, and intuitive interface for arbitrary motion mapping which provides new ways to control characters for real‐time animation.

international conference on computer graphics and interactive techniques | 2016

EgoCap: egocentric marker-less motion capture with two fisheye cameras

Helge Rhodin; Christian Richardt; Dan Casas; Eldar Insafutdinov; Mohammad Shafiei; Hans-Peter Seidel; Bernt Schiele; Christian Theobalt

Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort with marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion with an inside-in setup, i.e. without external sensors. This makes capture independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. Therefore, we propose a new method for real-time, marker-less, and egocentric motion capture: estimating the full-body skeleton pose from a lightweight stereo pair of fisheye cameras attached to a helmet or virtual reality headset - an optical inside-in method, so to speak. This allows full-body motion capture in general indoor and outdoor scenes, including crowded scenes with many people nearby, which enables reconstruction in larger-scale activities. Our approach combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. It is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.

international conference on 3d vision | 2016

Model-Based Outdoor Performance Capture

Nadia Robertini; Dan Casas; Helge Rhodin; Hans-Peter Seidel; Christian Theobalt

We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.

international conference on computer vision | 2015

A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation

Helge Rhodin; Nadia Robertini; Christian Richardt; Hans-Peter Seidel; Christian Theobalt

Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.

international conference on computer graphics and interactive techniques | 2015

Generalizing wave gestures from sparse examples for real-time character control

Helge Rhodin; James Tompkin; Kwang In Kim; Edilson de Aguiar; Hanspeter Pfister; Hans-Peter Seidel; Christian Theobalt

Motion-tracked real-time character control is important for games and VR, but current solutions are limited: retargeting is hard for non-human characters, with locomotion bound to the sensing volume; and pose mappings are ambiguous with difficult dynamic motion control. We robustly estimate wave properties ---amplitude, frequency, and phase---for a set of interactively-defined gestures by mapping user motions to a low-dimensional independent representation. The mapping separates simultaneous or intersecting gestures, and extrapolates gesture variations from single training examples. For animations such as locomotion, wave properties map naturally to stride length, step frequency, and progression, and allow smooth transitions from standing, to walking, to running. Interpolating out-of-phase locomotions is hard, e.g., quadruped legs between walks and runs switch phase, so we introduce a new time-interpolation scheme to reduce artifacts. These improvements to real-time motion-tracked character control are important for common cyclic animations. We validate this in a user study, and show versatility to apply to part- and full-body motions across a variety of sensors.

international conference on computer graphics and interactive techniques | 2018

MonoPerfCap: Human Performance Capture from Monocular Video

Weipeng Xu; Avishek Chatterjee; Michael Zollhoefer; Helge Rhodin; Dushyant Mehta; Hans-Peter Seidel; Christian Theobalt

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.

european conference on computer vision | 2018

Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation

Helge Rhodin; Mathieu Salzmann; Pascal Fua

Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weakly-supervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need a sufficiently large set of samples with 3D annotations for learning to succeed. In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations. To this end, we use an encoder-decoder that predicts an image from one viewpoint given an image from another viewpoint. Because this representation encodes 3D geometry, using it in a semi-supervised setting makes it easier to learn a mapping from it to 3D human pose. As evidenced by our experiments, our approach significantly outperforms fully-supervised methods given the same amount of labeled data, and improves over other semi-supervised methods while using as little as 1% of the labeled data.

Explore More