Yebin Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yebin Liu is active.

Explore More

Publication

Featured researches published by Yebin Liu.

IEEE Transactions on Visualization and Computer Graphics | 2010

A Point-Cloud-Based Multiview Stereo Algorithm for Free-Viewpoint Video

Yebin Liu; Qionghai Dai; Wenli Xu

This paper presents a robust multiview stereo (MVS) algorithm for free-viewpoint video. Our MVS scheme is totally point-cloud-based and consists of three stages: point cloud extraction, merging, and meshing. To guarantee reconstruction accuracy, point clouds are first extracted according to a stereo matching metric which is robust to noise, occlusion, and lack of texture. Visual hull information, frontier points, and implicit points are then detected and fused with point fidelity information in the merging and meshing steps. All aspects of our method are designed to counteract potential challenges in MVS data sets for accurate and complete model reconstruction. Experimental results demonstrate that our technique produces the most competitive performance among current algorithms under sparse viewpoint setups according to both static and motion MVS data sets.

computer vision and pattern recognition | 2011

Markerless motion capture of interacting characters using multi-view image segmentation

Yebin Liu; Carsten Stoll; Juergen Gall; Hans-Peter Seidel; Christian Theobalt

We present a markerless motion capture approach that reconstructs the skeletal motion and detailed time-varying surface geometry of two closely interacting people from multi-view video. Due to ambiguities in feature-to-person assignments and frequent occlusions, it is not feasible to directly apply single-person capture approaches to the multi-person case. We therefore propose a combined image segmentation and tracking approach to overcome these difficulties. A new probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Thereafter, a single-person markerless motion and surface capture approach can be applied to each individual, either one-by-one or in parallel, even under strong occlusions. We demonstrate the performance of our approach on several challenging multi-person motions, including dance and martial arts, and also provide a reference dataset for multi-person motion capture with ground truth.

IEEE Transactions on Visualization and Computer Graphics | 2011

Fusing Multiview and Photometric Stereo for 3D Reconstruction under Uncalibrated Illumination

Chenglei Wu; Yebin Liu; Qionghai Dai; Bennett Wilburn

We propose a method to obtain a complete and accurate 3D model from multiview images captured under a variety of unknown illuminations. Based on recent results showing that for Lambertian objects, general illumination can be approximated well using low-order spherical harmonics, we develop a robust alternating approach to recover surface normals. Surface normals are initialized using a multi-illumination multiview stereo algorithm, then refined using a robust alternating optimization method based on the ℓ1 metric. Erroneous normal estimates are detected using a shape prior. Finally, the computed normals are used to improve the preliminary 3D model. The reconstruction system achieves watertight and robust 3D reconstruction while neither requiring manual interactions nor imposing any constraints on the illumination. Experimental results on both real world and synthetic data show that the technique can acquire accurate 3D models for Lambertian surfaces, and even tolerates small violations of the Lambertian assumption.

european conference on computer vision | 2012

Performance capture of interacting characters with handheld kinects

Genzhi Ye; Yebin Liu; Nils Hasler; Xiangyang Ji; Qionghai Dai; Christian Theobalt

We present an algorithm for marker-less performance capture of interacting humans using only three hand-held Kinect cameras. Our method reconstructs human skeletal poses, deforming surface geometry and camera poses for every time step of the depth video. Skeletal configurations and camera poses are found by solving a joint energy minimization problem which optimizes the alignment of RGBZ data from all cameras, as well as the alignment of human shape templates to the Kinect data. The energy function is based on a combination of geometric correspondence finding, implicit scene segmentation, and correspondence finding using image features. Only the combination of geometric and photometric correspondences and the integration of human pose and camera pose estimation enables reliable performance capture with only three sensors. As opposed to previous performance capture methods, our algorithm succeeds on general uncontrolled indoor scenes with potentially dynamic background, and it succeeds even if the cameras are moving.

international conference on computer graphics and interactive techniques | 2011

Video-based characters: creating new human performances from a multi-view video database

Feng Xu; Yebin Liu; Carsten Stoll; James Tompkin; Gaurav Bharaj; Qionghai Dai; Hans-Peter Seidel; Jan Kautz; Christian Theobalt

We present a method to synthesize plausible video sequences of humans according to user-defined body motions and viewpoints. We first capture a small database of multi-view video sequences of an actor performing various basic motions. This database needs to be captured only once and serves as the input to our synthesis algorithm. We then apply a marker-less model-based performance capture approach to the entire database to obtain pose and geometry of the actor in each database frame. To create novel video sequences of the actor from the database, a user animates a 3D human skeleton with novel motion and viewpoints. Our technique then synthesizes a realistic video sequence of the actor performing the specified motion based only on the initial database. The first key component of our approach is a new efficient retrieval strategy to find appropriate spatio-temporally coherent database frames from which to synthesize target video frames. The second key component is a warping-based texture synthesis approach that uses the retrieved most-similar database frames to synthesize spatio-temporally coherent target video frames. For instance, this enables us to easily create video sequences of actors performing dangerous stunts without them being placed in harms way. We show through a variety of result videos and a user study that we can synthesize realistic videos of people, even if the target motions and camera views are different from the database content.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation

Yebin Liu; Juergen Gall; Carsten Stoll; Qionghai Dai; Hans-Peter Seidel; Christian Theobalt

Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting peoples is a very challenging task, even in a multicamera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. To address this task, we propose a framework that exploits multiview image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one by one to each individual, followed with surface estimation to capture detailed nonrigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multiperson motions, including dancing, wrestling, and hugging.

computer vision and pattern recognition | 2009

Continuous depth estimation for multi-view stereo

Yebin Liu; Xun Cao; Qionghai Dai; Wenli Xu

Depth-map merging approaches have become more and more popular in multi-view stereo (MVS) because of their flexibility and superior performance. The quality of depth map used for merging is vital for accurate 3D reconstruction. While traditional depth map estimation has been performed in a discrete manner, we suggest the use of a continuous counterpart. In this paper, we first integrate silhouette information and epipolar constraint into the variational method for continuous depth map estimation. Then, several depth candidates are generated based on a multiple starting scales (MSS) framework. From these candidates, refined depth maps for each view are synthesized according to path-based NCC (normalized cross correlation) metric. Finally, the multiview depth maps are merged to produce 3D models. Our algorithm excels at detail capture and produces one of the most accurate results among the current algorithms for sparse MVS datasets according to the Middlebury benchmark. Additionally, our approach shows its outstanding robustness and accuracy in free-viewpoint video scenario.

international conference on computer vision | 2011

Shading-based dynamic shape refinement from multi-view video under general illumination

Chenglei Wu; Kiran Varanasi; Yebin Liu; Hans-Peter Seidel; Christian Theobalt

We present an approach to add true fine-scale spatio-temporal shape detail to dynamic scene geometry captured from multi-view video footage. Our approach exploits shading information to recover the millimeter-scale surface structure, but in contrast to related approaches succeeds under general unconstrained lighting conditions. Our method starts off from a set of multi-view video frames and an initial series of reconstructed coarse 3D meshes that lack any surface detail. In a spatio-temporal maximum a posteriori probability (MAP) inference framework, our approach first estimates the incident illumination and the spatially-varying albedo map on the mesh surface for every time instant. Thereafter, albedo and illumination are used to estimate the true geometric detail visible in the images and add it to the coarse reconstructions. The MAP framework uses weak temporal priors on lighting, albedo and geometry which improve reconstruction quality yet allow for temporal variations in the data.

international conference on computer graphics and interactive techniques | 2014

Spatial-spectral encoded compressive hyperspectral imaging

Xing Lin; Yebin Liu; Jiamin Wu; Qionghai Dai

This paper proposes a novel compressive hyperspectral (HS) imaging approach that allows for high-resolution HS images to be captured in a single image. The proposed architecture comprises three key components: spatial-spectral encoded optical camera design, over-complete HS dictionary learning and sparse-constraint computational reconstruction. Our spatial-spectral encoded sampling scheme provides a higher degree of randomness in the measured projections than previous compressive HS imaging approaches; and a robust nonlinear sparse reconstruction method is employed to recover the HS images from the coded projection with higher performance. To exploit the sparsity constraint on the nature HS images for computational reconstruction, an over-complete HS dictionary is learned to represent the HS images in a sparser way than previous representations. We validate the proposed approach on both synthetic and real captured data, and show successful recovery of HS images for both indoor and outdoor scenes. In addition, we demonstrate other applications for the over-complete HS dictionary and sparse coding techniques, including 3D HS images compression and denoising.

international conference on computer graphics and interactive techniques | 2014

Intrinsic video and applications

Genzhi Ye; Elena Garces; Yebin Liu; Qionghai Dai; Diego Gutierrez

We present a method to decompose a video into its intrinsic components of reflectance and shading, plus a number of related example applications in video editing such as segmentation, stylization, material editing, recolorization and color transfer. Intrinsic decomposition is an ill-posed problem, which becomes even more challenging in the case of video due to the need for temporal coherence and the potentially large memory requirements of a global approach. Additionally, user interaction should be kept to a minimum in order to ensure efficiency. We propose a probabilistic approach, formulating a Bayesian Maximum a Posteriori problem to drive the propagation of clustered reflectance values from the first frame, and defining additional constraints as priors on the reflectance and shading. We explicitly leverage temporal information in the video by building a causal-anticausal, coarse-to-fine iterative scheme, and by relying on optical flow information. We impose no restrictions on the input video, and show examples representing a varied range of difficult cases. Our method is the first one designed explicitly for video; moreover, it naturally ensures temporal consistency, and compares favorably against the state of the art in this regard.

Explore More