Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shunsuke Saito is active.

Publication


Featured researches published by Shunsuke Saito.


european conference on computer vision | 2016

Real-Time Facial Segmentation and Performance Capture from RGB Input

Shunsuke Saito; Tianye Li; Hao Li

We introduce the concept of unconstrained real-time 3D facial performance capture through explicit semantic segmentation in the RGB input. To ensure robustness, cutting edge supervised learning approaches rely on large training datasets of face images captured in the wild. While impressive tracking quality has been demonstrated for faces that are largely visible, any occlusion due to hair, accessories, or hand-to-face gestures would result in significant visual artifacts and loss of tracking accuracy. The modeling of occlusions has been mostly avoided due to its immense space of appearance variability. To address this curse of high dimensionality, we perform tracking in unconstrained images assuming non-face regions can be fully masked out. Along with recent breakthroughs in deep learning, we demonstrate that pixel-level facial segmentation is possible in real-time by repurposing convolutional neural networks designed originally for general semantic segmentation. We develop an efficient architecture based on a two-stream deconvolution network with complementary characteristics, and introduce carefully designed training samples and data augmentation strategies for improved segmentation accuracy and robustness. We adopt a state-of-the-art regression-based facial tracking framework with segmented face images as training, and demonstrate accurate and uninterrupted facial performance capture in the presence of extreme occlusion and even side views. Furthermore, the resulting segmentation can be directly used to composite partial 3D face models on the input images and enable seamless facial manipulation tasks, such as virtual make-up or face replacement.


Biochimica et Biophysica Acta | 2009

Plasmid DNA-encapsulating liposomes: Effect of a spacer between the cationic head group and hydrophobic moieties of the lipids on gene expression efficiency

Yosuke Obata; Shunsuke Saito; Naoya Takeda; Shinji Takeoka

We have synthesized a series of cationic amino acid-based lipids having a spacer between the cationic head group and hydrophobic moieties and examined the influence of the spacer on a liposome gene delivery system. As a comparable spacer, a hydrophobic spacer with a hydrocarbon chain composed of 0, 3, 5, 7, or 11 carbons, and a hydrophilic spacer with an oxyethylene chain (10 carbon and 3 oxygen molecules) were investigated. Plasmid DNA (pDNA)-encapsulating liposomes were prepared by mixing an ethanol solution of the lipids with an aqueous solution of pDNA. The zeta potentials and cellular uptake efficiency of the cationic liposomes containing each synthetic lipid were almost equivalent. However, the cationic lipids with the hydrophobic spacer were subject to fuse with biomembrane-mimicking liposomes. 1,5-Dihexadecyl-N-lysyl-N-heptyl-l-glutamate, having a seven carbon atom spacer, exhibited the highest fusogenic potential among the synthetic lipids. Increased fusion potential correlated with enhanced gene expression efficiency. By contrast, an oxyethylene chain spacer showed low gene expression efficiency. We conclude that a hydrophobic spacer between the cationic head group and hydrophobic moieties is a key component for improving pDNA delivery.


international conference on computer graphics and interactive techniques | 2016

High-fidelity facial and speech animation for VR HMDs

Kyle Olszewski; Joseph J. Lim; Shunsuke Saito; Hao Li

Significant challenges currently prohibit expressive interaction in virtual reality (VR). Occlusions introduced by head-mounted displays (HMDs) make existing facial tracking techniques intractable, and even state-of-the-art techniques used for real-time facial tracking in unconstrained environments fail to capture subtle details of the users facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to an HMD, we record multiple subjects performing various facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a users mouth region to the parameters that control a digital avatar. To make training this system more tractable, we use audio-based alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that this approach is also feasible for tracking the expressions around the users eye region with an internal infrared (IR) camera, thereby enabling full facial tracking. This system requires no user-specific calibration, uses easily obtainable consumer hardware, and produces high-quality animations of speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art real-time facial tracking techniques.


computer vision and pattern recognition | 2017

Photorealistic Facial Texture Inference Using Deep Neural Networks

Shunsuke Saito; Lingyu Wei; Liwen Hu; Koki Nagano; Hao Li

We present a data-driven inference method that can synthesize a photorealistic texture map of a complete 3D face model given a partial 2D view of a person in the wild. After an initial estimation of shape and low-frequency albedo, we compute a high-frequency partial texture map, without the shading component, of the visible face area. To extract the fine appearance details from this incomplete input, we introduce a multi-scale detail analysis technique based on mid-layer feature correlations extracted from a deep convolutional neural network. We demonstrate that fitting a convex combination of feature correlations from a high-resolution face database can yield a semantically plausible facial detail description of the entire face. A complete and photorealistic texture map can then be synthesized by iteratively optimizing for the reconstructed feature correlations. Using these high-resolution textures and a commercial rendering framework, we can produce high-fidelity 3D renderings that are visually comparable to those obtained with state-of-the-art multi-view face capture systems. We demonstrate successful face reconstructions from a wide range of low resolution input images, including those of historical figures. In addition to extensive evaluations, we validate the realism of our results using a crowdsourced user study.


international conference on computer graphics and interactive techniques | 2015

Computational bodybuilding: anatomically-based modeling of human bodies

Shunsuke Saito; Zi-Ye Zhou; Ladislav Kavan

We propose a method to create a wide range of human body shapes from a single input 3D anatomy template. Our approach is inspired by biological processes responsible for human body growth. In particular, we simulate growth of skeletal muscles and subcutaneous fat using physics-based models which combine growth and elasticity. Together with a tool to edit proportions of the bones, our method allows us to achieve a desired shape of the human body by directly controlling hypertrophy (or atrophy) of every muscle and enlargement of fat tissues. We achieve near-interactive run times by utilizing a special quasi-statics solver (Projective Dynamics) and by crafting a volumetric discretization which results in accurate deformations without an excessive number of degrees of freedom. Our system is intuitive to use and the resulting human body models are ready for simulation using existing physics-based animation methods, because we deform not only the surface, but also the entire volumetric model.


ACM Transactions on Graphics | 2017

Avatar digitization from a single image for real-time rendering

Liwen Hu; Shunsuke Saito; Lingyu Wei; Koki Nagano; Jaewoo Seo; Jens Fursund; Iman Sadeghi; Carrie Sun; Yen-Chun Chen; Hao Li

We present a fully automatic framework that digitizes a complete 3D head with hair from a single unconstrained image. Our system offers a practical and consumer-friendly end-to-end solution for avatar personalization in gaming and social VR applications. The reconstructed models include secondary components (eyes, teeth, tongue, and gums) and provide animation-friendly blendshapes and joint-based rigs. While the generated face is a high-quality textured mesh, we propose a versatile and efficient polygonal strips (polystrips) representation for the hair. Polystrips are suitable for an extremely wide range of hairstyles and textures and are compatible with existing game engines for real-time rendering. In addition to integrating state-of-the-art advances in facial shape modeling and appearance inference, we propose a novel single-view hair generation pipeline, based on 3D-model and texture retrieval, shape refinement, and polystrip patching optimization. The performance of our hairstyle retrieval is enhanced using a deep convolutional neural network for semantic hair attribute classification. Our generated models are visually comparable to state-of-the-art game characters designed by professional artists. For real-time settings, we demonstrate the flexibility of polystrips in handling hairstyle variations, as opposed to conventional strand-based representations. We further show the effectiveness of our approach on a large number of images taken in the wild, and how compelling avatars can be easily created by anyone.


symposium on computer animation | 2017

Production-level facial performance capture using deep convolutional neural networks

Samuli Laine; Tero Karras; Timo Aila; Antti Herva; Shunsuke Saito; Ronald Yu; Hao Li; Jaakko Lehtinen

We present a real-time deep learning framework for video-based facial performance capture---the dense 3D tracking of an actors face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5--10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. We compare our results with several state-of-the-art monocular real-time facial capture techniques and demonstrate compelling animation inference in challenging areas such as eyes and lips.


Computer Graphics Forum | 2017

Multi-View Stereo on Consistent Face Topology

Graham Fyffe; Koki Nagano; L. Huynh; Shunsuke Saito; Jay Busch; Andrew Jones; Hao Li; Paul E. Debevec

We present a multi‐view stereo reconstruction technique that directly produces a complete high‐fidelity head model with consistent facial mesh topology. While existing techniques decouple shape estimation and facial tracking, our framework jointly optimizes for stereo constraints and consistent mesh parameterization. Our method is therefore free from drift and fully parallelizable for dynamic facial performance capture. We produce highly detailed facial geometries with artist‐quality UV parameterization, including secondary elements such as eyeballs, mouth pockets, nostrils, and the back of the head. Our approach consists of deforming a common template model to match multi‐view input images of the subject, while satisfying cross‐view, cross‐subject, and cross‐pose consistencies using a combination of 2D landmark detection, optical flow, and surface and volumetric Laplacian regularization. Since the flow is never computed between frames, our method is trivially parallelized by processing each frame independently. Accurate rigid head pose is extracted using a PCA‐based dimension reduction and denoising scheme. We demonstrate high‐fidelity performance capture results with challenging head motion and complex facial expressions around eye and mouth regions. While the quality of our results is on par with the current state‐of‐the‐art, our approach can be fully parallelized, does not suffer from drift, and produces face models with production‐quality mesh topologies.


international conference on computer graphics and interactive techniques | 2014

Pose-independent garment transfer

Fumiya Narita; Shunsuke Saito; Takuya Kato; Tsukasa Fukusato; Shigeo Morishima

Dressing virtual characters is necessary for many applications such as film and game. However, modeling clothing for characters is a significant bottleneck, because it requires manual effort to design clothing, position it correctly on the body, and adjusting the fitting. Therefore, even if we wish to design similar looking clothing for characters that have very different poses and shapes, we would need to repeat the tedious process practically from scratch. We then propose a method for automatic design-preserving transfer of clothing between characters in various poses and shapes. As shown in the results, our system enables us to automatically generate a clothing model for a target character.


international conference on computer graphics and interactive techniques | 2017

Pinscreen: creating performance-driven avatars in seconds

Hao Li; Shunsuke Saito; Lingyu Wei; Iman Sadeghi; Liwen Hu; Jaewoo Seo; Koki Nagano; Jens Fursund; Yen-Chun Chen; Stephen Chen

With this fully automatic framework for creating a complete 3D avatar from a single unconstrained image, users can upload any photograph to build a high-quality head model within seconds. The model can be immediately animated via performance capture using a webcam. It digitizes the entire model using a textured-mesh representation for the head and volumetric strips for the hair. A simple web interface uploads any photograph, and a high-quality head model, including animation-friendly blend shapes and joint-based rigs, is reconstructed within seconds. Several animation examples are instantly generated for preview purposes, and the model can be loaded into Unity for immediate performance capture using a webcam.

Collaboration


Dive into the Shunsuke Saito's collaboration.

Top Co-Authors

Avatar

Hao Li

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kyle Olszewski

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Lingyu Wei

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Liwen Hu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ronald Yu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrew Jones

University of Colorado Boulder

View shared research outputs
Researchain Logo
Decentralizing Knowledge