[PDF] Accurate Face Rig Approximation with Deep Differential Subspace Reconstruction

Abstract

Full PDF

AAccurate Face Rig Approximation with Deep Differential SubspaceReconstruction

STEVEN L. SONG ∗ , Blue Sky Studios

WEIQI SHI ∗ , Yale University

MICHAEL REED,

Blue Sky Studios

Rest Pose Ground Truth Our Method DifferenceFig. 1. Our rig approximation method learns localized shape information in differential coordinates and, separately, a subspace for mesh reconstruction.

To be suitable for film-quality animation, rigs for character deformationmust fulfill a broad set of requirements. They must be able to create highlystylized deformation, allow a wide variety of controls to permit artistic free-dom, and accurately reflect the design intent. Facial deformation is especiallychallenging due to its nonlinearity with respect to the animation controlsand its additional precision requirements, which often leads to highly com-plex face rigs that are not generalizable to other characters. This lack ofgenerality creates a need for approximation methods that encode the defor-mation in simpler structures. We propose a rig approximation method thataddresses these issues by learning localized shape information in differentialcoordinates and, separately, a subspace for mesh reconstruction. The useof differential coordinates produces a smooth distribution of errors in the ∗ Authors contributed equally.Authors’ addresses: Steven L. Song, [email protected], Blue Sky Studios, 1American Ln, Greenwich, CT, 06831; Weiqi Shi, [email protected], Yale University,New Haven, CT, 06520; Michael Reed, [email protected], Blue Sky Studios, 1American Ln, Greenwich, CT, 06831.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.0730-0301/2020/7-ART34 $15.00https://doi.org/10.1145/3386569.3392491 resulting deformed surface, while the learned subspace provides constraintsthat reduce the low frequency error in the reconstruction. Our method canreconstruct both face and body deformations with high fidelity and does notrequire a set of well-posed animation examples, as we demonstrate with avariety of production characters.CCS Concepts: •

Computing methodologies → Machine learning ; An-imation .Additional Key Words and Phrases: rigging, deep learning, facial animation

ACM Reference Format:

Steven L. Song, Weiqi Shi, and Michael Reed. 2020. Accurate Face Rig Approx-imation with Deep Differential Subspace Reconstruction.

ACM Trans. Graph.

39, 4, Article 34 (July 2020), 12 pages. https://doi.org/10.1145/3386569.3392491

Film-quality character rigs rely on a complex hierarchy of proceduraldeformers, driven by a large number of animation controls, thatmap to the deformation of the vertices of a character’s surface mesh.Because the characters are subject to high aesthetic standards, andthe rigs are the primary means by which the animators interact withthem, the rigs themselves have strict performance requirements:the character’s skin must behave predictably and precisely overthe entire range of control, which for animated characters can beextreme because of the caricatured design and motion.

ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. a r X i v : . [ c s . G R ] A ug Rigs for facial animation typically have much more complex be-havior than body rigs, and require additional precision due to theirimportance in conveying the most crucial aspects of communicationand expression. To offer artistic freedom, the face rig is usually acomplex structure containing a large number of numerical controls.Unlike the joint-based controls commonly used for a character’sbody, these numerical controls are globally defined and coopera-tively influence the transformation of each vertex, making facialdeformation highly nonlinear and expensive to compute.In production it’s often desirable to reuse the same rig behaviorfor different purposes in different environments. For example, trans-ferring the rig to a simulation application for crowd simulation, toa game engine for VR production, or to a renderer for render-timemanipulation. Unfortunately it’s often not viable to take the originalrig to other packages because a visually-matching reimplementationis required per deformer per package. Similarly, simulation-basedrigs (e.g. muscle systems) provide complex behavior that is desirablein many production situations, but their lack of interactive responsediscourages their adoption. These issues can be addressed by a rigapproximation method if it has the following characteristics: simpleuniversal structure, high accuracy and good performance. A neuralnetwork approach automatically meets the first requirement, as thesame network can approximate varying non-linear functions withdifferent sets of weights. Neural networks can also provide benefitswith batch evaluation. For example crowd characters, which canoften reuse the same nonlinear deformation with different scalingfactors, can be batch evaluated if driven by a neural network. Muchof the work in this area – on moving from the typical rig deformer“stack” to a neural representation – has focused on run-time perfor-mance e.g. [Bailey et al. 2018].In contrast, our work directly addresses the importance of ac-curacy as experienced in the film production environment. In thispaper we introduce a new learning-based solution to accuratelycapture facial deformation for characters using differential coordi-nates and a network architecture designed for that space. Similarto other work, we assume that the deformation has both a linearand a nonlinear component that can be separated. The linear defor-mation is not the focus of this paper since its contribution to facialdeformation is limited and many linear skinning solutions havebeen proposed [Kavan et al. 2008; Kavan and Žára 2005]. Instead, wefocus on learning the nonlinear component, which applies equallywell to both face and body rig approximation, as we show in ourresults.At run-time our method takes as input animation controls definedas a set of artist-level rig parameters, and computes the deforma-tion as vertex displacements from the rest pose. During the offlinetraining process, we use vectorized features generated from rig pa-rameters, and labels are differential coordinates calculated from thelocalized nonlinear deformation of the original rig. The differentialcoordinates have the advantages of a sparse mesh representationand embedded neighbor vertex information, which contribute to thelearning of local surface deformation. However, the transformationbetween coordinates is ill-conditioned and non-invertible, and sowe introduce a separate subspace to improve the conditioning ofthe reconstruction. This subspace is determined by artist-specified“anchor points”, selected from the original mesh at features that are significant to the character’s expressive ability. Our method con-ducts separate subspace training to learn how these anchor pointsdeform using a split network structure.We qualitatively and quantitatively evaluate our method on mul-tiple production-quality facial rigs. Experimental results show ourmethod can predict accurate facial deformation with minimal visualdifference from the ground truth. We show our method extendsto body deformation where it compares favorably with existingsolutions. Additionally, we show how using anchor points improvesthe reconstruction by reducing the low frequency error introducedin the differential training.

Skinning techniques can be roughly divided into physics-based [Kimet al. 2017; Si et al. 2014], example-based [Loper et al. 2015; Mukaiand Kuriyama 2016], and geometry-based methods. We focus hereon geometry-based solutions due to their computational efficiencyand simplicity. One of the most widely used techniques is linearblend skinning (LBS) [Magnenat-Thalmann et al. 1988], where aweighted sum of the skeleton’s bone transformations is appliedto each vertex. Advances in this technique include dual quater-nion skinning (DQS) [Kavan et al. 2008], spherical blend skinning[Kavan and Žára 2005] and optimized centers of rotation skinning[Le and Hodgins 2016]. Although these methods are computation-ally efficient for computing linear deformation, they do not handlenonlinear behaviors such as muscle bulging and twisting effects.Improving on this, Merry et al. [2006] and Wang et al. [2002] in-troduce more degrees of freedom for each bone transformationthrough additional skin weights, which can be acquired by fittingexample poses. Other approaches designed to address these issuesinclude pose space deformation [Lewis et al. 2000; Sloan et al. 2001],cage deformation [Jacobson et al. 2011; Joshi et al. 2007; Ju et al.2005; Lipman et al. 2008], joint-based deformers [Kavan and Sorkine2012], delta mush [Le and Lewis 2019; Mancewicz et al. 2014] andvirtual/helper joints methods [Kavan et al. 2009; Mukai 2015; Mukaiand Kuriyama 2016]. Wang et al. [Wang et al. 2007] introduce arotational regression model to capture nonlinear skinning deforma-tion, which optimizes the deformation of all vertices simultaneouslyusing the Laplace equation. An iterative optimization [Sorkine andAlexa 2007] is proposed to approximate nonlinear deformation byalternating surface smoothing and local deformation. All of thesemethods require additional computational cost for nonlinear com-ponents and are primarily focused on body deformation, leavingfacial deformation largely unaddressed.

In contrast to body rigs that are defined by bones and joints, facialrigs often include hundreds of animation controls represented by nu-merical values which control the nonlinear transformation of eachvertex. These animation controls are globally defined and widelyused in blendshapes [Lewis et al. 2014; Lewis and Anjyo 2010] toachieve realistic facial animation for production. Prior work focusedon editing data-driven facial animation [Deng et al. 2006; Joshi et al.2006] or providing intuitive control [Lau et al. 2009; Lewis and Anjyo

ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. ccurate Face Rig Approximation with Deep Differential Subspace Reconstruction • 34:3

There has been increasing interest in using learning-based solu-tions to replace traditional deformation algorithms. Previous worksuch as [Lewis et al. 2000] utilize a support vector machine to learnmesh deformation given a set of poses. [Tan et al. 2018a,b] pro-pose mesh-based autoencoders to learn deformation from a latentspace. Based on their work, [Gao et al. 2018] put forward a solutionto transfer shape deformation between characters with differenttopologies using a generative adversarial network. Luo et al. [2018]propose a deep neural network solution to approximate nonlinearelastic deformation, combining this with simulated linear elasticdeformation to achieve better results. Liu et al.[2019] use graphconvolutional networks to predict the skin weight distribution foreach vertex, resulting in a trained network that can be applied todifferent characters given their mesh data and rigs. Relevant to ourwork is [Bailey et al. 2018], where multiple neural networks areused to approximate the rig’s nonlinear deformation componentsunder the assumption that each vertex is associated with a singlebone. For each bone, they train a network to predict the offset ofeach associated vertex. Three unaddressed issues that motivate ourwork are: (1) the deformation of a vertex is often influenced by mul-tiple bones, with no single bone as the prominent influence, (2) thedeformation can be determined by numeric controls (as in face rigs)and (3) associating bones with disjoint sets of vertices can introducediscontinuities at set boundaries.

Subspace model reduction techniques are commonly used to solvenonlinear deformation in real-time applications. Instead of evaluat-ing the complete mesh, subspace models compute the deformationof a low dimensional embedding on the fly and project it back to theentire space. Subspace deformation was originally used in early sim-ulation work [Pentland and Williams 1989], which uses a subspacespanned by the low-frequency linear vibration modes to representthe deformation. To augment the linear model and handle non-linearities, Krysl et al. [2001] propose the empirical eigenvectorssubspaces using principal component analysis (PCA) for finite ele-ment models. Summer et al. [2007] use graph structure to representdeformations as a collection of affine transformations for shapemanipulation. An et al. [2008] introduces subspace forces and Jaco-bians associated with subspace deformations for simulation. Barbičet al. [2005] observe that the reduced internal forces with linearmaterials are cubic polynomials in reduced coordinates, which couldbe precomputed for efficient implicit Newmark subspace integra-tion. For deformation-related model reduction, Barbič et al. [2012]propose a method for interactive editing and design of deformable object animations by minimizing the force residual objective. Wanget al. [2015] design linear deformation subspaces by minimizing aquadratic deformation energy to efficiently unify linear blend skin-ning and generalized barycentric coordinates. Building on theseworks, a recent hyper-reduced scheme [Brandt et al. 2018] uses twosubspaces to achieve real-time simulation, one for constraint projec-tions in the preprocessing stage and the other for vertex positions inreal-time. Close to our work is Meyer et al. [2007], who propose theKey-Point Subspace Acceleration (KPSA) and caching to acceleratethe posing of deformable facial models. The idea of using key pointsfor reconstruction is analogous to the anchor points in our case.However, their method, like other subspace techniques, relies onhigh quality animation prior examples to compute the embeddingof the subspace.Compared with previous work, the advantages of our methodare: (1) it can reconstruct both face and body deformation with highaccuracy, (2) it can take different types of animation controls as input,(3) it does not require a particular set of well-posed animation priorsand (4) it provides a simple universal structure for cross-platformreal-time evaluation.For the rest of the paper, we first review the preliminaries ofdifferential coordinates in Section 3.1. We then describe our train-ing pipeline (Section 3.2), including the vectorization from inputanimation controls, the acquisition of nonlinear deformation fromexisting poses, network structures and reconstruction. We intro-duce the implementation details in Section 3.3, and we describe ourexperiments, evaluate the training results, compare with existingsolutions in Section 4. Finally, Section 5 discusses limitations andfuture work.

Our model approximates the nonlinear deformation in a characterrig. The linear deformation can be simply represented with linearblend skinning, so it’s not our focus here. For a given mesh in restpose, our model takes animation controls defined by a set of rigparameters as inputs, and outputs the non-linear deformation ofthe mesh. Fig. 2 shows our training pipeline. To process the trainingdata, we first vectorize the input rig parameters and extract thenonlinear deformation represented by vertex displacement fromthe corresponding deformed mesh. Then we convert the nonlineardeformation into differential coordinates ( δ space), where we learnlocalized shape information and map the rig controls to it. However,we cannot directly reconstruct the mesh surface from differentialcoordinates since the transformation is ill-conditioned. We conducta separate subspace learning on a group of anchor points selectedfrom the original mesh, for which we learn deformation in localcoordinates and use them as constraints for reconstruction. Let M ∈ { V , E } be a mesh with n vertices, V ∈ R n × . Each vertex v i ∈ V is represented using absolute Cartesian coordinates and E represents the set of edges. The Laplacian operator L is defined[Sorkine 2005] as: L = I − D − A (1) ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020.

Fig. 2. Our method takes rig parameters and the corresponding joint transforms as input and predicts the nonlinear deformation of the mesh vertices (indifferential coordinates) and the set of anchor points (in cartesian space). Green pathways are for network training, blue pathways for prediction. where A is a (0, 1) adjacent matrix of size n × n that indicates theconnectivity of vertex pairs in the mesh with A ij = ( i , j ) ∈ E . D is a diagonal matrix of size n × n representing the degree d i of eachvertex. Applying the Laplacian operator L to the vertices transformsthe mesh into delta space, where each vertex v i is represented as δ i .The differential coordinate of each vertex represents the differencebetween the vertex itself and the center of mass of its immediateneighbors ( A i denotes the neighborhood set of vertex v i ∈ V ): LV = δ v i − d i (cid:213) j ∈ A i v j = δ i (2)It’s more convenient to use the symmetrical version of L , denotedby L s = DL = D − A , giving: L s V = Dδ (3)Compared to the Cartesian coordinates, where only the spatiallocation of each vertex is provided, the differential coordinates carryinformation about the local shape of the surface and the orienta-tion of local details. It preserves local surface detail and capturesthe irregular shape of the surface. Transferring mesh deformationdata into differential space leads to a sparse representation, whichalso contributes to the learning process. Intuitively, if a surfacepatch is deformed uniformly, the differential representation of thedeformation will have zero values for all vertices except for theboundaries.Given the Laplacian operator and differential coordinates, wenow consider how to reconstruct mesh surface. Note the matrix L s is singular and has a non-trivial zero eigenvector because thesum of all its rows is 0. Therefore, we cannot directly invert thematrix for reconstruction, but can add constraints to the matrix tomake it full rank. We introduce the subspace P , which is constructedby a set of anchor points from V . The dimension of the subspaceis much smaller than the original mesh. The index matrix of theanchor points I ( P ) is appended at the end of the Laplacian matrix L s .Correspondingly, we append the Cartesian coordinates of anchorpoints V ( P ) to the differential coordinates of the full mesh to makeit solvable: (cid:101) LV = (cid:18) L s ω I ( P ) (cid:19) V = (cid:18) DδωV ( P ) (cid:19) = (cid:101) δ (4) (cid:101) L is the full-rank matrix with anchor points appended to theoriginal Laplacian matrix. ω is the weight matrix for the anchorpoints, which can be used to stress the importance of each anchorpoints. Given the full rank matrix (cid:101) L and (cid:101) δ , we can solve the followingequation: ( (cid:101) L T (cid:101) L ) V = (cid:101) L T (cid:101) δ (5)Applying the Laplacian operator to a mesh is analogous to ob-taining the second spatial derivatives. The eigenvectors of L arecosine basis functions of the Fourier transform, and the associatedeigenvalues are squares of the frequencies [Zhang et al. 2010]. Wedemonstrate that for a small error ϵ introduced in differential co-ordinates, the high frequency component of ϵ is dampened when ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. ccurate Face Rig Approximation with Deep Differential Subspace Reconstruction • 34:5 converted back to Cartesian space. This leads to a smoother distribu-tion of the error, which is much less noticeable in the reconstructedsurface.Since L s is symmetric positive semi-definite, it has an orthogonaleigenbasis E = { e , e , ... e n } , with corresponding eigenvalues 0 < λ ≤ λ ≤ λ ≤ ... ≤ λ n . (For this analysis, we assume L s isnon-singular by adding one anchor) L s V ′ = D ( δ + ϵ ) V ′ = L − s D ( δ + ϵ ) V ′ = V + L − s Dϵ (6)We denote Dϵ as ϵ ′ and decompose it in basis Eϵ ′ = c e + c e + ... c n e n (7)Notice that L − s shares the same eigenvectors and its correspond-ing eigenvalues are inversed. We have L s − ϵ ′ = λ c e + λ c e + ... λ n c n e n (8)Since λ is small and λ n is large, the inverse of the eigenval-ues amplifies the low frequency eigenvector e and dampens thehigh frequency one e n . In this way, the high-frequency errors inthe differential coordinates are reduced. This is desirable for meshdeformation as localized high frequency errors are much more no-ticeable. To reduce the amplification of low-frequency error, weincrease the number of anchor points, which improves the condi-tioning of the Laplacian matrix by increasing the smallest singularvalue. Therefore, we can decrease both the low and high-frequencyerrors when the mesh surface is reconstructed. The rig parameters cannot be directly usedfor training because they are in different representations and scales.Therefore, we need to first create feature vectors from the given rigparameters. Without loss of generality, we assume that facial rigsinclude joint controls J and numerical controls C . For the joint con-trols, we use the transformation matrix M J i = [ X J i , t J i ] of each joint J i as input, where X i ∈ R × is the rotation/scale matrix and t i ∈ R is the normalized translation value. We vectorize and concatenateall the joint controls so that we have J = { J , ... J i , ... J j } , J i ∈ R .For the numerical controls, we define the input features as theconcatenation of the normalized numerical value of each attribute, C = { C , ... C i , ... C c } , C i ∈ R , where C i represents each controlattribute. Then we concatenate all the joint and numerical controlsas our input feature F , whose dimension is 12 j + c . We normalize allthe translation values together, but every single numerical controlattribute is normalized independently since they are on differentscales. F = Concat (|| ji = J i , || ci = C i ) (9)To generate the training data, we randomly and independentlysample each rig control using truncated Gaussian distribution withina set range. The range of each control is defined so that it reasonablycovers the possible range of animation, similar to the method used Fig. 3. An example for rig controls and vectorization. Only joint controlsare shown on the character. by [Bailey et al. 2018]. We do not limit our training data to well-animated poses because (1) they require human labor and thus areexpensive to generate, and (2) using randomly generated poses cancover a large range of motion and more dynamic deformations,which can improve the generalization of our model.

We use the nonlinear deformationas our training labels, which can be computed from the deformedmesh. We assume a mesh in rest pose V and its deformation (cid:101) V isdefined by a set of rig parameters. We also assume (cid:101) V and V maintainthe same topology. The vertex v i ∈ V and (cid:101) v i ∈ (cid:101) V are defined inlocal Cartesian coordinates. We have the following equation: (cid:101) v i = T i ( v i + v i , nl ) (10)where v i , nl is the vertex displacement in local space caused bythe nonlinear deformation. T i is the linear transformation for vertex v i which can be computed from the transformation matrix of thejoint controls. T i = J ( v i ) (cid:213) k = ω k M J k ( M o J k ) − (11) J ( v i ) represents the joint controls that have influence on thevertex v i . M J k denotes the transformation matrix for joint J k and M o J k is its transformation matrix at rest pose. ω k is the weight forthe joint. We assume the rig as a black box, so we don’t have M o J k and ω k available. For general purposes, we use an implicit methodto calculate T i . Given equation 10, we perturb v i , nl by movingone unit for every direction along XYZ coordinates and observethe vertex displacement produced by the rig. Then we can use thevertex displacement to calculate T i . With the following equations: (cid:101) v ′ i = T i v i (cid:101) v i , x = T i ( v i + ( , , , ) T ) (cid:101) v i , y = T i ( v i + ( , , , ) T ) (cid:101) v i , z = T i ( v i + ( , , , ) T ) (cid:101) v null = T i ( , , , ) T (12)By subtracting the first equation from the following ones, wehave: T i = ( (cid:101) v i , x − (cid:101) v ′ i , (cid:101) v i , y − (cid:101) v ′ i , (cid:101) v i , z − (cid:101) v ′ i , (cid:101) v null ) (13) ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. T i can be substituted into equation 10 to calculate the nonlineardeformation with given rig input: v i , nl = T − i (cid:101) v i − v i (14)Our goal is to learn the nonlinear deformation from given rigparameters by minimizing the per-vertex distance between ourresults and the ground truth. The differential network takes the vec-torized features as input and outputs the vertex displacement corre-sponding to the nonlinear deformation in differential coordinates.This network has 5 fully connected layers with 2048 units, eachfollowed by a Relu activation layer. Similar to [Bailey et al. 2018;Laine et al. 2017], we apply PCA at the end of the network by multi-plying the projection matrix with the output. We precompute theprojection matrix on the entire training set. The training data can beconstructed as a matrix M ∈ R | V |× m where | V | is the vertex countand m is the dimension for all training poses. The purpose of PCAis to project the network output back to a lower dimension whichhelps the network converge. We determine the number of principalcomponents as a fixed percentage of the number of mesh vertices,which is simple to implement in practice (we evaluate the influenceof different percentage on training in Section 4.1). Alternatively thePC number can be selected by choosing the most significant basisvectors such that the reprojection error of the training set is belowa defined threshold.For the loss function, a simple choice would be the regressionloss such as the Euclidean distance between the predicted vertexdisplacement and the ground truth. However, it is known that an L2loss function tends to blur the prediction results [Isola et al. 2017; Liuet al. 2019]. The mesh deformation for character animation is smoothand continuous, which implies the differential representation hassmall values. Our training data is generated by random samplingthe rig parameters, but this also means the training data containsoutliers that would never appear in real animation and which appearin delta space as large values. L2 loss is more sensitive to outliers dueto the consideration of the squared differences. In our case, L2 losstends to adjust the network to fit and minimize those outlier vertices,which leads to higher errors for other vertices. On the other hand,using L1 loss reduces the influence of outliers and produces betterresult. Therefore we use the L1 loss for the differential network. The subspace network takes the vector-ized features as input and outputs the nonlinear deformation ofselected anchor points in local Cartesian coordinates for reconstruc-tion. Previously, Chen et al. [2005] and Sorkine et al. [2005] usegreedy heuristic methods to select anchor points. They treat all thevertices in the mesh equally and iteratively select the vertex basedon the largest geodesic distance between the approximated shapeand the original mesh. However, these algorithms do not fit in oursituation because of the different contributions of vertices to thefacial animation. We pay more attention to the important facialfeatures, such as eyes and mouth, rather than nose, ears or the scalp.In general face rigs define the controls on those areas to constrainthe deformation. Therefore, we use the rig as reference to selectanchor points and make sure that they are well-distributed and proportional to the density of the rig controls. Based on our obser-vation, the training performance and reconstruction results do notdepend on the specific anchor point selection as long as the majordeformable facial features are covered. We also note that the numberof anchor points contributes to the accuracy of reconstruction; weevaluate that in Section 4.2.The subspace network consists of a set of mini-networks, eachof which corresponds to a single anchor point and outputs its de-formation in R . For the input of each mini-network, we perform adimension reduction technique similar to that used in [Bailey et al.2018], where each network takes as input a subset of the vectorizedfeatures corresponding to the rig controls that deform the anchorpoint. However, the difference between our method and Bailey et al.is that we perform the split training on the anchor points instead ofthe entire mesh, and so we avoid the discontinuity issue. We applythis technique because only a small subset of all rig controls influ-ence a certain anchor point. We collect the related rig controls byperturbing all the controls individually and recording which anchorproduces deformation. This process is repeated with 100 randomexample poses and with large perturbations to ensure that controlsaffecting the anchor are identified. Each mini-network includes 3fully connected layers with 64 units, each followed by a Relu acti-vation layer. For the loss function, we use L2 loss for the networkas the subnetwork is trained on Cartesian coordinates, which don’tencode mesh information in a way that accentuates outliers. We usemultiple mini-networks instead of a single network because there isno direct spatial relationship between the anchor points and thereis low correlation between their deformation. In practice, we foundthis structure has better training performance compared with thesingle network due to the reduced dimension. We perform reconstruction using the full-rank Laplacian matrix (cid:101) L , which is constructed by appending theindices of anchor points at the end of the original Laplacian matrix L .Notice (cid:101) L does not vary with input rig parameters and only dependson the selected anchor points. According to equation 5, we can applyCholesky factorization on (cid:101) L T (cid:101) L to get the upper-triangular sparsematrix R : (cid:101) L T (cid:101) L = R T R (15)We only need to compute the factorization once with only themesh topology information and the matrix R can be reused when-ever rig parameters change. Now we can easily solve the equation4 and reconstruct the mesh surface using back substitution. Weconcatenate the results from the differential and subspace networkto get (cid:101) δ and use it in the following equation: R T RV nl = (cid:101) L T (cid:101) δ (16)Since R is a triangular matrix, we can efficiently reconstruct thenonlinear deformation V nl with back substitution, which makesit possible to run the reconstruction at an interactive speed withfrequently updated results from the networks.We use uniform Laplacian instead of the cotangent Laplacian be-cause the latter changes as the mesh deforms, requiring expensive ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. ccurate Face Rig Approximation with Deep Differential Subspace Reconstruction • 34:7

Table 1. Statistics for the three test models.

Agent Bull MatadorVertices 4403 3669 3211Face Height (cm) 25.12 84.28 26.03Face Width (cm) 21.27 67.00 20.45Numerical Controls 67 131 121Joint Controls 20 20 20Anchor 87 73 64Differential PC 220 183 160recomputation for every pose. With uniform Laplacian the factor-ization only needs to happen once, and the reconstruction is donewith 2 back substitutions, which are very fast.

For both the differential and subspace networks, we set the batchsize as 128 and choose a SGD solver for optimization with the ini-tial learning rate as 0.1 and the learning rate decay as 10 − (SGDoutperforms Adam in our case). We train 10000 epochs for boththe network, which takes 3.5 hours for the differential network onan NVIDIA GeForce GTX 2080 GPU, and less than 1 hour for thesubspace network. We use three production face rigs for experiments and evaluation(see Table 1). For each rig, we take a truncated normal samplingof the rig parameters to generate 10000 random poses: 9800 fortraining and 200 for testing (Fig. 4). The test poses are separatedfrom the training data to avoid bias. The rig parameters of the testposes are fed into the trained network to produce the reconstructeddeformation (Fig. 2). To evaluate the training performance we usetwo metrics: the MSE of the prediction error and the reconstructionerror ( cm ). The MSE of the prediction error measures the differencebetween the ground truth and network output, while the reconstruc-tion error measures the per-vertex absolute distance between thesurface reconstruction and ground truth deformation. We evaluatethe mean and maximum reconstruction errors calculated from thevertices among all test poses. The maximum error is a critical valueto consider as a large localized error will render the animation poseunacceptable, regardless of the MSE.Because face rigs precisely control the eyelid, eyebrow, and mouthbehavior, and because these are the primary cues for expression,having high accuracy here is paramount. A slight difference in eyelidposition changes the relative position of the pupil, which can changethe audience perception of the pose from “scheming” to “sleepy”,while a similar change in the lip position can go from “slight smile” -with the teeth slightly exposed - to “sneer”, making any method thatcould not accurately differentiate between these poses unacceptable. We first evaluate how varying the number of principle components(PC) influences the differential training. We specify the PC numberas a varying percentage of the mesh vertex count. Fig. 5 showsthe prediction error for three characters over 200 test poses. It is

Table 2. Prediction error (differential) and reconstruction error (mean andmaximum) of differential training with varying number of hidden layersand fixed subspace training.

Layers 1 2 3 4 5Differential 2 . × − . × − . × − . × − . × − Mean error 0.0240 0.0197 0.0189 0.0187 0.0182Max error 0.700 0.633 0.667 0.664 0.541 interesting to note that their MSE is minimized as the PC percent-age approaches 5%, regardless of the different number of verticesin their meshes. This suggests the optimal PC number is roughlyproportional to the mesh vertex count. Further increasing the PCpercentage does not lead to significant performance improvement,but instead makes the network vulnerable to overfitting, shown bythe slight increasing of the loss. Based on these observations, weset the PC number as 5% of the mesh vertex count for differentialtraining for the rest of our evaluation.We use an ablation study to evaluate the influence of the hiddenlayer numbers, varying the number of fully connected layers from1 to 5 while fixing the subspace network and anchor points. Theprediction and reconstruction error for character

Agent for each ofthese is shown in Table 2. As shown, the prediction error decreaseswhen the number of hidden layers increases, suggesting the improve-ment of network capacity for fitting. Also observable is the decreaseof the reconstruction error, but it is less significant compared withthe reduction of prediction error, suggesting that the accuracy ofdifferential training is not the bottleneck for reconstruction.

We use character

Agent to evaluate how the anchor points andsubspace network influence the deformation approximation, con-sidering different number of anchor points, selection methods andsubspace network structures. For experiment purpose, we fix the dif-ferential training (4403 mesh vertices with 220PC) and only changethe subspace network. We specify the number of anchor points as1%, 2% and 5% of the mesh vertex count, similar to our evaluation ofPCA for differential training. We report both the prediction and re-construction error in Table 3. Notice we increase the percentage byadding new anchor points into the existing ones instead of selectinga new group. To compare the network structure, we conduct thesubspace training using a single network instead of the subspacemini-networks (“2%Single”). The single network takes the entirevectorized features as input and outputs the deformation of all an-chor points together. To compare different anchor point selectionmethods, we use a new group of anchor points around the scalpwith less significant deformation (“2% Scalp”). Notice the originalgroup of anchor points are selected on the face to cover major facialfeatures with large deformation, as discussed in Section 3.2.As observed, increasing the number of anchor points leads tohigher prediction error since the network performs better fittingwhen the dimension is low. However, the reconstruction error staysroughly the same when the number of anchor point gets larger,because increasing the number of anchor points can improve theLaplacian matrix condition for reconstruction, which balances theincrease of prediction error. We use 2% anchor points as a middlepoint for our implementation and the rest of the evaluation.

ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020.

Fig. 4. Training poses for production characters

Agent (top),

Matador (middle) and

Bull (bottom). The poses are generated from a broad sampling of the rigparameter space. Although many look implausible, they are necessary to capture the full space accurately without assumptions on the artist’s control range.Fig. 5. Prediction error of differential network with varying PC percentage

For the network comparison, both the prediction and reconstruc-tion error of the subspace mini-networks (“2%”) are lower than thesingle network (“2%Single”). We believe the dimension reduction isthe reason for resulting performance improvement. The subspacemini-networks fit anchor points separately because they are discon-nected and do not have direct spatial relationship, which enablesbetter approximation. The single network, on the contrary, tries tolearn the deformation of anchor points all at once, increasing thedifficulty of fitting.For different anchor point selection, we find using vertices withless deformation can cause larger reconstruction error even whenthe prediction error is smaller. The network has better performancebecause no deformation needs to be learned for those vertices, butthey are not ideal for the reconstruction. Fig. 6 shows an example.As we can see, the deformation on mouth and eyelids are shiftedwhen vertices on the scalp are selected as anchor points. Ideally,we want the anchor points to “nail” the deformed mesh in placeand prevent large shifts or rotations for important face regions.Therefore, we select anchor points to cover major facial featureswith large deformation.

In this section, we evaluate the accuracy of deformation reconstruc-tion using well-animated poses from production. We evaluate themean and max reconstruction errors over a series well-animated

Fig. 6. Comparison of our method using anchors (in yellow) selected fromthe major facial features (left) vs. the less deformed scalp (right).Table 3. Prediction error (subspace) and reconstruction error (mean andmax) of the subspace training with varying anchor percentage with fixeddifferential training. . × − . × − . × − . × − . × − Mean error 0.0207 production sequences, where the deformations are much more ex-aggerated and dynamic. We present the quantitative results in Table5. The deformations of character bull are observed with larger er-rors because we test it on the most extreme animation sequence.Fig. 7 shows an example for the character and please refer to thesupplemental video for detailed comparison. In general, our methodcan accurately reconstruct mesh surface with mean errors smaller

ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. ccurate Face Rig Approximation with Deep Differential Subspace Reconstruction • 34:9

Fig. 7. Side-by-side comparison of ground truth (left), our approximation(center), and heatmap indicating per-vertex distance error in cm (right).Table 4. Prediction errors (Differential and Subspace) and reconstructionerrors (Mean and Max) for the tests with different training data size.

25% 50% 75% 100%Differential 2 . × − . × − . × − . × − Subspace 3 . × − . × − . × − . × − Mean error 0.0301 0.0246 0.0195 0.0186Max error 0.891 0.819 0.740 0.517

Table 5. Mean and max reconstruction absolute errors evaluated on thewell-animated production sequences, and as a percentage of face height.

Agent Bull MatadorMean error 0.032 0.512 0.087Percentage 0.127% 0.607% 0.334%Max error 0.630 4.682 0.782Percentage 2.50% 5.55% 3.00%Number of Poses 808 249 359than 0.6% and max error smaller than 6% of the size of the characterfaces.As a data-driven solution, the accuracy of our model largely relieson sufficient training data. To evaluate how the training size influ-ence the performance, we alternatively reduce the size for character

Agent to be 25%, 50% and 75% of the original dataset while keepingthe test data unchanged (200 randomly generated poses). We presentboth the prediction errors for the differential and subspace trainingand reconstruction errors in Table 4. Indeed, the increasing of train-ing data will boost the performance. However, the improvement isnot very significant when increasing the size over 75%.

We first compare the accuracy of facial deformation approximationwith previous methods. Then we apply our method to body rigs andcompare the results with Bailey et al. [2018].

We compare our methodwith linear blend skinning (LBS), PCA with linear regression (PCA),local Cartesian coordinates training using our model (Local) andMeyer et al. [2007] (KPSA). KPSA is an example-based deformation

Table 6. Mean and max reconstruction errors using our method comparedwith Linear Blend Skinning (LBS), PCA with linear regression, our modelusing local offset for training (Local) and Meyer et al. [2007] (KPSA). Thecomparison is shown for a set of test poses from a well-animated productionsequence.

Agent Bull MatadorMean Max Mean Max Mean MaxLBS 0.174 3.228 1.672 23.56 0.228 4.261PCA 0.073 1.980 0.848 8.367 0.158 1.533Local 0.072 0.689 0.521 5.779 0.155 1.106KPSA 0.061 1.623 2.115 34.25 0.089 1.664Ours 0.032 0.630 0.512 4.682 0.087 0.782 approximation method, which uses the deformation of key pointsas input to PCA to derive vertex positions for the entire mesh. Thequality of the training data significantly influences the accuracy ofthe deformation, and their method relies on evaluating the originaldeformer stack to determine the key points on the fly. For the Localmodel, we apply the same differential network with PCA directlyon the vertex local offsets without converting them into differentialcoordinates. No subspace learning and reconstruction is requiredfor this model. We use it to compare the differential training andevaluate the contribution of mesh representation. We use the sameset of randomly-generated training poses as used by our model totrain both KPSA and the Local model, and we perform evaluation onthe same well-animated sequences introduced in the last subsection.We report the reconstruction error in Table 6 and provide visualcomparison in Fig. 8. As observed, our method outperforms theother four methods in both quantitative and visualized results. Weuse the result of LBS as a base-line as it does not provide any non-linear deformation. From the heat map, we can see that the Localmodel fails to capture the local deformation on the eyelids and themouth is shifted. This is because no neighbor vertex information isembedded in the local offset, which makes it difficult for the networkto predict the local deformation. For KPSA, it fails to reconstruct thedeformation in the eyebrow region and the corner of the lips, eventhough with a substantial increase in the number of key points (274)and basis vectors (200) used in the original example. The relativelypoor performance is caused by the linear reconstruction of trainingdata, which could only provide a limited range and a fixed dimen-sion for the approximated deformation. Once the target pose is outof the dimension defined by the PCA, it is difficult for that method toachieve high reconstruction accuracy. Additionally, the key pointsstill need to be driven by the original rig. In comparison, our methodcan accurately capture the local deformation because of the errorcharacteristics of the differential coordinates. Due to the nonlinearfitting capability of deep neural networks, our method can use ran-domly generated data for training and approximate deformationwith a much larger range.

We demonstrate our methodapplied to body deformation approximation and compare our resultswith Bailey et al. [2018]. We use character

Agent as the example forcomparison. The character’s height is 200.25 cm. The body contains4908 vertices and the rig includes 107 joints controls with hand

ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020.

Ground Truth LBS PCA Local KPSA OursFig. 8. Comparisons for facial deformation between ground truth, Linear Blend Skinning (LBS), PCA with linear regression, local offset training, KPSA, andour method using a well-animated pose from production. joints excluded. We use the same training method and networkstructures mentioned in Section 3.2 and generate random posesfor body rig as training data. Since the body rig does not includenumerical controls, we remove them from input and only vectorizethe joint controls. We use 245 PCs for the differential training andselect 118 anchor points that are well-distributed around all thejoints of the body. For Bailey et al. [2018], denoted as FDDA, wefollow their methods to train multiple small networks (2 hiddenlayers with 128 units), each of which corresponds to a joint controland predicts the nonlinear deformation of the neighbor vertices inlocal coordinates. We generate 9800 random poses using the methoddescribed in Section 3.3 as training data and perform the evaluationusing 189 poses from a well-animated production sequence for allthree models. Mean MaxOurs 0.217 4.17FDDA 0.263 6.41We report the mean andmax reconstruction error inthe inline table and we showdeformation results in Fig 9.The results indicate that our method outperforms the FDDA method,especially for the maximum error. Using multiple networks for de-formation approximation, FDDA suffers from discontinuity problemon torso and left arm. We can observe high errors on the connectingparts of the body since the vertices from the two parts are predictedby different networks. The discontinuity is caused by the slightchange of joint scales in the evaluation sequence, which does notshow up in the training data. Due to the local joint input and small-scale network, FDDA suffers from overfitting to the training dataand is sensitive to new values. Our method uses a deeper networkwith a much larger input size, which increases the capacity andmakes the network less sensitive to the unseen scaling change of acouple joints. Since our method also uses small networks for sub-space training, there might be some anchors that are affected by thescaling. But due to the least square reconstruction, the local error isnicely distributed as low frequency error and is much less noticeable. Increasing the network size for FDDA may improve the overall per-formance, however evaluating a large number of deeper networks(40 in our case) would cause significant performance downgrade.Fig. 10 shows the error distribution of each model. As observed,the error distribution of our model is compressed to the lower rangewhile the distribution of FDDA extends to large errors. Althoughthe two methods have similar mean error, this observation suggeststhat our method can provide smooth approximation results withsmaller maximum errors, and avoids inappropriate deformation.

In this paper we have presented a learning-based solution to cap-ture facial deformation for rigs with high accuracy. Our methoduses differential coordinates and a learned subspace to reconstructsmooth nonlinear facial deformation. We have demonstrated therobustness of our method on a wide range of animated poses. Ourmethod compares favorably with existing solutions for both facialand body deformation. We also have successfully integrated oursolution into the production pipeline.Our work has limitations that we wish to investigate in the fu-ture. First, our method needs manually-selected anchor points forsubspace training and reconstruction. It would be interesting toinvestigate methods for inferring the anchor points based on thecharacteristics of the facial mesh and training poses. Second, as adeep learning based approach, a model must to be trained for everycharacter with different rig behavior or mesh topology. We wouldlike to explore the possibility of integrating a high level super-riginto our method to provide a single model for different characters.

REFERENCES

Steven S An, Theodore Kim, and Doug L James. 2008. Optimizing cubature for efficientintegration of subspace deformations. In

ACM transactions on graphics (TOG) , Vol. 27.ACM, 165.Stephen W Bailey, Dave Otte, Paul Dilorenzo, and James F O’Brien. 2018. Fast and deepdeformation approximations.

ACM Transactions on Graphics (TOG)

37, 4 (2018), 119.Jernej Barbič and Doug L James. 2005. Real-time subspace integration for St. Venant-Kirchhoff deformable models.

ACM transactions on graphics (TOG)

24, 3 (2005),982–990.ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020. ccurate Face Rig Approximation with Deep Differential Subspace Reconstruction • 34:11

Ground Truth FDDA OursFig. 9. Comparisons for body deformation between the ground truth, Bailey et al. [2018] (FDDA) and our method using well-animated poses.

Jernej Barbič, Funshing Sin, and Eitan Grinspun. 2012. Interactive editing of deformablesimulations.

ACM Transactions on Graphics (TOG)

31, 4 (2012), 70.Christopher Brandt, Elmar Eisemann, and Klaus Hildebrandt. 2018. Hyper-reducedprojective dynamics.

ACM Transactions on Graphics (TOG)

37, 4 (2018), 80.Doron Chen, Daniel Cohen-Or, Olga Sorkine, and Sivan Toledo. 2005. Algebraic analysisof high-pass quantization.

ACM Transactions on Graphics (TOG)

24, 4 (2005), 1259–1282.Matthew Cong, Kiran S Bhat, and Ronald Fedkiw. 2016. Art-directed muscle simulationfor high-end facial animation. In

Symposium on Computer Animation . 119–127.Zhigang Deng, Pei-Ying Chiang, Pamela Fox, and Ulrich Neumann. 2006. Animatingblendshape faces by cross-mapping motion capture data. In

Proceedings of the 2006symposium on Interactive 3D graphics and games . ACM, 43–48.Lin Gao, Jie Yang, Yi-Ling Qiao, Yu-Kun Lai, Paul L Rosin, Weiwei Xu, and ShihongXia. 2018. Automatic unpaired shape deformation transfer. In

SIGGRAPH Asia 2018 Technical Papers . ACM, 237.Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-imagetranslation with conditional adversarial networks. In

Proceedings of the IEEE confer-ence on computer vision and pattern recognition . 1125–1134.Alec Jacobson, Ilya Baran, Jovan Popovic, and Olga Sorkine. 2011. Bounded biharmonicweights for real-time deformation.

ACM Trans. Graph.

30, 4 (2011), 78.Pushkar Joshi, Mark Meyer, Tony DeRose, Brian Green, and Tom Sanocki. 2007. Har-monic coordinates for character articulation.

ACM Transactions on Graphics (TOG)

26, 3 (2007), 71.Pushkar Joshi, Wen C Tien, Mathieu Desbrun, and Frédéric Pighin. 2006. Learningcontrols for blend shape based realistic facial animation. In

ACM Siggraph 2006Courses . ACM, 17.Tao Ju, Scott Schaefer, and Joe Warren. 2005. Mean value coordinates for closedtriangular meshes.

ACM Transactions on Graphics (TOG)

24, 3 (2005), 561–566.ACM Trans. Graph., Vol. 39, No. 4, Article 34. Publication date: July 2020.

FDDA OursFig. 10. Comparison of error distribution for body deformation using well-animated pose from production.

Ladislav Kavan, Steven Collins, and Carol O’Sullivan. 2009. Automatic linearization ofnonlinear skinning. In

Proceedings of the 2009 symposium on Interactive 3D graphicsand games . ACM, 49–56.Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O’Sullivan. 2008. Geometric skinningwith approximate dual quaternion blending.

ACM Transactions on Graphics (TOG)

27, 4 (2008), 105.Ladislav Kavan and Olga Sorkine. 2012. Elasticity-inspired deformers for characterarticulation.

ACM Transactions on Graphics (TOG)

31, 6 (2012), 196.Ladislav Kavan and Jiří Žára. 2005. Spherical blend skinning: a real-time deformation ofarticulated models. In

Proceedings of the 2005 symposium on Interactive 3D graphicsand games . ACM, 9–16.Meekyoung Kim, Gerard Pons-Moll, Sergi Pujades, Seungbae Bang, Jinwook Kim,Michael J Black, and Sung-Hee Lee. 2017. Data-driven physics for human soft tissueanimation.

ACM Transactions on Graphics (TOG)

36, 4 (2017), 54.Petr Krysl, Sanjay Lall, and Jerrold E Marsden. 2001. Dimensional model reduction innon-linear finite element dynamics of solids and structures.

International Journalfor numerical methods in engineering

51, 4 (2001), 479–504.Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito, Ronald Yu, Hao Li,and Jaakko Lehtinen. 2017. Production-level facial performance capture using deepconvolutional neural networks. In

Proceedings of the ACM SIGGRAPH/EurographicsSymposium on Computer Animation . ACM, 10.Manfred Lau, Jinxiang Chai, Ying-Qing Xu, and Heung-Yeung Shum. 2009. Face poser:Interactive modeling of 3d facial expressions using facial priors.

ACM Transactionson Graphics (TOG)

29, 1 (2009), 3.Binh Huy Le and Jessica K Hodgins. 2016. Real-time skeletal skinning with optimizedcenters of rotation.

ACM Transactions on Graphics (TOG)

35, 4 (2016), 37.Binh Huy Le and JP Lewis. 2019. Direct delta mush skinning and variants.

ACMTransactions on Graphics (TOG)

38, 4 (2019), 113.John P Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Frederic H Pighin, and ZhigangDeng. 2014. Practice and Theory of Blendshape Facial Models.

Eurographics (Stateof the Art Reports)

1, 8 (2014), 2.John P Lewis and Ken-ichi Anjyo. 2010. Direct manipulation blendshapes.

IEEEComputer Graphics and Applications

30, 4 (2010), 42–50.John P Lewis, Matt Cordner, and Nickson Fong. 2000. Pose space deformation: a unifiedapproach to shape interpolation and skeleton-driven deformation. In

Proceedings ofthe 27th annual conference on Computer graphics and interactive techniques . ACMPress/Addison-Wesley Publishing Co., 165–172.Hao Li, Thibaut Weise, and Mark Pauly. 2010. Example-based facial rigging. In

Acmtransactions on graphics (tog) , Vol. 29. ACM, 32.Yaron Lipman, David Levin, and Daniel Cohen-Or. 2008. Green coordinates.

ACMTransactions on Graphics (TOG)

27, 3 (2008), 78.Lijuan Liu, Youyi Zheng, Di Tang, Yi Yuan, Changjie Fan, and Kun Zhou. 2019. Neu-roSkinning: automatic skin binding for production characters with deep graphnetworks.

ACM Transactions on Graphics (TOG)

38, 4 (2019), 114.Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael JBlack. 2015. SMPL: A skinned multi-person linear model.

ACM transactions ongraphics (TOG)

34, 6 (2015), 248.Ran Luo, Tianjia Shao, Huamin Wang, Weiwei Xu, Xiang Chen, Kun Zhou, and Yin Yang.2018. NNWarp: Neural Network-based Nonlinear Deformation.

IEEE transactionson visualization and computer graphics (2018).Nadia Magnenat-Thalmann, Richard Laperrire, and Daniel Thalmann. 1988. Joint-dependent local deformations for hand animation and object grasping. In

In Pro-ceedings on Graphics interfaceâĂŹ88 . Citeseer.Joe Mancewicz, Matt L Derksen, Hans Rijpkema, and Cyrus A Wilson. 2014. DeltaMush: smoothing deformations while preserving detail. In

Proceedings of the FourthSymposium on Digital Production . ACM, 7–11. Bruce Merry, Patrick Marais, and James Gain. 2006. Animation space: A truly linearframework for character animation.

ACM Transactions on Graphics (TOG)

25, 4(2006), 1400–1423.Mark Meyer and John Anderson. 2007. Key point subspace acceleration and soft caching.

ACM Transactions on Graphics (TOG)

26, 3 (2007), 74.Tomohiko Mukai. 2015. Building helper bone rigs from examples. In

Proceedings of the19th Symposium on Interactive 3D Graphics and Games . ACM, 77–84.Tomohiko Mukai and Shigeru Kuriyama. 2016. Efficient dynamic skinning with low-rank helper bone controllers.

ACM Transactions on Graphics (TOG)

35, 4 (2016),36.Alexander Pentland and John Williams. 1989. Good vibrations: Modal dynamics forgraphics and animation. (1989).Weiguang Si, Sung-Hee Lee, Eftychios Sifakis, and Demetri Terzopoulos. 2014. Realisticbiomechanical simulation and control of human swimming.

ACM Transactions onGraphics (TOG)

34, 1 (2014), 10.Peter-Pike J Sloan, Charles F Rose III, and Michael F Cohen. 2001. Shape by example.In

Proceedings of the 2001 symposium on Interactive 3D graphics . ACM, 135–143.Olga Sorkine. 2005. Laplacian mesh processing. In

Eurographics (STARs) . 53–70.Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling. In

Sympo-sium on Geometry processing , Vol. 4. 109–116.Olga Sorkine, Daniel Cohen-Or, Dror Irony, and Sivan Toledo. 2005. Geometry-awarebases for shape approximation.

IEEE transactions on visualization and computergraphics

11, 2 (2005), 171–180.Robert W Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation forshape manipulation. In

ACM SIGGRAPH 2007 papers . 80–es.Qingyang Tan, Lin Gao, Yu-Kun Lai, and Shihong Xia. 2018a. Variational autoencodersfor deforming 3d mesh models. In

Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition . 5841–5850.Qingyang Tan, Lin Gao, Yu-Kun Lai, Jie Yang, and Shihong Xia. 2018b. Mesh-basedautoencoders for localized deformation component analysis. In

Thirty-Second AAAIConference on Artificial Intelligence .Robert Y Wang, Kari Pulli, and Jovan Popović. 2007. Real-time enveloping with rota-tional regression. In

ACM Transactions on Graphics (TOG) , Vol. 26. ACM, 73.Xiaohuan Corina Wang and Cary Phillips. 2002. Multi-weight enveloping: least-squaresapproximation techniques for skin animation. In

Proceedings of the 2002 ACM SIG-GRAPH/Eurographics symposium on Computer animation . ACM, 129–138.Yu Wang, Alec Jacobson, Jernej Barbič, and Ladislav Kavan. 2015. Linear subspacedesign for real-time shape deformation.

ACM Transactions on Graphics (TOG)

34, 4(2015), 1–11.Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. In

ACM transactions on graphics (TOG) , Vol. 30. ACM, 77.Hao Zhang, Oliver Van Kaick, and Ramsay Dyer. 2010. Spectral mesh processing. In