Assembling a Pipeline for 3D Face Interpolation
11 Assembling a Pipeline for 3D Face Interpolation
Yusuke Niiro < [email protected] > Marcelo Kallmann < [email protected] > University of California, Merced
Abstract
This paper describes a pipeline built with open source tools for interpolating 3D facial expressions taken from images. Thepresented approach allows anyone to create 3D face animations from 2 input photos: one from the start face expression, and theother from the final face expression. Given the input photos, corresponding 3D face models are constructed and texture-mappedusing the photos as textures aligned with facial features. Animations are then generated by morphing the models by interpolationof the geometries and textures of the models. This work was performed as a MS project at the University of California, Merced.
I. I
NTRODUCTION
It is common to observe many realistic 3D animated char-acters being used in video games and movies. Behind thescenes, large data sets of images are often used in orderto achieve realistic appearances. Designers are also oftenresponsible for editing the characters in order to improve thefinal results. While significant advances have been achievedin facial animation, it is still difficult to make human-like3D faces that always look realistic. People are very good atrecognizing the differences between real humans and digitalhumans.3D character animation technologies are also starting to beused in a variety of new innovative products. For example, thenewest iPhone (smartphone by Apple) has an application thatcan take pictures or videos of people and place 3D modelscovering their faces and moving in coordination with the realfaces. Such an application illustrates the growing need forsimple and efficient approaches for achieving realistic faceanimation. This paper describes a pipeline built with opensource tools for achieving animations by interpolation of 3Dfacial expressions taken from pictures.II. R
ELATED W ORK
A significant amount of previous work has relied on datasetsof human faces in order to build face models. For example, theapproach of Blanz and Vetter [1] applied pattern classificationon their dataset of human faces in order to reconstruct a 3Dface model from a single 2D face image. Booth et al. [2]introduced the Large Scale Facial Model (LSFM), which isable to automatically construct 3D models of a variety ofhuman faces from a data set of 9,633 distinct facial identities.Tran et al. [3] also proposed a framework to construct 3Dmodels from a large set of face images, but without requiringto collect 3D face scan data. Huber et al. [4] presented theSurrey Facial Model, a multi-resolution model that can builda facial model in different resolution levels.Facial animation can be accomplished by various ap-proaches. Chuang and Bregler [5] describe an approach forfacial animation that is based on motion capture data andinterpolation of blend shapes. Noh and Neumann [6] proposedthe work Expression Cloning, an approach that makes a facialmovement by transferring vertex motion data from one source to another. Lee et al. [7] introduced an approach to generatefacial expressions based on muscle information from real facedata.There are also available products that allow the creation andmanipulation of 3D facial models. For example, Face Poser isa system presented by Lau et al. [8] and Poser [9] is a softwaresystem that facilitates modeling and editing 3D faces with acomprehensive graphical user interface.In this work a solution based on available open source toolsis presented. 3D face models are created from photos basedon the Surrey Facial Model [4]. In order to achieve a simpleapproach for face animation, this work focuses on animatingthe interpolation between a pair of input facial expressions.
A. Implementation Details
This project was implemented in C++ with Visual Studio2017 under Windows 10. The following C++ libraries wereused: • dlib library [10]: This library is used to process the inputface images and to place feature landmarks on the images.The landmarks are based on the ibug [16] facial points(Fig. 1), which represent 68 feature landmarks for humanfaces. These features are later used to generate the 3Dface models and to morph 2D images (using the eoslibrary [11]). Figures 2 and 3 illustrate example imagesincluding their respective facial landmarks.Fig. 1: Facial landmarks. a r X i v : . [ c s . G R ] N ov Fig. 2: Facial landmarks on Albert Einstein’s face.Fig. 3: Facial landmarks on Michael Jackson’s face. • eos library [11]: This library is used to create 3D facemodels from input photos of human faces. The obtainedmodels follow the Surrey Facial Model, which is basedon principal component analysis applied to a large dataset of face models. These models can be created inseveral resolutions. In this project we experimented withface models built with 3448, 16759, and 29587 vertices.This library depends on the OpenCV, Eigen, and Boostlibraries. Figures 4 and 5 illustrate example models whichwere generated from their corresponding input photos.[1] Front view [2] Side viewFig. 4: 3D Face Model of Albert Einstein.[1] Front view [2] Side viewFig. 5: 3D Face Model of Michael Jackson. • OpenCV library [12]: This library was used to supportimage operations. In particular, this library was used totriangulate the facial landmarks in order to interpolate image attributes based on triangle coordinates, such thatthe interpolated image information preserves the facialfeatures, as later explained in Section III-B. Figure 6illustrates the obtained triangulations in example photosand Figure 7 illustrates the obtained interpolation resultpreserving the facial features.[1] Albert Einstein [2] Michael JacksonFig. 6: Triangulation on example photos.Fig. 7: Morphed image of 2 example photos. • Eigen library [14]: This library provides functions forlinear algebra, matrix operations, geometrical transforma-tions, numerical solvers and related algorithms. • Boost library [13]: This library provides various functionsto complement the standard C++ libraries. • Standalone Interactive Graphics (SIG) library [15]: Thislibrary provides a C++ scene graph framework for thedevelopment of applications with interactive 3D graphics.The main animation visualizer was built with this library,which handled the graphical user interface and scenegraph operations including computing the performed in-terpolations handling images and 3D objects.III. O
VERALL M ETHOD
Our overall method consists of three main steps. First, 3Dface models from pairs of input photos are created. In thiswork we have used 4 pairs in order to obtain 4 animationsbetween different facial expressions. These pairs are presentedin Figure 8.Second, for each pair, the corresponding texture imagesare morphed in order to obtain a texture image for a giveninterpolation factor t ∈ [0 , , where t allows to obtain resultsfrom the initial model ( t = 0 ) to the goal model ( t = 1 ).Finally, the coordinates of the 3D models are interpolatedaccording to parameter t and associated with the intepolatedtexture image computed in the previous step. These steps aredetailed in the next subsections. Fig. 8: Four pairs of combination used in this project (Pair1: original photo and eye closed photo, Pair 2: original photoand mouth opened photo, Pair 3: original photo and smiling(mouth up) photo, Pair 4: original photo and depressed (mouthdown) photo).
A. Generation of 3D Face Models
The input 3D face models are generated from 2 photos ofthe same person but with different facial expressions. One ofthe photos is used as the initial face and the other photo isused as the goal face. At this point the coordinates of thefacial landmarks are computed for each photo by using thedlib library. Figure 9 illustrates one pair of photos with thefacial landmarks placed on them.[1] Original photo [2] Eye closed photoFig. 9: Facial landmarks placed on the input photos.The 3D facial models are then generated by using theoriginal photos together with the facial landmark informationconfigured in the previous step by using the eos library.Figures 10 and 11 illustrate the 3D face models generated. [1] Front view [2] Side viewFig. 10: 3D Face Model based on original photo.[1] Front view [2] Side viewFig. 11: 3D Face Model based on eye closed photo.
B. Morphed Texture Images
To morph the texture images, the coordinates of the faciallandmarks are configured for each texture image by using thedlib library. The landmarks are then triangulated using a De-launay triangulation (Fig. 12), such that the coordinates of theimage texture are interpolated using barycentric coordinatesaccording to the position of each image pixel in its containingtriangle. The colors of the image texture are also interpolatedin the same way. In this way the facial features are preservedduring interpolation. Figure 13 illustrates morphed image oforiginal and eye closed texture images of 3D face models.[1] Texture image oforiginal 3D model [2] Texture image ofeye closed 3D modelFig. 12: Triangulation on texture images.Fig. 13: Morphed image of original and eye closed images.
C. Interpolation of Mesh Coordinates
Given that the meshes respective to the input images havethe same connectivity and number of vertices, a target interpo-lated 3D model can be simply obtained by linear interpolationof the vertices of the two input meshes. Fig.14 illustrates oneinterpolated 3D face model with its corresponding morphedtexture image computed in the previous step.[1] Front view [2] Side viewFig. 14: Interpolated 3D Face Model with morphed image.IV. R
ESULTS
Figures 15 to 18 at the end of this paper illustrate theinterpolated 3D face models obtained for all 4 pairs that wereconsidered. The presented in-between models were obtainedwith interpolation parameter t starting from 0 to 1, by 0.1increments.The morphed 3D models for all pairs look natural both withrespect to the mesh and texture image deformations. Table1 presents performance measurements showing how long ittakes to compute one interpolated 3D model frame at differentresolutions with a growing number of vertices. A 3D facemodel with 29587 vertices is still processed fast enough toproduce smooth animations.TABLE I: Time for computed one interpolated mesh (inmilliseconds). V. D ISCUSSION
The used Surrey Facial Model [11] allowed us to construct3D face models that were natural; however, it did not producegood results for all types of faces. Problems were encounteredespecially for people with rounded faces. The Surrey FacialModel was constructed by analyzing 169 scan data sets andnearly 60% of the scanned data were from Caucasian people.This might be the reason why it was difficult for the SurreyFacial Model to achieve a precise face scaling for all types offaces.While the obtained results were smooth and of good quality,inorder to achieve a complete result additional facial objects have also to be considered. For example, models for eyeballs,teeth, tongue, eyebrows, eyelashes and hair are important forachieving complete animated faces.Improved rendering techniques are also important to beconsidered. The presented results were rendered only using astandard Phong illumination model with a single frontal lightsource. Special skin illumination characteristics are importantto be included with dedicated shaders in order to replicate skinproperties and achieve improved illumination results.VI. C
ONCLUSION
The described approach presents a simple pipeline toachieve fast and natural 3D facial animation between givenexpressions. The approach can be replicated with the use ofopen source tools and animations can be obtained just frominput face photos of different expressions.R
EFERENCES[1] V. Blanz and T. Vetter. ”A morphable model for the synthesis of 3dfaces”.
Proceedings of the 26th annual conference on Computer graphicsand interactive techniques , pages 187-194. ACM Press/Addison-WesleyPublishing Co., 1999.[2] J. Booth, A. Roussos, S. Zafeiriou, A. Ponniahy, and D. Dunaway, “A 3DMorphable Model Learnt from 10,000 Faces”, , 2016.[3] L. Tran, X. Liu. ”On Learning 3D Face Morphable Model from In-the-wild Images”, 2018.[4] P. Huber, G. Hu, R. Tena, P. Mortazavian, W. P. Koppen, W. J. Christmas,M. R¨atsch, and J. Kittler, “A Multiresolution 3D Morphable Face Modeland Fitting Framework”,
Proceedings of the 11th Joint Conference onComputer Vision, Imaging and Computer Graphics Theory and Applica-tions , 2016.[5] E. Chuang, C. Bregler. “Performance Driven Facial Animation usingBlendshape Interpolation”,
Computer Science Technical Report, StanfordUniversity , 2002.[6] J. Noh, U. Neumann. “Expression Cloning”,
Proceedings of SIG-GRAPH’01 , 2002.[7] H. Lee, E. Kim, G. Hur, H. Choi, ”Generation of 3D facial expressionsusing 2D facial image”,
Fourth Annual ACIS International Conferenceon Computer and Information Science (ICIS’05) , pages 228-232, 2005.[8] M. Lau, J. Chai, Y. Xu, H. Shum. “Face poser: Interactive modeling of3D facial expressions using model priors”, pages 161-170, 2007.[9] “Poser”,
Poser . [Online]. Available: https://my.smithmicro.com/poser-3d-animation-software.html .[10] “Dlib c++ Library”,
Dlib c++ Library . [Online]. Available: http://dlib.net/ .[11] Patrik Huber, “patrikhuber/eos,”
GitHub , 14-Dec-2018. [Online]. Avail-able: https://github.com/patrikhuber/eos .[12] ”OpenCV library”,
OpenCV library . [Online]. Available: https://opencv.org/ .[13] “boost c++ libraries”, boost c++ libraries
Eigen . [Online]. Available: http://eigen.tuxfamily.org/ .[15] Marcelo Kallmann, “mkallmann/sig”, bitbucket . [Online]. Available:https://bitbucket.org/mkallmann/sig .[16] “ibug”, ibug . [Online]. Available: https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/ .