[PDF] Multi-feature super-resolution network for cloth wrinkle synthesis

Abstract

Existing physical cloth simulators suffer from expensive computation and difficulties in tuning mechanical parameters to get desired wrinkling behaviors. Data-driven methods provide an alternative solution. It typically synthesizes cloth animation at a much lower computational cost, and also creates wrinkling effects that highly resemble the much controllable training data. In this paper we propose a deep learning based method for synthesizing cloth animation with high resolution meshes. To do this we first create a dataset for training: a pair of low and high resolution meshes are simulated and their motions are synchronized. As a result the two meshes exhibit similar large-scale deformation but different small wrinkles. Each simulated mesh pair are then converted into a pair of low and high resolution "images" (a 2D array of samples), with each sample can be interpreted as any of three features: the displacement, the normal and the velocity. With these image pairs, we design a multi-feature super-resolution (MFSR) network that jointly train an upsampling synthesizer for the three features. The MFSR architecture consists of two key components: a sharing module that takes multiple features as input to learn low-level representations from corresponding super-resolution tasks simultaneously; and task-specific modules focusing on various high-level semantics. Frame-to-frame consistency is well maintained thanks to the proposed kinematics-based loss function. Our method achieves realistic results at high frame rates: 12-14 times faster than traditional physical simulation. We demonstrate the performance of our method with various experimental scenes, including a dressed character with sophisticated collisions.

Full PDF

CComputers & Graphics (2020)

Contents lists available at ScienceDirect

Computers & Graphics

Multi-feature super-resolution network for cloth wrinkle synthesis

Lan Chen a,b , Xiaopeng Zhang a , Juntao Ye a, ∗ a Institute of Automation Chinese Academy of Sciences, Beijing, 100190, China b The School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China

A R T I C L E I N F O

Article history :Received April 10, 2020

Keywords:

Cloth animation, Deep learn-ing, Wrinkle synthesis, Multi-feature, Su-per-resolution

A B S T R A C TExisting physical cloth simulators su ﬀ er from expensive computation and di ﬃ culties intuning mechanical parameters to get desired wrinkling behaviors. Data-driven methodsprovide an alternative solution. It typically synthesizes cloth animation at a much lowercomputational cost, and also creates wrinkling e ﬀ ects that highly resemble the much con-trollable training data. In this paper we propose a deep learning based method for syn-thesizing cloth animation with high resolution meshes. To do this we ﬁrst create a datasetfor training: a pair of low and high resolution meshes are simulated and their motionsare synchronized. As a result the two meshes exhibit similar large-scale deformation butdi ﬀ erent small wrinkles. Each simulated mesh pair are then converted into a pair of lowand high resolution “images (a 2D array of samples), with each sample can be interpretedas any of three features: the displacement, the normal and the velocity. With these imagepairs, we design a multi-feature super-resolution (MFSR) network that jointly train an up-sampling synthesizer for the three features. The MFSR architecture consists of two keycomponents: a sharing module that takes multiple features as input to learn low-level rep-resentations from corresponding super-resolution tasks simultaneously; and task-speciﬁcmodules focusing on various high-level semantics. Frame-to-frame consistency is wellmaintained thanks to the proposed kinematics-based loss function. Our method achievesrealistic results at high frame rates: 12 −

14 times faster than traditional physical simula-tion. We demonstrate the performance of our method with various experimental scenes,including a dressed character with sophisticated collisions.c (cid:13)

1. Introduction

Cloth animation plays an important role in many applications,such as movies, video games, virtual try-on, etc. With the rapiddevelopment of physics-based simulation techniques [1, 2, 3, 4],garment animations with remarkably realistic and detailed fold-ing patterns can be achieved. However, these techniques requirehigh resolution meshes to represent ﬁne details, therefore needmuch computational time to solve velocity-updating equationsand resolve collisions. Moreover it is labor-intensive to tune sim-ulation parameters for a desired wrinkling behavior. ∗ Corresponding author e-mail: [email protected] (Juntao Ye)

Recently data-driven methods [5, 6, 7] provide alternative so-lutions for these problems, as they o ﬀ er fast production and alsocreate wrinkling e ﬀ ects that highly resemble the training data.Relying on precomputed data and data-driven techniques, a highresolution (HR) mesh is either directly synthesized, or super-resolved from a physically simulated low resolution (LR) mesh.Nevertheless, existing data-driven methods either depend on hu-man body poses [5, 7, 8, 9]) thus are not suitable for loosegarments, or lack of dynamic modeling of wrinkle behaviors[6, 10, 11, 12, 13] for general case of free-ﬂowing cloth. Whenused for the wrinkle synthesis, data-driven methods may su ﬀ erfrom cloth penetrations [6, 7, 10], even though the coarse meshesare guaranteed to be collision-free in the simulation. The pen-etrations in synthesized meshes are pre-existing collisions andshould be resolved by an untangling scheme. However, even the a r X i v : . [ c s . G R ] A p r Preprint Submitted for review / Computers & Graphics (2020) state-of-the-art untangling scheme is notoriously not robust [14].To tackle these challenges, we propose a framework, synthe-sizing cloth wrinkles with a deep learning based method. We cre-ate three datasets, from physics-based simulation, as the trainingdata. The simulation is assume to be independent of human bod-ies and not limited with tight garments. This dataset is generatedby a pair of LR and HR meshes with synchronized simulations.Given the simulated mesh pairs, we aim to map the LR meshesto the HR domain by a detail enhancement method, which is es-sentially a super-resolution (SR) operation. Deep SR networkshave proven to be powerful and fast machine learning tools forimage detail enhancement [15, 16, 17]. Yet for surface mesheswhich usually have irregular structures, it is not straightforwardto apply traditional convolutional operations as for images. Chen et al. [12] proposed a method, converting manifold meshes intogeometry images [18], to solve this issue. Inspired by their work,we design a multi-feature super-resolution network (MFSR) toimprove the synthesized results and model the dynamic wrinklebehaviors. The LR and HR image pairs, encoding three features:the displacement, the normal and the velocity, are fed into thenetwork for training. Our MFSR jointly learns upsampling syn-thesizers with a multi-task architecture, consisting of a sharednetwork and three task-speciﬁc networks, instead of combin-ing all features with a single SR network. The proposed spa-tial and temporal losses also contribute to the generation of dy-namic wrinkles and further maintain frame-to-frame consistency.At runtime, with super-resolved geometry images generated byMFSR, we convert them back into HR meshes. A reﬁnement stepis proposed to obtain realistic-looking and collision-free results.As our approach is based on deep neural networks, it reduces thecomputational cost signiﬁcantly, even with the reﬁnement step.In summary, the main contributions of our work are as follows: • We propose a novel framework for cloth wrinkle synthesis,including a multi-feature super-resolution network (MFSR),a synchronized simulation, and a collision handling method. • We learn both shared and task-speciﬁc representations ofgarment shapes with multiple features. • We generate dynamic wrinkles and consistent mesh se-quences thanks to the spatial and temporal loss functions.We qualitatively and quantitatively evaluate our method forvarious cloth types (tablecloths and long skirts) and motion se-quences. Experimental results show that the quality of synthe-sized garments is comparable with that from a physics-basedsimulation, yet signiﬁcantly reducing the computation cost. Tothe best of our knowledge, this is the ﬁrst approach to employ amulti-feature learning model on 3D dynamic wrinkle synthesis.

2. Related work

A physics-based simulation for realistic fabrics includes ve-locity updating by physical energies [1, 2, 19], time integration[3, 20], collision detection and collision response [4, 21], etc.These modules are solved separately and time consuming. Toimprove the e ﬃ ciency of this system, researchers have exploited many algorithms such as implicit or semi-implicit time integra-tion [3, 22, 23], adaptive remeshing [24, 25] and iterative opti-mization [26, 27], .etc. Nevertheless, these algorithms still costthe expensive computation to produce rich wrinkles and are la-bor consuming to tune mechanical parameters for desired wrin-kling behaviors. Recently data-driven methods have drawn muchattention as they o ﬀ er faster cloth animations than the physics-based methods. Based on precomputed data and data-driventechniques, an HR mesh is either directly synthesized, or super-resolved from a physically simulated LR mesh. In the ﬁrst streamof work, with precomputed data, researchers have investigatedmany techniques to accelerate the process for new animations,such as a linear conditional model [9, 28] and a secondary mo-tion graph [29, 30]. Additionally, deep learning-based methods[31, 32] are also used to generate static garments on human bod-ies. In the another line of work, researchers have proposed tocombine coarse mesh simulations with learned geometric detailsfrom paired mesh databases, to generalize the performance tocomplicated testing scenes. This stream of methods includeswrinkle synthesis depending on bone clusters [8] or human poses[5] for ﬁtted clothes, and linear upsampling operators [10] orlow-dimensional subspace with bases [6, 33] for general case offree-ﬂowing cloth. Inspired by these data-driven methods, wepropose a deep learning based approach to synthesize wrinkleson coarse simulated meshes, while our approach is independentwith poses or skeletons and not limited with tight garments. Dueto the expensive cost and low-quality data retrieving from thereal world, most data-driven methods use training data generatedfrom physics-based simulation. In our experiments, we use theARCSim system [24] for its speed and stability. Due to the irregular topology and high dimension, 3D meshesare more di ﬃ cult to be processed by neural networks than im-ages. Nevertheless, some approaches have been proposed in re-cent years. Wang et al. [34] propose to represent 3D shapesas voxel grids to cope with an octree-based convolutional neuralnetworks (CNNs) for 3D shape analysis, like object classiﬁca-tion, shape retrieval and part segmentation. Su et al. [35] learn torecognize 3D shapes from a collection of their rendered views on2D images with standard CNNs. Li et al. [36] use the learning-based SR framework to retrieve HR texture maps from multi-ple view points with the geometric information via normal maps.Representations based upon voxel grids or multi-view images areextrinsic to the shapes, which means that they may naturally failto recognize a shape under isometric deformations. To encodeintrinsic or extrinsic descriptors for CNN-based learning, a tech-nique called geometry images [18] is used in [12, 37, 38, 39] formesh classiﬁcation or generation. We adopt this representationembedding multiple features of meshes.Feature-based methods aim for proper descriptions of irreg-ular 3D meshes, for synthesizing detailed and realistic objects.Conventional data-driven methods [6, 40, 41] simplify the calcu-lation of wrinkle features, by formulating the strain or stress ina LR mesh. As for deep learning, several algorithms have alsoinvestigated robust shape descriptors for wrinkle deformation.Sinha et al. [37] use geometry images encoding principal cur-vatures and heat kernel signatures for rigid and non-rigid shapedescriptors, respectively. Geometry images embedding position reprint Submitted for review / Computers & Graphics (2020) 3

Synchronized Simulation

MeshesGeometryImages LR

Displacement

NormalVelocity

MFSR

LR GI

MFSR

LR HRHR GI HR

DisplacementNormalVelocity + refinement

Training StageRuntime Stage

Fig. 1: Pipeline of our MFSR for cloth wrinkle synthesis. We generate three datasets of LR and HR mesh sequences by synchronized simulation. In the training stage,meshes are converted into geometry images (GI) encoding the displacement, the normal and the velocity of the sampled points. Then they are fed into our MFSRfor training. For three datasets, we train three models separately. At runtime stage, LR geometry images (converted from the input LR mesh) are upsampled into HRgeometry images, which are converted to a detailed mesh. feature are proposed by Chen et al. [12] for cloth wrinkle synthe-sis. Santesteban et al. [7] use two displacements as descriptors,one from overall deformation in the form of stretch or relaxation,and the other from additional global deformation and small-scalewrinkles. Wang et al. [42] learn a shape feature descriptor fromvertex positions using multilayer perceptrons. In addition to theposition or the displacement, L¨ahner et al. [11] propose a wrinklegeneration method learning high frequency details from normalmaps. In our approach, we cascade multiple geometric featuresas shape descriptors embedded in geometry images, includingspatial information of the displacement, the normal and temporalinformation of the velocity.

Image SR is a challenging and ill-posed problem because ofno prior information. In recent years, learning-based methodshave made great progress with huge training data. Dong et al. [43] ﬁrstly use a simple 3-layer CNN to map the bicubic interpo-lated LR images to HR ones. Deeper networks [44, 45, 46] areproposed to improve the e ﬀ ects of SR networks. Above meth-ods need to do a preprocess interpolating LR images to the highresolution because the size of images can not be upscaled in anetwork. Upsampled input can cause computation complexityand blurred details. To do acceleration with more layers, Dong etal. [47] propose a transposed convolutional layer using LR im-ages as input and upscaling them to the HR space. A commonlayer used in recent SR approaches [15, 16] is an e ﬃ cient sub-pixel layer [48] directly transforming LR features into HR im-ages. However, all of these methods build basic blocks in a chainway ignoring information from early convolutional layers. Zhang et al. [17] propose residual dense networks (RDN) to extract and adaptively fuse features from all hierarchical layers e ﬃ ciently. Inour MFSR, RDN are used as basic networks.In video SR [49, 50, 51], consecutive frames are concatenatedtogether as inputs to a CNN that generate HR images as outputs.Therefore, temporal coherence is addressed in the training. Yet itis di ﬃ cult to decide the optimal sequence length for every motionin a video. Our data is animated mesh sequences so the frame-to-frame consistency needs to be considered in synthesis. Previouswork on 3D cloth synthesis did not consider very much on theconsistency between frames, while our work tries to address thisissue by introducing the kinematic-based loss function.

3. Overview

In this paper, we propose a deep learning based method forsynthesizing realistic and consistent HR cloth animations, tak-ing physical simulated LR meshes as input. We construct threedatasets: DRAPING, HITTING and SKIRT, and train each syn-thesizing model separately. The pipeline of our approach is il-lustrated in Figure 1. To generate training data, a pair of LRand HR meshes are simulated synchronously by virtual springconstraints and multi-resolution dynamic models ( § ﬀ er in the wrinkles. Given thesimulated mesh pairs, we convert them into dual-resolution ge-ometry images ( § § Preprint Submitted for review / Computers & Graphics (2020)Fig. 2:

Our dual-resolution cloth model. The LR meshes on the left areinitially deﬁned by triangulated polygon meshes. Then the HR mesheson the right are obtained by subdividing edges of the the LR meshesseveral times.

SR tasks simultaneously. And the task-speciﬁc module focuseson various high-level semantics for each speciﬁc task. Based onthese features, we design the spatial and temporal loss functions( § §

4. Data preparation

Dual-resolution meshes.

Before executing cloth simulationfor training data generation, we need to set the initial rest stateof LR and HR meshes. Examples are shown in Figure 2, a pieceof cloth is initially deﬁned by a polygon, and then triangulatedinto an initial LR mesh. To obtain the corresponding HR mesh,we subdivide the LR one by progressively dividing edges until itreaches the desired resolution. In this work, the number of facesin the HR mesh is 16 times as many as the LR mesh. With the reststate LR / HR meshes, we create two sets of dual-resolution framedata by physics-based simulation. The correspondence betweenthem is maintained during the simulation, so that they exhibitidentical or similar large-scale folding behavior but di ﬀ er onlyin the ﬁne-level wrinkles. More details about the synchronizedsimulation are given in § Dual-resolution geometry images.

We convert the pairedmeshes with dual-resolution geometry images, embedding spa-tial and temporal features. The feature descriptors include the displacement d , the normal n and the velocity v of samplingpoints. For any sample points inside a triangle face, these fea-tures are interpolated from three vertices using barycentric coor-dinates. Di ﬀ erent from the original geometry image paper [18], d nv Geometry images

Original HR mesh Reconstructed mesh

Fig. 3: The original long skirt model on the left is converted to geometry imagesencoding three descriptors: the displacement d , the normal n and the velocity v .For irregular garments, the feature values of sample points outside the mesh butinside the bounding box are ﬁrstly set to zero, which are black pixels shown inthe 2nd column. Then those zero vectors are changed into the nearest non-zeropixel values similar to replicate padding to form a padded geometry image (the3rd column). The right one is the reconstructed model using geometry image ofdisplacement. We encode the displacements instead of positions, since we areonly interested in the intrinsic shape of the mesh, not its abso-lute spatial locations. The displacement is deﬁned as the di ﬀ er-ence between its position in current frame and that in its startingposition. The vertex normal is computed by the area-weightedaverage normals of the faces adjacent to this vertex. Due to thephysics-based simulation with ﬁxed time step, the velocity is nat-urally calculated using the positions between two frames (Thecomplete calculation of our feature descriptors is provided in thesupplemental materials). Since these features are not rotation in-variant, we calculate a rigid motion transformation with rotation R and translation t . Then, we apply ( R , t ) to displacement, R applied to normal and velocity. This transformation is computedby the Kabsch Algorithm [52], ﬁnding an alignment between anewly deformed mesh and an unique reference one in the reststate (please refer to the supplementary material for the detailsof rigid motion invariant features). To reduce the computationcost, we only compute the rigid motion of LR meshes and applythe same ( R , t ) to the corresponding features of HR meshes. Torelease the internal covariate shift [53], these features are nor-malized into a range of [0, 1] for our network. Mesh-to-image conversion.

We convert the deformedmeshes into geometry images of 9 channels. For a mesh in itsrest state, we ﬁnd the smallest bounding box in the 2D materialspace. Inside the bounding box, we then sample an array of m × n points uniformly. For each sample point inside the mesh, we ﬁndthe triangle it is located in, and compute its barycentric coordi-nate (BC) w.r.t. three triangle vertices. BC is unchanged eventhough a triangle deforms during simulation. When computingfeatures for sample points, BCs are used as weights for interpo-lating feature values ( d , n , v ) from triangle vertices. For a meshwhose boundary coincides with the bounding box edge, we dothe padding operation along boundaries. For sample points out-side the mesh but inside the bounding box, their feature valuesare ﬁrstly set to zero. Similar to replicate padding, our operationchanges those zero vectors into the nearest non-zero pixel values.A long skirt example is given in Figure 3. Image-to-mesh conversion.

After an HR image is synthe-sized, values in the displacement channel are used to restore the reprint Submitted for review / Computers & Graphics (2020) 5 vertex positions of the detailed mesh, while the original topologyof that mesh is retained. Due to the padding operation, every ver-tex in 2D material space, whether it is on the boundaries or on theseam lines, or internal to the mesh, has four nearest surroundingnon-zero sample points. We restore displacements of the ver-tices of the detailed mesh by bilinear interpolation. Specially forvertices in seam lines, each one has two or more correspondingvertices in 2D material space. We restore its displacement byweighted averaged of the corresponding ones. The weight fora 2D vertex is the ratio of the number of faces incident to thisvertex, to the one incident to the 3D vertex. These computeddisplacements are added to the positions of subdivided mesh ver-tices in the rest state to obtain wrinkle-enhanced positions. In theend we apply the inverse of the rigid transformation, computedin the mesh-to-image phase, to new positions. As shown in theright of Figure 3, almost no visual di ﬀ erences can be seen. Inour quantitative experiments, the geometric reconstruction erroris smaller than 1e-4, measured by the vertex mean square error(VMSE). The high-quality training dataset is equally important for data-driven approaches. In our case, we need to generate correspond-ing LR / HR mesh pairs in animation sequences by physics-basedsimulation. In image super-resolution tasks [16, 43, 45], oneway to generate training dataset is down-sampling HR images toobtain their corresponding LR ones. However, down-samplingan HR cloth mesh could cause collisions, even though the HRmesh is collision-free. Therefore, it is preferred that two meshesare simulated individually, with all collisions being properly re-solved. However, as mentioned in previous works [10, 6], if thereis no constraints between two simulations, they will bifurcate todi ﬀ erent behaviors because of accumulated higher frequenciesgenerated by ﬁner meshes and numerical errors. To solve thisproblem, we enforce virtual spring constraints and use multi-resolution dynamic models to construct synchronized simulationfor HR meshes.As shown in Figure 2, our dual-resolution meshes are wellaligned in the initial state, because we only add vertices on theedges without changing the mesh shape. The vertices in an LRmesh, called feature vertices , show up in an HR mesh and areused as constraints for synchronized simulation. We ﬁrst runcoarse cloth simulation and record the positions of all featurevertices at total N frames as p lk , k = , · · · , N , where the super-script l stands for the LR. While simulating an HR mesh at theframe k , virtual springs are added to connect pairs ( p lk , p hk − ) offeature vertices between LR mesh at the frame k and HR mesh atthe frame k −

1. To pull p hk − towards p lk , we deﬁne an internalforce following Hooke’s law [54] as f spring = − c ( p lk − p hk − ) . (1)where c is a spring sti ﬀ ness constant that can be adjusted de-pending on how tight the tracking is desired by the user. A large c results in tight tracking of the feature vertices, but not for othervertices. As a side e ﬀ ect the simulated HR mesh has many an-noying “spikes”.To tackle the above issue, we propose another tracking mecha-nism with multi-resolution dynamic models. Given an HR meshat level H (shown as the solid lines in Figure 4), we construct

41 21 Fig. 4:

The multi-resolution dynamic model for tracking. Forces ofstretching (left) and bending (right). an LR triangle mesh at level H (the dashed triangle in Figure 4).The mesh in H connects the feature vertices by retaining thetopology of the LR mesh. In ﬁnite-element simulations, the con-stitutive model includes internal cloth forces supporting behav-iors such as anisotropic stretch or compression [55] and surfacebends [2, 19] with damping [24]. For a triangle in the coarsemesh at level H , the in-plane stretching forces f s = ( f s , f s , f s )at three vertices are measured by a corotational ﬁnite-elementapproach [55]. While the bending forces for two adjacent trian-gles are added using a discrete hinge model based on dihedralangles [2, 19], denoted as f b = ( f b , f b , f b , f b ). The trianglesin the ﬁne level H have the same force patterns f s and f b im-posed on all particles (including feature vertices). All stretchingand bending forces are added accompanying damping forces. Inaddition, our two-level dynamic models are independent of theforce implementations, and would also work with other triangu-lar ﬁnite-element methods. As a result, the feature vertices inmulti-resolution dynamic models receive the stretch forces fromboth f s and f s , while the same for bending forces. The rest ver-tices are only imposed on the forces at level H . With the two-hierarchy dynamics model, modest virtual spring coe ﬃ cients canmake the HR mesh keep pace with the LR mesh in simulation.

5. Multi-feature super-resolution network

In this section, we introduce our MFSR architecture based onthe RDN, as well as the loss functions taking spatial and temporalfeatures into account to improve wrinkle synthesis capability.

We now introduce our MFSR architecture for the image SRtasks of multiple features. With LR / HR images of the form( d , n , v ) l and ( d , n , v ) h , our MFSR learns the mappings of dif-ferent features by image SR networks. One standard methodol-ogy is single task learning, which means learning one task at atime. However it ignores a potentially rich source of informa-tion available in other tasks. Another option is multi-task learn-ing, which achieves inductive transfer between tasks, with thegoal to leverage additional sources to improve the performanceof the target task [56]. Our MFSR is a multi-task architecture,consists of two components: a single shared network, and threetask-speciﬁc networks. The shared network is designed based Preprint Submitted for review / Computers & Graphics (2020)

Shared layers

Unshared layers

RDB -1 RDB -d RDB -D × Global residual learningResidual dense network Residual dense networkLR features HR features Element-wisesumConcat

Residual denseblockUpscalemoduleConv

Fig. 5: The architecture of MFSR. The input and the output are LR / HR images where each pixel is represented in a 9-dimensional feature vector embedding thedisplacement, the normal and the velocity in order. Conv and Concat refer to convolutional and concatenation layers, respectively. The MFSR upscales the LR featureswith shared and unshared layers to recover HR features with detailed information.

Residual dense block C on v C on v C on v C on v R e l u R e l u R e l u R e l u C on v Residual block C on c a t × C on v Fig. 6:

Two network structures used in image super-resolution. The leftis residual block in [16]. The right is residual dense block in [17] usedfor our MFSR. on the SR task, whilst each task-speciﬁc network consists of aset of convolutional modules, which link with the shared net-work. Therefore, the features in the shared network, and thetask-speciﬁc networks, can be learned jointly to maximise thegeneralisation of the shared representation across multiple SRtasks, simultaneously maximising the task-speciﬁc performance.Figure 5 shows a detailed visualisation of our MFSR based onresidual dense blocks (RDB) [17]. In the shared network, theimage SR model consists of four parts: shallow feature extrac-tion , basic blocks , dense feature fusion , and ﬁnally upsampling .We use two convolutional layers to extract shallow features, fol-lowed by the RDB [17] as the basic blocks, then dense featurefusion to extract hierarchical features, and lastly one bilinear up-sampling layer to upscale the height and width of the LR featuremaps by 4 times. Di ﬀ erent from general SR tasks, we ﬁnd thatpixel shu ﬄ e and deconvolution methods cause apparent check-board artifacts so we use bilinear method. For basic blocks inour SR network, we employ RDB instead of residual blocks usedin [12]. As shown in the left of Figure 6, a residual block learnsa mapping function with reference to its input, therefore can be used to build deep networks to address the problem of vanishinggradients. However, in the residual block a convolutional layeronly has direct connection to its precedent layer, neglecting tomake full use of all preceding layers. To exploit all the hierarchi-cal features, we choose RDB (see in the right of Figure 6) thatconsist of densely connected layers for global feature combina-tion and local feature fusion with local residual learning. Moredetails about RDB are given in [17]. In each task-speciﬁc net-work, we utilize one convolutional layer to map the extracted lo-cal and global features to each upsampled descriptor d s , n s , and v s , respectively. In order to learn the spatial details and temporal consistency ofthe underlying HR meshes, our MFSR is trained by minimizingthe following loss functions for mesh features. A baseline meansquare error (MSE) reconstruction loss is deﬁned as L d = || d h − d s || , (2)where the superscripts h and s stand for the ground truth HR andthe synthesized SR, respectively. This displacement loss termis able to obtain a smooth HR result with given low frequencyinformation.To extend the loss into wrinkle feature space, a novel L lossfor normal is introduced: L n = || n h − n s || . (3)The normal feature is directly related to the bending behavior ofcloth meshes. This loss term encourages our model to learn theﬁne-level wrinkle features so that the outputs can stay as close reprint Submitted for review / Computers & Graphics (2020) 7 to the ground truth as possible. In our experiments it aids thenetworks in creating realistic details.The above two loss terms are utilized to reconstruct high-frequency details exclusively from spatial statistics. To improvethe consistency for animation sequences, we should also take thetemporal coherence into account. The vertex velocities of everyanimation frame contribute a velocity loss of the form L v = || v h − v s || . (4)In addition, we minimize a kinematics-based loss in the train-ing stage, to constrain the relationship between synthesized ve-locities and displacements (please refer to the supplementary ma-terial for the detail derivation) as L kine = n (cid:88) k = || R − ( d sk − ( d s + ( k (cid:88) j = v sk − j ) ∗ ∆ t ) || , (5)where n is the length of frames associated to the input frame, and ∆ t represents the time step between consecutive frames. Thiskinematics-inspired loss term can improve the consistency be-tween the generated cloth animations.The overall loss of our MFSR is deﬁned as L all = w d L d + w n L n + w v L v + w kine L kine . (6)which is a linear combination of spatial smoothness, detail sim-ilarity, temporal consistency and kinematic loss terms with theweight factors w d , w n , w v , and w kine . As for back propagation,each loss term propagates backwards through the task-speciﬁclayer independently. In the shared layers, parameters are updatedaccording to the total loss L all . As a result, the gradient of lossfunctions from multiple SR tasks will pass through the sharedlayers directly, and learn a common representation for all relatedtasks.

6. Reﬁnement

Given a new LR mesh, we convert it to geometry images andour MFSR predicts the corresponding upsampled images. Afterconverting the HR images to a detailed cloth mesh, we ﬁnd thatthe results may su ﬀ er from unexpected rough e ﬀ ects and pene-trations. To solve this problem, we design a reﬁnement step in-cluding a fast feature-preserving ﬁltering for global smoothnessand collision solving step to untangle cloth penetrations. Feature-preserved ﬁltering.

We observe that our MFSR fa-cilitates abundant wrinkles generation. However, these wrinklesare accompanied by some unexpected rough e ﬀ ects. This is be-cause our simulated HR meshes have abundant wrinkles whichmay introduce “noise” to networks and some numerical instabil-ities in physics-based simulation. We try to follow prior workson image super-resolution [57, 58] using total variation loss func-tion as a regularization mechanism to encourage spatial smooth-ness, however it cannot work. To deal with this problem, weadapt a feature-preserving mesh denoising method [59], whichcan remove noise e ﬀ ectively without smoothing wrinkle features.Super-resolved normals are used as the guidance to update thepositions of vertices in the corresponding mesh. For each mesh,updating vertex positions for only one step is good enough toobtain a visually much smoother result (see Figure 7). In our ex-periment, we set the iteration times to 5 and it takes about 0.01sper frame, which is e ﬃ cient for our application. (a) (b) (c) (d)Fig. 7: Comparisons of results with / without reﬁnement. (a) the super-resolved cloth mesh su ﬀ ering from unexpected rough e ﬀ ects and colli-sions, (b)(c) results by updating vertex positions for one step and 5 stepsrespectively according to the feature-preserved ﬁltering. (d) our resultwith reﬁnement by smoothing noise and solving penetrations. Collision handling.

For many data-driven methods, penetra-tions are not able to be completely avoided in the synthesizedmesh. For simulation of either the LR mesh or the HR mesh inthe training stage, penetrations are avoided, by the enforced con-tinuous collision detection and response [60]. This scheme alsoguarantees the runtime simulation of an LR mesh to be collision-free. However, the synthesized mesh has the possibility of pene-trating itself or other obstacles. Considering each detailed meshalone, these penetrations are actually pre-existing collisions andshould be resolved by an untangling scheme. However, even thestate-of-the-art untangling scheme is notoriously not robust [14].We propose a collision response method that guarantees to re-solve all collisions in our cloth synthesis.The state of a cloth mesh, when embedded in 3D space, can bedenoted as the positions of vertices x ∈ R m × . Given a simulatedLR mesh in the collision-free state, we can divide the edges twiceto obtain a subdivided version x sub . The subdivided mesh has thesame shape as the LR mesh, without penetrations. As for thesynthesized mesh x syn , collisions happen but plausible wrinklesare augmented. Then our collision handling problem turns intoproperly interpolating between the subdivided mesh x sub and thesynthesized mesh x syn , so that the new mesh x s is collision-freeand has wrinkles as many as possible. The interpolation can beexpressed as x s = ( I − W ) x sub + Wx syn , (7)where I ∈ R m × m is an identity matrix, and W ∈ R m × m is theweight diagonal matrix to be solved. In implementation, we usethe bisection method [61] to search for a close-to-optimal inter-polation weight. We iteratively bisect in the range of [0 ,

1] forthe elements in W , which means collision-free at the beginningbut collides at the end. For each bisection, if the middle point iscollision-free, it is set as the beginning of the new range, other-wise it is set as the end of the new range. We do the bisectionthree or four times and then take the last collision-free state asthe interpolation result (see Figure 7(d)).In addition, the above collision resolving process can be fur-ther optimized. It is not necessary to let all vertices of the wholemesh get involved in the position interpolation. Instead, only thevertices involved in the intersections are of our interests. Thesevertices can be speciﬁed by a discrete collision detection processand grouped into impact zones as did in [14]. Position interpola-tions are performed per zone, and each zone has di ﬀ erent inter- Preprint Submitted for review / Computers & Graphics (2020) polation weights. In this way, the synthesized meshes are leasta ﬀ ected by the collision handling.

7. Implementation

We describe the details of the data generation and the networkarchitecture in this section.

Data generation.

To generate data for our MFSR training,we construct three datasets using a tablecloth model and a skirtmodel with character motions. The two di ﬀ erent models are ofregular and irregular garment shapes, respectively. The meshesin each dataset are simulated from a template model to ensure aﬁxed size.For the tablecloths, the LR and HR meshes have 749 and11,393 vertices respectively. Using the tablecloth model we gen-erate two datasets, called DRAPING and HITTING (see Fig-ure 8). The DRAPING dataset is created by randomly handlingone of the topmost vertices of the tablecloth and letting the fabricfall freely. It contains 13 simulation sequences, each with 400frames. Ten sequences are randomly selected for training and re-maining three sequences are for testing. In addition to simulatinga piece of tablecloth in a free environment, we also construct aHITTING dataset where a sphere interacts with the tablecloth.Specially, we select spheres of di ﬀ erent radii to hit the tableclothback and forth at di ﬀ erent locations, and obtain a total of 35 sim-ulation sequences, with 1,000 frames for each sequence.We also generate a dataset called SKIRT, for the long skirt gar-ments worn by animated characters (shown in Figure 8). Thenumber of vertices are 1,303 and 19,798 in LR and HR skirtmeshes. A mannequins has rigid parts as [24] and is driven bypublicly available motion capture data from CMU [62]. Speciﬁ-cally, we select dancing motions including 7 sequences (in total30,000 frames), in which 5 sequences are randomly selected fortraining and remaining 2 sequences are for testing. Since somedancing motions are too fast for physics-based simulation, weslow it down by interpolating 8 times between two adjacent mo-tions from the original CMU data.We apply the ARCSim cloth simulation engine [24] to pro-duce all simulations, but without using the remeshing operation.ARCSim requires material parameters for simulation. In our ex-periment, we choose the Gray Interlock for its anisotropic behav-iors, from a library of measured cloth materials [63]. Another re-quirement from ARCSim is a collision-free initial state betweengarment and obstacles. For the tablecloth simulation, we can eas-ily set a rectangular sheet and put the obstacles in the right placeswithout collision. As for the long skirts, we ﬁrst manually put theskirt on a template mannequin (T pose) to ensure a collision-freestate. Then, we interpolate 80 motion frames between the T poseand the initial poses of all motion sequences. With these inter-polated motions, we run the simulations of the long skirts wornby the mannequins of various poses, from its collision-free initialstate. In addition, for synchronized simulation, we set the springsti ﬀ ness constant c =

10 in the equation (1).

Network architecture.

For the three simulated datasets, wetrain three MFSR respectively. Our proposed MFSR consists ofshared and task-speciﬁc layers. The shared network has 16 iden-tical RDB [17], where six of them are densely connected layersfor each RDB, and the growth rate is set to 32. The basic networksettings, such as the convolutional kernel and activation function, are set according to [17]. For the upscaling operation, i.e. , fromthe coarse resolution features to ﬁne ones, we consider severaldi ﬀ erent mechanisms, e.g. , pixel shu ﬄ e module [48], deconvo-lution, nearest, bilinear, etc., and ﬁnally choose the bilinear up-scaling layer because it can prevent checkerboard artifacts in thegenerated meshes. In our upsampling network, the upscale factoris set to 4. The upscale factor (in one dimension) for correspond-ing meshes is set to be as close to 4 as possible. For example, theLR and the HR tablecloth meshes have 749 and 11,393 vertices,respectively, the latter being roughly 16 times as many as the for-mer. Converting meshes to images, we set the size of LR imagesin tablecloth to be 192 × × / HR pairs with the size of72 ×

72 and 288 ×

288 as input. Adam optimizer [64] is usedto train our network, and its β and β are both set to 0.9. Thebase learning rate is initialized to 1e-4, and is divided by 10 every20 epochs. To avoid learning rate becoming too small, we ﬁx itafter 60 epochs. The training procedure stops after 120 epochs.The training for each model takes about a day and a half on aGeForce R (cid:13) GTX 1080Ti of NVIDIA R (cid:13) corporation. In all our ex-periments, we set the length of the input frames n = w d = . , w n = . , w v = .

03 and w kine = .

03 in theequation (6).

8. Results

In this section, we evaluate the results obtained with ourmethod both quantitatively and qualitatively. The runtime perfor-mance and visual ﬁdelity are demonstrated with various scenes:draping and hitting tablecloths, and long skirts worn by animatedcharacter, separately. We compare our results against simulationmethods and demonstrate the beneﬁts of our method for clothwrinkle synthesis. The e ﬀ ectiveness of our network componentsis also analyzed, for various loss functions and network architec-tures. We implement our method on a 2.50GHz Core 4 Intel CPU forcoarse simulation and mesh-image conversion, and a NVIDIAGeForce R (cid:13) GTX 1080Ti GPU for image synthesizing. Table 1shows average per-frame execution time of our method for thedi ﬀ erent garment resolutions. The execution time contains fourparts: coarse simulation, mesh / image conversion, image synthe-sizing, and reﬁnement. For reference, we also statistic the sim-ulation timings of a CPU-based implementation of tracked high-resolution simulation using ARCSim [24]. Our algorithm is aver-agely 13 times faster than the tracked simulation. The low com-putational cost of our method makes it suitable for the interactiveapplications. Generalization to new hanging.

We use the training data inthe DRAPING dataset to learn a synthesizer, then evaluate the reprint Submitted for review / Computers & Graphics (2020) 9Fig. 8:

Visualization of three datasets including DRAPING (left), HITTING (middle) and SKIRT (right). We generate the DRAPING dataset byrandomly handling one of the topmost vertices of the tablecloth and letting the fabric fall freely. For the HITTING dataset, we use spheres ofvarious sizes to hit the hanging tablecloth, at di ﬀ erent locations, generating a total of 35 simulation sequences. The SKIRT dataset contains thesimulation sequences of the long skirt garments worn by animated characters with di ﬀ erent motions (right). The top row shows the low-resolutioncloth simulation and the bottom row shows high-resolution ones. Table 1: Statistics and timing (sec / frm) of the tablecloth and skirt testing examples. Benchmark / image synthesizing reﬁnementsim. conversion (GPU)DRAPING 749 11,393 4.27 0.345 GTOurs

Fig. 9:

Comparison between the ground-truth HR tracked simulation(top) and the super-resolved results of our method (bottom), on a testsequence of the DRAPING dataset. generalization to new hanging vertices . Figure 9 shows the de-formations of tablecloths of three test sequences in the DRAP-ING dataset. The row of “GT” in Figure 9 illustrates the HRmeshes of tracked physics-based simulation for reference, whilethe row of “Ours” is the result of our data-driven method. Weﬁnd that our approach successfully produces the realistic andabundant wrinkles in di ﬀ erent deformation sequences, in details,tablecloths appear many middle and small wrinkles when fallingfrom di ﬀ erent directions. Generalization to new balls.

Shown in Figure 10, we vi-sually evaluate the quality of our algorithm in the HITTINGdataset, which illustrates the performance when generalizing tonew crashing balls of various sizes and initial positions. We showfour test examples comparing the ground-truth HR of the trackedsimulation with our method. For testing, the initial positions of

GTOurs

Fig. 10:

Comparison between ground-truth tracked simulation (top) andour super-resolved meshes (bottom), on testing animation sequences inHITTING dataset. Our method succeeds to predict the small and mid-scale wrinkles of the garment with 12 times faster running speed thanphysic-based ones. balls are set to four di ﬀ erent places which are unseen in trainingdata. Additionally, in the third and fourth columns of Figure 10,the diameter of the ball is set to 0.5 which is also a new sizenot used for training. When various sizes of balls crash into thecloth in di ﬀ erent positions, our method can successfully predictthe plausible wrinkles, with 12 times faster running speed thanphysics-based simulation. Generalization to new motions.

In Figure 11, we showthe deformed long skirt produced by our approach on the man-nequins while changing various poses over time. The humanposes are from two testing motion sequences 05 04 in the sub-ject of modern dance and 55 02 in the subject of lambada dance[62]. We visually compare the results of our algorithm with theground-truth simulation. The mid-scale wrinkles are successfullypredicted by our approach when generalizing to various dancingmotions not in the training set. For instance, in the ﬁrst column ofFigure 11, the skirt slides forward and forms plausible wrinklesdue to an extended and straight leg caused by the character poseof sideways arabesque. As for dancing sequences, please see theaccompanying video for more animated results and further com- / Computers & Graphics (2020)

GTOurs

Fig. 11:

Comparison between ground-truth tracked simulation (top) andour super-resolved meshes (bottom), on testing animation sequences inSKIRT dataset. parisons.

Comparison with other methods.

Compared to conventionalphysically-based simulations, [12] and [13] use deep learningbased methods to generate cloth animations for acceleration. Oh et al. [13] introduce a fast and reliable hierarchical cloth an-imation algorithm, simulating coarsest level by physics-basedmethod and generating more detailed levels by inference of deepneural networks. However, it relies on a full-connected networkto model detailed levels for each triangle separately, thus not ap-propriate for learning the wrinkling behaviors. We compare ourmethod with mesh SR [12], a CNN-based method to synthesizecloth wrinkles. The performance is evaluated on the Tableclothdataset with both DRAPING and HITTING. The training settingsof our network are illustrated in §

7, and mesh SR is trained usingthe same setting reported in their paper [12].The peak signal-to-noise ratio (PSNR) is a widely-used met-ric for quantitatively evaluating image restoration quality, whilevertex-wise mean square error (VMSE), which is computed asper vertex L error averaged over all vertices and frames, is usu-ally used for evaluating the quality of reconstructed meshes. Inthis work, we choose these two metrics to quantitatively compareour method with [12], and the corresponding results are reportedin Table 2. When comparing with mesh SR, our MFSR improvesthe performance signiﬁcantly, and obtains a higher PSNR. Thisis favored by RDB, solving the drawbacks of neglecting to use allpreceding features in residual blocks. With better super-resolvedimages and the reﬁnement step, our MFSR further reaches alower VMSE, indicating a better performance and generalizationfor those datasets.In Figure 12, we show visual results of our MFSR and meshSR. Given the same LR meshes in the testing stage, our MFSRsuccessfully produces rich and consistent wrinkles due to mul-tiple features, while the results of mesh SR approximate inac-curate wrinkles depending on the position. The velocity andkinematics-based loss functions also favor to more stable re-sults than mesh SR (please refer to the accompanying video). Ithas been reported that mesh SR can generate large-scale folderswhen the resolution of training pairs is low ( i.e. the vertices of LRmeshes are decreased to 200). However, in our experiments weﬁnd the mesh SR is unable to converge close to the ground truthwith complicated wrinkle styles. Their results su ﬀ er from plausi-ble but unrealistic small wrinkles like noise artifacts. In contrast,our method synthesizes the HR meshes in a physically realistic manner. The di ﬀ erences between the result and the ground truthare highlighted in Figure 12 using color coding. In the results ofresults of mesh SR, it clearly highlights the bottom left, bottomright corners and wrinkle lines, where our results look closer tothe ground truth. Table 2: Comparison of pixel-wise and vertex-wise error values (PSNR / VMSE)of our method and [12].

Benchmark Chen et al. / VMSE PSNR / VMSEDRAPING 59.07 / / HITTING 59.15 / / In addition, here we introduce some improvements of ourmethod to state-of-the-art data-driven methods not using deepnetworks. As mentioned in [6], their method handles only qua-sistatic wrinkle formation without dynamics information so thatit cannot capture the richness of wrinkles in a ﬂag. They only usethe edge ratio between current and the rest state as mesh descrip-tors, contrarily, our algorithm enhances the LR deformation us-ing descriptors with displacement, normal and velocity coveringboth spatial and dynamic information. As shown in Figure 9, ourtechnique can realize such dynamic e ﬀ ects like travelling waves.Another limitation mentioned in their work is the possibility toincur in cloth interpenetration. Although the penetrations in LRcloth are solved, generated HR meshes may su ﬀ er from collisionproblems. With the controllable cost (see in Table 1), we havesolved this problem using discrete collision detection and inter-polation collision response algorithm. Next, we study the e ﬀ ect of di ﬀ erent components of our pro-posed network, including loss function and network architecture. Loss function.

To demonstrate the e ﬀ ectiveness of our pro-posed loss functions, we conduct the experiments with di ﬀ erentloss combinations on three datasets, i.e. , DRAPING, HITTING,and SKIRT, respectively. The training and testing datasets are se-lected as mentioned in §

7. We use the displacement loss as thebaseline and progressively add the remaining loss terms of ourmesh MFSR, to obtain the comparative results of di ﬀ erent lossterms.Table 3 reports the quantitative evaluation of PSNR betweengenerated displacement images and ground truth in various set-tings of loss functions. Red text indicates the best performanceand the blue text indicates the second-best. The result shows thatour proposed algorithm has either a best or second-best perfor-mance through combining all our proposed loss terms in a multi-task learning framework. Notice that without the constraints ofvelocity and kinematics-based loss functions, normal loss maydecrease the ﬁnal testing PSNR although it encourages wrinklegeneration in SR results. Network architecture.

To further investigate the performanceof di ﬀ erent SR networks (SRResNet and RDN), we conduct anexperiment on the DRAPING dataset. In particular, we validateour cloth animation results on randomly selected 800 pairs ofLR / HR meshes from the DRAPING dataset, which are excludedfrom the training set, and cover di ﬀ erent complex hanging mo-tions in pendulum movement. In Figure 13, we depict the con-vergence curves of three di ﬀ erent features in the above validation reprint Submitted for review / Computers & Graphics (2020) 11

Ground Truth Chen et al. et al.

Fig. 12:

Here we qualitatively show the reconstruction results for unseen data in the DRAPING and HITTING dataset with mesh SR[12]. Theﬁrst column is the physics-based HR simulation. The second column is the results using method in [12]. The third column is our results. Thereconstruction accuracy is qualitatively showed as a di ﬀ erence map rendered side by side. Reconstruction errors are color-coded and the warmercolor indicates a larger error. Our method leads to signiﬁcantly lower reconstruction errors. Epoch PS NR ( d B ) RDNSRResNet (a)

Epoch PS NR ( d B ) RDNSRResNet (b)

Epoch PS NR ( d B ) RDNSRResNet (c)Fig. 13:

Convergence analysis of (a) the displacement, (b) the normal and (c) the velocity with di ﬀ erent super-resolution networks of RDN andSRResNet. Table 3: Comparison of PSNR of displacement images using di ﬀ erent trainingloss terms. Benchmark L d L d + n L d + v L d + n + v L all DRAPING 67.90 62.83

HITTING 67.11 68.23

SKIRT dataset. The convergence curves show that RDN achieves betterperformance than SRResNet, and further stabilizes the trainingprocess in all three features. The improved performance and sta-bilization are beneﬁted from contiguous memory, local residuallearning and global feature fusion in RDN. In SRResNet, localconvolutional layers do not have direct access to the subsequentlayers so it neglects to fully use information of each convolu-tional layer. As a result, RDN achieves better performance thanSRResNet.

9. Conclusions and future work

In this paper, we have presented a novel deep learning basedframework to synthesize cloth animations with abundant wrin-kles. Our evaluations show that the spatial and temporal fea-tures can be augmented with high frequency details using a multi-feature super-resolution network. The proposed network consistsof a sharing module to jointly learn the low-level representationand a task-speciﬁc module to focus on high-level semantics. Wedesign an additional kinematics-based loss to the network ob-jective that maintains frame-to-frame consistency across time.Quantitative and qualitative results reveal that our method can synthesize realistic-looking wrinkles in various scenes, such asdraping cloth, garments interacting with moving balls and hu-man bodies, etc. We also give details on how to create pairedmeshes using a synchronized simulation, as it contributes to con-struct large 3D datasets. These aligned coarse and ﬁne meshescan also be used for other related applications such as 3D shapematching of incompatible shape structures. In addition, our col-lision handling algorithm is independent of the wrinkle synthesisimplementations therefore can cope with other data-driven meth-ods. To the best of our knowledge, our approach is the ﬁrst oneto consider multiple features including geometry and time con-sistency on 3D dynamic wrinkle synthesis, and it can be conve-niently generalized to have more tasks cascaded together.Nevertheless, several limitations remain open for the futurework. Since our data are simulated sequences, we plan to investi-gate the recurrent SR networks to capture the dynamics, and thenpotentially improve the consistency. In our work, the trainingdata is the paired LR / HR meshes generated by a synchronizedsimulation. While tracking the LR cloth, the HR cloth cannotshow dynamic properties of a full simulation. We would liketo address this limitation by imposing unsupervised learning forunpaired data. In addition, the dataset could be further expandedincluding more scenes, motion sequences, and garment shapes.

References [1] Terzopoulos, D, Platt, J, Barr, A, Fleischer, K. Elastically deformablemodels. In: SIGGRAPH. 1987, p. 205–214.[2] Bridson, R, Marino, S, Fedkiw, R. Simulation of clothing with folds andwrinkles. In: Proc. Symp. Computer Animation. 2003, p. 28–36.2 Preprint Submitted for review / Computers & Graphics (2020)[3] Bara ﬀ , D, Witkin, A. Large steps in cloth simulation. In: SIGGRAPH.1998, p. 43–54.[4] Provot, X. Collision and self-collision handling in cloth model dedicatedto design garments. In: EG Workshop on Computer Animation and Simu-lation. 1997, p. 177–189.[5] Wang, H, Hecht, F, Ramamoorthi, R, O’Brien, J. Example-based wrinklesynthesis for clothing animation. ACM Trans Graph 2010;29(4):107:1–107:8.[6] Zurdo, JS, Brito, JP, Otaduy, MA. Animating wrinkles by example onnon-skinned cloth. IEEE Trans Visual Comput Graph 2013;19(1):149–158.[7] Santesteban, I, Otaduy, MA, Casas, D. Learning-based animation ofclothing for virtual try-on. In: Computer Graphics Forum; vol. 38. WileyOnline Library; 2019, p. 355–366.[8] Feng, WW, Yu, Y, Kim, BU. A deformation transformer for real-timecloth animation. ACM Trans Graph 2010;29(4):108:1–108:10.[9] de Aguiar, E, Sigal, L, Treuille, A, Hodgins, JK. Stable spaces forreal-time clothing. ACM Trans Graph 2010;29(3):106:1–106:9.[10] Kavan, L, Gerszewski, D, Bargteil, AW, Sloan, PP. Physics-inspired upsampling for cloth simulation in games. ACM Trans Graph2011;30(4):93:1–93:10.[11] L¨ahner, Z, Cremers, D, Tung, T. Deepwrinkles: Accurate and real-istic clothing modeling. In: European Conference on Computer Vision.Springer; 2018, p. 698–715.[12] Chen, L, Ye, J, Jiang, L, Ma, C, Cheng, Z, Zhang, X. Synthesizingcloth wrinkles by CNN-based geometry image superresolution. ComputerAnimation and Virtual Worlds 2018;29(3-4):e1810.[13] Oh, YJ, Lee, TM, Lee, IK. Hierarchical cloth simulation using deep neuralnetworks. In: Proceedings of Computer Graphics International 2018. 2018,p. 139–146.[14] Ye, J, Ma, G, Jiang, L, Chen, L, Li, J, Xiong, G, et al. A uniﬁedcloth untangling framework through discrete collision detection. ComputerGraphics Forum 2017;36(7):217–228.[15] Lim, B, Son, S, Kim, H, Nah, S, Lee, KM. Enhanced deep resid-ual networks for single image super-resolution. In: IEEE Conference onComputer Vision and Pattern Recognition Workshops (CVPRW). 2017, p.1–4.[16] Ledig, C, Theis, L, Husz´ar, F, Caballero, J, Cunningham, A, Acosta,A, et al. Photo-realistic single image super-resolution using a generativeadversarial network. arXiv preprint arXiv:160904802 2016;.[17] Zhang, Y, Tian, Y, Kong, Y, Zhong, B, Fu, Y. Residual dense networkfor image super-resolution. In: IEEE Conference on Computer Vision andPattern Recognition (CVPR). 2018, p. 1–10.[18] Gu, X, Gortler, SJ, Hoppe, H. Geometry images. ACM Trans on Graph2002;21(3):355–361.[19] Grinspun, E, Hirani, AN, Desbrun, M, Schr¨oder, P. Discrete shells. In:Proc. Symp. Computer Animation. 2003, p. 62–67.[20] Harmon, D, Vouga, E, Smith, B, Tamstorf, R, Grinspun, E. Asyn-chronous Contact Mechanics. ACM Trans Graph 2009;28(3).[21] Volino, P, Magnenat-Thalmann, N. Collision and self-collision detection:e ﬃ cient and robust solutions for highly deformable surfaces. In: ComputerAnimation and Simulation. 1995, p. 55–65.[22] Choi, KJ, Ko, HS. Stable but responsive cloth. ACM Trans Graph2002;21(3):604–611.[23] Hauth, M, Etzmuß, O, Straßer, W. Analysis of numerical methods forthe simulation of deformable models. The Visual Computer 2003;19(7–8):581–600.[24] Narain, R, Samii, A, O’Brien, JF. Adaptive anisotropic remeshing forcloth simulation. ACM Trans on Graph 2012;31(6):147:1–10.[25] Weidner, NJ, Piddington, K, Levin, DI, Sueda, S. Eulerian-on-lagrangiancloth simulation. ACM Transactions on Graphics (TOG) 2018;37(4):50.[26] Liu, T, Bargteil, AW, O’Brien, JF, Kavan, L. Fast simulation of mass-spring systems. ACM Trans Graph 2013;32(6):1 – 7.[27] Wang, H, Yang, Y. Descent methods for elastic body simulation on thegpu. ACM Trans Graph 2016;35(6):212:1–212:10.[28] Guan, P, Reiss, L, Hirshberg, DA, Weiss, A, Black, MJ. DRAPE:Dressing any person. ACM Trans Graph 2012;31(4):35:1–35:9.[29] Kim, D, Koh, W, Narain, R, Fatahalian, K, Treuille, A, O’Brien, JF.Near-exhaustive precomputation of secondary cloth e ﬀ ects. ACM TransGraph 2013;32(4):87:1–87:8.[30] Kim, TY, Vendrovsky, E. Drivenshape: A data-driven approach for shapedeformation. In: ACM SIGGRAPH / Eurographics Symposium on Com-puter Animation. 2008, p. 49–55.[31] Gundogdu, E, Constantin, V, Seifoddini, A, Dang, M, Salzmann, M, Fua,P. Garnet: A two-stream network for fast and accurate 3d cloth draping.arXiv preprint arXiv:181110983 2018;. [32] Wang, TY, Ceylan, D, Popovic, J, Mitra, NJ. Learning a shared shapespace for multimodal garment design. ACM Trans Graph 2018;37(6):1:1–1:14. doi: .[33] Hahn, F, Thomaszewski, B, Coros, S, Sumner, RW, Cole, F, Meyer,M, et al. Subspace clothing simulation using adaptive bases. ACM TransGraph 2014;33(4):105:1–105:9.[34] Wang, PS, Liu, Y, Guo, YX, Sun, CY, Tong, X. O-cnn: Octree-basedconvolutional neural networks for 3d shape analysis. ACM Transactions onGraphics (TOG) 2017;36(4):1–11.[35] Su, H, Maji, S, Kalogerakis, E, Learned-Miller, EG. Multi-view convo-lutional neural networks for 3D shape recognition. In: IEEE ICCV. 2015,.[36] Li, Y, Tsiminaki, V, Timofte, R, Pollefeys, M, Gool, LV. 3d appearancesuper-resolution with deep learning. In: Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition. 2019, p. 9671–9680.[37] Sinha, A, Bai, J, Ramani, K. Deep learning 3D shape surfaces usinggeometry images. In: European Conference on Computer Vision (ECCV).2016, p. 223–240.[38] Sinha, A, Unmesh, A, Huang, Q, Ramani, K. SurfNet: Generating 3Dshape surfaces using deep residual networks. In: CVPR. 2017, p. 6040–6049.[39] Maron, H, Galun, M, Aigerman, N, Trope, M, Dym, N, Yumer, E, et al.Convolutional neural networks on surfaces via seamless toric covers. ACMTrans Graph 2017;36(4):71:1–71:10.[40] Wu, Y, Kalra, P, Thalmann, NM. Simulation of static and dynamic wrin-kles of skin. In: Computer Animation ’96. Proceedings. 1996, p. 90–97.[41] Rohmer, D, Popa, T, Cani, MP, Hahmann, S, She ﬀ er, A. Anima-tion wrinkling: Augmenting coarse cloth simulations with realistic-lookingwrinkles. ACM Trans Graph 2010;29(6):157:1–157:8.[42] Wang, TY, Shao, T, Fu, K, Mitra, NJ. Learning an intrinsic garmentspace for interactive authoring of garment animation. ACM Transactionson Graphics (TOG) 2019;38(6):220.[43] Dong, C, Loy, CC, He, K, Tang, X. Image super-resolution using deepconvolutional networks. IEEE Trans Pattern Analysis and Machine Intelli-gence 2016;38(2):295–307.[44] Kim, J, Kwon Lee, J, Mu Lee, K. Accurate image super-resolution usingvery deep convolutional networks. In: Proceedings of the IEEE conferenceon computer vision and pattern recognition. 2016, p. 1646–1654.[45] Kim, J, Kwon Lee, J, Mu Lee, K. Deeply-recursive convolutional networkfor image super-resolution. In: Computer Vision and Pattern Recognition(CVPR). 2016, p. 1637–1645.[46] Zhang, K, Zuo, W, Gu, S, Zhang, L. Learning deep CNN denoiserprior for image restoration. In: IEEE Conference on Computer Vision andPattern Recognition (CVPR). 2017, p. 0–10.[47] Dong, C, Loy, CC, Tang, X. Accelerating the super-resolution convolu-tional neural network. In: European Conference on Computer Vision. 2016,p. 391–407.[48] Shi, W, Caballero, J, Husz´ar, F, Totz, J, Aitken, AP, Bishop, R, et al.Real-time single image and video super-resolution using an e ﬃ cient sub-pixel convolutional neural network. In: IEEE CVPR. 2016, p. 1874–1883.[49] Kappeler, A, Yoo, S, Dai, Q, Katsaggelos, AK. Video super-resolutionwith convolutional neural networks. IEEE Transactions on ComputationalImaging 2016;2(2):109–122.[50] Caballero, J, Ledig, C, Aitken, AP, Acosta, A, Totz, J, Wang, Z, et al.Real-time video super-resolution with spatio-temporal networks and mo-tion compensation. In: IEEE Conference on Computer Vision and PatternRecognition. 2017, p. 1–7.[51] Liu, D, Wang, Z, Fan, Y, Liu, X, Wang, Z, Chang, S, et al. Robust videosuper-resolution with learned temporal dynamics. In: Computer Vision(ICCV), 2017 IEEE International Conference on. IEEE; 2017, p. 2526–2534.[52] Kabsch, W. A discussion of the solution for the best rotation to relatetwo sets of vectors. Acta Crystallographica Section A: Crystal Physics,Di ﬀ raction, Theoretical and General Crystallography 1978;34(5):827–828.[53] Glorot, X, Bengio, Y. Understanding the di ﬃ culty of training deep feed-forward neural networks. In: Proceedings of the thirteenth internationalconference on artiﬁcial intelligence and statistics. 2010, p. 249–256.[54] Halliday, D, Resnick, R, Walker, J. Fundamentals of physics. John Wiley& Sons; 2013.[55] M¨uller, M, Gross, M. Interactive virtual materials. In: Proceedings ofGraphics Interface 2004. Canadian Human-Computer Communications So-ciety; 2004, p. 239–246.[56] Caruana, R. Multitask learning. Machine learning 1997;28(1):41–75.[57] Aly, HA, Dubois, E. Image up-sampling using total-variation regulariza-tion with a new observation model. IEEE Transactions on Image Processing2005;14(10):1647–1659.reprint Submitted for review / Computers & Graphics (2020) 13[58] Zhang, H, Yang, J, Zhang, Y, Huang, TS. Non-local kernel regressionfor image and video restoration. In: European Conference on ComputerVision. Springer; 2010, p. 566–579.[59] Sun, X, Rosin, PL, Martin, R, Langbein, F. Fast and e ﬀﬀ