[PDF] Deep Geometric Texture Synthesis

Abstract

Recently, deep generative adversarial networks for image generation have advanced rapidly; yet, only a small amount of research has focused on generative models for irregular structures, particularly meshes. Nonetheless, mesh generation and synthesis remains a fundamental topic in computer graphics. In this work, we propose a novel framework for synthesizing geometric textures. It learns geometric texture statistics from local neighborhoods (i.e., local triangular patches) of a single reference 3D model. It learns deep features on the faces of the input triangulation, which is used to subdivide and generate offsets across multiple scales, without parameterization of the reference or target mesh. Our network displaces mesh vertices in any direction (i.e., in the normal and tangential direction), enabling synthesis of geometric textures, which cannot be expressed by a simple 2D displacement map. Learning and synthesizing on local geometric patches enables a genus-oblivious framework, facilitating texture transfer between shapes of different genus.

Full PDF

DDeep Geometric Texture Synthesis

AMIR HERTZ ∗ , Tel Aviv University

RANA HANOCKA ∗ , Tel Aviv University

RAJA GIRYES,

Tel Aviv University

DANIEL COHEN-OR,

Tel Aviv University

Recently, deep generative adversarial networks for image generation haveadvanced rapidly; yet, only a small amount of research has focused on gener-ative models for irregular structures, particularly meshes. Nonetheless, meshgeneration and synthesis remains a fundamental topic in computer graphics.In this work, we propose a novel framework for synthesizing geometrictextures. It learns geometric texture statistics from local neighborhoods( i.e., local triangular patches) of a single reference 3D model. It learns deepfeatures on the faces of the input triangulation, which is used to subdivideand generate offsets across multiple scales, without parameterization ofthe reference or target mesh. Our network displaces mesh vertices in anydirection ( i.e., in the normal and tangential direction), enabling synthesisof geometric textures, which cannot be expressed by a simple 2D displace-ment map. Learning and synthesizing on local geometric patches enables agenus-oblivious framework, facilitating texture transfer between shapes ofdifferent genus.CCS Concepts: •

Computing methodologies → Neural networks ; Shapeanalysis .Additional Key Words and Phrases: Geometric Deep Learning, Shape Syn-thesis

ACM Reference Format:

Amir Hertz, Rana Hanocka, Raja Giryes, and Daniel Cohen-Or. 2020. DeepGeometric Texture Synthesis.

ACM Trans. Graph.

39, 4, Article 108 (July 2020),11 pages. https://doi.org/10.1145/3386569.3392471

In recent years, neural networks for geometry processing haveemerged rapidly and changed the way we approach geometric prob-lems. Yet, common 3D modeling representations are irregular andunordered, which challenges the straightforward adaptation fromimage-based techniques. Recent advances enable applying convolu-tional neural networks (CNNs) on irregular structures, like pointclouds and meshes [Li et al. 2018a; Hanocka et al. 2019]. So far,these CNN-based methods have demonstrated promising successfor discriminative tasks like classification and segmentation. Onthe other hand, only a small amount of research has focused ongenerative models for irregular structures, particularly meshes [Gaoet al. 2019]. ∗ Joint first authors.Authors’ addresses: Amir Hertz, Tel Aviv University; Rana Hanocka, Tel Aviv University;Raja Giryes, Tel Aviv University; Daniel Cohen-Or, Tel Aviv University.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].© 2020 Association for Computing Machinery.0730-0301/2020/7-ART108 $15.00https://doi.org/10.1145/3386569.3392471

Fig. 1. Learning local geometric textures from a reference mesh (gold) andtransferring it to a target mesh (giraffe).

In this work, we take a step forward in developing generativemodels for meshes. We present a deep neural network that learnsthe geometric texture of a single 3D reference mesh, and can transferits texture to any arbitrary target mesh. Our generative frameworkuses a CNN to learn to model the unknown distribution of geometrictextures directly from an input triangular mesh. Our network learnslocal neighborhoods ( i.e., local triangular patches) from a referencemodel, which is used to subdivide and generate offsets over thetarget mesh to match the local statistics of the reference model. Forexample, see Figure 1, where the geometric spikes of the reference

ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020. a r X i v : . [ c s . G R ] J un

3D model are learned, and then synthesized on the target surface ofthe giraffe.In this work, we calculate deep features directly on the meshtriangles and exploit a unique property of triangular meshes. Everytriangle in manifold triangular mesh is adjacent to exactly three faces(Figure 3), which defines a fixed-sized convolutional neighborhood,similar in spirit to MeshCNN [Hanocka et al. 2019]. Our networkgenerates mesh vertex displacements to synthesize local geometries,which are indistinguishable from the local statistics of the referencetexture. To facilitate learning the statistics of geometric texturesover multiple scales, we process the mesh using a hierarchy. We startwith a low-resolution mesh ( e.g., an icosahedron), and iterativelysubdivide its faces and refine the geometry for each scale in thehierarchy.Our method of transferring geometric texture from a referencemodel to a target model has notable properties: (i) it requires noparameterization, of neither the reference nor target surface; (ii)the target surface can have an arbitrary genus, which is not neces-sarily compatible with the reference surface, and last but not least,(iii) it is generative: reference patches are not copied or mapped,instead, they are learned, and then probabilistically synthesized.Note that geometric textures can be rather complex, as shown inFigure 10, which cannot simply be expressed by 2D displacementmaps. Our network is given the freedom to displace mesh verticesin any direction, i.e., not only along the normal direction, but alsotangentially.We demonstrate results of transferring geometric textures fromsingle meshes to a variety of target meshes. We show that the refer-ence mesh can have a different genus than the target mesh. Moreover,we show that our generative probabilistic model synthesizes vari-ations of the reference geometric texture based on different latentcodes.

Fig. 2. Our method is agnostic to the genus of both the reference and targetmeshes. Learning the geometric texture on a cat with a genus of one, andtransferring it to the fertility statue with a genus of four.

Generative models have garnered significant attention since theintroduction of Generative Adversarial Network (GAN) [Goodfellowet al. 2014]. GANs are commonly trained on a large data set (typicallyimages), attempting to generate novel samples that come from thedistribution of the training data. Recently, some works presentedGANs trained on a single image [Zhou et al. 2018; Shocher et al.2018; Gandelsman et al. 2019; Shaham et al. 2019; Sun et al. 2019].The basic idea is to learn the distribution of local patches from thepatches of the reference image, and then apply the knowledge invarious applications. In the same spirit, in this work, we learn thedistribution of local patches, but of 3D triangular meshes, which,unlike images, have an irregular structure.

Deep generative models in 3D.

In recent years, a large body ofworks have proposed generating or synthesizing 3D shapes usingdeep neural networks. 3D shapes are commonly represented byirregular structures, which challenge the use of traditional convolu-tional neural networks. Thus, early approaches proposed using avolumetric representation, which naturally extends 2D image CNNconcepts to a 3D voxel grid [Wu et al. 2015, 2016]. However, applyingCNNs on 3D voxel-grids is highly inefficient, as it necessarily incurshuge amounts of memory, particularly when a high resolution isrequired.On the other hand, a sparse and more direct portrayal of shapesuses the point cloud representation, which is simple and native toscanning devices. Achlioptas et al. [2018] pioneered the conceptof deep generative models for point clouds, using the operatorsdefined in pointnet [Qi et al. 2017a], which uses 1 × ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020. eep Geometric Texture Synthesis • 108:3

Fig. 3. Method overview. Starting with the training input in the current scale in the hierarchy, we (1) add noise to the vertices and (2) extract local rotationand translation invariant input features per triangular face. We (3) learn face-based equivariant convolutional kernels which learn to generate differentialdisplacements per vertex with respect to the input. Subdividing the generated mesh progresses to the next level in the hierarchy.

3D objects from images. SDM-Net [Gao et al. 2019] is a VAE-basednetwork for generating genus-0 mesh parts, yet, the collective sumof the parts can define non-genus zero shapes.The most related work is MeshCNN [Hanocka et al. 2019], aneural network with operators that delete and un-collapse edgesfrom a mesh for discriminative tasks like segmentation. However,unlike Hanocka et al. [2019], in this work, we propose a generative network for synthesizing new mesh geometries. Since we learn fromlocal geometric patches, our framework is oblivious to genus, andcan transfer textures between arbitrary genus shapes (Figures 2 and8).

Texture transfer on Meshes.

Texturing a target surface hasbeen a fundamental problem in computer graphics. Basically, tex-ture mapping requires parameterizing the target surface to define alow-distortion mapping between the source surface and target sur-face [Sorkine et al. 2002; Lévy et al. 2002; Sheffer et al. 2007]. In themost common setting, the source surface is a plane with a trivial pa-rameterization. Naively, mapping a topological disc with boundaryonto a manifold without boundaries, necessarily yields noticeableseams, where the boundaries are mapped and form discontinuities.Various works dealing with special textures with symmetries havedeveloped continuous seamless mappings between closed surfaces(i.e., no boundaries) which have compatible genus [Aigerman et al.2015; Aigerman and Lipman 2015; Knöppel et al. 2015; Campen et al.2018]. Rather than mapping textures between surfaces, the texturescan be synthesized over the target surface. Ying et al. [2001] andTurk et al. [2001] have presented texture synthesis techniques thatsynthesize textures from a 2D exemplar directly on the trianglesof a target mesh. Their methods extend basic image space texturesynthesis techniques by forming local parameterization over themesh. Xu et al. [2009] present a more advanced method applied onmeshes which is based on texture optimization [Wexler et al. 2004;Kwatra et al. 2005].The above texture synthesis techniques are based on the premisethat there is a simple local mapping between patches on the targetand the source surfaces. Thus, they assume that the source surfaceis a flat image with a trivial parameterization [Chen et al. 2012]. Themethod we present does not map local patches, but learns the localgeometries from the source mesh and synthesizes local geometriesover the target mesh using a neural generative model. As notedearlier, local geometries are often too complex to be modeled by asimple 2 D displacement maps. Recently, Liu and Jacobson [2019]proposed an approach for cubic stylization, which can cubifiy a3D mesh directly, without any parameterization. Applying as-rigid-as-possible [Sorkine and Alexa 2007] reconstruction with an ℓ regularization on the normals leads to a cubic stylization that isdetail-preserving. ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020.

Fig. 4. Multiscale training data generation. Given a reference mesh with geometric texture, we create a series of multi-resolution training data using anoptimization strategy. Starting with a low-res template mesh we repeatedly subdivide and optimize the mesh geometry, to obtain a training input withincreasing resolution.

We present a framework for learning to synthesize the local geo-metric texture from a single mesh. We learn the statistics of thepatches in a hierarchical, coarse-to-fine manner, where the inputto each level is a subdivided version of the output coarser level.Figure 3 illustrates an overview of a single level in the hierarchy.We train a generative adversarial network (GAN) on the patches( i.e., local triangulations) of a single mesh, where the generator aimsto synthesize local mesh patches that are indistinguishable from thereference patch.Given a reference mesh with geometric textures, we create aseries of meshes which depict the reference mesh across multipleresolutions. This multi-resolution series is used as input to train thehierarchical network. We obtain these multi-scale training inputs viaa preliminary optimization strategy. Starting with a low-resolutiontemplate mesh, the vertices are optimized such that its surface willmatch the reference mesh. This template is repeatedly subdividedand optimized to fit the reference, resulting in a multi-scale repre-sentation of the reference mesh (example in Figure 4). From thispoint forward, the reference mesh is discarded , and these multi-scaletraining inputs are used to train the discriminator and generator.Starting with the coarsest scale training mesh, we add Gaussiannoise to its vertices, which are then used as input to the network.Then, we extract local geometric features per triangular face, whichare invariant to rigid transformations. The initial geometric featurespass through a series of face convolutions to learn deep features.The output of the final convolutional layer is a displacement vectorper triangular face, which describes a displacement for each of thethree incident vertices. To generate the final displacement vectorper-vertex, we average the displacement vectors of all its incidentfaces. The displaced mesh is then refined by a subdivision and fedas input to the next level in the hierarchy.The synthesized mesh in each scale is passed to the discrimina-tor in the same scale, which learns to discriminate whether localpatches ( i.e., faces) are real or fake . The discriminator trains face-based convolutional kernels to abstract the input geometric featuresto salient deep features, which indicate whether the local mesh is synthesized or real. Note that the series of generators and discrimi-nators have decreasing receptive fields that control the scale-spacefor synthesizing the geometric textures, in a similar hierarchicalfashion as Shaham et al. [2019] demonstrated on images.After training is complete, we discard the discriminators, and usethe series of multi-scale generators to displace vertices of any noveltarget mesh. The scale space of the synthesized geometric texture isdetermined by the scale of the generators employed. See for exampleFigure 6, where the training input (gold) is transferred to a target(gray) starting from the coarsest scale (left) to the finest scale (right).Note that the target mesh may have a different triangulation andgenus from the training input data.In the process, we exploit a unique property of triangle meshes:the one-ring neighborhood of a mesh triangle has a fixed size ofthree triangles (step (3) in Figure 3). Similar to the edge-based con-volution in MeshCNN [Hanocka et al. 2019], we learn convolutionalkernels which operate on the faces of each triangle, and the threeneighboring faces of each triangle. We apply convolutions on thefeatures of each face and the neighboring 1-ring faces. To be in-variant to the initial ordering of the mesh, we apply symmetricfunctions to the features in the neighboring 1-ring, resulting in anequivariant triangular face convolution. A triangular mesh is a special type of graph defined by a set ofvertices and triangles: ( V , F ), where V = { v , v · · · v n } is the un-ordered set of vertex positions in R . The mesh connectivity, oradjacency information, is designated by an unordered set of trian-gular faces F = { f , f · · · f m } , each containing a triplet of vertices,which implicitly constructs the edges of the graph E = { e , e · · · } (pairs of vertices). Input Features.

At each resolution level, the input features to ournetwork are defined locally on each face and describe the relationsbetween each face and its three neighboring faces. We define a localcoordinate system for each edge in every face, where the origin isthe edge midpoint. We use the face normal to define a consistentorientation for each local x , y , z axis. The local z -axis orientation ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020. eep Geometric Texture Synthesis • 108:5

Fig. 5. Unconditional mesh generation. Our method can unconditionallygenerate meshes (top rows), or conditionally generate meshes in differentscale spaces. Higher levels in the scale space conditioned on a higher levelinput mesh results in a synthesis that maintains the global structure of thereference mesh. is defined by the face normal, x -axis is the edge direction and y istheir cross product. Finally, we extract 4 features for each edge: edgelength and Cartesian coordinates of the opposite vertex to that edge,projected onto the local coordinate system (see step 2 of Figure 3).We denote the features of the three edges of the face f by thematrix S ∈ R × , where each row contains the features of a singleedge. These features are invariant to translations and rotations ofthe mesh. Moreover, these features contain enough information toreproduce the mesh in any global position and orientation from anyface. In our network, we first perform a 1 × face feature embedding . Then,we perform a symmetric convolution that takes into account thethree 1-ring neighbours of the face. Face Feature Embedding.

The geometric features per face serveas input features to our face-based convolutional neural network,which are subsequently abstracted to deep features. We denote thedimension of the feature vector in convolution layer i by d i . For thefirst convolutional layer of the network, we extract neural featuresˆ f ∈ R d for each face side via a linear layer ˆ S = д ( S | W , b ) = SW + b ,where S ∈ R X are the extracted features of that face, and W ∈ R × d and b ∈ R d are learned weights. As we want to generate aface embedding that is invariant to the order of neighbouring faces,we apply a max operation on the rows of ˆ S . This leads to an initialface embedding ˆ f that is invariant to rigid transformations andmesh face order. Convolutions on Faces.

In the subsequent convolutions, theinput face features are the deep embedding from the previous layer.Unlike the first block, in subsequent layers, the convolution operateson the face feature vector and the 3 neighboring face feature vectors.Abusing notation, denote by S ∈ R × d i the matrix whose rowscontains the intermediate face embedding of the neighbouring facesof a face f and by ˆ f ∈ R d i its intermediate embedding. Then,we define the linear operation for the face by д ( S , ˆ f | W S , W f , b ) = SW S + ˆ f T W f + b , where W S ∈ R d i × d i + , W f ∈ R d i × d i + , b ∈ R d i + are the learned weights. To ensure the convolution is invariantto the face ordering, we take a Max operation across the rows of д ( S , ˆ f | W S , W f , b ) . Vertex displacement.

In this work, the face-based convolutionsare used to build both the discriminator and generator networks.The discriminator uses the deep feature embedding to distinguishbetween real and fake faces, while the generator outputs 3 D dis-placement vectors which modify the input mesh geometry. Thegenerator outputs a single displacement vector per face, which isused to displace its three vertices symmetrically.Each face predicts a displacement vector that is shared across allthree vertices in that face, which is then projected onto the localcoordinate axis of each edge, respectively. Since each vertex is sharedby several faces, it receives multiple displacements. We average allof them to calculate its final displacement. Note that while each facepredicts a symmetric displacement to each vertex, the vertices canbe moved in all directions since they receive displacements from allthe incident faces. Realizing our goal to learn to synthesize geometric textures froma reference mesh, requires defining a method for upsampling , orsubdividing the input mesh to achieve a hierarchical scale space.After defining a subdivision operator, it will be used to iterativelyincrease the resolution of some input mesh, such that with eachsubdivision, additional details from the reference mesh are added tothe input mesh (see golden mesh in Figure 6).

Uniform Subdivision.

In images upsampling is trivial, sincedownsampling and upsampling results in the same connectivity,

ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020.

Fig. 6. Hierarchical texture scale space. A series of multi-scale generators are trained to synthesize geometric textures across multiple scales using themulti-scale training inputs (gold). During test time the geometric textures are synthesized on a novel target shape (gray). The scale space of the synthesizedgeometric texture is defined by the scale of the generators employed. The target shape is input to the first-level generator which synthesizes the first texturescale in the output (left). This output is passed to the second-level generator which synthesizes the next scale, and so on. i.e., the local scale space of the image grid is preserved. However,for the irregular mesh structure, we must define an operator whichcan upsample both the training and inference meshes in a similarmanner. For example, it is not sufficient to simply collapse edges insome pre-defined order, and then restore the collapses, since therewould be no way of transferring this operation to a novel mesh. Tothis end, we propose using a uniform subdivision operator, whichwill have the same behavior on any given connectivity.Uniform subdivision divides each face into four faces, by placinga triangle inside each face (see step 5 of Figure 3). A vertex is placedin the midpoint of every edge in the triangle, which increases themesh resolution by four. This operation is fixed , meaning that givena specific connectivity (regardless of vertex placement), the uniformsubdivision will always generate the same mesh. This propertyis desirable for transferring to novel target meshes which have adifferent connectivity than the training mesh.

Multiscale Input Shapes.

Given a reference mesh with geomet-ric texture, we employ a pre-processing phase to prepare a series ofmulti-scale input shapes which we will use for training. The userdefines a low resolution template which is iteratively subdivided anddeformed to match the reference mesh. The template was chosen tobe either: an icosahedron, torus or coarse mesh (simplified versionof the reference). Note that for a given reference mesh, the exacttessellation will not be recovered using uniform subdivision fromsome template. For this reason, we remesh the reference shape priorto training, and only use the multi-scale inputs during training ( i.e., discard the reference).The proposed re-meshing procedure will generate a series ofmulti-scale training inputs. We create increasing resolutions (or scales) of the reference mesh via an optimization procedure. Startingwith a template mesh, we iteratively subdivide and minimize thedistance to the reference mesh. As the number of mesh elementsincrease, the optimization will obtain a better fit to the referencemesh.We solve this optimization problem through back-propagation,where the minimizer is the vertex locations of the training meshes.The optimization objective is measured by a bi-directional Chamferdistance between uniformly sampled points on both the referenceand the optimized mesh. This distance is the Euclidean distancebetween each point on the training mesh and its closest point in thereference (and vice-versa), in addition to a negative cosine similaritybetween the normals on the meshes at those points.We add two regularization terms to this optimization process toobtain a locally uniform triangulation and a smooth shape. The first(uniform) term encourages the minimization of the variance of theedge lengths and the second (smoothness) term reduces the distancebetween each vertex v i on the mesh to the average coordinate ofits one-ring: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) v − d i (cid:213) j : ( i , j )∈ E v j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where d i is the degree of the vertex v . We describe how we use our face-based convolutional layers todesign a GAN model (generator and discriminator) that learns thelocal geometric statistics from a single mesh using a hierarchy of

ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020. eep Geometric Texture Synthesis • 108:7

Fig. 7. Geometric texture synthesis which are learned from a reference shape (gold) and transferred to different target shapes. Textures can be synthesizedfrom natural shapes with geometric textures ( e.g., the thorny lizard).

ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020.

Fig. 8. Latent synthesized textures. By sampling different noise vectors, wecan synthesize variations of the geometric texture. generators. The generator network learns to predict vertex displace-ments to generate local geometries which are indistinguishable fromthe local statistics of the reference texture.

Hierarchical GAN training.

We synthesize geometric texturesvia a series of generators which create local geometries incremen-tally. The output of a generator at a given level is local refinementsto the input mesh, which is subdivided and used as input to thegenerator in the next level. In this manner, displacements in thecoarse generator correspond to large refinements on the final mesh,and as the mesh progresses through the hierarchy, the generatordisplacements become fine-grained. This eases the training sinceeach generator level only needs to capture the local refinements ofeach scale.The generator in our model receives an input mesh and a tensor ofnoise z that is added to the input mesh vertices. Then, the generatoroutputs a displacement vector per-face, which is applied on theinput mesh shape (without noise). The discriminator receives boththe modified input mesh ( fake ) and the corresponding real mesh ( i.e., training shape with the same resolution) as input. An illustration ofthe real meshes for each level is shown in Figure 4. The discriminatoris patch-based, so it learns to classify whether faces are real or fake .In other words, given an input mesh, the discriminator estimates aprobability per face of being real .As with standard GAN training, the goal of the generator is tofool the discriminator by generating shapes that are as similar tothe real mesh, and the goal of the discriminator is to be able todistinguish between the generator output and the true mesh. Weuse the WGAN-GP [Gulrajani et al. 2017] framework to train both. Fig. 9. Transferring geometric texture from the spikey ball (shown in Fig-ure 1) to a torus with different resolutions. Transferring the spikes to a low resolution torus results in coarse texture scale space. Increasing theresolution of the torus increases the transferred texture scale space.

In addition to the adversarial loss, we add also a reconstructionloss as suggested in [Shaham et al. 2019]. We require that for a givenfixed noise vector z = c , the generator will be able to reconstructthe real mesh. We use the MSE distance between the vertices of thegenerated and real meshes. To combine the two loss functions, weuse a parameter γ to weight the reconstruction loss.Training starts with the generator and discriminator in the coars-est level. The input to the coarsest generator is the template (+noise z = c ), while the desired output ( i.e., using reconstructionand adversarial loss) is the training input in the coarsest level. Boththe generator and the discriminator are trained until convergence.When progressing to train the next level, the generator from theprevious level is kept fixed. The output from the previous level issubdivided and scaled and then input to the next level. After subdivi-sion, we uniformly scale the mesh such that the mean-edge length ispreserved. We set c to be a fixed random noise vector in the coarsestresolution, and a vector of zeros in the higher resolutions. Inference.

Our generator network is fully convolutional, andtherefore it can be applied to any mesh with any connectivity andresolution. Given a new shape ( i.e., target mesh), we use the genera-tor to synthesize the learned local structures of the reference mesh.This is achieved by scaling the target shape to have an average edgelength of one ( i.e., input feature normalization). We use the targetshape plus random noise, as an input to the generator in one ofthe lower resolutions in the hierarchy. This leads to transferringthe (local) structure of the reference mesh to the target mesh. Note,unlike the reference mesh, the given connectivity of the target meshdoes not need to be re-meshed.

The reference meshes using for training our models are providedby Thingi10K dataset [Zhou and Jacobson 2016] or built by hand.Our PyTorch [Paszke et al. 2017] implementation as well as pre-trained models and multiscale training meshes will be made publiclyavailable upon publication.Each model contains 5 − ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020. eep Geometric Texture Synthesis • 108:9

Fig. 10. Geometric textures with complex 3D displacements. The network learns to synthesize geometric textures from the reference geometry, whosetangential movement is highlighted in a cross section illustration, respectively. Synthesizing the geometric textures on different target shapes (right - gray). levels, the generator and discriminator have 7 layers of face convo-lutions with instance normalization [Ulyanov et al. 2016] and leakyReLU. The face feature embedding dimension ( d in Section 3.2), orthe output of the first convolution layer increases as we move upthe hierarchy. Starting with 32 in the first level, and reaching 128 atthe third level. From the fourth level and onward, we initialize boththe generator and the discriminator models, with the weights of theprevious level. Each level was trained for 2000 iterations using theAdam optimizer [Kingma and Ba 2014] with a learning rate of 5 e − and learning rate decay of 0 . γ was set to 5. Hierarchical Generation.

Our hierarchical training allows syn-thesizing meshes starting from different levels of the generator.When starting from the lowest level with the template, the gener-ator outputs different meshes with different global structures (seetop rows of Figure 5). When synthesizing from higher levels byusing higher resolution inputs, the generator preserves the globalstructure of the input and only deforms local regions on the mesh(see bottom rows in Figure 5).This hierarchical characteristic enables applying our model on avariety of meshes with different resolutions during inference fromany given level. In general, we usually start from level 2-3 of thesource shape when performing the texture synthesis. However, thisis dependent on the training meshes, and the scale spaces that aredefined within each level. Some geometric textures require morelevels, while others can be compactly defined in a few levels.We evaluate the pretrained models by applying them on unseentarget meshes. Figure 6 shows the hierarchical generation of textures.Notice how the geometric texture is transferred gradually from thesource shape (in gold) to the target shape, where the process startsusing the source shape in a lower resolution and progresses whileincreasing it.

Texture Synthesis.

Figure 7 presents additional texture synthe-sis for various unseen target meshes from different reference shapes.Observe how our method is able to synthesize different geometricalstructures on directly on the target shape, without the use of anyparametrization. A remarkable property of our approach is that it can processpairs with a different genus. See for example the torus and the pigin Figure 7. In Figure 2 the generator was trained on the cat model(genus of one), and was transferred to a target fertility shape with agenus of four.Notice that the resolution (number of mesh faces) used in thetarget shape determines the scale of the texture synthesized onthem. Figure 9 demonstrates this effect by synthesizing spikes ona torus mesh, where we start from different mesh resolutions andtransfer the texture from levels 2 − Latent Space Interpolation.

Since our framework is generative,it enables synthesizing different textures from the same referenceshape. This can be done by sampling different noise vectors, result-ing in different synthesized textures on the target mesh. We showexamples on two different shapes and textures in Figure 8. Notethat since the generator was trained on a single reference mesh,the differences in the synthesized texture on the same target shapeare solely due the noise vector. This enables smoothly interpolatingbetween shapes by simply interpolating over the latent variable thatwas used for generation. Performing smooth interpolations betweenshapes enables animation of the textures. We provide several suchexamples in the supplementary material.

Comparison.

We compare against OptCuts [Li et al. 2018c], astate-of-the-art parameterization technique, in Figure 11. We man-ually create a 2D displacement map which corresponds to the 3Dreference shape (gold), and use OptCuts to automatically calculatethe parameterization and cutting of the 3D mesh, resulting in amapping of the displacement map to the target mesh. We use theUV mapping to displace vertices in the normal direction on thetarget mesh. On the other hand, our technique learns to synthesizegeometric textures directly from a 3D reference mesh. Finally, itcan be seen that the edges of OptCuts textures are sharp. However,automatically creating 2D displacement maps from 3D geometries isnon-trivial. Moreover, a 2D displacement map that moves geometry

ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020.

Fig. 11. Comparison to OptCuts [Li et al. 2018c] (left), which projects a2D displacement map to the target surface. This UV mapping is used todisplace vertices in the normal direction. The 2D displacement was estimatedmanually from the 3D reference shapes in gold: cylinder and coronavirus.Note the tangential displacements of the coronavirus are not capturedby a 2D displacement map. Our technique (right) learns to synthesize 3Dgeometric textures directly from the reference mesh (gold). in the normal direction is rather limited, since it does not encodetangential movement ( e.g., the coronavirus). Lastly, OptCuts took 10minutes to compute a parameterization, while our technique onlyrequires a few seconds.

We have presented a novel concept for geometric texture synthesis,which uses a generative framework to learn the local structuresfrom a given triangular mesh and then synthesizes it on differenttarget models. Our technique learns to match the local statistics ofa specified mesh model and transfers it to a target one. To the bestof our knowledge, this is the first generative model that learns froma single mesh.

Fig. 12. Our method is limited to isometric textures. The vertical (cactus)and horizontal (brick) texture direction is not transferred to to the duck.

A prominent advantage of our scheme is that it does not requireany parameterization of the reference or target shape. Given a modelwith natural organic geometric texture (i.e., lizard Figure 6), whichis not given as a displacement map, it is not immediately obvioushow to employ a classic parameterization technique to transfer thegeometric texture to another (target) shape ( i.e., squirrel Figure 6).There is no generic method for decomposing an arbitrary surfacewith geometric textures into a base and displacements. Furthermore,not every geometric texture is simple enough to be representedas displacements along the surface normal ( e.g., reference shapeswith tangential movement in Figure 10). By contrast, our approachreceives a reference 3D model which contains geometric texture( i.e., not a displacement map), and learns to synthesize geometricstructures by displacing vertices in all directions ( e.g., not only alongthe normal direction but also tangentially).However, the presented method has its limitations. First, it learnsto synthesize local textures, and cannot capture large structures.Moreover, it currently assumes that the geometric textures are sta-tionary and isometric ( e.g.,

Figure 12). Handling anistropic textureswould entail learning a directional field which can be transferredfrom the reference to the target mesh, a difficult task in and of itself.Moreover, even after the directional field is estimated, synthesizingthe final geometric texture from the directional field is not a trivialtask. Another limitation is that the hierarchical learning requiresthe mesh to have a locally-uniform triangulation and well-behavedsubdivision structure. Currently, we achieve this via a preliminaryremeshing process. This remeshing procedure may fail on complexshapes, e.g., thin and intertwined structures, and in general, wherethe euclidean and geodesic distances differ significantly. In the fu-ture we would like to relax this requirement, and build the hierarchyby learning vertex splits.While the focus of this work is geometric texture synthesis froma single mesh, our approach opens the door for a variety of followup works. For example, it is possible to use the machinery devel-oped in this work for transferring color texture or other attributes.Furthermore, by learning different positions of the same shape, thegenerative model can be used to interpolate between two positionsand thus, animate shapes in a controlled direction.Another possible application of our method employs a geometrictexture transfer using a two-step mapping. First, we build a localgeometric texture on a simple shape such as a sphere or a torus,where automatic or semi-automatic tools work well. Then, thismesh is used as an intermediate shape toward the ultimate goal ofgenerating textures on the target mesh. This two-step method isreminiscent of the 30+ year old work of Bier and Sloan [1986] fortexture mapping.Learning to synthesize is a challenging task, especially whenit comes to irregular geometric data. The proposed face convolu-tion facilitates the development of a GAN framework for triangularmeshes. Our learning-based technique leads to results that are diffi-cult to achieve using state-of-the-art graphics tools, or requires atailored solution for each target shape. We believe that this workis just a first step towards the development of more deep-learningtechniques for 3D generative mesh models.

ACM Trans. Graph., Vol. 39, No. 4, Article 108. Publication date: July 2020. eep Geometric Texture Synthesis • 108:11

ACKNOWLEDGMENTS

We would like to thank the anonymous reviewers for their help-ful comments. This work is supported by the NSF-BSF grant (No.2017729), the European research council (ERC-StG 757497 PI Giryes),ISF grant 2366/16, and the Israel Science Foundation ISF-NSFC jointprogram grant number 2217/15, 2472/17.

REFERENCES

Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. 2018. Learn-ing Representations and Generative Models for 3D Point Clouds. In

InternationalConference on Machine Learning . 40–49.Noam Aigerman and Yaron Lipman. 2015. Orbifold Tutte embeddings.

ACM Trans.Graph.

34, 6 (2015), 190–1.Noam Aigerman, Roi Poranne, and Yaron Lipman. 2015. Seamless surface mappings.

ACM Transactions on Graphics (TOG)

34, 4 (2015), 72.Heli Ben-Hamu, Haggai Maron, Itay Kezurer, Gal Avineri, and Yaron Lipman. 2018.Multi-chart generative surface modeling. In

SIGGRAPH Asia 2018 Technical Papers .ACM, 215.Eric A Bier and Kenneth R Sloan. 1986. Two-part texture mappings.

IEEE ComputerGraphics and applications

6, 9 (1986), 40–53.Marcel Campen, Hanxiao Shen, Jiaran Zhou, and Denis Zorin. 2018. SeamlessParametrization with Arbitrarily Prescribed Cones. arXiv preprint arXiv:1810.02460 (2018).Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaakko Lehtinen, Alec Jacobson,and Sanja Fidler. 2019a. Learning to predict 3d objects with an interpolation-baseddifferentiable renderer. In

Advances in Neural Information Processing Systems . 9605–9616.Xiaobai Chen, Tom Funkhouser, Dan B Goldman, and Eli Shechtman. 2012. Non-parametric texture transfer using meshmatch.

Technical Report Technical Report2012-2 (2012).Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang. 2019b. BSP-Net: Generating Com-pact Meshes via Binary Space Partitioning. arXiv:cs.CV/1911.06971Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shapemodeling. In

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition . 5939–5948.Yossi Gandelsman, Assaf Shocher, and Michal Irani. 2019. "Double-DIP": UnsupervisedImage Decomposition via Coupled Deep-Image-Priors. (6 2019).Lin Gao, Jie Yang, Tong Wu, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai, and Hao Zhang.2019. SDM-NET: Deep generative network for structured deformable mesh.

ACMTransactions on Graphics (TOG)

38, 6 (2019), 1–15.Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In

Advances in neural information processing systems . 2672–2680.Thibault Groueix, Matthew Fisher, Vladimir Kim, Bryan Russell, and Mathieu Aubry.2018. AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation. In

CVPR 2018 .Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron CCourville. 2017. Improved training of wasserstein gans. In

Advances in neuralinformation processing systems . 5767–5777.Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and DanielCohen-Or. 2019. MeshCNN: A Network with an Edge.

ACM Trans. Graph.

38, 4,Article 90 (July 2019), 12 pages. https://doi.org/10.1145/3306346.3322959Amir Hertz, Rana Hanocka, Raja Giryes, and Daniel Cohen-Or. 2020. PointGMM: aNeural GMM Network for Point Clouds. In

Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition . 12054–12063.Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Felix Knöppel, Keenan Crane, Ulrich Pinkall, and Peter Schröder. 2015. Stripe Patternson Surfaces.

ACM Trans. Graph.

34 (2015). Issue 4.Ilya Kostrikov, Zhongshi Jiang, Daniele Panozzo, Denis Zorin, and Joan Bruna. 2018.Surface networks. In

Proceedings of the IEEE Conference on Computer Vision andPattern Recognition . 2540–2548.Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra. 2005. Texture optimizationfor example-based synthesis. In

ACM Transactions on Graphics (ToG) , Vol. 24. ACM,795–802.Bruno Lévy, Sylvain Petitjean, Nicolas Ray, and Jérome Maillot. 2002. Least squaresconformal maps for automatic texture atlas generation. In

ACM transactions ongraphics (TOG) , Vol. 21. ACM, 362–371.Jiaxin Li, Ben M Chen, and Gim Hee Lee. 2018b. So-net: Self-organizing network forpoint cloud analysis. In

Proceedings of the IEEE conference on computer vision andpattern recognition . 9397–9406.Minchen Li, Danny M. Kaufman, Vladimir G. Kim, Justin Solomon, and Alla Sheffer.2018c. OptCuts: Joint Optimization of Surface Cuts and Parameterization.

ACM Transactions on Graphics

37, 6 (2018). https://doi.org/10.1145/3272127.3275042Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018a. PointCNN. arXivpreprint arXiv:1801.07791 (2018).Hsueh-Ti Derek Liu and Alec Jacobson. 2019. Cubic stylization.

ACM Transactions onGraphics (TOG)

38, 6 (Nov 2019), 1âĂŞ10. https://doi.org/10.1145/3355089.3356495Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Love-grove. 2019. Deepsdf: Learning continuous signed distance functions for shaperepresentation. arXiv preprint arXiv:1901.05103 (2019).Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, ZacharyDeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Auto-matic differentiation in PyTorch. In

NIPS-W .Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017a. Pointnet: Deeplearning on point sets for 3d classification and segmentation. In

Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition . 652–660.Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. Pointnet++: Deephierarchical feature learning on point sets in a metric space. In

Advances in neuralinformation processing systems . 5099–5108.Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning aGenerative Model from a Single Natural Image. arXiv preprint arXiv:1905.01164 (2019).Alla Sheffer, Emil Praun, Kenneth Rose, et al. 2007. Mesh parameterization methodsand their applications.

Foundations and Trends® in Computer Graphics and Vision

Proceedings of the IEEE Conference on Computer Vision andPattern Recognition . 3118–3126.Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling. In

Sympo-sium on Geometry processing , Vol. 4. 109–116.Olga Sorkine, Daniel Cohen-Or, Rony Goldenthal, and Dani Lischinski. 2002. Bounded-distortion piecewise mesh parameterization. In

Proceedings of the conference onVisualization’02 . IEEE Computer Society, 355–362.Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A Efros, and Moritz Hardt.2019. Test-Time Training for Out-of-Distribution Generalization. arXiv preprintarXiv:1909.13231 (2019).Greg Turk. 2001. Texture synthesis on surfaces. In

Proceedings of the 28th annualconference on Computer graphics and interactive techniques . ACM, 347–354.Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance normalization:The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016).Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. 2018.Pixel2mesh: Generating 3d mesh models from single rgb images. In

Proceedings ofthe European Conference on Computer Vision (ECCV) . 52–67.Yonatan Wexler, Eli Shechtman, and Michal Irani. 2004. Space-time video completion.In

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision andPattern Recognition, 2004. CVPR 2004. , Vol. 1. IEEE, I–I.Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016.Learning a probabilistic latent space of object shapes via 3d generative-adversarialmodeling. In

Advances in neural information processing systems . 82–90.Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, andJianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes.In

Proceedings of the IEEE conference on computer vision and pattern recognition .1912–1920.Kai Xu, Daniel Cohen-Or, Tao Ju, Ligang Liu, Hao Zhang, Shizhe Zhou, and YueshanXiong. 2009. Feature-aligned shape texturing.

ACM Transactions on Graphics (TOG)

28, 5 (2009), 108.Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and BharathHariharan. 2019. Pointflow: 3d point cloud generation with continuous normalizingflows. In

Proceedings of the IEEE International Conference on Computer Vision . 4541–4550.Lexing Ying, Aaron Hertzmann, Henning Biermann, and Denis Zorin. 2001. Textureand shape synthesis on surfaces. In

Rendering Techniques 2001 . Springer, 301–312.Qingnan Zhou and Alec Jacobson. 2016. Thingi10K: A Dataset of 10,000 3D-PrintingModels. arXiv preprint arXiv:1605.04797 (2016).Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, and Hui Huang.2018. Non-stationary Texture Synthesis by Adversarial Expansion.