[PDF] On Demand Solid Texture Synthesis Using Deep 3D Networks

Abstract

This paper describes a novel approach for on demand volumetric texture synthesis based on a deep learning framework that allows for the generation of high quality 3D data at interactive rates. Based on a few example images of textures, a generative network is trained to synthesize coherent portions of solid textures of arbitrary sizes that reproduce the visual characteristics of the examples along some directions. To cope with memory limitations and computation complexity that are inherent to both high resolution and 3D processing on the GPU, only 2D textures referred to as "slices" are generated during the training stage. These synthetic textures are compared to exemplar images via a perceptual loss function based on a pre-trained deep network. The proposed network is very light (less than 100k parameters), therefore it only requires sustainable training (i.e. few hours) and is capable of very fast generation (around a second for 256 3 voxels) on a single GPU. Integrated with a spatially seeded PRNG the proposed generator network directly returns an RGB value given a set of 3D coordinates. The synthesized volumes have good visual results that are at least equivalent to the state-of-the-art patch based approaches. They are naturally seamlessly tileable and can be fully generated in parallel.

Full PDF

OOn Demand Solid Texture Synthesis Using Deep 3D Networks

J. Gutierrez J. Rabin B. Galerne T. Hurtut Polytechnique Montréal, Canada Normandie Univ. UniCaen, ENSICAEN, CNRS, GREYC, France Institut Denis Poisson, Université d’Orléans, Université de Tours, CNRS, France

Abstract

This paper describes a novel approach for on demand volumetric texture synthesis based on a deep learning framework thatallows for the generation of high quality 3D data at interactive rates. Based on a few example images of textures, a generativenetwork is trained to synthesize coherent portions of solid textures of arbitrary sizes that reproduce the visual characteristicsof the examples along some directions. To cope with memory limitations and computation complexity that are inherent to bothhigh resolution and 3D processing on the GPU, only 2D textures referred to as “slices” are generated during the trainingstage. These synthetic textures are compared to exemplar images via a perceptual loss function based on a pre-trained deepnetwork. The proposed network is very light (less than 100k parameters), therefore it only requires sustainable training (i.e. fewhours) and is capable of very fast generation (around a second for voxels) on a single GPU. Integrated with a spatiallyseeded PRNG the proposed generator network directly returns an RGB value given a set of 3D coordinates. The synthesizedvolumes have good visual results that are at least equivalent to the state-of-the-art patch based approaches. They are naturallyseamlessly tileable and can be fully generated in parallel. Keywords :

Solid texture; On demand texture synthesis; Generative networks; Deep learning;

1. Introduction

2D textures are ubiquitous in 3D graphics applications. Theirvisual complexity combined with a widespread availability allowsfor the enrichment of 3D digital objects’ appearance at a low cost.In that regard, solid textures, which are the 3D equivalent of sta-tionary raster images, offer several visual quality advantages overtheir 2D counterparts. Solid textures eliminate the need for a sur-face parametrization and its accompanying visual artifacts. Theyproduce the feeling that the object was carved from the texture ma-terial. Additionally, the availability of consistent volumetric colorinformation allows for the interactive manipulation including ob-ject fracturing or cut-away views to reveal internal texture details.However, unlike scanning a 2D image, digitization of volumetriccolor information is impractical. As a result, most of the existentsolid textures are synthetic.One early way to generate solid textures is by procedural genera-tion [Pea85,Per85]. In procedural methods the color of a texture at a

This document is a lightweight preprint version of the journal articlepublished in Computer Graphics Forum. DOI:10.1111/cgf.13889 ( https://doi.org/10.1111/cgf.13889 ) Another preprint version with uncom-pressed images is available here: https://hal.archives-ouvertes.fr/hal-01678122v3 . given point only depends on its coordinates. This allows for a local-ized evaluation to generate only the required portions of texture ata given moment. We refer to this characteristic as on demand eval-uation. Procedural methods are indeed fast and memory efﬁcient.Unfortunately ﬁnding the right parameters of a procedural modelto synthesize a given texture requires a high amount of expertiseand trial and error. Photo-realistic textures with visible elementalpatterns are particularly hard to generate by these methods.In order to give up the process of empirically tuning the modelfor a given texture, several by-example solid texture synthesis meth-ods have been proposed [HB95,KFCO ∗ target

2D texture example through all thecross-sections in a given set of slicing directions (inferring the 3Dstructure given the constrained directions). Although they do notalways deliver perfect results, by-example methods can be used toapproximate the characteristics of a broad set of 2D textures. Oneconvenient approach to synthesize textures by-example is called lazy synthesis [DLTD08, ZDL ∗ on demand synthesis i.e. synthesizing only voxels at a speciﬁc location in con-trast to generating a whole volume of values. Current lazy synthesismethods tend to deliver lower visual quality.Several solid texture synthesis methods arise as the extrapolationof a 2D model. While some approaches can be trivially expanded(e.g. procedural), others require a more complex strategy, e.g. pre- a r X i v : . [ c s . G R ] J a n J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks computation of 3D neighborhoods [DLTD08] or averaging of syn-thesized slices [HB95]. Currently, 2D texture synthesis methodsbased on Convolutional Neural Networks (CNN) [GEB15,UVL17]roughly deﬁne the state-of-the-art in the 2D literature.

Texture net-works methods [ULVL16, UVL17, LFY ∗ local evaluation in the literature (e.g. [WL03, LH05, DLTD08]),is critical for solid texture synthesis because storing the values of awhole solid texture at a useful resolution is prohibitive for mostapplications. It provides the ability to generate on demand onlythe needed portions of the solid texture. This speeds up texturingsurfaces and saves memory. It was elegantly addressed for patchbased methods for 2D and solid textures [WL03, LH05, DLTD08,ZDL ∗ ∗

18, YBS ∗

19] use the activationsin the hidden layers of VGG-19 [SZ14] as a descriptor networkto characterize the generated samples and evaluate their similaritywith the example. There is, however, not a volumetric equivalent ofsuch image classiﬁcation network that we could use off-the-shelf.Second, we need a strategy to surmount the enormous amount ofmemory demanded by the task.We propose a compact solid texture generator model based onCNN capable of on demand synthesis at interactive rates. On train-ing, we assess the samples’ appearance via a volumetric loss func-tion that compares slices of the generated textures to the target im-age. We exploit the stationarity of the model to propose a fast andmemory efﬁcient single-slice training strategy. This allows us touse target examples at higher resolutions than those in the current3D literature. The resulting trained network is simple, lightweight,and powerful at reproducing the visual characteristics of the exam-ple on the cross sections of the generated volume along one or upto three directions.

2. Related works

To the best of our knowledge, the method proposed here is the ﬁrstto employ a CNN to generate solid textures. Here we brieﬂy out-line the state-of-the-art on solid texture synthesis, then we describesome successful applications of using CNNs to perform 2D texturesynthesis. Finally we mention relevant CNN methods that use 2Dviews to generate 3D objects.

Procedural methods [Per85, Pea85] are quite convenient for com-puter graphics applications thanks to their real time computationand on demand evaluation capability. Essentially one can add tex-ture to a 3D surface by directly evaluating a function given the coordinates of only the required (visible) points in the 3D space.The principle is as follows for texture generation on a surface: acolormap function, such as a simple mathematical expression, isevaluated at each of the visible 3D points. In [Per85], authors usepseudo-random numbers that depend on the coordinates of the lo-cal neighborhood of the evaluated point which ensures both the ran-dom aspect of the texture and its spatial coherence. Creating realis-tic textures with a procedural noise is a trial and error process thatnecessitates technical and artistic skills. Some procedural methodsalleviate this process by automatically estimating their parametersfrom an example image [GD95,GLLD12,GSV ∗ slicing strategy where they alternate between independent2D synthesis of the slices and 3D aggregation. Another strategystarts with a solid noise and then iteratively modiﬁes its voxels byassigning them a color value depending on a set of coherent pix-els from the example. Wei [Wei03] uses the 2D neighborhood ofa pixel also called patch to assess coherence, then, the set of con-tributing pixels is formed by the closest patch along each axis ofthe solid. Finally, the assigned color is the average of the contribut-ing pixels. Dong et al. [DLTD08] determine coherence using threeoverlapping 2D neighborhoods (i.e. forming a 3D neighborhood)around each voxel and ﬁnd only the best match among a set of pre-computed candidates.The example-based solid texture synthesis methods that achievethe best visual results in a wider category of textures [KFCO ∗ input examples), and are incapable of on de-mand evaluation which limits their usability. Regarding speed, thepatch-based method proposed by Dong et al. [DLTD08] allows forfast on demand evaluation, thus allowing for visual versatility andpracticality. Here the patch-matching strategy is accelerated via thepre-processing of compatible 3D neighborhoods accordingly to theexamples given along some axis. This preprocessing is a trade-offbetween visual quality and speed as it reduces the richness of thesynthesized textures. Thus, their overall visual quality is less satis-factory than the one of the optimization methods previously men-tioned. Our method builds upon previous work on example-based 2D tex-ture synthesis using convolutional neural networks. We distinguishtwo types of approaches: image optimization and feed-forward tex-ture networks . . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Image optimization

Image optimization methods were inspiredfrom previous statistical matching approaches [PS00] and use avariational framework which aims at generating a new image thatmatches the features of an example image. The role of CNN in thisclass of methods is to deliver a powerful characterization of theimages. It typically comes in the form of feature maps at the in-ternal layers of a pretrained deep CNN [SZ14] namely VGG-19.Gatys et al. [GEB15] pioneered this approach for texture synthe-sis, by considering the discrepancy between the feature maps of thesynthesized image and the example ones. More precisely for tex-ture synthesis where one has to take into account spatial stationar-ity, the corresponding perceptual loss as coined later by [JAFF16]is the Frobenius distance of the Gram matrices of CNN featuresat different layers. Starting from a random initialization, the in-put image is then optimized via a stochastic gradient descent al-gorithm, where the gradient is computed using back-propagationthrough the CNN. Since then, several variants have built on thisframework to improve the quality of the synthesis: for structuredtextures by adding a Fourier spectrum discrepancy [LGX16] or us-ing co-occurence information computed between the feature mapsand their translation [BM17]; for non-local patterns by consideringspatial correlation and smoothing [SCO17]; for stability by consid-ering a histogram matching loss and smoothing [WRB17]. Thesemethods deliver good quality and high resolution results as theycan process images of resolutions up to 1024 pixels. Their maindrawback comes from the optimization process itself, as it requiresseveral minutes to generate one image. Implementing local eval-uation on these methods is infeasible since they use a global op-timization scheme as for patch-based texture optimization meth-ods [KEBK05, KFCO ∗ Feed-forward texture networks

Feed-forward networks ap-proaches were introduced by Johnson et al. [JAFF16] for styletransfer and Ulyanov et al. [ULVL16] for texture synthesis. In thelatter, they train an actual generative CNN to synthesize texturesamples that produce the same visual characteristics as the exam-ple. These methods use the loss function in [GEB15] to comparethe visual characteristics between the generated and example im-ages. However, instead of optimizing the generated output image,the training aims at tuning the parameters of the generative net-work. Such optimization can be more demanding since there is nospatial regularity shared across iterations as for the optimization ofa single image. However this training phase only needs to be doneonce for a given input example. This is achieved in practice usingback propagation and a gradient-based optimization algorithm us-ing batches of noise inputs. Once trained, the generator is able toquickly generate samples similar to the input example by forwardevaluation. Originally these methods train one network per texturesample. Li et al. [LFY ∗ ∗ ∗

14] with a purely con-volutional architecture [RMC16] to allow for ﬂexibility on the sizeof the samples to synthesize. This method shares the advantages offeed-forward networks regarding evaluation but is based in a morecomplex training scheme where two cascading networks have to beoptimized using an adversarial loss which can affect the qualityon different texture examples. Zhou et al. [ZZB ∗

18] use a com-bination of perceptual and adversarial loss and achieve impressiveresults for the synthesis of non-stationary textures. This is a moregeneral problem seldom addressed in the literature and it requiresextra assumptions about the behavior of the texture at hand. Sim-ilarly, Yu et al. [YBS ∗

19] train a CNN to perform texture mixingusing a hybrid approach, combining the perceptual loss and ad-versarial loss to help the model produce plausible new textures.Finally, Li et al. [LFY ∗ A related problem in computer vision is the generation of bi-nary volumetric objects from 2D images [GCC ∗

17, JREM ∗ ∗ ).

3. Overview

Figure 1 outlines the proposed method. We perform solid texturesynthesis using the convolutional neural generator network G de- J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks ...

Multi-scale noise input Synthesized solid 2D slices 3D slice-based lossGenerator DescriptorDescriptorTraining ...

Texture examples ...

Figure 1:

Training framework for the proposed CNN Generator network G ( ·| θ ) with parameters θ . The Generator processes a multi-scalenoise input Z to produce a solid texture v. The loss L compares, for each direction d, the feature statistics induced by the example u d in thelayers of the pre-trained Descriptor network D ( · ) to those induced by each slice of the set (cid:8) v d , , . . . , v d , N d (cid:9) . The training iteratively updatesthe parameters θ to reduce the loss. We show in Section 5 that we can perform training by only generating single-slice solids instead of fullcubes. tailed in Section 4. The generator with learnable parameters θ , takesa multi-scale volumetric noise input Z and processes it to producea color solid texture v = G ( Z | θ ) . The proposed model is able toperform on demand evaluation which is a critical property for solidtexture synthesis algorithms. On demand evaluation spares compu-tations and memory usage as it allows the generator to only synthe-size the voxels that are visible.The desired appearance of the samples v is speciﬁed in theform of a view dissimilarity term for each direction. The 3D gen-erated samples v are compared to D ∈ { , , } example images { u , . . . , u D } that correspond to the desired view along D directionsamong the 3 canonical directions of the Cartesian grid. The genera-tor learns to sample solid textures from the visual features extractedin the examples via the optimization of its parameters θ . To do so,we formulate a volumetric slice-based loss function L . It measureshow appropriate the appearance of a solid sample is by comparingits 2D slices v d , n ( n th slice in along the d th direction) to each corre-sponding example u d . Similar to previous methods, the comparisonis carried out in the space of features from the descriptor network D based on VGG-19.The training scheme, detailed in Section 5, involves the gener-ation of batches of solid samples which would a priori require aprohibitive amount of memory if relying on classical optimizationapproach for CNN. We overcome this limitation thanks to the sta-tionarity properties of the model. We show that training the pro-posed model only requires the generation of single slice volumesalong the speciﬁed directions. Section 6 presents experiments andcomparative results. Finally, in Section 7 we discuss the currentlimitations of the model.

4. On demand evaluation enabled CNN generator

The architecture of the proposed CNN generator is summarized inFigure 2 and detailed in Subsection 4.1. The generator applies a series of convolutions to a multi-scale noise input to produce a sin-gle solid texture. It is inspired by Ulyanov et al. [ULVL16] model,which stands out for on demand evaluation thanks to its small num-ber of parameters and its local dependency between the output andinput. It is based on a multi-scale approach inspired itself fromthe human visual system that has been successfully used in manycomputer vision applications, and in particular for texture synthe-sis [HB95, De 97, WL00, PS00, KEBK05, RPDB12, GLR18].This fully convolutional generator allows the generation of ondemand box-shaped/rectangular volume textures of an arbitrarysize (down to a single voxel) controlled by the size of the input. For-mally, given an inﬁnite noise input it represents an inﬁnite texturemodel. A ﬁrst step to achieve on demand evaluation is to control thesize of the generated sample. To do so, we unfold the contributionof the values in the noise input to each value in the output of thegenerator. This dependency is described in Subsection 4.2. Then,on demand voxel-wise generation is achieved thanks to the multi-scale shift compensation detailed in Subsection 4.3. The resultinggenerator is able to synthesize coherent and expandable portions ofa theoretical inﬁnite texture.

The generator produces a solid texture v = G ( Z | θ ) from a set ofmulti-channel volumetric white noise inputs Z = { z , . . . , z K } . Thespatial dimensions of Z directly control the size of the generatedsample. The process of transforming the noise Z into a solid tex-ture v is depicted in Figure 2. It follows a multi-scale architecturebuilt upon three main operations: convolution, concatenation, andupsampling. Starting at the coarsest scale, the 3D noise sample isprocessed with a set of convolutions followed by an upsampling toreach the next scale. It is then concatenated with the independentnoise sample from the next scale, itself also processed with a set ofconvolutions. This process is repeated K times before a ﬁnal singleconvolution layer that maps the number of channels to three to get a . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Convolution blockUpsamplingChannel concatenationSingle convolution layer

Figure 2:

Schematic of the network architecture. Noise input Z = { z , . . . , z K } at K + different scales is processed using convolutionoperations and non-linear activations. The information at different scales is combined using upsampling and channel concatenation. M i indicates the number of input channels and M s controls the number of channels at intermediate layers. For simplicity we consider a cubeshaped output with spatial size of N = N = N = N. For each intermediate cube the spatial size is indicated above and the number ofchannels below. color texture. We now detail the three different blocks of operationsused in the generator network.

Convolution block

A convolution block groups a sequence ofthree ordinary 3D convolution layers, each of them followed by abatch-normalization and a leaky rectiﬁed linear unit function. Con-sidering M in and M out channels in the input and output respectively,the ﬁrst convolution layer carries out the shift from M in to M out .The following two layers of the block have M out channels in boththe input and the output. The size of the kernels is 3 × × × × Upsampling

An upsampling performs a 3D nearest neighbor up-sampling by a factor of 2 on each spatial dimension (i.e. each voxelis replicated 8 times).

Channel concatenation

This operation ﬁrst applies a batch nor-malization operation and then concatenates the channels of twomulti-channel volumes having the same spatial size. If different,the biggest volume is cropped to the size of the smallest one.The learnable parameters θ of the generator are: the convolu-tion’s kernels and bias, and the batch normalization layers’ weight,bias, mean and variance. The training of these parameters is dis-cussed in Section 5. Forward evaluation of the generator is deterministic and local, i.e.each value in the output only depends on a small number of neigh-boring values in the multi-scale input. By handling the noise inputs correctly, we can feed the network separately with two contiguous portions of noise to synthesize textures that can be tiled seamlessly.Current 2D CNN methods [ULVL16, LFY ∗ N × N × N the size of theinput at the k -th scale has to be ( N k + c k ) × ( N k + c k ) × ( N k + c k ) where c k denotes the additional values along each spatial dimensionrequired due to the dependency. The size in any spatial dimension N can be any positive integer (provided the memory is large enoughfor synthesizing the volume).These additional values c k depend on the network architecture.In our case, thanks to the symmetry of the generator, the coefﬁ-cients c k are the same along each spatial dimension. Each convolu-tional block requires additional support of two values on each sidealong each dimension and each upsampling cuts down the depen-dency by two (taking the smallest following integer when the re-sult is fractional). At scale k = c =

4. For subsequent scales c k = (cid:100) ( c k − − ) / (cid:101) + K where there is only one convolutionblock and therefore c K = (cid:100) ( c K − − ) / (cid:101) +

2. For example, in or-der to generate a single voxel, the spatial dimensions of the noiseinputs must be 9 for z , 11 for z , 13 for z to z , and 9 for z ,which totals 9380 random values. On demand generation is a standard issue in procedural synthe-sis [Per85]. The purpose is to generate consistently any part of an

J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks inﬁnite texture model. It enables the generation of small textureblocks separately, whose localization depends on the geometry ofthe object to be texturized, instead of generating directly a full vol-ume containing the object. For procedural noise, this is achievedusing a reproducible Pseudo-Random Number Generator (PRNG)seeded with spatial coordinates.In our setting, we enforce spatial consistency by generating themulti-scale noise inputs using a xorshift PRNG algorithm [Mar03]seeded with values depending on the volumetric coordinates, chan-nel and scale, similarly to [GLM17]. Thus, our model only requiresthe set of reference 3D coordinates and the desired size to generatespatially coherent samples. Given a reference coordinate n at theﬁnest scale in any dimension, the corresponding coordinates at the k -th scale are computed as n k = (cid:98) n k (cid:99) . These corresponding noisecoordinates need to be aligned in order to ensure the coherence be-tween samples generated separately.Feeding the generator with the precise set of coordinates of noiseat each scale is only a ﬁrst step to successfully synthesize com-patible textures. Recall that the model is based on combinationsof transformed noises at different scales (see Figure 2), thereforerequiring special care regarding upsampling to preserve the coor-dinate alignment across scales, i.e. which coordinate n k at scale k must be associated to a given coordinate n at the ﬁnest scale k = pushing the restof the values spatially. Depending on the coordinates of the refer-ence voxel being synthesized, this shift of one position can disruptthe coordinate alignment with the subsequent scale. Therefore, thegenerator network has to compensate accordingly before each con-catenation.For K = K =

32 combinations ofcompensation shifts has to be properly done for each dimension tosynthesize a given voxel. In order to consider these compensationshifts, we make the generator network aware of the coordinate ofthe sample at hand. In our implementation the reference is on thevertex of the sample closest to the origin. Given the ﬁnal referencecoordinate n of the voxel (in any spatial dimension), the generatordeduces the set of shifts recursively from the following relation, n k − = n k + s k , where n k is the spatial reference coordinate usedto generate the noise at scale k ∈ { , . . . , K } , and s k ∈ { , } is theshift value used after the k th upsampling operation. At evaluationtime, the generator sequentially applies the set of shifts before everycorresponding concatenation operation.

5. Training

Here we detail our approach to obtain the parameters θ u that drivethe generator network to synthesize solid textures speciﬁed by theexample u . Like current texture networks methods, we leverage thepower of existing training frameworks to optimize the parametersof the generator. Typically an iterative gradient-based algorithm isused to minimize a loss function that measures how different thesynthetic and target textures are.However a ﬁrst challenge facing the training of the solid tex-ture generator is to devise a discrepancy measure between the solidtexture and the 2D examples. In Subsection 5.1 we propose a 3D slice-based loss function that collects the measurements producedby a set of 2D comparisons between 2D slices of the syntheticsolid and the examples. We conduct the 2D comparisons simi-larly to the state-of-the-art methods, using the perceptual loss func-tion [GEB15, JAFF16, UVL17].The second challenge comes from the memory requirementsduring training. Typically the optimization algorithm estimates adescent direction by applying backpropagation on the loss func-tion evaluated on a batch of samples. In the case of solid textures,each volumetric sample occupies a large amount of memory, whichmakes the batch processing impractical. Instead, we show in Sub-section 5.2 that, thanks to the stationary properties of our generativemodel, we can carry out the training using batches of single slicesolid samples. For a color solid v ∈ R N × N × N × , we denote by v d , n the n th v orthogonal to the d th direction. Given a number D ≤ { u , . . . , u D } , we propose the following slice-based loss L ( v |{ u , . . . , u D } ) = D ∑ d = N d N d − ∑ n = L ( v d , n , u d ) , (1)where L ( · , u ) is a 2D loss that computes the similarity between animage and the example u .We use the 2D perceptual loss L from [GEB15], which provedsuccessful for training the CNNs [JAFF16, ULVL16]. It comparesthe Gram matrices of the VGG-19 feature maps of the synthesizedand example images. The feature maps result from the evaluationof the descriptor network D on an image, i.e. D : x ∈ R N × N × (cid:55)→{ F l ( x ) ∈ R N l × M l } l ∈ L , where L is the set of considered VGG-19layers, each layer l having N l spatial values and M l channels. Foreach layer l , the Gram matrix G l ∈ R M l × M l is computed from thefeature maps as G l ( x ) = N l F l ( x ) T F l ( x ) , (2)where T indicates the transpose of a matrix. The 2D loss betweenthe input example u d and a slice v d , n is then deﬁned as L ( v d , n , u d ) = ∑ l ∈ L ( M l ) (cid:13)(cid:13)(cid:13) G l ( v d , n ) − G l ( u d ) (cid:13)(cid:13)(cid:13) F , (3)where (cid:107) · (cid:107) F is the Frobenius norm. Observe that the Gram matri-ces are computed along spatial dimensions to take into account thestationarity of the texture. Those Gram matrices encode both ﬁrstand second order information of the feature distribution (covarianceand mean). Formally, training the generator G ( Z | θ ) with parameters θ corre-sponds to minimizing the expectation of the loss in Equation 1given the set of examples { u , . . . , u D } , θ u ∈ argmin θ E Z [ L ( G ( Z | θ ) , { u , . . . , u D } )] , (4) . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks where Z is a multi-scale noise, independent and identically dis-tributed from a uniform distribution in the interval [ , ] . Note thateach scale z , . . . , z K of Z is stationary. The generator on the otherhand induces a non stationary behavior on the output due to upsam-pling operations. When upsampling a stationary signal by a factor2 with a nearest neighbor interpolation the resulting process is onlyinvariant to translations of multiples of two. Because our modelcontains K volumetric upsampling operations, the process is trans-lation invariant by multiples of 2 K values on each axis. Consideringthe d -th axis, for any coordinate at the scale of the generated sam-ple n = K p + q with q ∈ { , . . . , K − } , the statistics of the slice G ( Z | θ ) d , n only depend on the value of q , therefore E Z (cid:2) L ( G ( Z | θ ) d , n , u d ) (cid:3) = E Z (cid:2) L ( G ( Z | θ ) d , q , u d ) (cid:3) . (5)Assuming N d is a multiple of 2 K , we have E Z [ L ( G ( Z | θ ) , { u , . . . , u D } )]= D ∑ d = N d N d − ∑ n = E Z (cid:2) L ( G ( Z | θ ) d , n , u d ) (cid:3) = D ∑ d = K K − ∑ q = E Z (cid:2) L ( G ( Z | θ ) d , q , u d ) (cid:3) . (6)As a consequence, instead of using N d slices per direction the gen-erator network could be trained using only a set of 2 K contiguousslices on each constrained direction.The GPU memory is a limiting factor during the training process,even cutting down the size of the samples to 2 K slices restricts thetraining slice resolution. For example, training a network for a tex-ture output of size 512 × ×

32 with K = n = K p + Q d in the d -th axis with a ﬁxed p ∈ { , . . . , N d K − } and with Q d ∈ { , . . . , K − } randomly drawnfrom a discrete uniform distribution, E Z , Q d (cid:2) L ( G ( Z | θ ) d , Q d , u d ) (cid:3) = K K − ∑ q = E Z (cid:2) L ( G ( Z | θ ) d , q , u d ) (cid:3) . (7)Then using doubly stochastic sampling (noise input values and out-put coordinates) we have E Z [ L ( G ( Z | θ ) , { u , . . . , u D } )] = D ∑ d = E Z , Q d (cid:2) L ( G ( Z | θ ) d , Q d , u d ) (cid:3) , (8)which means that we can train the generator using only D single-slice volumes oriented according to the constrained directions.Note that the whole volume model is impacted since the convo-lution weights are shared by all slices.The proposed scheme saves computation time during the train-ing, and more importantly, it also reduces the required amount ofmemory. In this setting we can use resolutions of up to 1024 valuesper size during training (examples and solid samples), a resolutionsigniﬁcantly larger than the ones reached in the literature regard-ing solid texture synthesis by example which are usually limited to256 ×

6. Results6.1. Experimental settings

Unless otherwise speciﬁed all the results in this section were gen-erated using the following settings.

Generator network

We set the number of scales to 6, i.e. K = M i = M s = channel step across scales) which results in the last layerbeing quite narrow (6 M s =

24) and the whole network compact,with ∼ . × parameters. We include a batch normalization op-eration after every convolution layer and before the concatenations.As mentioned in previous methods [ULVL16] we noticed that sucha strategy helps stabilizing the training process. Descriptor network

Following Gatys et al. [GEB15], we usea truncated VGG-19 [SZ14] as our descriptor network, withpadded convolutions and average pooling. The considered layersfor the loss are: relu1_1 , relu2_1 , relu3_1 , relu4_1 and relu5_1 . Training

We implemented our approach using py-torch (code available in http://github.com/JorgeGtz/SolidTextureNets ) and we use the pre-trained parameters forVGG-19 available from the BETHGE LAB [GEB15, GEB16]. Weoptimize the parameters of the generator network using Adam al-gorithm [KB15] with a learning rate of 0.1 during 3000 iterations.Figure 3 shows the value of the empirical estimation of L duringthe training of three of the examples shown below in Figure 7( histology, cheese and granite ). We use batches of 10 samples perslicing direction. We compute the gradients individually for eachsample in the batch which slows down the training process butallows us to concentrate the available memory on the resolution ofthe samples. With these settings and using 3 slicing directions, thetraining takes around 1 hour for a 128 training resolution (i.e. sizeof the example(s) and generated slices), 3 . and12 . using one GPU Nvidia GeForce GTX 1080 Ti. Figure 3:

Value of the 3D empirical loss during the training of thegenerator for the textures histology , cheese and granite of Figure 7. J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks

Synthesis

In order to synthesize large volumes of texture, it ismore time efﬁcient to choose the size of the building blocks in away that best exploits parallel computation. Considering computa-tion complexity alone, synthesizing large building blocks of textureat once is also more efﬁcient given the spatial dependency (indi-cated by coefﬁcients c k ) shared by neighboring voxels. In orderto highlight the seamless tiling of on demand generated blocks oftexture, most of the samples shown in this work are built by assem-bling blocks of 32 voxels. However the generator is able to syn-thesize box-shaped/rectangular samples of any size, given enoughmemory is available.Figures 4 and 5 depict how the small pieces of texture tile per-fectly to form a bigger texture. It takes nearly 12 milliseconds togenerate a block of texture of 32 voxels on a Nvidia GeForce RTX2080 GPU. However we can use bigger elemental blocks, e.g. acube of 64 voxels takes ∼

24 milliseconds and one of 128 vox-els takes ∼

128 milliseconds. For reference, using the method ofDong et al. [DLTD08] it takes 220 milliseconds to synthesize a 64 volume and 1.7 seconds to synthesize a 128 volume. Figure 4:

Left, example texture of size pixels, used along thethree orthogonal axes. Right contiguous cubes of voxelsgenerated on demand to form a texture. The gaps are addedto better depict the aggregation. Figure 5:

Left, example texture of size pixels, used for train-ing the generator along the three orthogonal axes. Right, assem-bled blocks of different sizes generated on demand. Note that it ispossible to generate blocks of size in any direction. In this section we highlight the various properties of the proposedmethod and we compare them with state-of-the-art approaches.

Texturing mesh surfaces

Figure 6 exhibits the application of tex-tures generated with our model to add texture to 3D mesh models.In this case we generate a solid texture with a ﬁxed size and loadit on OpenGL as a regular 3D texture with bilinear ﬁltering. Solidtextures avoid the need for surface parametrization and can be usedon complex surfaces without creating artifacts.

Single example setting

Figures 7 and 8 show synthesized samplesusing our method on a set of examples depicting some physicalmaterial. Considering their isotropic structure, we train the gener-ator network using a single example to designate the appearancealong the three orthogonal directions, i.e. u = u = u . On theﬁrst column we show a generated sample of size 512 voxels builtby assembling blocks of 32 voxels generated using on demandevaluation. The second column is the example image of size 512 pixels, columns 3-5 show the middle slices of the generated cubeacross the three considered directions and the last column shows aslice extracted in an oblique direction with a 45 ◦ angle. These ex-amples illustrate the capacity of the model to infer a plausible 3Dstructure from the 2D features present in isotropic example images.Observe that a slice across an oblique direction still displays a con-ceivable structure given the examples. They also demonstrate thespatial consistency while using on demand evaluation. Regardingthe visual quality, notice that the model successfully reproducesthe patterns’ structure while also capturing the richness of colorsand variations. The quality of the slices is comparable to that of thestate-of-the-art 2D methods [ULVL16,UVL17,LFY ∗ Existence of a solution

The example texture used in Figure 9 isisotropic (arrangement of red shapes having green spot inside), butthe volumetric material it depicts is not (the green stem being out-side the pepper). Training the generator using three orthogonal di-rections assumes 3D isotropy, and thus, the outcome is a solid tex-ture where the patterns are isotropic through the whole volume.This creates some new patterns in the slices, which makes themsomewhat different from the example (red shapes without a greenspot inside). Actually, the generated volume just does not makessense physically, as one always obtains full peppers after slicingthem. This example shows that for a given texture example u and anumber of directions D > ∗

07, DLTD08] and here we aim to extend the discussion. . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Figure 6:

Texture mapping on 3D mesh models. The example texture used to train the generator is show in the upper left corner of eachobject. When using solid textures the mapping does not require parametrization as they are deﬁned in the whole 3D space. This prevents anymapping induced artifacts.

Sources: the ‘duck’ model comes from Keenan’s 3D Model Repository, the ‘mountain’ and ‘hand’ models fromfree3d.com and the tower and the vase from turbosquid.com.Let us consider for instance the isotropic example shown in Fig-ure 10, where the input image contains only discs at a given scale(e.g. a few pixels diameter). It follows that when slicing a volumecontaining spheres with the same diameter δ , the obtained imagewill necessarily contain objects with various diameters, rangingfrom 0 to δ . This seems to be a paradigm natural to the 2D-3D ex-trapolation. It demonstrates that for some 2D textures an isotropicsolid version might be impossible and conversely, that the exampletexture and the imposed directions must be chosen carefully giventhe goal 3D structure.This also might have dramatic consequences regarding conver-gence issues during optimization. In global optimization methods [KFCO ∗

07, CW10], where patches from the example are sequen-tially copied view per view, progressing the synthesis in one direc-tion can create new features in the other directions thus potentiallypreventing convergence. In contrast, in our method, we seek fora solid texture whose statistics are as close as possible from theexamples without requiring a perfect match. This always ensuredconvergence during training in all of our experiments. An exam-ple of this is illustrated in the ﬁrst row of Figure 12, where theexample views are incompatible given the proposed 3D conﬁgu-ration: the optimization procedure converges during training andthe trained generator is able to synthesize a solid texture, however,the result is clearly a tradeoff which mixes the different contra-dictory orientations. This also contrasts with common statistical J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks

Generated volume Examples Generated slices v u = u = u v , N v , N v , N oblique (45 ◦ ) g r a n it e m a r b l e b ee f c h ee s e h i s t o l ogy Figure 7:

Synthesis of isotropic textures. We train the generator network using the example in the second column of size along D = directions. The cubes on the ﬁrst column are generated samples of size built by assembling blocks of voxels generated using ondemand evaluation. Subsequent columns show the middle slices of the generated cube across the three considered directions and a sliceextracted in an oblique direction with a ◦ angle. The trained models successfully reproduce the visual features of the example in 3D. . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Generated volume Examples Generated slices v u = u = u v , N v , N v , N oblique (45 ◦ ) p e bb l e s a ndg r a ss l a v a Figure 8:

Synthesis of isotropic textures (same presentation as in Figure 7). While generally satisfactory, the examples in the ﬁrst and thirdrows have a slightly degraded quality. In the ﬁrst row the features have more rounded shapes than in the example and in the third row weobserve high frequency artifacts. texture synthesis approaches that are rather based on constrainedoptimization to guarantee statistical matching, for instance by pro-jecting patches [GRGH17], histogram matching [RPDB12], or mo-ment matching [PS00].

Constraining directions

While for isotropic textures, constrain-ing D = D < J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks

Figure 9:

2D to 3D isotropic patterns. The example texture (top-left) depicts a pattern that is approximately isotropic in 2D, but thematerial it depicts is not. Training the generator using the exam-ple along three orthogonal directions results in solid textures thatare isotropic in the three dimensions (top-right). Here, the red andgreen patterns vary isotropically in the third dimension, this cre-ates a bigger variation on their size and makes some slices containred patterns that lack the green spot. This is a case where the slicesof the solid texture (bottom) cannot match exactly the patterns ofthe 2D example, thus, not complying with the example in the way a2D algorithm would.

Figure 10:

Illustration of a solid texture whose cross sections can-not comply with the example along three directions. Given a 2Dexample formed by discs of a ﬁxed diameter (upper left) a directisotropic 3D extrapolation would be a cube formed by spheres ofthe same diameter. Slicing that cube would result in images withdiscs of different diameters. The cube in the upper right is gen-erated after training our network with the 2D example along thethree orthogonal axes. The bottom row shows cross sections alongthe axes, all of them present discs of varying diameters thus failingto look like the example. the example texture (see column v , N ), but rather color structuresthat fulﬁll the visual requirements for the other considered direc-tions.Additionally, pattern compatibility across different directions is essential to obtain coherent results. In the examples of Figure 12the generator was trained with the same image but in a differentorientation conﬁguration. In the top row example no 3D arrange-ment of the patterns can comply with the orientations of the threeexamples. Conversely the conﬁguration on the bottom row can bereproduced in 3D thus generating more meaningful results. All thishas to be taken into account when choosing the set of training di-rections given the expected 3D texture structure. Observe that thevalue of the loss at the end of the training gives a hint on whichconﬁguration works better. This can be exploited for automaticallyﬁnding the best training conﬁguration for a given set of examples.These results bring some light to the scope of the slice-based for-mulation for solid texture synthesis using 2D examples. This for-mulation is best suited for textures depicting 3D isotropic materi-als, for which we obtain an agreement with the example’s patternscomparable with 2D state-of-the-art methods. For most anisotropictextures we can usually obtain high quality results by consideringonly two directions. Finally, for textures with patterns that are onlyisotropic in 2D, using more than one direction inevitably createsnew patterns. Diversity

A required feature of texture synthesis models is thatthey are able to generate diverse samples. Ideally the generatedsamples are different from each other and from the example itselfwhile still sharing its visual characteristics. Additionally, depend-ing on the texture it might be desired that the patterns vary spatiallyinside each single sample. Yet, observe that many methods in theliterature for 2D texture synthesis generate images that are localcopies of the input image [WL00, KEBK05], which strongly limitsthe diversity. As reported in Gutierrez et al. [GRGH17], the (un-wanted) optimal solution of methods based on patch optimizationis the input image itself. For most of these methods though, the lo-cal copies are sufﬁciently randomized to deliver enough diversity.Variability issues have also been reported in the literature for tex-ture generation based on CNN, and [UVL17, LFY ∗ learn to synthesize a single sample that induces a low valueof the perceptual loss while disregarding the random inputs.When dealing with solid texture synthesis from 2D examples,such a trivial optimal solution only arises when considering onedirection, where the example itself is copied along such direc-tion. Yet, there is no theoretical guarantee that prevents the gen-erator network from copying large pieces of the example as it hasbeen shown that deep generative networks can memorize an im-age in [LVU18] . However, the compactness or the architecture andthe stochastic nature of the proposed model make it very unlikely.In practice, we do not observe repetition among the samples gen-erated with our trained models, even when pursuing the optimiza-tion long after the visual convergence (which generally occurs after1000 iterations, see Figure 3). This is consistent with the results ofUlyanov et al. [ULVL16], the 2D architecture that inspired ours,where diversity is not an issue. One explanation for this differencewith other methods may be that the architectures that exhibit a lossof diversity process an input noise that is small compared to the . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Generated volume Training Examples Generated views v conﬁguration v , N v , N v , N s o il ( D = ) s o il ( D = ) b r i c k w a ll ( D = ) c obb l e w a ll ( D = ) Figure 11:

Training the generator using two or three directions for anisotropic textures. The ﬁrst column shows generated samples of size built by assembling blocks of voxels generated using on demand evaluation. The second column illustrates the training conﬁgu-ration, i.e. which axes are considered and the orientation used. Subsequent columns show the middle slices of the generated cube acrossthe three considered directions. The top two rows show that for some examples considering only two directions allow the model to bettermatch the features along the directions considered. The bottom rows show examples where the appearance along one direction might not beimportant. generated output (0 . ∗ .

13% in [UVL17])and which is easier to ignore. On the contrary, our generative net-work receives an input that accounts for roughly 1.14 times the sizeof the output. Figure 13 demonstrates the capacity of our model togenerate diverse samples from a single trained generator. It showsthree generated solid textures along with their middle slices. To fa-cilitate the comparison, it includes an a posteriori correspondencemap which highlights spatial similarity by forming smooth regions. In all cases we obtain noisy maps which means that the slices donot repeat arrangements of patterns or colors.

Multiple examples setting

As already discussed in the work ofKopf et al. [KFCO ∗

07] and earlier, it appears that most solid tex-tures can only be modeled from a single example along differentdirections. In the literature, to the best of our knowledge, only onesuccess case of a 3D texture using two different examples has been J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks

Generated volume Training Examples Generated views Training v conﬁguration u = u = u v , N v , N v , N empirical loss1183 . . Figure 12:

Importance of the compatibility of examples. In this experiment, two generators are trained with the same image along threedirections, but for two different orientations. The ﬁrst column shows generated samples of size built by assembling blocks of voxelsgenerated using on demand evaluation. The second column illustrates the training conﬁguration, i.e. for each direction the orientation of theexample shown in the third column. Subsequent columns show the middle slices of the generated cube across the three constrained directions.Finally, the rightmost column gives the empirical loss value at the last iteration. In the ﬁrst row, no 3D arrangement of the patterns can complywith the orientations of the three examples. Conversely the conﬁguration on the second row can be reproduced in 3D, thus generating moremeaningful results. The lower value of the training loss for this conﬁguration reﬂects the better reproduction of the patterns in the example. proposed [KFCO ∗ The results in Figures 7 and 8 prove the capability of the proposedmodel to synthesize photo-realistic textures. This is an importantimprovement with respect to classical on demand methods basedon Gabor/LRP-noise. High resolution examples are important inorder to obtain more detailed textures. In previous high qualitymethods [KFCO ∗

07, CW10] the computation times increase sub-stantially with the size of the example, so examples were limited to256 pixels. Furthermore, the empirical histogram matching stepscreate a bottleneck for parallel computation. Our method takes astep forward by allowing higher resolution example textures, de-pending only on the memory of the GPU used.We compare the visual quality of our results with the two ex-isting methods that seem to produce the best results: Kopf etal. [KFCO ∗

07] and Chen et al. [CW10]. Figure 15 shows somesamples obtained from the respective articles or websites side byside with results using our method. The most salient advantage of our method is the ability to better capture high frequency in-formation, making the structures in the samples sharper and morephoto-realistic. Considering voxels’ statistics, i.e. capturing therichness of the example, both our method and that of Chen etal. [CW10] seem to obtain a better result than the method of Kopf etal. [KFCO ∗ or 256 pixels. We observe that the visual qualityof the textures generated with our method deteriorate when usingsmall examples. This can be due to the descriptor network which ispre-trained using bigger images.We do not consider the method of Dong et al. [DLTD08] for a vi-sual quality comparison as their pre-computation of candidates lim-its the richness of information, which yields lower quality results.However, thanks to the on-demand evaluation, this model greatlysurpasses the computation speeds of the other methods. Yet, as de-tailed before, our method is faster during synthesis while achievingbetter visual quality. Besides, our computation time does not de-pend on the resolution of the examples.On Figure 16 we show some of our results used for texturing acomplex surface and we compare them to the results of Kopf etal. [KFCO ∗ ∼ . × parameters to store) stands close to the memory footprint of a . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Figure 13:

Diversity among generated samples. We compare the middle slices along three axis of three generated textures of voxels. Thecomparison consists in ﬁnding the pixel with the most similar neighborhood (size ) in the other image and constructing a correspondencemap given its coordinates. The smooth result in the diagonal occurs when comparing a slice to itself. The stochasticity in the rest of resultsmeans that the compared slices do not share a similar arrangement of patterns and colors. patch-based approach working with a 170 color pixels input (i.e.8 . × parameters if all the patches are used).

7. Limitations and future workLong distance correlation

As it can be observed in the brick walltexture on Figure 11 and in the diagonal texture in Figure 12 ourmodel is less successful at preserving the alignment of long pat-terns in the texture. This limitation is also observed in the second row of Figure 11 where the objects size in the synthesized samplesdo not match the one in the example, again due to the overlookedlong distance correlation. One possible explanation comes from theﬁxed receptive ﬁeld of VGG as descriptor network. It is likely thatit only sees some local patterns which results in breaking long pat-terns into pieces. A possible solution could be to use more scalesin the generator network, similarly to use larger patches in patchbased methods. Another possible improvement to explore is to ex- J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks

HM examples conﬁguration v (cid:55)(cid:51)(cid:55)(cid:51)(cid:55)(cid:51) Figure 14:

Anisotropic texture synthesis using two examples. Theﬁrst columns show the two examples used and the training conﬁg-uration, i.e. how images are oriented for each view. Last columnshows a sample synthesized using the trained generator. For eachexample, we experiment with ( (cid:51) ) and without ( (cid:55) ) preprocessingthe example images to match color statistics, by performing a his-togram matching (HM) on each color channel independently. Weobserve favorable results particularly when the colors of both ex-amples are close. plicitly construct our 2D loss L incorporating those long distancecorrelations as in [LGX16, SCO17]. Constrained directions

We observed that training the generatorwith two instead of three constrained directions results in unsat-isfying texture along the unconsidered direction, while improvingvisual quality along the two constrained directions for anisotropictextures (see Figure 11). It would be interesting to explore a mid-dle point between letting the algorithm infer the structure along onedirection and constraining it.

Visual quality

Although our method delivers high quality resultsfor a varied set of textures, it still presents some visual ﬂaws thatwe think are independent of the existence issue . In textures like thepebble and grass of Figure 8 the synthesized sample presents over-simpliﬁed versions of the example’s features. Although not detailedin the articles, the available codes for [ULVL16,UVL17,LFY ∗ Non-stationary textures

The perceptual loss in Equation (3)stands out for traditional texture synthesis, where the examples arestationary. An interesting problem is to consider non-stationary tex-tures, such as the recent method of Zhou et al. [ZZB ∗ Real time rendering

The trained generator can be integrated in afragment shader to generate the visible values of a 3D model thanksto its on demand capability. Note however that on-the-ﬂy ﬁlteringof the generated solid texture is a challenging problem that is notaddressed in this work.

8. Conclusion

The main goal of this paper was to address the problem of examplebased solid texture synthesis by the means of a convolutional neu-ral network. First, we presented a simple and compact generativenetwork capable of synthesizing portions of inﬁnitely extendablesolid texture. The parameters of this 3D generator are stochasti-cally optimized using a pre-trained 2D descriptor network and aslice-based 3D objective function. The complete framework is efﬁ-cient both during training and at evaluation time. The training canbe performed at high resolution, and textures of arbitrary size canbe synthesized on demand. This method is capable of achievinghigh quality results on a wide set of textures. We showed the out-come on textures with varying levels of structure and on isotropic . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks Example u [KFCO ∗ Figure 15:

Comparison with the existing methods that produce the best visual quality. The last row show the results using the proposedmethod. Our method is better at reproducing the statistics of the example compared to [KFCO ∗

07] and is better at capturing high frequenciescompared to both methods. and anisotropic arrangements. We demonstrate that, although solidtexture synthesis from a single example image is an intricate prob-lem, our method delivers compelling results given the desired lookimposed via the 3D loss function.The second aim of this study was to achieve on demand syn-thesis for which, to the best of our knowledge, no other methodbased on neural networks is capable of. The on demand evalua-tion capability of the generator allows for it to be integrated with a3D graphics renderer to replace the use of 2D textures on surfacesand thus eliminating the possible accompanying artifacts. The pro-posed techniques during training and evaluation can be extendedto any fully convolutional generative network. We observed somelimitations of our method mainly in the lack of control over thedirections not considered in the training. Using multiple examplescould complement the training by giving information of the desiredaspect along different directions. We aim to further study the lim-its of solid texture synthesis from multiple sources with the goal ofobtaining an upgraded framework better capable of simulating reallife objects.

Aknowledgments

We acknowledge the support of the Natural Sciences and Engineer-ing Research Council of Canada (NSERC), RGPIN-2015-06025.This project has been carried out with support from the FrenchState, managed by the French National Research Agency (ANR-16-CE33-0010-01). This study has been also carried out with ﬁnan-cial support from the CNRS for supporting grants PEPS 2018 I3A"3DTextureNets". In like manner, we acknowledge the support ofCONACYT. Bruno Galerne acknowledges the support of NVIDIACorporation with the donation of a Titan Xp GPU used for thisresearch. The authors would like to thank Loïc Simon for fruitfuldiscussions on 3D rendering, and Guillaume-Alexandre Bilodeaufor giving us access to one of his GPUs.

References [BJV17] B

ERGMANN

U., J

ETCHEV

N., V

OLLGRAF

R.: Learning tex-ture manifolds with the periodic spatial GAN. In

ICML (2017), pp. 469–477. 3[BK88] B

OURLARD

H., K

AMP

Y.: Auto-association by multilayer per-ceptrons and singular value decomposition.

Biological cybernetics , 4(1988), 291–294. doi:10.1007/BF00332918 . 3[BM17] B

ERGER

G., M

EMISEVIC

R.: Incorporating long-range consis-tency in cnn-based texture generation.

ICLR (2017). 38

J. Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks

Figure 16:

Comparison of our approach with Kopf etal. [KFCO ∗ ∗ and for the rest. [CW10] C HEN

J., W

ANG

B.: High quality solid texture synthesis usingposition and index histogram matching.

Vis. Comput. 26 , 4 (2010), 253–262. doi:10.1007/s00371-009-0408-3 . 1, 2, 9, 14, 17[De 97] D E B ONET

J. S.: Multiresolution sampling procedure for anal-ysis and synthesis of texture images. In

SIGGRAPH (1997), ACM,pp. 361–368. doi:10.1145/258734.258882 . 4[DGF98] D

ISCHLER

J. M., G

HAZANFARPOUR

D., F

REYDIER

R.:Anisotropic solid texture synthesis using orthogonal 2d views.

ComputerGraphics Forum 17 , 3 (1998), 87–95. doi:10.1111/1467-8659.00256 . 2[DLTD08] D

ONG

Y., L

EFEBVRE

S., T

ONG

X., D

RETTAKIS

G.: Lazysolid texture synthesis. In

EGSR (2008), pp. 1165–1174. doi:10.1111/j.1467-8659.2008.01254.x . 1, 2, 8, 14[GCC ∗

17] G

WAK

J., C

HOY

C. B., C

HANDRAKER

M., G

ARG

A.,S

AVARESE

S.: Weakly supervised 3d reconstruction with adversarialconstraint. In

International Conference on 3D Vision (2017), pp. 263–272. doi:10.1109/3DV.2017.00038 . 3[GD95] G

HAZANFARPOUR

D., D

ISCHLER

J.: Spectral analysis for au-tomatic 3-d texture generation.

Computers & Graphics 19 , 3 (1995),413–422. doi:10.1016/0097-8493(95)00011-Z . 2[GEB15] G

ATYS

L., E

CKER

A. S., B

ETHGE

M.: Texture synthesis usingconvolutional neural networks. In

NIPS (2015), pp. 262 – 270. 2, 3, 6,7, 8[GEB16] G

ATYS

L. A., E

CKER

A. S., B

ETHGE

M.: Image style transferusing convolutional neural networks. In

CVPR (2016), pp. 2414–2423.7[GLLD12] G

ALERNE

B., L

AGAE

A., L

EFEBVRE

S., D

RETTAKIS

G.:Gabor noise by example.

ACM Trans. Graph. 31 , 4 (2012), 73:1–73:9. doi:10.1145/2185520.2185569 . 2[GLM17] G

ALERNE

B., L

ECLAIRE

A., M

OISAN

L.: Texton noise.

Computer Graphics Forum 36 , 8 (2017), 205–218. doi:10.1111/cgf.13073 . 2, 6[GLR18] G

ALERNE

B., L

ECLAIRE

A., R

ABIN

J.: A texture synthesismodel based on semi-discrete optimal transport in patch space.

SIAM11 , 4 (2018), 2456–2493. doi:10.1137/18M1175781 . 4[GPAM ∗

14] G

OODFELLOW

I., P

OUGET -A BADIE

J., M

IRZA

M., X U B., W

ARDE -F ARLEY

D., O

ZAIR

S., C

OURVILLE

A., B

ENGIO

Y.: Gen-erative adversarial nets. In

NIPS (2014), pp. 2672–2680. 3[GRGH17] G

UTIERREZ

J., R

ABIN

J., G

ALERNE

B., H

URTUT

T.:Optimal patch assignment for statistically constrained texture syn-thesis. In

SSVM (2017), pp. 172–183. doi:10.1007/978-3-319-58771-4_14 . 11, 12[GSV ∗

14] G

ILET

G., S

AUVAGE

B., V

ANHOEY

K., D

ISCHLER

J.-M.,G

HAZANFARPOUR

D.: Local random-phase noise for procedural textur-ing.

ACM Trans. Graph. 33 , 6 (2014), 195:1–195:11. doi:10.1145/2661229.2661249 . 2[HB95] H

EEGER

D. J., B

ERGEN

J. R.: Pyramid-based texture anal-ysis/synthesis. In

SIGGRAPH (1995), ACM, pp. 229–238. doi:10.1145/218380.218446 . 1, 2, 4[JAFF16] J

OHNSON

J., A

LAHI

A., F EI -F EI L.: Perceptual lossesfor real-time style transfer and super-resolution. In

European Con-ference on Computer Vision (2016), pp. 694–711. doi:10.1007/978-3-319-46475-6_43 . 2, 3, 6, 16[JREM ∗

16] J

IMENEZ R EZENDE

D., E

SLAMI

S. M. A., M

OHAMED

S.,B

ATTAGLIA

P., J

ADERBERG

M., H

EESS

N.: Unsupervised learning of3d structure from images. In

NIPS (2016), pp. 4996–5004. 3[KB15] K

INGMA

D. P., B A J.: Adam: A method for stochastic optimiza-tion. In

ICLR (2015). 7[KEBK05] K

WATRA

V., E

SSA

I., B

OBICK

A., K

WATRA

N.: Textureoptimization for example-based synthesis. In

SIGGRAPH (2005), ACM,pp. 795–802. doi:10.1145/1186822.1073263 . 2, 3, 4, 12 . Gutierrez & J. Rabin & B. Galerne & T. Hurtut / On Demand Solid Texture Networks ∗

07] K

OPF

J., F U C., C

OHEN -O R D., D

EUSSEN

O., L

ISCHIN - SKI

D., W

ONG

T.: Solid texture synthesis from 2d exemplars. In

SIG-GRAPH (2007), ACM. doi:10.1145/1275808.1276380 . 1, 2,3, 8, 9, 13, 14, 17, 18[LeC87] L E C UN Y.:

Modeles connexionnistes de lapprentissage . PhDthesis, Universite Paris 6, 1987. 3[LFY ∗ I Y., F

ANG

C., Y

ANG

J., W

ANG

Z., L U X., Y

ANG

M.: Di-versiﬁed texture synthesis with feed-forward networks. In

CVPR (2017),pp. 266–274. doi:10.1109/CVPR.2017.36 . 2, 3, 5, 8, 12, 13, 16[LFY ∗ I Y., F

ANG

C., Y

ANG

J., W

ANG

Z., L U X., Y

ANG

M.-H.:Universal style transfer via feature transforms. In

NIPS (2017), pp. 386–396. 3[LGX16] L IU G., G

OUSSEAU

Y., X IA G.: Texture synthesis throughconvolutional neural networks and spectrum constraints. In

ICPR (2016),pp. 3234–3239. doi:10.1109/ICPR.2016.7900133 . 3, 16[LH05] L

EFEBVRE

S., H

OPPE

H.: Parallel controllable texture syn-thesis. In

SIGGRAPH (2005), ACM, pp. 777–786. doi:10.1145/1186822.1073261 . 2[LVU18] L

EMPITSKY

V., V

EDALDI

A., U

LYANOV

D.: Deep imageprior. In

CVPR (2018), pp. 9446–9454. doi:10.1109/CVPR.2018.00984 . 12[Mar03] M

ARSAGLIA

G.: Xorshift rngs.

Journal of Statistical Software8 , 14 (2003), 1–6. doi:10.18637/jss.v008.i14 . 6[Pea85] P

EACHEY

D. R.: Solid texturing of complex surfaces.

SIG-GRAPH (1985), 279–286. doi:10.1145/325165.325246 . 1, 2[Per85] P

ERLIN

K.: An image synthesizer.

SIGGRAPH (1985), 287–296. doi:10.1145/325165.325247 . 1, 2, 5[PS00] P

ORTILLA

J., S

IMONCELLI

E. P.: A parametric texture modelbased on joint statistics of complex wavelet coefﬁcients.

IJCV 40 , 1(2000), 49–71. doi:10.1023/A:1026553619983 . 3, 4, 11[QhY07] Q IN X., H . Y ANG

Y.: Aura 3d textures.

Transactions onVisualization and Computer Graphics 13 , 2 (2007), 379–389. doi:10.1109/TVCG.2007.31 . 2[RMC16] R

ADFORD

A., M

ETZ

L., C

HINTALA

S.: Unsupervised rep-resentation learning with deep convolutional generative adversarial net-works.

ICLR (2016). 3[RPDB12] R

ABIN

J., P

EYRÉ

G., D

ELON

J., B

ERNOT

M.: Wasser-stein barycenter and its application to texture mixing. In

SSVM (2012),pp. 435–446. doi:10.1007/978-3-642-24785-9_37 . 4, 11[SCO17] S

ENDIK

O., C

OHEN -O R D.: Deep correlations for texturesynthesis.

ACM Trans. Graph. 36 , 5 (2017), 161:1–161:15. doi:10.1145/3015461 . 3, 16[SZ14] S

IMONYAN

K., Z

ISSERMAN

A.: Very deep convolutional net-works for large-scale image recognition.

CoRR (2014). arXiv:1409.1556 . 2, 3, 7[TBD18] T

ESFALDET

M., B

RUBAKER

M. A., D

ERPANIS

K. G.: Two-stream convolutional networks for dynamic texture synthesis. In

CVPR (2018), pp. 6703–6712. doi:10.1109/CVPR.2018.00701 . 3[ULVL16] U

LYANOV

D., L

EBEDEV

V., V

EDALDI

A., L

EMPITSKY

V.:Texture networks: Feed-forward synthesis of textures and stylized im-ages. In

ICML (2016), pp. 1349–1357. 2, 3, 4, 5, 6, 7, 8, 12, 16[UVL17] U

LYANOV

D., V

EDALDI

A., L

EMPITSKY

V.: Improved tex-ture networks: Maximizing quality and diversity in feed-forward styl-ization and texture synthesis. In

CVPR (2017), pp. 4105–4113. doi:10.1109/CVPR.2017.437 . 2, 3, 6, 8, 12, 13, 16[Wei03] W EI L.-Y.: Texture synthesis from multiple sources. In

SIG-GRAPH (2003), ACM, pp. 1–1. URL: http://doi.acm.org/10.1145/965400.965507 , doi:10.1145/965400.965507 . 2[WL00] W EI L. Y., L

EVOY

M.: Fast texture synthesis using tree-structured vector quantization. In

SIGGRAPH (2000), ACM, pp. 479–488. doi:10.1145/344779.345009 . 4, 12 [WL03] W EI L.-Y., L

EVOY

M.: Order-independent texture synthesis.

Tech. Rep. TR-2002-01, Computer Science Department, Stanford Uni-versity (2003). 2[WRB17] W

ILMOT

P., R

ISSER

E., B

ARNES

C.: Stable and controllableneural texture synthesis and style transfer using histogram losses.

CoRR (2017). arXiv:1701.08893 . 3[YBS ∗

19] Y U N., B

ARNES

C., S

HECHTMAN

E., A

MIRGHODSI

S.,L

UKAC

M.: Texture mixer: A network for controllable synthesis andinterpolation of texture. In

CVPR (June 2019). 2, 3[YYY ∗

16] Y AN X., Y

ANG

J., Y

UMER

E., G UO Y., L EE H.: Perspectivetransformer nets: Learning single-view 3d object reconstruction without3d supervision. In

NIPS (2016), pp. 1696–1704. 3[ZDL ∗

11] Z

HANG

G.-X., D U S.-P., L AI Y.-K., N I T., H U S.-M.:Sketch guided solid texturing.

Graphical Models 73 , 3 (2011), 59–73. doi:https://doi.org/10.1016/j.gmod.2010.10.006 . 1,2[ZZB ∗

18] Z

HOU

Y., Z HU Z., B AI X., L

ISCHINSKI

D., C

OHEN -O R D., H

UANG

H.: Non-stationary texture synthesis by adversarial expan-sion.

ACM Trans. Graph. 37 , 4 (2018), 49:1–49:13. doi:10.1145/3197517.3201285doi:10.1145/3197517.3201285