[PDF] Guided Fine-Tuning for Large-Scale Material Transfer

Abstract

We present a method to transfer the appearance of one or a few exemplar SVBRDFs to a target image representing similar materials. Our solution is extremely simple: we fine-tune a deep appearance-capture network on the provided exemplars, such that it learns to extract similar SVBRDF values from the target image. We introduce two novel material capture and design workflows that demonstrate the strength of this simple approach. Our first workflow allows to produce plausible SVBRDFs of large-scale objects from only a few pictures. Specifically, users only need take a single picture of a large surface and a few close-up flash pictures of some of its details. We use existing methods to extract SVBRDF parameters from the close-ups, and our method to transfer these parameters to the entire surface, enabling the lightweight capture of surfaces several meters wide such as murals, floors and furniture. In our second workflow, we provide a powerful way for users to create large SVBRDFs from internet pictures by transferring the appearance of existing, pre-designed SVBRDFs. By selecting different exemplars, users can control the materials assigned to the target image, greatly enhancing the creative possibilities offered by deep appearance capture.

Full PDF

EEurographics Symposium on Rendering 2020C. Dachsbacher and M. Pharr(Guest Editors)

Volume 39 ( ), Number 4

Guided Fine-Tuning for Large-Scale Material Transfer

Valentin Deschaintre , , , George Drettakis and Adrien Bousseau Université Côté d’Azur, Inria Imperial College London Optis for Ansys

HD InputSVBRDF exemplar Rendering HD InputSVBRDF exemplar Rendering

Figure 1:

Our method transfers the appearance of one or a few exemplar SVBRDFs to a target picture. This approach allows the capture oflarge planar surfaces taken with ambient lighting (far left), by extracting the SVBRDF exemplars from close-up ﬂash pictures (lower left), aswell as the creation of plausible SVBRDFs from internet pictures by using existing artist-designed materials as exemplars (right). Please seesupplemental materials for high-resolution SVBRDF parameter maps and animated renderings of all our results, which give a much betterimpression of the material properties.

Abstract

We present a method to transfer the appearance of one or a few exemplar SVBRDFs to a target image representing similarmaterials. Our solution is extremely simple: we ﬁne-tune a deep appearance-capture network on the provided exemplars, suchthat it learns to extract similar SVBRDF values from the target image. We introduce two novel material capture and designworkﬂows that demonstrate the strength of this simple approach. Our ﬁrst workﬂow allows to produce plausible SVBRDFsof large-scale objects from only a few pictures. Speciﬁcally, users only need take a single picture of a large surface and a fewclose-up ﬂash pictures of some of its details. We use existing methods to extract SVBRDF parameters from the close-ups, and ourmethod to transfer these parameters to the entire surface, enabling the lightweight capture of surfaces several meters wide suchas murals, ﬂoors and furniture. In our second workﬂow, we provide a powerful way for users to create large SVBRDFs frominternet pictures by transferring the appearance of existing, pre-designed SVBRDFs. By selecting different exemplars, userscan control the materials assigned to the target image, greatly enhancing the creative possibilities offered by deep appearancecapture.

This is a very low resolution version of the paper, you can ﬁnd a more reasonably compressed version on the projectpage: https://team.inria.fr/graphdeco/projects/large-scale-materials/

CCS Concepts • Computing methodologies → Reﬂectance modeling;

Imageprocessing;

Keywords: material transfer, material capture, appearance cap-ture, SVBRDF, deep learning, ﬁne tuning c (cid:13) (cid:13) a r X i v : . [ c s . G R ] J u l . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer

1. Introduction

Recent progress on lightweight appearance capture allows the re-covery of plausible real-world spatially-varying reﬂectance dis-tribution functions (SVBRDF) from just a few photographs of asurface. In particular, multiple methods take as input one or sev-eral photographs captured with a hand-held camera, where theco-located ﬂash provides informative spatially-varying illumina-tion over the measured surface sample [AWL15, AAL16, RPG16,HSL ∗

17, DAD ∗

18, LSC18, DAD ∗

19, GLD ∗ scale at which materials canbe captured – typically a dozen centimeters wide using a cell phoneheld at a similar distance. Relying on a ﬂash also prevents thesemethods from processing existing images captured under unknownlighting , such as textures downloaded from the Internet. Finally, an-other common limitation of the above methods is that they rely onblack-box optimization or deep learning to infer SVBRDF param-eters from few measurements, offering little user control on theiroutput. We address all three limitations by proposing a by-example appearance capture method, which recovers SVBRDF parametermaps over large surfaces captured under environment lighting bytransferring information from one or a few exemplar SVBRDFpatches (Fig. 1), that can either be extracted from additional close-up ﬂash photos, or come from a database of SVBRDFs.Our technical solution to transfer material appearance from ex-emplars is surprisingly simple yet extremely effective. We buildon a state-of-the-art SVBRDF capture deep network [DAD ∗ on-site acquisition scenario is the ﬁrst ap-plication to allow capture of plausible material properties of large surfaces with just a few photos. In this case, we capture a singlephotograph of a large surface as well as one or a few close-up ﬂashphotographs of its details. We then use an off-the-shelf networkto extract SVBRDF maps from the ﬂash photographs, and use ourﬁne-tuned network to transfer this information to the large image,effectively acquiring SVBRDFs several meters wide. In our sec-ond scenario – creative design – we provide a powerful methodfor users to create realistic SVBRDFs from stock photos, simply using artist-created SVBRDFs downloaded from the Internet as ex-emplars. This demonstrates how our method allows ﬁne control onthe design process for SVBRDFs.In summary, this paper makes the following contributions: • We present a simple yet very effective algorithm to transfer ma-terial appearance from a few exemplars to a target image. • We introduce a lightweight method to capture SVBRDFs of largeplanar surfaces, based on this algorithm. • We introduce a novel workﬂow that allows material designersto create new SVBRDFs from existing photos and SVBRDFpatches ( e.g. , taken from online texture and SVBRDF reposito-ries), using the same algorithm.Our code, data and supplemental material are available here: https://team.inria.fr/graphdeco/projects/large-scale-materials/

2. Related Work

Appearance capture and design is a vast and active research ﬁeld;We refer to the survey by Guarnera et al [GGG ∗

16] for a generalintroduction, and to the one by Dong [Don19] for a focus on meth-ods based on deep learning. Here we discuss lightweight SVBRDFcapture methods most similar to our approach, as well as relatedwork on by-example image synthesis and deep learning.Reconstructing multiple SVBRDF maps from one or a few pic-tures is an ill-posed problem, as the radiance observed in the pic-tures can be explained by a number of different combinations ofSVBRDF parameters. Existing work tackled this challenge by in-corporating domain-speciﬁc priors on the solution, either designedby hand or learned from large quantities of SVBRDF data. Exam-ple hand-designed priors include the assumption that the materialsample is stochastic or self-repetitive [WSM11, AWL15, AAL16],or that the lighting exhibits natural statistics [DCP ∗

14] and phys-ical properties [RRFG17]. Data-driven methods seek to explainthe observed data as a combination of known BRDFs [HSL ∗ ∗ ∗ ∗ ∗ ∗ c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer Generic training setGeneric SVBRDF-estimation CNN Exemplar-speciﬁc training setExemplar-speciﬁcSVBRDF-estimation CNN On-site captureCreative design (a) Pre-training (b) Fine-tuning (c) Application scenarios

Propagation tolarge-scale picturePropagation to Internet textureCaptured exemplars[Deschaintre et al. 2018]Stock exemplars[Substance, CC0Textures] HD Input 512 x 512 tiles512 x 512 SVBRDFs HD Output (d) HD processing

Figure 2:

Main steps of our method. We ﬁrst pre-train an SVBRDF prediction network [DAD ∗

18] on a large set of synthetic SVBRDFmaps rendered under varying lighting (a). While this generic network produces plausible results, it often mis-interprets the material featuresin the absence of ﬂash cues. Our key idea is to ﬁne-tune the pre-trained network on renderings of user-provided SVBRDF exemplars (b).After ﬁne-tuning, the resulting network combines generic pre-training knowledge with information from the exemplars. Here, this allows ourmethod to interpret the cyan tiles as more shiny than the grey concrete. We demonstrate this approach on two application scenarios, either toacquire large-scale real-world surfaces by propagating small-scale exemplars (c, top), or to design new SVBRDFs by propagating existingSVBRDF maps over internet textures (c, bottom). While we train our network on images of × pixels, we process HD images of morethan × pixels by processing small tiles individually, and by stitching their predicted SVBRDFs to generate the ﬁnal output. This ismade possible by the absence of strong local ﬂash highlights in the input image. methods as the network resolution is limited by the GPU mem-ory – related methods were typically demonstrated on images of256 ×

256 pixel resolution. In supplemental material we provide anexample showing how the method by Deschaintre et al. [DAD ∗ self-augmentation ,which was further studied by Ye et al. [YLD ∗ diversity of SVBRDFs seen by the network. This strat-egy differs from ours, since our goal is rather to specialize the net-work to extract user-provided SVBRDF values, which we achieveby ﬁne-tuning the network on speciﬁc exemplars. Our use of exemplar images makes our problem akin to im-age analogies [HJO ∗ ∗ ∗ ∗

18] or unsupervisedsettings [ZPIE17]. In particular, multiple methods combine densecorrespondences with deep learning to achieve more robust col-orization [HCL ∗ ∗

19] and style transfer [LYY ∗ ∗

20] used a similar strategy to specialize a style transfernetwork using a small number of style exemplars.By complementing an input image with a few user-provided ex-emplars, our approach also relates to the interactive material design c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer system AppGen [DTPG11]. The main difference between the twoapproaches resides in the level of expertise required and control of-fered. While AppGen offers ﬁne control on the local interpretationof an image thanks to user scribbles, it requires users to manuallysegment the different materials in the image, and to specify eachspecular BRDF. In contrast, users of our approach need only se-lect exemplar SVBRDFs from an existing library, or acquire themusing an existing lightweight method, and let our method automati-cally transfer BRDF values from the exemplars to the target image.Our on-site acquisition scenario also follows the same two-scalecapture strategy as

Manifold Bootstrapping [DWT ∗ internal learning , i.e., training a deep neuralnetwork on a speciﬁc image rather than on a large dataset. Thisintriguing idea ﬁrst appeared in the seminal work of Ulyanov etal. [UVL18] on deep image priors , where a network trained to re-construct a speciﬁc image was shown to denoise or inpaint thatimage. Subsequent work used image-speciﬁc training for varioustasks, including unsupervised super-resolution [SCI18] and GAN-based image editing [BSP ∗ transfer the knowledge it acquired on a dif-ferent target image. Our work also relates to the TileGAN methodof Frühstück et al. [FAW19], who train a conditional GAN to per-form small-scale texture synthesis, and apply this GAN in a sliding-window fashion to produce large-scale images. However, traininga GAN to synthesize a speciﬁc texture takes several days, while weshow that it takes only a few minutes to ﬁne-tune a generic mate-rial acquisition network to achieve successful material transfer. Ourstrategy can also be seen as a form of few-shot learning , that aims atadapting a pre-trained model to a new category of data given onlya few examples of such data [LHM ∗

3. Method

Fig. 2 provides a visual overview of our method to extract SVBRDFparameter maps for large-scale surfaces. The main steps includepre-training a deep SVBRDF prediction network on a varied setof SVBRDFs (Fig. 2a), ﬁne-tuning this network on our exemplars(Fig. 2b), and ﬁnally using this exemplar-speciﬁc network to extractSVBRDFs similar to the exemplars over images of large surfaces,either captured on site or downloaded from the Internet (Fig. 2c).We ﬁrst describe typical inputs to our method, before explaininghow we pre-train and ﬁne-tune the deep network to achieve materialtransfer.

Our goal is to generate SVBRDF parameter maps for large-scaleplanar surfaces, such as walls, doors or furniture. To do so, ourmethod takes two forms of input. First, a single picture of the sur- face of interest, captured under ambient indoor or outdoor light-ing. Second, a series of SVBRDF patches that represent smallparts of the surface, or of a similar material. To obtain thesepatches, we either capture close-up ﬂash pictures of the surfaceand run an existing single-image SVBRDF method [DAD ∗ ×

512 pixels. Our method processes each tile independently,and generates the ﬁnal output by stitching these individual predic-tions (Fig. 2d, Sec. 3.5). Neighboring tiles have an overlap of 256pixels to facilitate subsequent stitching of their SVBRDF maps.Applying the network in this sliding-window fashion ensures thatour method has a constant memory footprint, and as such scalesto images of arbitrary resolution. In contrast, while running thenetwork in a fully-convolutional manner would also allow the pro-cessing of images of varied resolution [GLD ∗ ∗

18, LSC18, DAD ∗

19, GLD ∗ Our method processes each tile of the input image independently tooutput four Cook-Torrance SVBRDF maps [CT82], correspondingto the normal, diffuse albedo, specular albedo, and specular rough-ness of each input pixel. We perform this task with the convolu-tional neural network proposed by Deschaintre et al. [DAD ∗ ×

512 pixels. In total, the network is pre-trained for 800.000iterations, which took around 8 days on a 1080TI graphics card.Pre-training the network on a large set of SVBRDFs not only ac-celerates the subsequent ﬁne-tuning step, it also equips the networkwith general priors on material appearance, which complements theexemplar-speciﬁc priors learned during ﬁne-tuning (see Fig. 6).

A single image often does not provide enough information to re-cover SVBRDF parameters unambiguously, especially in the ab-sence of ﬂash highlights. The key idea of our work is to favor theSVBRDF parameter values present in the exemplars by ﬁne-tuningthe network on these images. In other words, we perform a number c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer RMSE over ﬁne - tuning i tera �ons Normal Di ﬀ use Roughness Specular Figure 3:

Average RMSE of predicted maps for syntheticSVBRDFs, using crops of these SVBRDFs as exemplars. The errorquickly drops in less than training iterations. Input exemplars Augmented exemplars

Figure 4:

We augment the input set of SVBRDF exemplars by com-posing them using a low-frequency random mask. of training iterations where we ask the network to predict exem-plar SVBRDF maps given a rendering of that SVBRDF as input.The network thus becomes increasingly specialized in mapping thecolor and texture of the exemplar renderings to their normal and re-ﬂectance values. We used 1000 training iterations for all our results,which takes around 2 minutes on a 1080 Ti GPU and is largely suf-ﬁcient to achieve successful transfer. Our numerical experimentssuggest that most of the improvement occurs within a few hundrediterations (Fig. 3). Once ﬁne-tuned, we run the network on eachinput tile to obtain its SVBRDF maps.

While extremely simple, the above procedure quickly overﬁts thenetwork so much on the few exemplars that it does not generalizeto input images having a different distribution of materials regions.Our solution to this challenge is to apply massive data augmenta-tion on the exemplars to obtain a training set that retains their localappearance, but varies their overall layout. We achieve this goal bygenerating, for every training iteration, a unique SVBRDF that iscomposed of pieces of two randomly-selected different exemplars.We ﬁrst apply random scaling and cropping on these exemplars,and then combine them according to a binary mask that we generateby thresholding a low-frequency Perlin noise (Fig. 4). We performall these processing steps at training time in TensorFlow [AAB ∗ The last step of our method consists in merging the predictions ofall tiles into a large-scale SVBRDF. Since all tiles are processedusing the same exemplars, neighboring tiles mostly agree in theirpredictions up to low frequency variations. We achieve a seamlesscomposite by blending the tiles over their overlap using a Gaussianweighting kernel that gives a weight of 1 at the center of the tile andreaches almost 0 at its border. This mechanism allows our methodto be applied on high-resolution inputs of arbitrary aspect ratio , asshown in our results of up to 2048 ×

4. Evaluation

We ﬁrst present results obtained by applying our method on ourown photographs as well as on internet images. We then evaluatethe impact of our ﬁne-tuning and data augmentation strategies. Fi-nally, we compare our method with alternative approaches on syn-thetic data for which we have ground truth SVBRDF maps. Pleasesee supplemental materials for high-resolution SVBRDF parame-ter maps and animated renderings of all our results. We will releaseour code and data upon acceptance to ease reproduction.

Our research was originally motivated by the need to quickly ac-quire the appearance of large-scale surfaces with minimal hard-ware. Following this ﬁrst usage scenario, we used a smartphoneto photograph a variety of planar objects. For each object, we ﬁrstcaptured a single photograph of the entire surface under ambientlighting. We then captured 1-3 close-up ﬂash photos of parts thatexhibit characteristic material features. Finally, we ran the single-image network [DAD ∗

18] to obtain SVBRDF exemplars for eachclose-up. Fig. 1 and 5 show a mosaic, tiled ﬂoors, and a sculptedwall captured on-site with this approach. Thanks to the exemplarsprovided, our method faithfully reproduces the varying shininess ofthe different tiles, and distinguishes rough stone from metal.A second usage scenario of our method is to estimate theSVBRDF maps of existing pictures, using pre-designed SVBRDFsas exemplars of similar materials. Fig. 6 shows this on three internetimages, processed with exemplars from libraries of artist-createdprocedural SVBRDFs [Ado19, Str19]. Our method transfers dif-fuse and specular reﬂectances of the exemplars across the surfacewhile conforming to the input image. In this workﬂow, the user se-lects exemplars that correspond to the materials they would like tosee over the large surface. For instance, by choosing appropriateexemplars, the golden part of the mural is successfully interpretedas having low roughness and yellow specular components, and rustis interpreted as a rough orange material. The last row of Fig. 6illustrates the behavior of our method when part of the image isnot covered by the provided exemplars. In this result, the exemplarguides the interpretation of the bricks, but not of the window. Nev-ertheless, our method also beneﬁts from generic priors on materialappearance learned during pre-training, here to interpret the darkwindow as more shiny than the brick.While pre-designed SVBRDFs provide convincing material pa-rameters, many come with normal maps that are either ﬂat or c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer HD input picture Examplars Results Rendering N o r m D i ff R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r N o r m D i ff R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r Figure 5:

Real-world surface captured on-site with our method. We used a single ﬂash picture to capture the shininess of the tiles, whichis propagated to all tiles of the large ﬂoor. We used two ﬂash pictures for the second example, one for the diffuse stone and the other onefor the more shiny metal disk. Please zoom on the .pdf to appreciate the high-resolution details of the individual SVBRDF maps. Images ofresolution × . weakly correlated to the target pictures. When this is the case, weignore the normal map produced by the ﬁne-tuned network and usethe one produced by our pre-trained network instead. All results forwhich the exemplar normal map is not shown were obtained withthis approach.Fig. 7 further demonstrates the control that the input exemplarsprovide on the output SVBRDF. The input picture contains darkand yellow pixels with little in terms of visual cues of their respec-tive shininess. We ﬁrst selected a dark diffuse and a yellow metal-lic exemplar to achieve a golden appearance. We next show howchanging the exemplar allows us to increase the roughness of thegold, or even to interpret the yellow pixels as diffuse paint. Finally,we also show how our method behaves in the presence of an outlierexemplar, which in this case gives a slight orange tint to the yellowpixels.We show in Fig. 8 a visual comparison between real photographsof a surface and renderings of the SVBRDF created with ourmethod. We used artist-designed SVBRDFs as exemplars for thiscomparison because the single-image method of Deschaintre etal. [DAD ∗

18] fails to recover convincing maps from ﬂash picturesof this complex surface (see supplemental materials for their re-sult). This experiment shows that users can reproduce the desiredoverall appearance by guiding our method with adequate exem-plars.Finally, Fig. 14 showcases a variety of SVBRDFs created withour method, either via on-site acquisition or from stock pho-tographs. Note that most of these results represent large, non-squaresurfaces encoded as high-resolution parameter maps, which con-trasts with the small material samples often shown in related work.

We use the single-image network of Deschaintre et al. [DAD ∗ To our knowledge, our method is the ﬁrst to offer by-example guid-ance for deep SVBRDF inference. We compare to related work onstyle transfer, as well as to single-image alternatives. We use syn-thetic SVBRDFs for these comparisons, which allows visual com-parison to the ground truth maps, as well as numerical evaluation.

Qualitative comparisons.

Our approach is related to the methodby Melendez et al. [MGSJW12], which transfers diffuse albedoand displacement maps using patch-based texture synthesis akinto image analogies [HJO ∗ ∗ c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer HD input picture Examplars Results Rendering D i ff R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r N o r m D i ff R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r N o r m D i ff R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r Figure 6:

Various SVBRDFs estimated from internet images. We selected artist-designed SVBRDF patches as exemplars for gold, paint, rustand bricks. Note how the shiny gold is well transferred to the yellow parts of the top panel, and how the diffuse rust is transferred to thebrown parts of the middle plate. Note also that our method produces a plausible interpretation of the window (third row), even though theprovided exemplar only contains bricks. Please zoom on the document to appreciate the high-resolution details of the individual SVBRDFmaps. Images of resolution × . [DAD ∗

18] [LDPT17] Few shot Ours OursNo Flash style transfer [DAD ∗

18] exemplar GT exemplarNormals 0.045 0.043 0.04

Diffuse 0.092 0.095 0.059

Roughness 0.215 0.195 0.142

Specular 0.016 0.015 0.021

Renderings 0.122 0.256 0.124 0.086

Numerical comparison to alternative methods using the RMSE metric (smaller is better), performed on synthetic SVBRDFs. Ourmethod outperforms existing single-image algorithms thanks to the guidance of the exemplar (only one exemplar used). We only report therendering error for [LDPT17] because this method outputs a different BRDF model than ours. c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer Patch-based synthesis lacks variety in the maps due to the lim-ited information contained in a single exemplar. While more ad-vanced synthesis algorithms exist to interpolate between limited ex-emplars [DBP ∗ ∗ Quantitative comparisons.

Table 1 shows numerical compar-isons to the single-image method of Deschaintre et al. [DAD ∗ ∗ ∗

19] and by aggregat-ing the resulting low-dimensional latent codes into a single code viamax pooling. We next process this code with three fully-connectedlayers to produce parameters for several AdaIN layers that we useto transform the feature maps of the SVBRDF prediction network.The numerical evaluation reveals that the addition of AdaIN layerscontroled by the exemplars slightly improves performance over thebaseline network of Deschaintre et al. [DAD ∗

18] for some of themaps, but is largely inferior to our results obtained after ﬁne-tuningthis baseline on augmented exemplars.

As with previous deep-learning based methods for material cap-ture [DAD ∗

18, LSC18], we cannot handle cast shadows, or anyother phenomenon that requires more than a normal/bump map.Extending our approach to handle such cases, e.g. , using a displace-ment map, would require a much more complex differentiable ren-derer to handle 3D during training. Similarly, our SVBRDF modeland renderer are not designed to handle non-local effects like sub-surface scattering.Despite the strong ability of deep learning to extract discrimi-native features, our method sometimes has difﬁculty distinguishingdifferent materials that share similar colors and textures. This is thecase in Fig. 12, where the shininess of the small metal disk is trans-ferred to some of the stones that have a similar appearance in theinput picture. Our method also assumes that the large-scale input iscaptured under largely uniform lighting. When this is not the case,large illumination gradients pollute the SVBRDF maps, as shownin Fig. 13. Nevertheless, our method is robust to localized high-lights, as some occur in the training set (see synthetic materials insupplemental materials for typical examples).Finally, while there is a theoretical limitation to the scale differ-ence that our method can handle between the exemplar and large-scale input to correctly transfer the materials, we never encounteredthis problem in our tests.

5. Conclusion

Our method alleviates inherent limitations of ﬂash-based materialacquisition methods, namely limited scale, low resolution, and lackof user control. By complementing the input image with one or afew exemplars, our approach can recover SVBRDFs of much largersurfaces, at high resolution and arbitrary aspect ratio. Furthermore,our method greatly increases the creative freedom of material de-signers by letting them create plausible SVBRDFs from existingphotographs with high-level control on their constituent materials.We achieved all these beneﬁts thanks to a surprisingly simple ﬁne-tuning strategy, which we believe to be directly applicable to othercapture and design tasks based on deep learning.

Acknowledgments

We thank Simon Rodriguez for his help with video edit-ing. This work was partially funded by an ANRT( ) CIFRE scholarship between In-ria and Optis for Ansys, ERC Advanced Grant FUNGRAPH(No. 788065, http://fungraph.inria.fr ), EPSRC EarlyCareer Fellowship (EP/N006259/1) and by software dona-tions from Adobe. The authors are grateful to Inria SophiaAntipolis - Méditerranée "Nef" computation cluster for pro-viding resources and support ( https://wiki.inria.fr/ClustersSophia/Clusters_Home ). References [AAB ∗

15] A

BADI

M., A

GARWAL

A., B

ARHAM

P., B

REVDO

E., C

HEN

Z., C

ITRO

C., C

ORRADO

G. S., D

AVIS

A., D

EAN

J., D

EVIN

M., G HE - MAWAT

S., G

OODFELLOW

I., H

ARP

A., I

RVING

G., I

SARD

M., J IA Y., c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer HD input picture Examplars Normals Diffuse Roughness Specular Rendering D i ff u s e R ough S p ec u l a r D i ff u s e R ough S p ec u l a r D i ff u s e R ough S p ec u l a r D i ff u s e R ough S p ec u l a r Figure 7:

Given the same input picture, we achieve different outcomes by changing the exemplars. In the ﬁrst row, we provide an exemplar ofa black diffuse material and an exemplar of a shiny yellow metal, which are successfully transferred to the dark and golden parts of the inputpicture respectively. In the second row, we increased the roughness of the yellow metallic exemplar, which is again successfully propagatedto the golden parts of the input. In the third row, we replaced the metallic exemplar by a yellow diffuse material, which results in a SVBRDFwhere only the diffuse map contains yellow information. Finally, in the fourth row, we included an outlier red diffuse exemplar, which ourmethod tends to mix with the yellow metal to produce a slightly orange diffuse map and a weaker specular map. Images of resolution × . c (cid:13) (cid:13)(cid:13)

Figure 8:

Comparison to real-world photographs. We reproduced the appearance of a book cover using a single picture captured underenvironment lighting, and two exemplars of blue leather and golden material. The top row shows real-world pictures of the book undervarying lighting, and the bottom row shows our renderings under similar lighting. A comparison with exemplars obtained with [DAD ∗ J OZEFOWICZ

R., K

AISER

L., K

UDLUR

M., L

EVENBERG

J., M

ANÉ

D., M

ONGA

R., M

OORE

S., M

URRAY

D., O

LAH

C., S

CHUSTER

M.,S

HLENS

J., S

TEINER

B., S

UTSKEVER

I., T

ALWAR

K., T

UCKER

P.,V

ANHOUCKE

V., V

ASUDEVAN

V., V

IÉGAS

F., V

INYALS

O., W

ARDEN

P., W

ATTENBERG

M., W

ICKE

M., Y U Y., Z

HENG

X.: TensorFlow:Large-scale machine learning on heterogeneous systems, 2015. Softwareavailable from tensorﬂow.org. 5[AAL16] A

ITTALA

M., A

ILA

T., L

EHTINEN

J.: Reﬂectance modelingby neural texture synthesis.

ACM Transactions on Graphics (Proc. SIG-GRAPH) 35 , 4 (2016). 2[Ado19] A

DOBE : Substance share, 2019. https://share.substance3d.com/.4, 5[AWL15] A

ITTALA

M., W

EYRICH

T., L

EHTINEN

J.: Two-shotSVBRDF capture for stationary materials.

ACM Trans. Graph. (Proc.SIGGRAPH) 34 , 4 (July 2015), 110:1–110:13. doi:10.1145/2766967 . 2[BJTK18] B

AEK

S.-H., J

EON

D. S., T

ONG

X., K IM M. H.: Simultane-ous acquisition of polarimetric svbrdf and normals.

ACM Transactionson Graphics (Proc. SIGGRAPH Asia) 37 , 6 (2018). 2[BSP ∗

19] B AU D., S

TROBELT

H., P

EEBLES

W., W

ULFF

J., Z

HOU

B.,Z HU J., T

ORRALBA

A.: Semantic photo manipulation with a generativeimage prior.

ACM Transactions on Graphics (Proc. SIGGRAPH) 38 , 4(2019). 4[CT82] C

OOK

R. L., T

ORRANCE

K. E.: A reﬂectance model for com-puter graphics.

ACM Transactions on Graphics 1 , 1 (1982), 7–24. 4[DAD ∗

18] D

ESCHAINTRE

V., A

ITTALA

M., D

URAND

F., D

RETTAKIS

G., B

OUSSEAU

A.: Single-image svbrdf capture with a rendering-awaredeep network.

ACM Transactions on Graphics (SIGGRAPH ConferenceProceedings) 37 , 128 (aug 2018), 15. 2, 3, 4, 5, 6, 7, 8, 10, 11, 13[DAD ∗

19] D

ESCHAINTRE

V., A

ITTALA

M., D

URAND

F., D

RETTAKIS

G., B

OUSSEAU

A.: Flexible svbrdf capture with a multi-image deepnetwork.

Computer Graphics Forum (Proceedings of the EurographicsSymposium on Rendering) 38 , 4 (July 2019). 2, 4[DBP ∗

15] D

IAMANTI

O., B

ARNES

C., P

ARIS

S., S

HECHTMAN

E.,S

ORKINE -H ORNUNG

O.: Synthesis of complex image appearance fromlimited exemplars.

ACM Transactions on Graphics (Proc. SIGGRAPH)34 , 2 (2015). 3, 8[DCP ∗

14] D

ONG

Y., C

HEN

G., P

EERS

P., Z

HANG

J., T

ONG

X.:Appearance-from-motion: Recovering spatially varying surface re-ﬂectance under unknown lighting.

ACM Transactions on Graphics (Proc.SIGGRAPH Asia) 33 , 6 (2014). 2[Don19] D

ONG

Y.: Deep appearance modeling: A survey.

Visual Infor-matics 3 , 2 (2019), 59 – 68. doi:https://doi.org/10.1016/j.visinf.2019.07.003 . 2[DTPG11] D

ONG

Y., T

ONG

X., P

ELLACINI

F., G UO B.: Appgen: In-teractive material modeling from a single image.

ACM Transactions onGraphics (Proc. SIGGRAPH Asia) 30 (2011). 4[DWT ∗

10] D

ONG

Y., W

ANG

J., T

ONG

X., S

NYDER

J., L AN Y., B EN -E ZRA

M., G UO B.: Manifold bootstrapping for svbrdf capture.

ACMTransactions on Graphics (Proc. SIGGRAPH) 29 , 4 (2010). 4[FAW19] F

RÜHSTÜCK

A., A

LHASHIM

I., W

ONKA

P.: TileGAN: Syn-thesis of large-scale non-homogeneous textures.

ACM Transactions onGraphics (Proc. SIGGRAPH) 38 , 4 (2019), 58:1–58:11. 4[FJL ∗

16] F

IŠER

J., J

AMRIŠKA

O., L

UKÁ ˇC

M., S

HECHTMAN

E.,A

SENTE

P., L U J., S

ÝKORA

D.: StyLit: Illumination-guided example-based stylization of 3d renderings.

ACM Transactions on Graphics (proc.SIGGRAPH) 35 , 4 (2016). 3, 6, 12[GGG ∗

16] G

UARNERA

D., G

UARNERA

G. C., G

HOSH

A., D

ENK

C.,G

LENCROSS

M.: BRDF Representation and Acquisition.

ComputerGraphics Forum (2016). 2 c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer Method HD input picture Examplars Results Rendering N o ﬁ n e - t un i ng N o r m a l s D i ff u s e R oughn e ss S p ec u l a r N od a t aa ug m e n t a ti on D i ff u s e R oughn e ss S p ec u l a r N o r m a l s D i ff u s e R oughn e ss S p ec u l a r O u r s D i ff u s e R oughn e ss S p ec u l a r N o r m a l s D i ff u s e R oughn e ss S p ec u l a r Figure 9:

Ablation study. The baseline single-image network of Deschaintre et al. [DAD ∗

18] interprets this weathered golden door as madeof rough plastic (ﬁrst row). Fine-tuning this network on two exemplars without data augmentation yields a uniform golden appearance(second row). Thanks to data augmentation, our method successfully distinguishes the shiny golden parts from the more diffuse dark parts(third row). See supplemental materials for additional ablation results. Image of resolution × . c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer Method HD input picture Examplar Results Rendering A d a I N N o r m a l s D i ff u s e R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r T e x t u r e s yn t h e s i s N o r m a l s D i ff u s e R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r G T N o r m a l s D i ff u s e R oughn e ss S p ec u l a r O u r s N o r m a l s D i ff u s e R ough S p ec u N o r m a l s D i ff u s e R oughn e ss S p ec u l a r Figure 10:

Comparison to neural style transfer [HB17] and patch-based texture synthesis [FJL ∗ c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer Method HD input picture Examplar Results Rendering [ L D P T ] N o r m a l s D i ff u s e R oughn e ss S p ec u l a r [ DAD ∗ ] N o ﬂ a s h N o r m a l s D i ff u s e R oughn e ss S p ec u l a r G T N o r m a l s D i ff u s e R oughn e ss S p ec u l a r O u r s , [ DAD ∗ ] e x e m p l a r s N o r m a l s D i ff u s e R ough S p ec u l a r N o r m a l s D i ff u s e R oughn e ss S p ec u l a r Figure 11:

Comparison to the single-image methods of [LDPT17] and [DAD ∗ c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer HD input picture Exemplar Results Rendering N o r m a l s D i ff u s e R ough S p ec u l a r N o r m a l s D i ff u s e R oughn e ss S p ec u l a r Figure 12:

Limitation. Our method can have difﬁculty distinguishing materials with similar colors and texture, such as this shiny metal diskthat has a similar appearance to some of the dark rough stones.

HD input picture Exemplar Results Rendering D i ff u s e R ough S p ec u l a r D i ff u s e R oughn e ss S p ec u l a r Figure 13:

Limitations. Our method is not designed to handle large illumination gradients over the surface.

Figure 14:

A variety of surfaces captured or designed with our method. See supplemental materials for animated renderings. c (cid:13) (cid:13) . Deschaintre, G. Drettakis & A. Bousseau / Guided Fine-Tuning for Large-Scale Material Transfer [GLD ∗

19] G AO D., L I X., D

ONG

Y., P

EERS

P., X U K., T

ONG

X.: Deepinverse rendering for high-resolution svbrdf estimation from an arbitrarynumber of images.

ACM Trans. Graph. 38 , 4 (July 2019), 134:1–134:15. doi:10.1145/3306346.3323042 . 2, 4, 8[HB17] H

UANG

X., B

ELONGIE

S.: Arbitrary style transfer in real-timewith adaptive instance normalization. In

ICCV (2017). 8, 12[HCL ∗

18] H E M., C

HEN

D., L

IAO

J., S

ANDER

P. V., Y

UAN

L.: Deepexemplar-based colorization.

ACM Transactions on Graphics (Proc.SIGGRAPH) 37 , 4 (2018). 3[HJO ∗

01] H

ERTZMANN

A., J

ACOBS

C. E., O

LIVER

N., C

URLESS

B.,S

ALESIN

D. H.: Image analogies.

ACM SIGGRAPH (2001). 3, 6[HLC ∗

19] H E M., L

IAO

J., C

HEN

D., Y

UAN

L., S

ANDER

P. V.: Pro-gressive color transfer with dense semantic correspondences.

ACMTrans. Graph. 38 , 2 (Apr. 2019). doi:10.1145/3292482 . 3[HSL ∗

17] H UI Z., S

UNKAVALLI

K., L EE J. Y., H

ADAP

S., W

ANG

J.,S

ANKARANARAYANAN

A. C.: Reﬂectance capture using univariatesampling of brdfs. In

IEEE International Conference on Computer Vi-sion (ICCV) (2017). 2[IZZE17] I

SOLA

P., Z HU J.-Y., Z

HOU

T., E

FROS

A. A.: Image-to-image translation with conditional adversarial networks. In

The IEEEConference on Computer Vision and Pattern Recognition (CVPR) (July2017). 3[LDPT17] L I X., D

ONG

Y., P

EERS

P., T

ONG

X.: Modeling surface ap-pearance from a single photograph using self-augmented convolutionalneural networks.

ACM Transactions on Graphics (Proc. SIGGRAPH)36 , 4 (2017). 3, 7, 8, 13[LHM ∗

19] L IU M.-Y., H

UANG

X., M

ALLYA

A., K

ARRAS

T., A

ILA

T., L

EHTINEN

J., K

AUTZ

J.: Few-shot unsupervised image-to-imagetranslation. In

The IEEE International Conference on Computer Vision(ICCV) (October 2019). 4, 8[LSC18] L I Z., S

UNKAVALLI

K., C

HANDRAKER

M.: Materials formasses: SVBRDF acquisition with a single mobile phone image.

Pro-ceedings of ECCV (2018). 2, 4, 8[LXR ∗

18] L I Z., X U Z., R

AMAMOORTHI

R., S

UNKAVALLI

K., C

HAN - DRAKER

M.: Learning to reconstruct shape and spatially-varying re-ﬂectance from a single image.

ACM Transactions on Graphics (Proc.SIGGRAPH Asia) (2018). 2[LYY ∗

17] L

IAO

J., Y AO Y., Y

UAN

L., H UA G., K

ANG

S. B.: Visualattribute transfer through deep image analogy.

ACM Transactions onGraphics (Proc. SIGGRAPH) 36 , 4 (2017). 3[MGSJW12] M

ELENDEZ

F., G

LENCROSS

M., S

TARCK

J., J. W

ARD

G.:Transfer of albedo and local depth variation to photo-textures. pp. 40–48. doi:10.1145/2414688.2414694 . 3, 6[NLGK18] N AM G., L EE J. H., G

UTIERREZ

D., K IM M. H.: Practi-cal svbrdf acquisition of 3d objects with unstructured ﬂash photography.

ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 37 , 6 (2018). 2[RPG16] R

IVIERE

J., P

EERS

P., G

HOSH

A.: Mobile surface reﬂectome-try.

Computer Graphics Forum 35 , 1 (2016). 2[RRFG17] R

IVIERE

J., R

ESHETOUSKI

I., F

ILIPI

L., G

HOSH

A.: Polar-ization imaging reﬂectometry in the wild.

ACM Transactions on Graph-ics (Proc. SIGGRAPH) (2017). 2[RWS ∗

11] R EN P., W

ANG

J., S

NYDER

J., T

ONG

X., G UO B.: Pocketreﬂectometry.

ACM Transactions on Graphics (Proc. SIGGRAPH) 30 , 4(2011). 2[SCI18] S

HOCHER

A., C

OHEN

N., I

RANI

M.: "zero-shot" super-resolution using deep internal learning. In

The IEEE Conference onComputer Vision and Pattern Recognition (CVPR) (June 2018). 4[SDM19] S

HAHAM

T. R., D

EKEL

T., M

ICHAELI

T.: Singan: Learning agenerative model from a single natural image. In

The IEEE InternationalConference on Computer Vision (ICCV) (October 2019). 4[Str19] S

TRUFFEL P RODUCTIONS : Cc0textures, 2019.https://cc0textures.com/. 4, 5 [TFK ∗

20] T

EXLER

O., F

UTSCHIK

D., K

U ˇCERA

M., J

AMRIŠKA

O., S O - CHOROVÁ R ., C

HAI

M., T

ULYAKOV

S., S

ÝKORA

D.: Interactive videostylization using few-shot patch-based training.

ACM Transactions onGraphics (Proc. SIGGRAPH) 39 , 4 (2020). 3[UVL18] U

LYANOV

D., V

EDALDI

A., L

EMPITSKY

V.: Deep imageprior. In

The IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR) (June 2018). 4[WAM02] W

ELSH

T., A

SHIKHMIN

M., M

UELLER

K.: Transferringcolor to greyscale images.

ACM Transactions on Graphics (Proc. SIG-GRAPH) 21 , 3 (2002). 3[WLZ ∗

18] W

ANG

T., L IU M., Z HU J., T AO A., K

AUTZ

J., C

ATANZARO

B.: High-resolution image synthesis and semantic manipulation withconditional gans. In (2018), pp. 8798–8807. 3[WSM11] W

ANG

C.-P., S

NAVELY

N., M

ARSCHNER

S.: Estimatingdual-scale properties of glossy surfaces from step-edge lighting.

ACMTransactions on Graphics (Proc. SIGGRAPH Asia) 30 , 6 (2011). 2[YLD ∗

18] Y E W., L I X., D

ONG

Y., P

EERS

P., T

ONG

X.: Single im-age surface appearance modeling with self-augmented cnns and inexactsupervision.

Computer Graphics Forum 37 , 7 (2018), 201–211. 3[ZPIE17] Z HU J.-Y., P

ARK

T., I

SOLA

P., E

FROS

A. A.: Unpairedimage-to-image translation using cycle-consistent adversarial networks.In

Computer Vision (ICCV), 2017 IEEE International Conference on (2017). 3 c (cid:13) (cid:13)(cid:13)