Neural Smoke Stylization with Color Transfer
Fabienne Christen, Byungsoo Kim, Vinicius C. Azevedo, Barbara Solenthaler
EEUROGRAPHICS 2020/ F. Banterle and A. Wilkie
Short Paper
Neural Smoke Stylization with Color Transfer
Fabienne Christen, Byungsoo Kim, Vinicius C. Azevedo and Barbara Solenthaler
ETH Zürich
Abstract
Artistically controlling fluid simulations requires a large amount of manual work by an artist. The recently presented transport-based neural style transfer approach simplifies workflows as it transfers the style of arbitrary input images onto 3D smokesimulations. However, the method only modifies the shape of the fluid but omits color information. In this work, we thereforeextend the previous approach to obtain a complete pipeline for transferring shape and color information onto 2D and 3Dsmoke simulations with neural networks. Our results demonstrate that our method successfully transfers colored style featuresconsistently in space and time to smoke data for different input textures.
CCS Concepts • Computing methodologies → Physical simulation; Image processing; Neural networks;
1. Introduction
Physically-based fluid simulations have become an integral partof special effects in movie production and graphics for computergames. However, artistic control of such simulations is not wellsupported and hence remains tedious, resource intensive and costly.Recent work on fluid control include target-driven optimization tofind artificial forces to match given keyframes [TMPS03, PM17]and velocity synthesis methods that allow augmentation with tur-bulent structures [TKG08, SSN08]. With a neural flow stylizationapproach [KAGS19], more complex styles and semantic structureshave been transferred in a post-processing step. Features from nat-ural images are transferred onto smoke simulations, enabling gen-eral content-aware manipulations ranging from simple patterns tointricate motifs. The method is physically inspired, as it computesthe density transport from a source input smoke to a desired tar-get configuration. Stylizations from different camera viewpointsare merged to compute a 3D reconstruction of the smoke. Whilestructural information is successfully transferred onto smoke data,color information was omitted. However, transferring texture infor-mation represents a valuable control tool for artists to change theappearance of a fluid. Our work therefore extends the transport-based neural flow stylization of Kim et al. [KAGS19] with a sub-sequent color optimization step that allows artists to control bothstyle and color based on example images. The application is relatedto [JFA ∗
15] that uses a flow-guided synthesis approach to transfertextures onto fluids.Flow stylization approaches extend existing image style transfermethods with spatio-temporal constraints. In the image process-ing literature, [GEB16] automated the style transfer with neuralnetworks and introduced several ways for the user to control thestylization effects [GEB ∗
2. Preliminaries
Our approach for colorized smoke stylization is based on the orig-inal neural style transfer for images [GEB16] and the transport-based neural style transfer for fluid simulations [KAGS19], whichare briefly introduced in the following.
Neural Style Transfer (NST) is the process of synthesizing an im-age I from a style image I S and a content image I C through opti-mization using a convolutional neural network (CNN). The CNN c (cid:13) (cid:13) a r X i v : . [ c s . G R ] D ec . Christen et al. / Neural Smoke Stylization with Color Transfer is trained for natural image classification and its layers provide thefeature space for the stylization. Using this CNN, the stylizationcan be formulated as an optimization problem as I ∗ = arg min I α L c ( I , I C ) + β L s ( I , I S ) , (1)where L c is the content loss, L s is the style loss and α and β areweighing factors. The content loss is spatially aware and aims atpreserving the overall structure of I C in the synthesized image. Thestyle loss on the other hand optimizes for style structures indepen-dently of their image position. Let F lI be the feature representationof image I on layer l . The content loss L C and the style loss L S canthen be formulated as L c ( I , I C ) = ∑ l ( F lI − F lC ) (2) L s ( I , I S ) = ∑ l ( G lI − G lI S ) (3)where G lX = ( F lX ) T ( F lX ) is the Gram matrix of the feature represen-tation on layer l of an image X . Transport-Based Neural Style Transfer (TNST) extends the originalNST algorithm to transfer the appearance of a given image to flow-based smoke density. As opposed to NST where the stylized imageis optimized, the optimization formulation for TNST outputs a ve-locity field. Consequently, no image pixels are modified directly.Instead, the input density d is transported by the optimized veloc-ity field v ∗ to obtain the final stylized density d ∗ . v ∗ and d ∗ areobtained through optimization analogously to Equation 1 with v ∗ = arg min v α L c ( R θ ( T ( d , v )) , I C ) + β L s ( R θ ( T ( d , v )) , I S ) (4) d ∗ = T ( d , v ∗ ) . (5)The transport function T ( d , v ) advects the density by the given ve-locity. The renderer R θ ( d ) renders a 2D greyscale image of thedensity d at viewpoint angle θ . Several viewpoints can be selectedfor the optimization to avoid distortions in the final stylized 3Ddensity d ∗ . The loss functions L C and L S maintain their definitionsfrom Equation 2 and 3. The content loss can be neglected in ourcase, as we only have a style image and there is no content thatneeds to be preserved.To extend the single frame stylization to multiple frames in atime coherent way, TNST aligns the stylization velocities with theinput velocities. This is done recursively for a pre-defined windowsize. Increasing the windows size enhances smoothness betweenconsecutive frames, but simultaneously leads to larger memory re-quirements due to the recursive nature of the velocity alignment.
3. Method
Our method uses both NST and TNST as illustrated in Figure 1. Ina first step, TNST is applied to the input frames of the smoke sim-ulation to transfer structural information. This step corresponds tothe approach of Kim et al. [KAGS19], and optimizes density valuesat each point. In a second step, we apply a color style optimizationthat modifies the color at each point while keeping the density val-ues constant.
Figure 1:
Two step pipeline for colorized image style transfer.
In the second step of the pipeline, color is added to the stylizedmask d ∗ from the previous step. This part creates and optimizescolor channels for d ∗ , but does not further modify the density mask.The colorization process is performed using the original NST algo-rithm with a few alterations. Again, the desired style is given by thestyle image I S and there is no content to preserve or transfer. Hence,we formulate the color style optimization as a simplified version ofEquation 1 without content loss: d ∗ RGB = arg min d L s (( R RGB , θ ( d ) , I S ) . (6)Since color information is now relevant for the optimization, therenderer R RGB , θ ( d ) produces a 2D color image from viewpoint θ .The proper initialization is crucial for the success of the colorstyle optimization. As opposed to the original NST, there is nocontent loss, so any bias that is introduced in the initial conditioncan persist in the output. When starting the color style optimiza-tion from the stylized density d ∗ , the initial pixel values of the areathat will be stylized are close to white. This leads to washout ef-fects as shown in Figure 2(a). For the result in Figure 2(b) on theother hand, the stylized mask d ∗ is initially multiplied pointwisewith white noise as shown on Figure 2(c). This initial conditionconverges to a satisfying result. (a) Output from density (b) Output from noise (c) Noise mask Figure 2:
Influence of initialization with a) stylized density only, b)density combined with noise as shown on c).
The color style optimization needs to be constrained to only op-timize on the pixels that actually contain density. We obtain a guid-ance mask T l by downsampling d ∗ to the size of each layer l thatwas selected for the style feature extraction and apply it to the stylefeature representation on layer l with [GEB ∗ F lG ( I ) = T l ◦ F lG ( I ) , (7) c (cid:13) (cid:13)(cid:13)
The color style optimization needs to be constrained to only op-timize on the pixels that actually contain density. We obtain a guid-ance mask T l by downsampling d ∗ to the size of each layer l thatwas selected for the style feature extraction and apply it to the stylefeature representation on layer l with [GEB ∗ F lG ( I ) = T l ◦ F lG ( I ) , (7) c (cid:13) (cid:13)(cid:13) . Christen et al. / Neural Smoke Stylization with Color Transfer where ◦ denotes element-wise multiplication. This way of guidingthe stylization will lead to some overflow at the boundaries, becausethe receptive fields of neurons near the boundaries can overlap themasked out regions. This overflow can be removed from the finalstylized density by applying the guidance mask once in the end. Both renderers R θ and R RGB , θ are part of the optimization pipelineand therefore need to be differentiable and lightweight. R θ rendersthe smoke by calculating the pixel intensity along a ray in normaldirection to the camera as proposed by [KAGS19]. More specifi-cally, the transmittance τ ( x ) and the intensity I at each image pixel i j are defined [FWKH17] as τ ( x ) = e − γ (cid:82) rmax d ( r ij ) dr (8) I i j = (cid:90) r max d ( r i j ) τ ( r i j ) dr . (9)The transmittance factor γ defines how much light is lost due toabsorption and scattering, d ( x ) evaluates the amount of density atpoint x , r i j is the ray through pixel i j normal to the camera and r max is the length of the ray. For the color style optimization, weextend this formulation to support color fields. The C = { R , G , B } emission values at each pixel i j are computed with C i j = (cid:90) r max C ( r i j ) d ( r i j ) τ ( r i j ) dr . (10)The density d i j is multiplied into the emitted colors and can be seenas the emission factor at each point. Note that the RGB emissionvalues are normalized to [ .. ] . The impact of the transmittancevalue onto the colorized result is illustrated in Figure 3. where R ( r ij ) , G ( r ij ) and B ( r ij ) evaluate the red, green and blue color field at pixel ij . Thedensity d ij is still multiplied into the emitted colors and can be understood as the emissionfactor at each point. Figure 4.12 shows the output of the differentiable renderer R ✓ and thedifferentiable color renderer R RGB,✓ for multiple transmittance factors.
Figure 4.12:
Output of the differentiable renderer without and with color channels for different transmit-tance factors = [0 . , . , . , , . , . The emitted colors
RGB need to be normalized in a meaningful way as the calculations fromEquation 4.13 are not bound to the original color range from 0 to 255. Thus, in case of anyoverflow occurring, the new color range is mapped back to the original range from 0 to 255with ˆ R ij = R ij RGB max ⇥ G ij = G ij RGB max ⇥ B ij = B ij RGB max ⇥ (4.14)where RGB max is defined as the scalar maximum value over all three color channels. It isimportant to only perform this operation if the maximum color value effectively exceeds theoriginal color range to avoid rescaling the brightness of the input image.
By now, the pipeline, including the new rendering step, is able to produce a convincing lookingstylized result if it is rendered at the viewpoint angle ✓ that was used during the optimization.24 Figure 3:
Output of the differentiable renderer without andwith color channels for different transmittance factors γ =[ . , . , . , ] . We used the VGG-19 network [SZ15] for the feature extraction,which consists of 19 layers and has been trained for natural image classification. The stylization can be controlled by selecting layersin the CNN. The deeper a layer is positioned in the CNN, the higheris the complexity of the extracted features, as illustrated in Figure4(a) for two different input images. The shallow layers optimize forlow-level features, while deeper layers generate high-level features.The size of the stylized features depends on the size of the inputimage. Tiling can be used to progressively increase the input sizeto generate smaller scale structures as shown in Figure 4(b). (a) Feature complexity increases with deeper layers of the CNN(’ relu1_1 ’.. ’ relu4_1 ’).(b) Structure size can be controlled by tiling the input.
Figure 4:
Stylized structures can be controlled by selecting corre-sponding layers in the CNN and tiling the input image.
4. Results
We implemented the stylization with TensorFlow and used theAdam optimizer with a learning rate of 0.5 and 1 for the 2D and3D examples, respectively, for 300 iterations. For our results, weselected the layers ’ relu2_1 ’ and ’ relu3_1 ’ of the VGG-19 networkfor the feature extraction.We applied the style and color transfer to the 2D smoke dataset of [JFA ∗
15] using different input images as shown in Figure5. Color information is transferred coherently in space and time(see accompanying video sequences † ), and hence complements themask stylization of [KAGS19].The 3D results were computed with a data set of [KAGS19],and shows the colorized outcome with the 3D pipeline that opti-mizes for multiple viewpoints as described in the original paperof [KAGS19]. The lightweight and hence efficient differentiablecolor renderer is sufficient to capture the most relevant structures.We illustrate this by comparing the 3D results with their 2D coun-terparts in Figure 6. † https://youtu.be/TyNlaBoP6oI c (cid:13) (cid:13) . Christen et al. / Neural Smoke Stylization with Color Transfer Figure 5:
2D single frame color stylization applied to a data set of[JFA ∗
15] using different input images (blue strokes, flower, flame,fire and volcano). (a) Obtainedwith3Dpipeline (b) Obtainedwith2Dpipeline(c) Obtainedwith3Dpipeline (d) Obtainedwith2Dpipeline
Figure5.13(cont.):
Comparisonofoptimizationwith3Dpipeline(left)and2Dpipeline(right). (a) Obtainedwith3Dpipeline (b) Obtainedwith2Dpipeline(c) Obtainedwith3Dpipeline (d) Obtainedwith2Dpipeline Figure5.13(cont.):
Comparisonofoptimizationwith3Dpipeline(left)and2Dpipeline(right). Figure 6:
3D (left) and 2D (right) color stylization applied to adata set of [KAGS19] for two input images (blue strokes and fire).
5. Conclusion
In this work we extended an existing flow stylization approach byadding color transfer. The color stylization is coherent in space andtime, and can be applied to 2D and 3D smoke densities. Our methoddirectly optimizes for the stylized images during the training stagein an online fashion. Other research in the field of neural style trans-fer explores model-optimization based offline techniques. This typeof style transfer technique moves the time intensive optimizationinto the phase of training the model, thereby gaining the advan- tage of stylizing images in a single forward pass. Using this opti-mization method would greatly reduce the time that the stylizationtakes. Further, for the best outcome, the differentiable renderer thatis used in the optimization should match the final high-quality ren-dering of the smoke. Our differentiable renderer could be adaptedaccordingly but at the cost of increased computation time.
6. Acknowledgments
The authors would like to thank Ondrej Jamriska for sharing hisdataset. This work was supported by the Swiss National ScienceFoundation (Grant No. 200021_168997).
References [FWKH17] F
ONG
J., W
RENNINGE
M., K
ULLA
C., H
ABEL
R.: Pro-duction volume rendering: Siggraph 2017 course. In
ACM SIGGRAPHCourses (2017), pp. 2:1–2:79. 3[GEB16] G
ATYS
L. A., E
CKER
A. S., B
ETHGE
M.: Image style transferusing convolutional neural networks. In
CVPR (2016), pp. 2414–2423.1[GEB ∗
17] G
ATYS
L. A., E
CKER
A. S., B
ETHGE
M., H
ERTZMANN
A.,S
HECHTMAN
E.: Controlling perceptual factors in neural style transfer.In
CVPR (2017), pp. 3730–3738. 1, 2[JFA ∗
15] J
AMRIŠKA
O., F
IŠER
J., A
SENTE
P., L U J., S
HECHTMAN
E., S
ÝKORA
D.: LazyFluids: Appearance transfer for fluid animations.
ACM Transactions on Graphics 34 , 4 (2015). 1, 3, 4[KAGS19] K IM B., A
ZEVEDO
V. C., G
ROSS
M. H., S
OLENTHALER
B.: Transport-based neural style transfer for smoke simulations.
ACMTrans. Graph. (SIGGRAPH Asia) 38 , 6 (2019), 188:1–188:11. 1, 2, 3, 4[LADL18] L I T.-M., A
ITTALA
M., D
URAND
F., L
EHTINEN
J.: Differ-entiable Monte Carlo ray tracing through edge sampling.
ACM Trans.Graph. 37 , 6 (Dec. 2018), 222:1–222:11. 1[LB14] L
OPER
M. M., B
LACK
M. J.: OpenDR: An approximate differ-entiable renderer. In
European Conference on Computer Vision (ECCV) (2014), vol. 8695, pp. 154–169. 1[LPSB17] L
UAN
F., P
ARIS
S., S
HECHTMAN
E., B
ALA
K.: Deep photostyle transfer. In
Computer Vision and Pattern Recognition (CVPR) (2017), pp. 6997–7005. 1[LXNC17] L I S., X U X., N IE L., C
HUA
T.-S.: Laplacian-steered neuralstyle transfer. In
ACM International Conference on Multimedia (MM) (2017), pp. 1716–1724. 1[MNDJ19] M
ERLIN N IMIER -D AVID D ELIO V ICINI
T. Z., J
AKOB
W.:Mitsuba 2: A retargetable forward and inverse renderer.
ACM Transac-tions on Graphics (2019). 1[PM17] P AN Z., M
ANOCHA
D.: Efficient solver for spacetime control ofsmoke.
ACM Transactions on Graphics 36 , 4 (2017). 1[RDB18] R
UDER
M., D
OSOVITSKIY
A., B
ROX
T.: Artistic style trans-fer for videos and spherical images.
International Journal of ComputerVision 126 , 11 (2018), 1199–1219. 1[RWB17] R
ISSER
E., W
ILMOT
P., B
ARNES
C.: Stable and controllableneural texture synthesis and style transfer using histogram losses, 2017. arXiv:1701.08893 . 1[SSN08] S
YUHEI S ATO Y OSHINORI D OBASHI
T. K., N
ISHITA
T.:Example- based turbulence style transfer.
ACM ToG 37 , 4 (2008). 1[SZ15] S
IMONYAN
K., Z
ISSERMAN
A.: Very deep convolutional net-works for large-scale image recognition. In
International Conference onLearning Representations (ICLR) (2015). 3[TKG08] T
HEODORE K IM N ILS T HUEREY
D. J., G
ROSS
M.: Waveletturbulence for fluid simulation.
ACM ToG 27 (2008). 1[TMPS03] T
REUILLE
A., M C N AMARA
A., P
OPOVI ´C
Z., S
TAM
J.:Keyframe control of smoke simulations.
ACM ToG 22 , 3 (2003), 716.1 c (cid:13) (cid:13)(cid:13)