[PDF] Learning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects

Abstract

Accurate subsurface scattering solutions require the integration of optical material properties along many complicated light paths. We present a method that learns a simple geometric approximation of random paths in a homogeneous volume of translucent material. The generated representation allows determining the absorption along the path as well as a direct lighting contribution, which is representative of all scattering events along the path. A sequence of conditional variational auto-encoders (CVAEs) is trained to model the statistical distribution of the photon paths inside a spherical region in presence of multiple scattering events. A first CVAE learns to sample the number of scattering events, occurring on a ray path inside the sphere, which effectively determines the probability of the ray being absorbed. Conditioned on this, a second model predicts the exit position and direction of the light particle. Finally, a third model generates a representative sample of photon position and direction along the path, which is used to approximate the contribution of direct illumination due to in-scattering. To accelerate the tracing of the light path through the volumetric medium toward the solid boundary, we employ a sphere-tracing strategy that considers the light absorption and is able to perform statistically accurate next-event estimation. We demonstrate efficient learning using shallow networks of only three layers and no more than 16 nodes. In combination with a GPU shader that evaluates the CVAEs' predictions, performance gains can be demonstrated for a variety of different scenarios. A quality evaluation analyzes the approximation error that is introduced by the data-driven scattering simulation and sheds light on the major sources of error in the accelerated path tracing process.

Full PDF

LLearning Multiple-Scattering Solutions for Sphere-Tracing ofVolumetric Subsurface Effects – (P

REPRINT ) L. Leonard

University of [email protected]

K. Höhlein

Technical University of [email protected]

R. Westermann

Technical University of [email protected]

Figure 1: Subsurface scattering simulation for homogeneous media with varying density and high anisotropy ( 𝑔 = . ) using standardpath-tracing (left) vs. our approach – a sphere-tracing algorithm, based on data-driven learning of multiple-scattering statistics(right). 5000 light paths per pixel are simulated. Density doubles from left to right, significantly increasing the number of scatteringevents to be considered during path-tracing. Our approach summarizes many scattering events into a single sphere-tracing step andthus minimizes performance loss due to exceedingly long scattering chains. ABSTRACT

Accurate subsurface scattering solutions require the integration ofoptical material properties along many complicated light paths. Wepresent a method that learns a simple geometric approximation ofrandom paths in a homogeneous volume of translucent material. Thegenerated representation allows determining the absorption along thepath as well as a direct lighting contribution, which is representativeof all scattering events along the path. A sequence of conditionalvariational auto-encoders (CVAEs) is trained to model the statisticaldistribution of the photon paths inside a spherical region in pres-ence of multiple scattering events. A first CVAE learns to samplethe number of scattering events, occurring on a ray path inside thesphere, which effectively determines the probability of the ray beingabsorbed. Conditioned on this, a second model predicts the exitposition and direction of the light particle. Finally, a third model gen-erates a representative sample of photon position and direction alongthe path, which is used to approximate the contribution of direct illu-mination due to in-scattering. To accelerate the tracing of the lightpath through the volumetric medium toward the solid boundary, weemploy a sphere-tracing strategy that considers the light absorptionand is able to perform statistically accurate next-event estimation.We demonstrate efficient learning using shallow networks of onlythree layers and no more than 16 nodes. In combination with a GPUshader that evaluates the CVAEs’ predictions, performance gains can be demonstrated for a variety of different scenarios. A qualityevaluation analyzes the approximation error that is introduced bythe data-driven scattering simulation and sheds light on the majorsources of error in the accelerated path tracing process.

The realistic rendering of translucent solid materials such as waxor skin requires simulating the scattering of light in the interior ofthe body. Subsurface scattering refers to the mechanism where lightthat penetrates the boundary of a translucent object is scattered inter-nally until it exits the body at a different location on the boundary.Due to the internal scattering of light and the resulting complicatedlight paths that have to be considered in the simulation, an accu-rate subsurface scattering solution becomes computationally veryexpensive.To reduce the computational requirements, approximation meth-ods exploit special shape and material configurations, such as planarsurface geometry, isotropic scattering, or separability of the bidirec-tional scattering-surface reflectance distribution function (BSSRDF)into location and direction dependent terms. While it has been shownthat efficient analytic transport solutions can be developed for somespecial cases, rendering quality decreases when the resulting algo-rithms are applied to scene configurations beyond the scope of therestrictive assumptions. To improve the generalizability of rendering a r X i v : . [ c s . G R ] N ov . Leonard, K. Höhlein, and R. Westermann acceleration solutions, there has been growing interest in data-drivenapproaches that can infer approximate multiple-scattering solutionsat render time.One of the earliest methods precomputes long-distance light trans-port through a homogeneous medium using transition probabilitiesat varying spatial scales [33]. This so-called shell tracing approachenables to move rays through the volume in large steps withoutconsidering individual scattering events. It builds upon the con-cept of sphere-tracing for surface rendering [15], to adapt the stepsize to how deep the ray is in the solid body. Recently, for render-ing volumetric media without an explicit boundary representation,deep-scattering [21] proposed to fed geometric information into anetwork by progressively evaluating a hierarchical volumetric ge-ometry descriptor. Inference is based on an expressive multi-layerneural network. To address specifically the issue of boundary effects,a method for learning a shape-aware BSSRDF model from groundtruth volumetric transport simulations using path tracing has beenproposed by Vicini et al [47]. The method avoids many limitingassumptions of prior analytic models regarding the planarity of sur-faces and isotropy of volumetric transport, and it generalizes thebuilt-in notion of spatio-directional separability. It assumes a localpolynomial surface expansion around a shading point, and uses thisexpansion to re-project predicted off-surface samples back onto thesurface.Our work builds upon prior works in data-driven subsurface scat-tering simulation, by leveraging the core idea of "bypassing a poten-tially lengthy internal scattering process" [47] through a networkthat has learned to sample outgoing light locations on an object’ssurface from an incident light location. In contrast to existing workin subsurface scattering simulation, our method avoids an explicitencoding of the local surface geometry in the stochastic modelling ofthe scattering process, and it also infers the outgoing light directionfrom a reference distribution generated via volumetric path tracing,instead of using importance sampling. To achieve this, we com-bine shell tracing with network-based learning of long-distance lighttransport between a point and a spherical surface centered at thispoint. A sequence of neural-network-based conditional variationalauto-encoders (CVAEs) is trained to model the statistical distributionof light paths and absorption along them inside a spherical region inpresence of multiple scattering events. A first CVAE learns to sam-ple the number of scattering events, i.e., the absorption, occurringalong a path inside the sphere. Conditioned on this, a second modelpredicts exit position and continuation direction of this path. Byusing shell tracing, light paths can be traced efficiently through thevolumetric medium toward the solid boundary without an explicitencoding of the boundary geometry.Furthermore, instead of modelling purely the one-to-one lighttransport between two surface points, our method supports exten-sions of volumetric path tracing, including direct in-scattering toachieve improved convergence. For this, a third network modelgenerates a representative sample of photon position and directionalong the path, which is used to approximate the contribution ofdirect illumination due to in-scattering. In combination with exist-ing approaches for simulating the direct interior illumination, thepath-tracing solution converges quickly even when none of the pathsarrives directly at a light source. We jointly train all three networks on a spherical geometry withunitary radius and varying interior density (different radii can beconsidered by adjusting the density appropriately). In the trainingand validation phase, ground truth transport solutions using MonteCarlo path tracing are used. An interesting result is that all proposednetworks are light-weight, comprised of only up to three layers withno more than 16 nodes each. Despite the resulting compact internallatent-space representations, very realistic reconstructions of themodeled distributions are obtained. Due to this compactness, allinferencing steps can be implemented efficiently using short shaderprograms and matrix multiplication instructions on the GPU. Weanalyze the performance and accuracy of our approach on differ-ent geometry and material properties, and we also investigate theinferencing skills of each of our network models individually. Ourmethod achieves quality en par with that of a reference Monte Carlopath tracing solution, at significantly improved performance. Compared to traditional ray tracing algorithms that can assume afree traverse of the light on the vacuum between surfaces, ray tracingparticipating media represents a superior level in computationalcomplexity. For some medium such as milk and clouds, the highvalue of the scattering albedo (i.e. the ratio of energy remaining afterscattering) entails a large number of light bounce simulations beforethe ray is absorbed, leaves the volume or hits a surface. Due to theplethora of possible material interactions, the rendering equationfor realistic scene settings does not possess a closed-form solutionin presence of participating media. Thus, for synthesizing realisticimages, approximate solutions are required.Traditional approximation schemes include the exclusion of higher-order scattering effects, leading to the single-scattering approx-imation [46] – which under certain circumstances can admit aclosed-form solution [40] – or approaches based on luminance diffu-sion [9, 20]. Especially approximations of the latter type, however,typically impose severe assumptions on homogeneity of the con-sidered material and surface geometry, which effectively disablesapplication of these approaches in more complex settings. Evenincreasing model capacity, for example by considering dipole diffu-sion [19], does not resolve the shape restrictions. A more thoroughreview of traditional rendering techniques for participating mediacan be found in [3]

Monte Carlo methods

More flexible approaches arise from the ap-plication of Monte Carlo path tracing algorithms [28] for numericallysolving the rendering equation for participating media [14, 20, 44].Though being computationally expensive, Monte Carlo path tracingin a converged setting yields physically accurate images which canserve as a "ground-truth" for validating alternative approaches. Therecent survey by Novak et al. [36] reviews the latest advances inMonte Carlo path tracing methods for solving the light transport inparticipating media.In path tracing and as an alternative to ray-marching with constantstep size, delta tracking is often used to determine free path lengthsaccording to the optical depth in the participating medium. As an im-portance sampling of the corresponding probability density function,Woodcock tracking [51] adjusts the sampling distances so that denseregions in a volume can be sampled appropriately, and it has been earning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects – (P

REPRINT ) adapted to achieve unbiased sampling of sparse inhomogeneousmedia [53]. Free path sampling with probabilities not necessarilyproportional to the volume transmittance has been realized by meansof weighted delta tracking approaches, e.g. [25, 38, 43]. All thesetechniques reduce noise in the estimated light paths for participat-ing media, and they enable an automatic selection of the step sizeaccording to the underlying material distribution.A different approach to move rays through the volume in adaptivesteps even without considering individual scattering events is shelltracing [33]. It builds upon a statistical model that is based on a dis-cretized array of transition probabilities, limiting the expressivenessof the simulated light transport paths. In the spirit of sphere-tracingfor surface rendering [15], it performs larger step in the interior ofa body and successively decreases the step size towards the objectboundary. Diffusion methods

Despite the large generality of Monte CarloRendering, a variety of methods have been developed, which arebased on diffusion approximations. Especially in the field of sub-surface scattering, computation times can explode when highly-dense materials require translucent renderings. Though the light doesnot enter the bodies deeply, Monte-Carlo approaches may run intolengthy sequences of frequent scattering or reflection events right un-derneath the surface. Diffusion models, which approximate the lighttransport by virtue of bidirectional scattering-surface reflectance dis-tribution functions (BSSRDFs) may in these cases lead to speed-ups.While early approaches, like [19], can hardly keep up with the qual-ity of ray-traced images, more recent methods like [7, 10, 52] cangenerate visually appealing results at reduced rendering time, how-ever, at the cost of reduced physical realism. Notably, also hybridmethods exist, which weight out the advantages and disadvantagesof Monte Carlo-based and diffusion-based rendering against eachother [12].

Machine Learning for Rendering

As Monte Carlo-based render-ing can intrinsically be described as a statistical problem, the useof machine learning methods for rendering has recently attractedgreat interest. In the context of path tracing the goal is typically touse machine learning models to model sampling distributions orcreate guidance for efficient sample selection. Popular modelingapproaches include Gaussian mixture models [17, 48], and neuralnetwork methods.

Neural network-based methods

In the context of path tracing,deep learning methods can be classified into image-based adaptivesampling and denoising algorithms on the one hand, and multiple-scattering approximations on the other hand.The recent approach by Kuznetsov et al. [26] facilitates neuralnetworks for learning adaptivity in Monte-Carlo path-tracing anddenoising of the final image. A first network learns to adapt thenumber of additional paths from an initial image at the target resolu-tion, which is generated via one path per pixel. A second denoisingnetwork learns to model the relationship between an image withincreased variance in the color samples to the ground truth rendering[31, 42]. Recently, Weiss et al. [50] used neural networks to learn thepositions of sample locations in image space from a low resolutionimage, using a differentiable sampling stage as well as a differen-tiable image reconstruction stage that can work on a sparse set ofsamples. Related to denoising approaches for path traced images isthe screen space shading approach by Nalbach et al. [35], where a network is trained to infer missing shading information from imageinformation in a view-dependent G-buffer.For surface graphics, Zheng and Zwicker have used neural net-works to model the relationships between scene parameters and lightpaths [55]. Recently, deep-scattering [21] employed a radiance-predicting neural network for simulating scattering events in clouds.Geometry information is fed into the network by progressively eval-uating a hierarchical volumetric geometry descriptor, and inferenceis based on an expressive multi-layer neural network, optimizedwith recent design features. Nevertheless, the method was devel-oped for cloud rendering and does, as such, not explicitly accountfor boundary effects, which may play a major role in sub-surfacescattering. Adrressing specifically this issue, a method for learning ashape-aware BSSRDF model from ground truth volumetric transportsimulations using path tracing has been proposed [47]. Exploitingthe expressiveness of neural network models in a similar way, thismethod avoids many limiting assumptions of prior analytic mod-els regarding the planarity of surfaces and isotropy of volumetrictransport, and it generalizes the built-in notion of spatio-directionalseparability. It assumes a locally expandable surface geometry, whichis explicitly encoded using a trivariate polynomial around a shadingpoint. In this way, an approximate signed-distance function to thesurface geometry can be used to model the outgoing on-surface lightdistribution.In [47], a learned approximation of all the subsurface scatteringallows them to overcome the limitation of the planar approximation,fitting the geometry with a cubic polynomial. Nevertheless, theirassumption of the final outgoing direction with a uniform distributionand a single polynomial approximation to the whole geometry biasesthe final result.Another idea, considering the light scattering in a medium as arandom process (similar to the walk-on-sphere statistic technique[34][27]) is to have a short-cut of all possible scattering inside asphere with homogeneous density. This model can be used to sim-ulate all those “events” as a single step in a Montecarlo technique.This strategy represents an acceleration rather than an approximation,and removes any bias regarding the geometry, limitation of othertechniques. In [33] they propose a method named “shell-tracing”dealing with the rendering of discrete random media using precom-puted scattering distributions in form of matrix factorization. Withthis approach, the number of variables considered is reduced to threedue to the increment of the matrix complexity if a higher dimensionis considered.

Our key idea is to train generative statistical models to bypass multi-ple scattering events occurring inside a spherical volume of materialwith constant density and potentially anisotropic scattering character-istics, and to use the learned representations in the rendering processto generate paths through the volume using large steps (Fig. 2a).Whenever the ray arrives at a sampling point, it tests for the largeststep it can make without leaving the volume, and then evaluates asequence of statistical models to infer the end position and directionwhen performing this step. The ray proceeds using delta trackinguntil a next collision event is determined, and recurrently performslearning-based path tracing (Fig. 2b). To determine the length of . Leonard, K. Höhlein, and R. Westermann 𝑥 = 0,0,0 𝜔 𝑖𝑛 𝑥 𝜎 𝑡 ,𝜑,𝑔 𝜔 𝑜𝑢𝑡 𝜎 𝑡 ,𝜑,𝑔 𝜎 𝑡 ′ 𝜎 𝑡′′ 𝜎 𝑡 ′′′ a) b) Figure 2: Method overview. a) Given optical properties of thevolumetric medium ( 𝜎 𝑡 , 𝜑, 𝑔 ), the path (gray dotted lines) start-ing at 𝑥 = ( , , ) from the incoming direction 𝑤 = 𝑤 in is by-passed by letting a network infer directly (blue) the random out-going position 𝑥 and direction 𝑤 out in an unitary radius sphere.b) After free-travel events (black arrows), from a signed dis-tance field the maximum sphere not intersecting the geometry iscomputed, and the radius is used to control the adaptive densityand the length of the next step along a light path. This processis repeatedly applied until the path leaves the volume. the step, a 3D signed distance function is computed for the currentgeometry in a pre-process. For this, the object’s bounding volumeis discretized using a 3D voxel grid, and for each voxel a conserva-tive shortest Chebyshev distance to the object’s surface is computed[6]. This representation enables to quickly find the radius of thesphere that can be safely traversed without intersecting the geom-etry. As the ray approaches the edges, it is forced to take smallerand smaller steps and eventually the tracking will pass across theboundary surface. To efficiently trace against a polygonal boundarysurface, we employ a GPU ray-tracer using NVIDIA’s RTX raytracing interface [39] through the DirectX Raytracing API (DXR). Learning is conducted in an offline process. We assume homoge-neous extinction 𝜎 t and scattering albedo 𝜑 (with 𝜎 t = 𝜎 a + 𝜎 s consid-ering the rate of photons getting absorbed 𝜎 a / 𝜎 t or scattered into otherdirections 𝜑 = 𝜎 s / 𝜎 t due to light-particle interaction). Anisotropy inscattering is modelled by using the Henyey-Greenstein [16] phasefunction, 𝜌 𝑔 ( 𝜔 ′ , 𝜔 ) = 𝜋 − 𝑔 ( + 𝑔 − 𝑔 cos ( 𝜔, 𝜔 ′ )) / (1)with cos (· , ·) indicating the cosine between the respective directionsand anisotropy parameter 𝑔 , which equals the expected cosine be-tween ray direction before and after scattering. For instance, 𝑔 > results in a strong forward lobe of scattering into the incoming lightdirection. In this setting, the radiative transport [4] can be describedin terms of the volume rendering equation [20], 𝐿 ( 𝑥, 𝜔 ) = ∫ 𝑑 𝑧 𝑡 = 𝑇 ( 𝑥, 𝑥 𝑡 ) [ 𝜎 a 𝐿 e ( 𝑥 𝑡 , 𝜔 ) + 𝜎 s 𝐿 s ( 𝑥 𝑡 , 𝜔 )] d 𝑡 + 𝑇 ( 𝑥, 𝑧 ) 𝐿 d ( 𝑧, 𝜔 ) ,with 𝑥 𝑡 = 𝑥 − 𝜔 𝑡 and 𝑧 = 𝑥 𝑑 𝑧 . 𝐿 ( 𝑥, 𝜔 ) describes the incoming radi-ance at position 𝑥 from direction direction 𝜔 , 𝐿 e ( 𝑥, 𝜔 ) , 𝐿 s ( 𝑥, 𝜔 ) and 𝐿 d ( 𝑥, 𝜔 ) denote emitted, scattered and direct radiance, and 𝑇 ( 𝑥, 𝑦 ) is the transmittance along the ray (Beer-Lambert law [29]), with 𝑇 ( 𝑥, 𝑥 𝑡 ) = exp (− 𝜎 t 𝑡 ) in the case of homogeneous medium. Theimpact of the phase function is considered in 𝐿 s ( 𝑥 𝑡 , 𝜔 ) , which canbe rewritten as 𝐿 s ( 𝑥 𝑡 , 𝜔 ) = ∫ Ω 𝜌 𝑔 ( 𝜔, 𝜔 ′ ) 𝐿 i ( 𝑥 𝑡 , 𝜔 ′ ) d 𝜔 ′ ,where 𝐿 i ( 𝑥 𝑡 , 𝜔 ′ ) denotes the directional incoming radiance at 𝑥 𝑡 , andthe integral runs over the full solid angle Ω . For a detailed discussionof the derivation of the rendering equation, we refer the reader to[41].For training our model, we assume that rays start in the center of asphere of unitary radius, and terminate when they cross the boundaryof the sphere for the first time, after experiencing a sequence ofscattering events in between. By assumption, the incident directionof the rays in the sphere center is set to 𝜔 CVAEin = ( , , ) for allrays, and the first scattering event occurs directly in the center of thesphere. For each ray, Monte Carlo path tracing is used as a groundtruth renderer to simulate radiative transport through scattering me-dia [28, 36]. The distance between subsequent scattering events isestimated as 𝑙 = − log ( − 𝜉 )/ 𝜎 t , wherein 𝜉 is a uniformly distributedrandom number between and . I.e., the mean free path is equalto the reciprocal of the extinction. At every scattering event, it isdecided whether the path continues or terminates due to absorptionusing importance-based Russian roulette [8]. In case of continuation,the scattering direction is computed by importance sampling theHenyey-Greenstein phase function.Within our model, we ignore the details of the full path trace.Instead, we assume that – in the simplest case of a standard pathtracer – the only states which affect the outcome of the rendering arethe number 𝑁 of scattering events affecting a ray inside the sphere,determining the likelihood of absorption 𝐴 (ray terminates due toabsorption, 𝐴 = , or not, 𝐴 = ), as well as the exit position 𝑥 onthe shell of the spherical volume and the outgoing direction 𝑤 ofthe ray. To further support the use of the proposed method in themore general context of path tracing, we include the opportunity tosample additional ray statistics, which we summarize as Σ .Following up these considerations, the likelihood of sampling aparticular ray path with summary statistics ( 𝑁, 𝐴, 𝑥, 𝜔, Σ ) is deter-mined by a conditional probability density function (PDF) of theform 𝑝 ( 𝑁, 𝐴, 𝑥, 𝜔, Σ | 𝜎 𝑡 , 𝑔, 𝜑 ) , wherein quantities to the right of thevertical bar affect the PDF only as conditions. Empowered by fast progress in machine learning, in particular deeplearning, a variety of approaches have been proposed to efficientlylearn complex statistical distributions. These include variational auto-encoders [22], generative adversarial networks [11] or invertibleneural networks [1], which all can be extended to learn conditionaldistributions, see e.g. [2, 32, 45]. Within this study, we proposeto train conditional variational auto-encoders (CVAEs) to learn arepresentation of 𝑝 ( 𝑁, 𝐴, 𝑥, 𝜔, Σ | 𝜎 𝑡 , 𝑔, 𝜑 ) , which offer a preferabletrade-off between computational model complexity, model flexibilityand training stability.In general, a CVAE defines a directed graphical model, whichlearns to emulate complex conditional data distributions, 𝑝 X ( 𝑥 | 𝑐 ) , earning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects – (P REPRINT ) with data 𝑥 ∈ X and condition 𝑐 ∈ C , by establishing a mappingbetween the original data space X and a latent feature space Z .In this feature space, the data samples follow a prior distribution, 𝑝 Z ( 𝑧 | 𝑐 ) , 𝑧 ∈ Z , from which samples can be drawn more easily,such as a multi-variate Gaussian distribution N ( , I Z ) with zero-mean and diagonal unit variance. The distributions are related via 𝑝 X ( 𝑥 | 𝑐 ) = ∫ Z 𝑝 Dec ( 𝑥 | 𝑧, 𝑐 ) 𝑝 Z ( 𝑧 | 𝑐 ) d 𝑧 , 𝑝 Z ( 𝑧 | 𝑐 ) = ∫ X 𝑝 Enc ( 𝑧 | 𝑥, 𝑐 ) 𝑝 X ( 𝑥 | 𝑐 ) d 𝑥 ,which can be combined to yield a self-supervised learning model,similar to the standard auto-encoder. Therein 𝑝 Enc and 𝑝 Dec denoteprobabilistic encoding and decoding mappings between data andlatent space. Exploiting these relations, the CVAE defines a pairof neural network models, the encoder

Enc ( 𝑥, 𝑐 ) and the decoder Dec ( 𝑧, 𝑐 ) , which learn to parameterize the encoding and decodingPDFs 𝑝 Enc and 𝑝 Dec . While the data-space posterior 𝑝 Dec is typi-cally considered to reflect the true mapping between Z and X , thelatent-space posterior 𝑝 Enc can only be treated approximately, sincethe marginalization required to derive the exact latent-space poste-rior is computationally unfeasible. Within our study, we assume aconditional Gaussian shape for both distributions, i.e., 𝑞 Enc ( 𝑧 | 𝑥, 𝑐 ) = N ( 𝜇 Enc ( 𝑥, 𝑐 ) , Σ Enc ( 𝑥, 𝑐 )) , 𝑝 Dec ( 𝑥 | 𝑧, 𝑐 ) = N ( 𝜇 Dec ( 𝑧, 𝑐 ) , Σ Dec ( 𝑧, 𝑐 )) ,wherein 𝑞 Enc ( 𝑧 | 𝑥, 𝑐 ) refers to the parametric approximation of 𝑝 Enc ( 𝑧 | 𝑥, 𝑐 ) . Furthermore, 𝜇 Enc ( 𝑥, 𝑐 ) and Σ Enc ( 𝑥, 𝑐 ) , as well as 𝜇 Dec ( 𝑧, 𝑐 ) and Σ Dec ( 𝑧, 𝑐 ) are outputs of the encoder model Enc ( 𝑥, 𝑐 ) ,and decoder model Dec ( 𝑧, 𝑐 ) , respectively, with 𝜇 Enc ( 𝑥, 𝑐 ) ∈ Z , 𝜇 Dec ( 𝑧, 𝑐 ) ∈ X . Σ Enc and Σ Dec are diagonal covariance matrices,consistent with the dimensions of Z and X , respectively. At trainingtime, encoder and decoder models are trained jointly to maximizethe so-called evidence lower bound L = − KL ( 𝑞 Enc ( 𝑧 | 𝑥, 𝑐 ) || N ( , I )) + ⟨ log 𝑝 Dec ( 𝑥 | 𝑧, 𝑐 )⟩ 𝑞 Enc ,which constitutes a lower bound for the observed data log-likelihood [22].The first term refers to Kullback-Leibler divergence [24] between thelearned latent-space posterior and the latent-space prior, and penal-izes a mismatch between both distributions. The second term denotesthe expected log-likelihood of the learned data posterior, where theexpectation is taken with respect to the latent-space posterior, andpenalizes the reconstruction error of the model. At inference time,the encoder model is discarded and a latent-space sample 𝑧 is drawnfrom the latent-space prior distribution N ( , I Z ) . Conditioned onthis sample, the decoder model is evaluated and a data-space sampleis obtained as 𝑥 = √︁ Σ Dec ( 𝑧, 𝑐 ) 𝜖 + 𝜇 Dec ( 𝑧, 𝑐 ) ,where 𝜖 is another Gaussian random variable, drawn from N ( , I X ) .This so-called reparameterization trick [22] assures that 𝑥 ∼ N ( 𝜇 Dec , Σ Dec ) .A sketch of the training and inference processes in the current settingis shown in Figure 3a.In our work, we realize CVAE models via pairs of neural net-works, encoder and decoder, respectively. During rendering, only thedecoder is evaluated and, thus, needs to be kept as simple as possible. Training: xc 𝜇𝛴 𝜀

Encoder zc 𝜇𝛴

Decoder x Inference: z ∼ 𝒩(0, I) c 𝜇𝛴 𝜀

Decoder xz ℒ (a) Working principle of CVAE.

Encoder:Decoder: P �� P �� WD LLxc 𝛴 P �� L WDc 𝜇𝛴 z P �� P �� 𝜇 (b) Model realization in this work. Figure 3: (a) Working principle of the CVAE. Parameters 𝜖 are random variables to evaluate the Gaussian posterior dis-tributions, parametrized by mean 𝜇 and covariance Σ of therespective model. (b) Configuration of encoder (blue) and de-coder (red) networks for CVAEs. Both models are determinedthrough the number of input parameters, 𝑃 in (dimension of theconditioning variables, 𝑐 ), the number of output parameters, 𝑃 out (dimension of the variables to be predicted, 𝑥 ), the latentspace dimension 𝐿 , the depth of the networks 𝐷 , and the numberof nodes per layer 𝑊 . Encoder and decoder share the same set-tings of 𝐷 , 𝑊 and 𝐿 , and output estimates for mean 𝜇 Enc/Dec anddiagonal covariance matrix Σ Enc/Dec of the respectively mod-eled distributions.

All our models follow the scheme shown in Figure 3b and are deter-mined through the number of input parameters 𝑃 in , the number ofoutput parameters, 𝑃 out , the latent space dimension 𝐿 , the number oflayers, i.e., the depth of the networks 𝐷 , and the number of nodes perlayer 𝑊 . Both encoder and decoder networks use Softplus activationfunctions [54], Softplus ( 𝑥 ) = log ( + exp ( 𝑥 )) ,between hidden layers. The outputs of the final layers, mean 𝜇 Enc/Dec and diagonal covariance matrix Σ Enc/Dec for the respective param-eters, are not transformed by non-linear activation functions. Notehowever, that for standard deviations we found it useful to predict log ( Σ Enc/Dec ) instead of Σ Enc/Dec directly. Both encoder and de-coder networks share the same settings for 𝐷 and 𝑊 . For a givenset of input and target parameters, we examined different settingsfor 𝐷 , 𝑊 and 𝐿 , and empirically selected those parameters resultingin the most successful models. Parameter selection was guided byconsiderations with respect to computational efficiency. Since matrixmultiplications on the GPU are carried out in matrix sections of size × , we adapted 𝐷 , 𝑊 and 𝐿 so that especially the decoder modelis able to efficiently utilize the parallelization capabilities on theGPU. In particular, we considered multiples of four for 𝑊 -values,to speed-up matrix multiplications between hidden layers, and se-lected 𝐿 so that the input size of the decoder, 𝑃 in + 𝐿 , is a multiple offour. Concerning 𝐷 , we considered small values, as deeper modelsrequire longer computation time due to the sequential evaluationof subsequent layers. The models were trained using an AdamWoptimizer [30], implemented in the PyTorch interface in Python. . Leonard, K. Höhlein, and R. Westermann Once the training was completed, the models were re-implementedin a shader-viable form, using mainly matrix multiplications and thenonlinear activation functions.Recall now that the likelihood of sampling a particular ray pathwith summary statistics ( 𝑁, 𝐴, 𝑥, 𝜔, Σ ) is determined by a conditionalPDF of the form 𝑝 ( 𝑁, 𝐴, 𝑥, 𝜔, Σ | 𝜎 𝑡 , 𝑔, 𝜑 ) . To enable sequential modelevaluation and minimize the need for modeling complex correla-tion structures between random variables, we decompose the PDFaccording to 𝑝 ( 𝑁, 𝐴, 𝑥, 𝜔, Σ | 𝜎 t , 𝑔, 𝜑 ) = 𝑝 𝑁 ( 𝑁 | 𝜎 t , 𝑔 ) (2) × 𝑝 𝐴 ( 𝐴 | 𝑁, 𝜑 )× 𝑝 𝑥,𝜔 ( 𝑥, 𝜔 | 𝑁, 𝐴, 𝜎 t , 𝑔 )× 𝑝 Σ ( Σ | 𝑁, 𝐴, 𝑥, 𝜔, 𝜎 t , 𝑔, 𝜑 ) ,and decompose the full model into three separate CVAEs, whichwe will call L ENGTH G EN , P ATH G EN and E VENT G EN (Figure 4),one for each of the distributions 𝑝 𝑁 , 𝑝 𝑥,𝑤 and 𝑝 Σ . At inferencetime, the models are evaluated sequentially. The first model theninfers the number 𝑁 of scattering events on the path. The PDF 𝑝 𝑁 ,thereby depends only on 𝜎 t and 𝑔 , because we suppose in this stepthat a full path is observed, which starts in the center of the sphereand terminates when crossing the sphere boundary without beingabsorbed in between. This approach simplifies the generation oftraining data, especially in scenarios with significant absorption.Conditioned on 𝑁 and 𝜑 , the probability of absorption 𝐴 along thebypassed path follows a Bernoulli distribution 𝑝 𝐴 ( 𝐴 | 𝑁, 𝜑 ) = 𝑞 ( 𝑁, 𝜑 ) 𝐴 ( − 𝑞 ( 𝑁, 𝜑 )) − 𝐴 ,with 𝑞 ( 𝑁, 𝜑 ) = − 𝜑 𝑁 . This admits simple sampling, so that trainingof a CVAE is not required. At this stage, a path can be rejected, ifsampling yields 𝐴 = . In this case, no further models have to beevaluated, increasing computational efficiency.If the path is accepted, the second CVAE infers the positionon the sphere surface where the ray exits the volume as well asthe direction into which it continues to be propagated (Figure 2a).Note here that 𝑝 𝑥,𝜔 is independent of 𝜑 , which can be assumedsince the absorption process is treated separately, before 𝑥 and 𝜔 are sampled, and 𝑥 and 𝜔 are conditioned on the outcome of theabsorption request 𝐴 . If applicable, the third model finally resolvesthe summary statistics Σ . Since in general no dependencies can beexcluded, 𝑝 Σ may depend on all previously sampled variables, aswell as on all material parameters. An application of these statisticswill be discussed in Section 4.3, where Σ = ( 𝑋,𝑊 ) refers to arepresentative sample of all scattering events along the bypassedpath inside the sphere. In this setting, 𝑋 denotes the position of therepresentative scattering event and 𝑊 the in-going direction of thephoton path at 𝑋 . While training of the CVAEs can be performed in normalized con-ditions, i.e. in a unit sphere with all rays coming from the samedirection, the parameters have to be adapted to the local frame ofreference of the current ray at render time. Also, the radius of thesphere has to be adapted to the geometry of the scene setting. Toaccount for this, the output parameters of the models, i.e., 𝑥 , 𝜔 , 𝑋 g 𝜑𝜎 � z 𝜀 xN 𝛴 X 𝜔 Abs? Reject

N z 𝜀 z 𝜀 PathGenEventGen

A=0 A=1

LengthGen

Figure 4: Overview of the overall model architecture, consist-ing of three trainable CVAE models, L

ENGTH G EN , P ATH G EN ,E VENT G EN , and an absorption request. Variables 𝑧 and 𝜖 referto random numbers, which are used for sampling the generativemodels. and 𝑊 , have to be parameterized in a way that is invariant to rota-tions of the spatial coordinates and changes of the sphere radius. Toaccount for this, we propose a parameterization scheme as shown inFigure 5. 𝜔 Ƹ𝑒 𝑡 Ƹ𝑒 𝑏 Ƹ𝑒 𝑛 𝑥𝜔 in 𝛽 𝛼𝑟 sphere cos 𝜃 𝜃 Figure 5: Path parameterization. Due to rotation symmetry ofthe phase function, it is useful to parameterize exit position 𝑥 and outgoing direction 𝜔 of a ray invariant with respect to rota-tions around 𝜔 in . All exit positions on a circle of constant devia-tion angle 𝜃 with respect to 𝜔 in (dotted circle) are equally likely.Exit position 𝑥 of a path is therefore parameterized through cos ( 𝜃 ) . Coordinates of the outgoing direction 𝜔 are given in alocal reference frame around 𝑥 , with normal vector ˆ 𝑒 n , binor-mal ˆ 𝑒 b and tangent ˆ 𝑒 t , yielding coordinates 𝛼 (projection on ˆ 𝑒 b )and 𝛽 (projection on ˆ 𝑒 t ). After sampling 𝑁 and 𝐴 , the outgoing position 𝑥 on a sphere shellof radius 𝑟 sphere is parameterized in spherical coordinates through atuple ( 𝑟 sphere , 𝜃,𝜓 ) , where cos ( 𝜃 ) = cos ( 𝜔 in , 𝑥 ) . Due to invarianceof the distribution 𝑝 𝑥,𝜔 with respect to rotations around 𝜔 in , 𝜓 can besampled as a random number from a uniform distribution between earning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects – (P REPRINT ) and 𝜋 . This invariance is inherited from the rotation invariance ofthe Henyey-Greenstein phase function 𝜌 𝑔 ( 𝜔, 𝜔 ′ ) and can be used torestrict the learning-space of the CVAE to only the 𝜃 -coordinate. Therotation invariance furthermore simplifies the matching between real-space coordinates and CVAE reference system, since the orientationof the basis vectors orthogonal to 𝜔 in can be chosen arbitrarily, aslong as the CVAE-reference direction 𝜔 CVAEin = ( , , ) is mappedappropriately to the true 𝜔 in in real-space coordinates. Given asuitable rotation matrix 𝑅 CVAE , the direction 𝑥 can be sampled as 𝑥 = 𝑟 sphere 𝑅 CVAE 𝑅 ( , , ) 𝜓 ˜ 𝑥 , (3)where ˜ 𝑥 = ( sin ( 𝜃 ) , , cos ( 𝜃 )) , and 𝑅 ( , , ) 𝜓 denotes the rotation ma-trix corresponding to the random rotation of angle 𝜓 around theCVAE-reference direction 𝜔 CVAEin = ( , , ) .Given 𝑥 , a local frame of reference can be obtained in 𝑥 byconsidering the coordinate system ˆ 𝑒 n = ˆ 𝑥 , ˆ 𝑒 b = 𝜔 in × ˆ 𝑥 , ˆ 𝑒 t = ˆ 𝑒 b × ˆ 𝑒 n ,with ˆ 𝑥 = 𝑥 / 𝑟 sphere , and ˆ 𝑒 n , ˆ 𝑒 b and ˆ 𝑒 t referring to normal, binormaland tangent vectors of the frame of reference around 𝑥 . The operator × denotes the cross product in R . Within this system, the out-goingdirection 𝜔 can be parameterized through a tuple ( 𝛼, 𝛽 ) , wherein 𝛼 = cos ( 𝜔, ˆ 𝑒 b ) and 𝛽 = cos ( 𝜔, ˆ 𝑒 t ) . Again exploiting rotation invari-ance, 𝜔 can be sampled in the CVAE-reference system and can betransformed to real-space by applying a transformation similar tothat in Equation 3.The treatment of summary statistics Σ has to be considered sepa-rately and depending on which quantities are required. For the caseof Σ = ( 𝑋,𝑊 ) , with 𝑋 and 𝑊 describing position and direction ofa representative scattering event, a symmetry-guided restriction ofsample space is not admissible since the statistics may depend on 𝑥 and 𝜔 in a complex fashion. Therefore, 𝑋 and 𝑊 are sampled asreal-valued 3D vectors in the CVAE system of reference with 𝑊 being normalized subsequently. Both quantities are then transformedto real-space coordinates by applying a suitable rotation matrix andscaling.To account for varying radii 𝑟 sphere in the model setting, theextinction coefficient of the medium is rescaled according to 𝜎 t → 𝜎 t 𝑟 sphere before applying the models. This is justified because ofhomogeneity of the medium and independence of scattering eventstherein. All other variables are unaffected by the transformationbetween model- and real-space coordinates. Building upon the above considerations, the statistical model de-scribed above can be decomposed into a small number of shaderfunctions, which accept input parameters according to the proba-bilities defined in Equation (2) and sample output parameters asdiscussed in Section 4.1. Three of the shader functions involveCVAE-based sub-steps and can be summarized follows: • Sampling of the number of scattering events:L

ENGTH G EN ( 𝜎 𝑡 , 𝑔, 𝑧 ( 𝐿 L ) , 𝜖 ( ) ) → 𝑁 . • Sampling of outgoing position and direction:P

ATH G EN ( 𝜎 𝑡 , 𝑔, 𝑁, 𝑧 ( 𝐿 P ) , 𝜖 ( ) ) → ( 𝜃, 𝛽, 𝛼 ) . • Sampling the representative scattering event:E

VENT G EN ( 𝜎 t , 𝑔, 𝜑, 𝜃, 𝛼, 𝛽, 𝑁, 𝑧 ( 𝐿 E ) , 𝜖 ( ) ) → ( 𝑋,𝑊 ) .Therein, 𝑧 ( 𝐿 ) ∼ N ( , I 𝐿 ) denotes the Gaussian random variablesused for sampling the CVAE priors, respectively, and 𝜖 ( 𝑃 o 𝑢𝑡 ) ∼N ( , I 𝑃 out ) denote Gaussian random variables used for samplingthe decoder posterior, as discussed in Section 3.2. The settings ofhyper parameters 𝐷 , 𝑊 and 𝐿 used for setting up the CVAEs aresummarized in Table 1. Table 1: Examined and final parameter settings for the CVAEmodels L

ENGTH G EN , P ATH G EN and E VENT G EN . Final pa-rameter settings are highlighted. 𝑃 in 𝑃 out 𝐷 𝑊 𝐿 L ENGTH G EN ATH G EN VENT G EN In the following, the inclusion of next-event estimation for in-scatteringfrom direct illumination [5, 36] is described as a concrete applica-tion of the third model, E

VENT G EN . In standard path tracing, thisinformation is either neglected during the internal scattering processand considered only at the position where the path exits the volume,or the in-scattering is considered at every internal sampling point,e.g., by using pre-computed photon maps [37] or direct illuminationusing next-event estimation approaches [5, 13, 23]. Since at rendertime, the samples at which internal scattering occurs are not avail-able any more, we set in this case Σ = ( 𝑋,𝑊 ) , wherein 𝑋 indicatesthe position of the scattering event in the interior of the sphere and 𝑊 the direction of the ray arriving to 𝑋 . At training time, ( 𝑋,𝑊 ) issampled from the set of all scattering events on the path through thesphere with weights according to the contribution of the in-scatteringevents to the luminosity of the ray.For example, the direct light contribution from a directional lightsource at position 𝑥 L with power Φ along a path inside a translucentmedium can be considered at every internal sampling point of theray path. Characterizing the ray through the occurring scatteringevents, ( 𝑥 , 𝜔 ) , . . . , ( 𝑥 𝑁 , 𝜔 𝑁 ) , the total in-scattering 𝐿 due to directillumination along the path is obtained as 𝐿 = Φ 𝑛 ∑︁ 𝑖 = 𝜑 𝑖 𝜌 𝑔 ( 𝜔 𝑖 , 𝜔 𝑥 L → 𝑥 𝑖 ) 𝛾 ( 𝑥 L → 𝑥 𝑖 ) . (4)Here, 𝜔 𝑥 L → 𝑥 𝑖 indicates the direction of the incoming direct illumi-nation at position 𝑥 𝑖 and 𝛾 ( 𝑥 L → 𝑥 𝑖 ) denotes the extinction of thelight when traveling from the light source through the volume to thescattering position.Since in path tracing typically the expectation value ⟨ 𝐿 ⟩ over manypaths is considered, our approach is to approximate 𝐿 in Monte Carlofashion by choosing one single representative sample ( 𝑋,𝑊 ) perpath and considering the in-scattering from the direct light sourceonly for this scattering event. A sample ( 𝑋,𝑊 ) , thereby, is chosen tobe the 𝑘 -th scattering event in a path {( 𝑥 𝑖 , 𝜔 𝑖 )} 𝑁𝑖 = with probability . Leonard, K. Höhlein, and R. Westermann 𝜑 𝑘 / Λ ( 𝑁 ) , where Λ ( 𝑁 ) = (cid:205) 𝑁𝑖 = 𝜑 𝑖 . This approximation does not alter ⟨ 𝐿 ⟩ if a sufficient number of rays is considered in the average.The crucial step in applying this approximation lies in efficientlycomputing the incoming direction of the direct light, 𝜔 𝑥 L → 𝑋 , aswell as the attenuation coefficient 𝛾 ( 𝑥 L → 𝑋 ) for all positions 𝑋 . Inabsence of refractive boundaries, the expressions simplify to 𝜔 𝑥 L → 𝑋 = 𝑥 L − 𝑋 ∥ 𝑥 𝐿 − 𝑋 ∥ and 𝛾 ( 𝑥 L → 𝑋 ) = exp (− 𝑑 𝑡 ( 𝑥 L → 𝑋 ) 𝜎 𝑡 ) ,with ∥·∥ indicating the standard Euclidean vector norm and 𝑑 𝑡 ( 𝑥 L → 𝑋 ) referring to the distance along the direct path between 𝑥 L and 𝑋 ,during which the light travels through the medium. In the more gen-eral case of refractive medium boundaries being present, however,alternative computation methods have to be employed, which areable to account for refraction appropriately. A number of methodshave been proposed to tackle the problem, including [18, 23, 49].However, the choice of a specific method does not affect applicabil-ity and performance gain of the path-tracing acceleration methodproposed here. To demonstrate the validity of our approach, we conducted numericalexperiments regarding fitting accuracy of the model predictions withrespect to the real distributions, visual accuracy of the renderingresults including and excluding the in-scattered direct illumination,as well as performance analysis of the method in several scenarios.Note here that for all scenarios the same model configuration wasused, i.e., no models were retrained to specialize for particular tasks.Training of the models was conducted on . × ray sampleswith material parameters sampled from the ranges 𝜎 t ∈ [ , ] and 𝑔 ∈ [− , ] to guarantee an appropriate coverage of materialparameters relevant for our study. For all our experiments we assumemodel height to be normalized to m. Figure 6 shows a direct comparison between renderings computedwith our proposed sphere-tracing approach and with a brute forcepath tracer. In-scattered direct illumination is not yet included atthis stage, so only L

ENGTH G EN and P ATH G EN are employed forrendering. To illustrate the capabilities and limitations of our method,we compare the same geometric model in two density configurations(high density and low density) and under different lighting conditions(light coming from a single light source in front of and behindthe object). Since both approaches perform ray tracing and, thus,compute the same quantities, a fair comparison is achieved by fixingthe number of rays traced per pixel (5000 in the current setting) andcomparing the results and rendering times.Figure 6 reveals hardly any perceptual differences between ourapproach and brute force path tracing. Quantitatively, our renderingsachieve a root mean square error to the path-traced ground truthbetween . and . and, thus, come very close to the groundtruth. Especially the low-density rendering under backside lightingdemonstrates that our approach is well able to generate translucencyeffects in thin material parts, which closely resemble the effectsobserved in the brute-force rendering. In the low-density setting in Figure 6, our approach achieves a speed-up of roughly . x ( s for sphere tracing vs. s for path tracing)compared to brute force path tracing. The speed-up increases withmaterial density, so that in the high-density case our approach isalready . x faster than path tracing ( s for sphere tracing vs. s for path tracing). This trend continues when the optical densityof the material is further increased. As seen in the teaser (Figure 1),speed-ups of roughly . x or more can be achieved for sufficientlydense materials.Given the increase in relative performance gain of our methodwith respect to standard path tracing, we further shed light on the dif-ferent factors causing this behaviour. Therefore, we rendered differ-ent geometric objects with varying parameter settings and examinedthe number of scattering events per path as well as the rendering per-formance depending on object geometry and optical material density.Figure 7 visually encodes the number of scattering events occurringper path for different geometries. The corresponding rendering timesof our approach and brute force path tracing are shown in Figure 8.The Stanford bunny in the top row of Figure 7 is characterizedby a compact shape, with many convex parts. As a result, lightcan penetrate deeply into the body, yielding long light paths with alarge number of scattering events. As to be expected, the number ofscattering events per path increases strongly with increasing opticaldensity in brute force path tracing (left). Our method, in contrast, cantake advantage of the voluminous convex shape to construct largespheres, which efficiently propagate the rays through the medium.Even though also in our approach the number of scattering eventsincreases with increasing optical density, the rate is much lower.While in the case of the highest density the brute force path tracer isrequired to simulate up to scattering events per path, our methodpropagates the rays in less than 50 steps. The need for sequentialpath evaluations, which cannot be parallelized on the GPU, is thusreduced by a factor of up to x, and drastically reduces the amountof time that is spent waiting for long sequences of scattering eventsto converge. The rendering times in Figure 8 confirm these findings.Notably, compared to our approach the time increases at a higherrate when using brute force path tracing. In the case of highestoptical density, a rendering time for a per-pixel path-trace of about ms stands against a rendering time of more than ms, yieldinga speed-up by a factor of more than x. The difference betweenthe reduction in scattering events, x, and the final speed-up canbe attributed to the increasing cost of evaluating the neural networkmodels, compared to the few operations required in a standard pathtracing.When rendering the statue model in the bottom row of Figure 7,however, some limitations of our approach become paramount. Sincethe statue possesses many thin features, which do not form a volu-minous material body, the sphere tracing approach is unable to buildlarge spheres and propagate rays efficiently. Especially in the case oflow-densities, our approach requires only slightly less model evalua-tions than the brute force method. In combination with comparativelycostly model evaluations in our approach, the performance gain be-comes very low, yielding a speed-up of less than x (see Figure 8).Nevertheless, for larger material densities the number of scatteringevents per path increases much stronger in the brute force method earning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects – (P REPRINT ) PT381 s ST123 s PT381 s ST123 s ST73 s ST73 sPT114 s PT114 s

Figure 6: Comparison between path tracing (PT) and our sphere tracing approach (ST). From left to right: High-density materialunder front lighting and back-lighting conditions ( 𝜎 t = ( . mm − , . mm − , . mm − ) and 𝜑 = ( . , . , . ) forchannels ( red , green , blue ) ), and low-density material ( 𝜎 t = ( . mm − , . mm − , . mm − ) ) under front-lighting and back-lighting conditions. All images were rendered using 5000 paths per pixel. Rendering times are shown in the images. Figure 7: Number of scattering events per path for different geometries and densities. Standard path tracing (left) vs our approach(right). Densities increase from left to right. than in our approach, so that again a significant performance gaincan be achieved, i.e. roughly x at the highest density.From these considerations, we conclude that our method canbe employed at high efficiency if standard path tracing leads toexceedingly long scattering paths. Thus, another factor limiting effi-ciency of our approach is absorption, which has not been consideredstrongly in previous experiments. Figure 9 explores this effect indetail. For this experiment, we rendered a geometric object with con-stant material parameters and under fixed lighting conditions, while varying the absorption rate of the medium ( − 𝜑 ). From the figure itis obvious that higher absorption leads to shorter ray paths and thusshortens the rendering time. At higher absorption, the advantage ofour approach decreases. To evaluate the accuracy of plain model predictions, we conduct thefollowing experiment. For a predefined set of material parameters, 𝜎 t ∈ { , , , } , 𝑔 ∈ {− . , . , . , . } , 𝜑 = , we use standard . Leonard, K. Höhlein, and R. Westermann Densities102050100200500 T i m e ( m s ) Bunny PTBunny STLucy PTLucy ST

Figure 8: Time complexity of our approach (sphere tracing, ST)vs. brute-force path tracing (PT) for different geometries andoptical densities. (a) Geometric model, rendered with different absorption albedos, using brute forcepath tracing (left half images) and our approach (right half images). The initialabsorption albedo (left-most image) is ( × − , × − , × − ) and is doubledbetween subsequent images. T i m e ( m s ) Buddha PTBuddha ST (b) Rendering times for the model from Figure 9a as a function of the absorptionparameter.

Figure 9: Performance impact of the absorption parameter. path tracing to generate a distribution of ground-truth rays, startingin the center of a unit sphere in direction 𝜔 in = ( , , ) and evaluatethe ray statistics ( 𝑁, 𝜃, 𝛼, 𝛽, 𝑋,𝑊 ) for each of the rays. The resultingdistribution is then compared to the distribution, which is obtainedfrom sampling the respective CVAE models. For all experiments, wedraw samples per parameter setting ( 𝜎 t , 𝑔 ) , for both groundtruth and CVAE statitics.. t = t = t = g = 0.7 t = g =0.0 10 g =0.4 10 g =0.9 Figure 10: Distribution of number of samples 𝑁 from L ENGTH -G EN (orange) vs. ground truth (blue) for different settings of 𝜎 𝑡 and 𝑔 . The distribution of 𝑁 samples drawn from the L ENGTH G EN model is compared to the respective ground truth in Figure 10.Ground truth statistics are summarized in a log-scale histogram,shown in blue, and overlaid by an orange curve reflecting the cor-responding statistics obtained from sampling L ENGTH G EN . It canbe seen that for small values of 𝜎 t (top row), a large fraction of raysleaves the sphere already after the first scattering event. This resultsin a bipartite structure of the histogram, consisting of a pronouncedpeak of the histogram counts around 𝑁 = and a smooth tail indi-cating rays that are scattered more than once. L ENGTH G EN learnsto compensate for this and is able to reproduce the peak accurately,especially in the case of strongly anisotropic forward scattering, 𝑔 = . . Only in the case of backward scattering, 𝑔 = − . , theground truth is underestimated significantly. Nevertheless, the dis-tribution of scatter counts in the case of 𝑁 > is reproduced toreasonable accuracy in all cases. For larger values, the significanceof the peak at 𝑁 = decreases and the ground-truth statistics appearto converge to a unimodal shape, similar to a log-Gaussian. For largervalues of 𝜎 t , notably 𝜎 t = , , the matching of the distributionsis good, and also the dependence on 𝑔 is captured appropriately. Thehighest inaccuracy is observed at intermediate scattering coefficients, 𝜎 t = , where the multi-scattering contributions are dominant, butsingle scattering still has a notable contribution. There, it seemsdifficult for the model to predict the balance between both effects,especially as a function of 𝑔 . These observations can be used, how-ever, to bias the statistics of 𝜎 t and 𝑔 . By increasing the number ofsamples in these difficult regions a more accurate fit can be achieved.For the P ATH G EN model, three parameters need to be validatedagainst ground truth. The sampling statistics of cos ( 𝜃 ) are shownin Figure 11a. Again, the blue histogram refers to ground truthstatistics, whereas the orange line reflects the P ATH G EN predictions.Notably, strongly anisotropic scattering is observed only in caseswhere 𝜎 t is small or 𝑔 is close to extreme values − or . There isan overall good fit between learned and ground-truth distributions,only for strongly anisotropic cases, 𝑔 = . and 𝑔 = − . , the modelhas minor difficulties with correctly handling the importance ofsingle-scattering rays.To validate the distribution of 𝛼 and 𝛽 , we generate a contourmap, which shows the distribution function of 𝛼 and 𝛽 as a func-tion of cos ( 𝜃 ) . Two-dimensional histograms of the distribution of ( cos ( 𝜃 ) , 𝛼 ) and ( cos ( 𝜃 ) , 𝛽 ) are obtained and normalized by the num-ber of observations. The resulting bin probability values are finally earning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects – (P REPRINT ) t = t = t = g = 0.7 t = g =0.0 1.0 0.5 0.0 0.5 1.0 g =0.4 1.0 0.5 0.0 0.5 1.0 g =0.9 (a) Distribution of cos ( 𝜃 ) . t = t = t = g = 0.71.00.50.00.51.0 t = g =0.0 1 0 1 g =0.4 1 0 1 g =0.9 (b) Distribution of 𝛼 (vertical axis) asa function of cos ( 𝜃 ) (horizontal axis). t = t = t = g = 0.71.00.50.00.51.0 t = g =0.0 1 0 1 g =0.4 1 0 1 g =0.9 (c) Distribution of 𝛽 (vertical axis) asa function of cos ( 𝜃 ) (horizontal axis). Figure 11: Distribution of P

ATH G EN samples for cos ( 𝜃 ) , 𝛼 and 𝛽 (orange) vs. ground truth (blue) for different settings of 𝜎 𝑡 and 𝑔 . smoothed with a Gaussian filter to reduce visual clutter. The ground-truth distribution is shown as blue-filled contours, the learned distri-bution is overlaied as orange isocontours at probability iso-valuesmatching those of the ground truth. The comparison is shown in Fig-ures 11b and 11c. Similar to the distribution of cos ( 𝜃 ) , the greatestanisotropy is observed for highly anisotropic cases, i.e., 𝑔 = . or 𝑔 = − . , and small 𝜎 t . Nevertheless, the learned distributions fit theground truth to reasonable accuracy.We conclude that the major challenges for all models lie in learn-ing a reasonable PDF at small values of 𝜎 t and in correctly resolvingthe dependence of the PDF on 𝑔 . From a practical perspective, learn-ing results can be improved, however, by specializing the model attraining time towards those parameter regions, which are particularlyrelevant during upcoming rendering processes. By over-weightingray samples with particular ( 𝜎 t , 𝑔 ) -settings at training time, theseregions of parameter spaces are assigned a larger weight in theloss function and thus affect the outcome of the training processmore strongly. If, for example, only media with largely forward-directed scattering are considered during rendering, there is no needto include negative values of 𝑔 at training time. For our purposes,however, we did not make use of such optimizations. To also illustrate the use of the third CVAE model, E

VENT G EN ,we compare the result of rendering a geometric object with the fullsequence of three models, L ENGTH G EN , P ATH G EN and E VENT -G EN . For comparison purposes, we render the same image alsowith a standard path tracer, as well as with a brute-force path tracer with next-event estimation using direct illumination. To obtain afair comparison, the image was first rendered with the standard pathtracer, using 1000 ray samples per pixels and rendering time wasmeasured to be s. The remaining renderers finally were allowedto render the same geometry for the same amount of time. In thecurrent implementation, both the extended brute-force path traceras well as the sphere tracing model draw a shadow ray to a singlelight source in the scene setting to account for direct illumination.As such, they do not handle refractive object boundaries accuratelyand yield biased results in comparison with the standard path tracer.Nevertheless, Figure 12 illustrates nicely that our methods yieldsa lower variance than both other algorithms. This is a promisingresult and suggests that CVAEs can be employed successfully forlearning summary statistics of the bypassed path sections. Also, theperformance gain achieved is orthogonal to the problem of accu-rately treating refractive boundaries, since potential fixes, enablingaccurate inclusion of boundary effects, would affect our model andthe reference path tracer in identical ways. In this work we have introduced and analyzed a network pipelinecomprised of CVAEs that learns to bypass multiple-scattering pathsin anisotropic volumetric material. We have demonstrated that thispipeline can be embedded into a sphere-tracing approach to speed upthe performance of Monte Carlo path tracing. For different parametersettings and geometries, we have demonstrated numerical accuracyof the inferred results. We are particular intrigued about the visualquality of the rendering results compared to ground truth MonteCarlo path tracing, as well as the performance gain that can beachieved for objects with large volumes of material.On the other hand, we have seen that for thin objects where spheretracing cannot exploit its full potential, the performance gains areonly marginal. To minimize the overhead in computation time inlow-density or thin object parts, we will consider a case-sensitiveimplementation of the sphere-tracing algorithm in the future. Forshell tracing [33], a fall-back strategy to standard path tracing wasalready used to avoid sphere tracing in the limit of very small radii.Efficiently parallelizing CVAE evaluation and standard path tracingon GPU architectures concurrently, however, appears challenging.Furthermore, the proposed combination of deep learning and spheretracing opens the possibility to efficiently render volumetric ob-jects like clouds. In this context, we envision, in particular, theintegration of direct in-scattering to result in significantly improvedperformance.

REFERENCES [1] L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen,L. Maier-Hein, C. Rother, and U. Köthe. Analyzing inverse problems with invert-ible neural networks. arXiv preprint arXiv:1808.04730 , 2018.[2] L. Ardizzone, C. Lüth, J. Kruse, C. Rother, and U. Köthe. Guided image generationwith conditional invertible neural networks. arXiv preprint arXiv:1907.02392 ,2019.[3] E. Cerezo, F. Pérez, X. Pueyo, F. J. Seron, and F. X. Sillion. A survey on par-ticipating media rendering techniques.

The Visual Computer , 21(5):303–328,2005.[4] S. Chandrasekhar. Radiative transfer. clarendon, 1950.[5] R. R. Coveyou, V. Cain, and K. Yost. Adjoint and importance in monte carloapplication.

Nuclear Science and Engineering , 27(2):219–234, 1967.[6] L. Deakin and M. Knackstedt. Accelerated volume rendering with chebyshevdistance maps. In

SIGGRAPH Asia 2019 Technical Briefs , pages 25–28. 2019. . Leonard, K. Höhlein, and R. Westermann

PT PT-DI ST-DI129 s 129 s 129 s

Figure 12: Convergence improvement using next-event estimation. All images are rendered in s. Left: Brute-force path tracing(1000 rays per pixel), excluding next event estimation. Middle: Brute-force path tracing (400 rays per pixel), including next-eventestimation using in-scattered direct illumination at every internal sample. Right: Our approach (1465 rays per pixel), including next-event estimation using one representative sample ( 𝑋 , 𝑊 ) per sphere tracing step. [7] E. d’Eon and G. Irving. A quantized-diffusion model for rendering translucentmaterials. ACM transactions on graphics (TOG) , 30(4):1–14, 2011.[8] R. Eckhardt, S. Ulam, and J. Von Neumann. the monte carlo method.

Los AlamosScience , (15):131, 1987.[9] T. J. Farrell, M. S. Patterson, and B. Wilson. A diffusion theory model of spatiallyresolved, steady-state diffuse reflectance for the noninvasive determination oftissue optical properties in vivo.

Medical physics , 19(4):879–888, 1992.[10] J. R. Frisvad, T. Hachisuka, and T. K. Kjeldsen. Directional dipole model forsubsurface scattering.

ACM Transactions on Graphics (TOG) , 34(1):1–12, 2014.[11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, and Y. Bengio. Generative adversarial nets. In

Advances in neuralinformation processing systems , pages 2672–2680, 2014.[12] R. Habel, P. H. Christensen, and W. Jarosz. Photon beam diffusion: a hybrid montecarlo method for subsurface scattering. In

Computer Graphics Forum , volume 32,pages 27–37. Wiley Online Library, 2013.[13] J. Hanika, M. Droske, and L. Fascione. Manifold next event estimation. In

Computer graphics forum , volume 34, pages 87–97. Wiley Online Library, 2015.[14] P. Hanrahan and W. Krueger. Reflection from layered surfaces due to subsurfacescattering. In

Proceedings of the 20th Annual Conference on Computer Graphicsand Interactive Techniques , SIGGRAPH ’93, page 165–174, New York, NY, USA,1993. Association for Computing Machinery.[15] J. C. Hart. Sphere tracing: A geometric method for the antialiased ray tracing ofimplicit surfaces.

The Visual Computer , 12(10):527–545, 1996.[16] L. G. Henyey and J. L. Greenstein. Diffuse radiation in the galaxy.

The Astrophys-ical Journal , 93:70–83, 1941.[17] S. Herholz, O. Elek, J. Vorba, H. Lensch, and J. Kˇrivánek. Product importance sam-pling for light transport path guiding. In

Computer Graphics Forum , volume 35,pages 67–77. Wiley Online Library, 2016.[18] N. Holzschuch. Accurate computation of single scattering in participating mediawith refractive boundaries. In

Computer graphics forum , volume 34, pages 48–59.Wiley Online Library, 2015.[19] H. W. Jensen, S. R. Marschner, M. Levoy, and P. Hanrahan. A practical modelfor subsurface light transport. In

Proceedings of the 28th annual conference onComputer graphics and interactive techniques , pages 511–518, 2001.[20] J. T. Kajiya and B. P. Von Herzen. Ray tracing volume densities.

ACM SIGGRAPHcomputer graphics , 18(3):165–174, 1984.[21] S. Kallweit, T. Müller, B. Mcwilliams, M. Gross, and J. Novák. Deep scattering:Rendering atmospheric clouds with radiance-predicting neural networks.

ACMTransactions on Graphics (TOG) , 36(6):1–11, 2017.[22] D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprintarXiv:1312.6114 , 2013.[23] D. Koerner, J. Novák, P. Kutz, R. Habel, and W. Jarosz. Subdivision next-eventestimation for path-traced subsurface scattering. In

EGSR (EI&I) , pages 91–96,2016.[24] S. Kullback and R. A. Leibler. On information and sufficiency.

The annals ofmathematical statistics , 22(1):79–86, 1951.[25] P. Kutz, R. Habel, Y. K. Li, and J. Novák. Spectral and decomposition trackingfor rendering heterogeneous volumes.

ACM Trans. Graph. , 36(4), July 2017. [26] A. Kuznetsov, N. K. Kalantari, and R. Ramamoorthi. Deep adaptive sampling forlow sample count rendering. In

Computer Graphics Forum , volume 37, pages35–44. Wiley Online Library, 2018.[27] A. E. Kyprianou, A. Osojnik, and T. Shardlow. Unbiased ‘walk-on-spheres’ montecarlo methods for the fractional laplacian.

IMA Journal of Numerical Analysis ,38(3):1550–1578, 2018.[28] E. P. Lafortune and Y. D. Willems. Rendering participating media with bidirec-tional path tracing. In X. Pueyo and P. Schröder, editors,

Rendering Techniques’96 , pages 91–100, Vienna, 1996. Springer Vienna.[29] J. H. Lambert.

Photometria sive de mensura et gradibus luminis, colorum etumbrae . Klett, 1760.[30] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprintarXiv:1711.05101 , 2017.[31] M. Mara, M. McGuire, B. Bitterli, and W. Jarosz. An efficient denoising algorithmfor global illumination. In

Proceedings of High Performance Graphics , New York,NY, USA, jul 2017. ACM.[32] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprintarXiv:1411.1784 , 2014.[33] J. T. Moon, B. Walter, and S. R. Marschner. Rendering discrete random mediausing precomputed scattering solutions. In

Proceedings of the 18th Eurographicsconference on Rendering Techniques , pages 231–242, 2007.[34] M. E. Muller et al. Some continuous monte carlo methods for the dirichlet problem.

The Annals of Mathematical Statistics , 27(3):569–589, 1956.[35] O. Nalbach, E. Arabadzhiyska, D. Mehta, H.-P. Seidel, and T. Ritschel. Deepshading: convolutional neural networks for screen space shading. In

Computergraphics forum , volume 36, pages 65–78. Wiley Online Library, 2017.[36] J. Novák, I. Georgiev, J. Hanika, and W. Jarosz. Monte carlo methods for volumet-ric light transport simulation. In

Computer Graphics Forum , volume 37, pages551–576. Wiley Online Library, 2018.[37] J. Novák, D. Nowrouzezahrai, C. Dachsbacher, and W. Jarosz. Virtual ray lightsfor rendering scenes with participating media.

ACM Transactions on Graphics(TOG) , 31(4):1–11, 2012.[38] J. Novák, A. Selle, and W. Jarosz. Residual ratio tracking for estimating attenuationin participating media.

ACM Trans. Graph. , 33(6), Nov. 2014.[39] NVIDIA. Nvidia rtx ray tracing developer documentation. developer.nvidia.com/rtx/raytracing, 2018. [Online; accessed 21-March-2019].[40] V. Pegoraro and S. G. Parker. An analytical solution to single scattering inhomogeneous participating media. In

Computer Graphics Forum , volume 28,pages 329–335. Wiley Online Library, 2009.[41] M. Pharr, W. Jakob, and G. Humphreys.

Physically based rendering: From theoryto implementation . Morgan Kaufmann, 2016.[42] C. R. Alla Chaitanya, A. S. Kaplanyan, C. Schied, M. Salvi, A. Lefohn,D. Nowrouzezahrai, and T. Aila. Interactive reconstruction of monte carlo im-age sequences using a recurrent denoising autoencoder.

ACM Transactions onGraphics , 36:1–12, 07 2017.[43] J. Rehak, L. Kerby, M. DeHart, and R. Slaybaugh. Weighted delta-tracking inscattering media.

Nuclear Engineering and Design , 342:231–239, 2019. earning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects – (P

REPRINT ) [44] H. E. Rushmeier. Realistic image synthesis for scenes with radiatively participat-ing media.

PhD thesis, 1989.[45] K. Sohn, H. Lee, and X. Yan. Learning structured output representation usingdeep conditional generative models. In C. Cortes, N. D. Lawrence, D. D. Lee,M. Sugiyama, and R. Garnett, editors,

Advances in Neural Information ProcessingSystems 28 , pages 3483–3491. Curran Associates, Inc., 2015.[46] B. Sun, R. Ramamoorthi, S. G. Narasimhan, and S. K. Nayar. A practical analyticsingle scattering model for real time rendering.

ACM Transactions on Graphics(TOG) , 24(3):1040–1049, 2005.[47] D. Vicini, V. Koltun, and W. Jakob. A learned shape-adaptive subsurface scatteringmodel.

ACM Transactions on Graphics (TOG) , 38(4):1–15, 2019.[48] J. Vorba, O. Karlík, M. Šik, T. Ritschel, and J. Kˇrivánek. On-line learning ofparametric mixture models for light transport simulation.

ACM Transactions onGraphics (TOG) , 33(4):1–11, 2014.[49] B. Walter, S. Zhao, N. Holzschuch, and K. Bala. Single scattering in refractivemedia with triangle mesh boundaries. In

ACM SIGGRAPH 2009 papers , pages1–8. 2009. [50] S. Weiss, M. I¸sık, J. Thies, and R. Westermann. Learning adaptive sampling andreconstruction for volume visualization. arXiv preprint arXiv:2007.10093 , 20203.[51] E. Woodcock, T. Murphy, P. Hemmings, and S. Longworth. Techniques usedin the gem code for monte carlo neutronics calculations in reactors and othersystems of complex geometry. In

Proc. Conf. Applications of Computing Methodsto Reactor Problems , volume 557, 1965.[52] L.-Q. Yan, Y. Zhou, K. Xu, and R. Wang. Accurate translucent material renderingunder spherical gaussian lights. In

Computer Graphics Forum , volume 31, pages2267–2276. Wiley Online Library, 2012.[53] Y. Yue, K. Iwasaki, B.-Y. Chen, Y. Dobashi, and T. Nishita. Unbiased, adaptivestochastic sampling for rendering inhomogeneous participating media.

ACMTransactions on Graphics (TOG) , 29(6):1–8, 2010.[54] H. Zheng, Z. Yang, W. Liu, J. Liang, and Y. Li. Improving deep neural networksusing softplus units. In , pages 1–4. IEEE, 2015.[55] Q. Zheng and M. Zwicker. Learning to importance sample in primary samplespace.