[PDF] Photon-Driven Neural Path Guiding - Researchain

Abstract

Although Monte Carlo path tracing is a simple and effective algorithm to synthesize photo-realistic images, it is often very slow to converge to noise-free results when involving complex global illumination. One of the most successful variance-reduction techniques is path guiding, which can learn better distributions for importance sampling to reduce pixel noise. However, previous methods require a large number of path samples to achieve reliable path guiding. We present a novel neural path guiding approach that can reconstruct high-quality sampling distributions for path guiding from a sparse set of samples, using an offline trained neural network. We leverage photons traced from light sources as the input for sampling density reconstruction, which is highly effective for challenging scenes with strong global illumination. To fully make use of our deep neural network, we partition the scene space into an adaptive hierarchical grid, in which we apply our network to reconstruct high-quality sampling distributions for any local region in the scene. This allows for highly efficient path guiding for any path bounce at any location in path tracing. We demonstrate that our photon-driven neural path guiding method can generalize well on diverse challenging testing scenes that are not seen in training. Our approach achieves significantly better rendering results of testing scenes than previous state-of-the-art path guiding methods.

Full PDF

PPhoton-Driven Neural Path Guiding

SHILIN ZHU,

University of California San Diego, USA

ZEXIANG XU,

Adobe Research, USA

TIANCHENG SUN,

University of California San Diego, USA

ALEXANDR KUZNETSOV,

University of California San Diego, USA

MARK MEYER,

Pixar Animation Studios, USA

HENRIK WANN JENSEN,

University of California San Diego and Luxion, USA

HAO SU,

University of California San Diego, USA

RAVI RAMAMOORTHI,

University of California San Diego, USA

Elegant Hotel Room Müller et al.Bako et al. Rath et al. ReferencePath tracer

Ours

Vorba et al.

Fig. 1. We present a novel photon-driven neural path guiding approach that can effectively reduce the variance of path tracing results. This complex scene islit by several decorative ceiling lights which are extremely difficult to discover in path tracing. We compare the equal-time ( ∼

20 minutes) rendering resultswith standard path tracing and state-of-the-art path guiding methods (including Müller et al. [2017], Bako et al. [2019], and Rath et al. [2020]), showing thecrops of the rendered results with corresponding relative MSEs (rMSEs). Recently, Bako et al. [2019] use an offline trained neural network for path guiding;however, it only supports guiding the first bounce, which is not effective since this scene is dominated by indirect lighting. On the other hand, while traditionalmethods allow for multi-bounce path guiding, they are purely online learning methods and it is highly expensive for them to learn the complex samplingfunctions for this challenging scene. Our method utilizes an offline trained deep neural network and enables neural path guiding at any path bounces. Oursachieves the best rendering results qualitatively and quantitatively.

Although Monte Carlo path tracing is a simple and effective algorithm tosynthesize photo-realistic images, it is often very slow to converge to noise-free results when involving complex global illumination. One of the mostsuccessful variance-reduction techniques is path guiding, which can learnbetter distributions for importance sampling to reduce pixel noise. However,previous methods require a large number of path samples to achieve reliablepath guiding. We present a novel neural path guiding approach that can re-construct high-quality sampling distributions for path guiding from a sparseset of samples, using an offline trained neural network. We leverage photonstraced from light sources as the input for sampling density reconstruction,which is highly effective for challenging scenes with strong global illumina-tion. To fully make use of our deep neural network, we partition the scenespace into an adaptive hierarchical grid, in which we apply our network toreconstruct high-quality sampling distributions for any local region in thescene. This allows for highly efficient path guiding for any path bounce atany location in path tracing. We demonstrate that our photon-driven neuralpath guiding method can generalize well on diverse challenging testingscenes that are not seen in training. Our approach achieves significantly

Authors’ addresses: Shilin Zhu, University of California San Diego, USA, [email protected]; Zexiang Xu, Adobe Research, USA, [email protected]; Tiancheng Sun,University of California San Diego, USA, [email protected]; Alexandr Kuznetsov,University of California San Diego, USA, [email protected]; Mark Meyer, PixarAnimation Studios, USA, [email protected]; Henrik Wann Jensen, University ofCalifornia San Diego and Luxion, USA, [email protected]; Hao Su, University ofCalifornia San Diego, USA, [email protected]; Ravi Ramamoorthi, University ofCalifornia San Diego, USA, [email protected]. better rendering results of testing scenes than previous state-of-the-art pathguiding methods.

Monte Carlo path tracing has been widely used in photo-realisticimage synthesis. However, while simple and flexible, path tracingcan take a significant amount of time to generate noise-free imagesfor complex scenes (e.g., Fig. 1). One critical challenge for MonteCarlo based methods is to effectively construct light transport pathsconnecting the light and the camera.Many path guiding methods [Müller et al. 2017; Jensen 1995]have been presented to construct advanced distributions (usuallyapproximating incident light fields or some variants of those) forimportance sampling at local shading points, guiding the local pathsampling for high-energy path construction. The recent successfulones are unidirectional guiding methods [Müller et al. 2017; Rathet al. 2020]; they rely on early path samples to discover high-energysampling directions. However, this unidirectional path discoveryprocess can be slow for a challenging scene that is dominated byindirect illumination. While using light paths is known to be efficientin exploring the path space, previous photon-driven or bidirectionalpath guiding methods [Jensen 1995; Vorba et al. 2014] are not yetefficient, requiring sampling a large number of light paths.

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. a r X i v : . [ c s . G R ] O c t :2 • Zhu et al. We present a novel path guiding approach that can achieve highlyefficient path sampling using only a sparse set of light paths as input,thus significantly advancing the overall rendering speed. Inspired bythe original path guiding work [Jensen 1995], we leverage photonsto compute local sampling distributions for importance sampling inpath tracing. As is done by Jensen [1995], a sampling distribution atany 3D local region can be easily obtained by binning local photonsaccording to their directions (i.e., a 2D histogram map). However,such distributions are only reliable with locally dense enough pho-tons, and, on the other hand, are usually low-quality and appearhighly noisy with sparse photons (see Figs. 2 and 3).We propose to use a deep neural network to reconstruct high-quality sampling maps for path guiding from low-quality noisysampling maps that are acquired by binning sparse photons (seeFig. 2). Our approach is the first deep learning based photon-drivenpath guiding approach . In essence, we break down the complexpath guiding problem, mainly focusing on reconstructing local sam-pling distributions represented as 2D maps (i.e., images), and thuspose this problem as one of the image-to-image reconstructionsthat can now be addressed by deep learning techniques. Our sam-pling map reconstruction network is effectively trained offline in ascene-independent way. The trained network is able to recover theaccurate shapes of a diverse set of complex sampling distributionson challenging novel scenes, which enables highly efficient guidedpath tracing with complex global illumination effects.Our network is designed to reconstruct high-quality samplingmaps at local spatial regions. To make these sampling maps well dis-tributed and locally representative in the scene space, we adaptivelypartition the entire scene space into a hierarchical grid, according tothe complexity of local geometry and incident light. The samplingmap of every leaf voxel in the grid is reconstructed by our network,enabling path guiding at any location in a scene. Note that, ourapproach allows for efficient guided path sampling at any bouncepoints; this is the first offline-learning neural path guiding approachthat can guide arbitrary bounces . We demonstrate that our noveldeep path guiding achieves significantly better rendering qualityon various challenging scenes than previous state-of-the-art pathguiding methods given equal rendering time (see Fig. 1).In summary, our main contributions are: • We present the first deep learning based photon-driven pathguiding approach; • To our knowledge, this is the first offline-learning neural pathguiding approach that can guide arbitrary bounces; • Our proposed framework generalizes well to unseen newscenes and produces significantly better rendering results.

Monte Carlo rendering.

One central problem of computer graphicsis to efficiently evaluate the rendering equation [Kajiya 1986], whichdescribes how light transports globally inside a scene. Monte Carlomethods are among the most effective methods to compute the lighttransport, which require effectively sampling high-energy paths thatconnect the camera and light for efficient rendering. Since MonteCarlo path tracing was introduced in the seminal work by Kajiya[1986], numerous papers have developed more efficient methods to explore path space, including bidirectional path tracing [Lafor-tune and Willems 1993; Veach and Guibas 1995a] and metropolislight transport [Veach and Guibas 1997; Pauly et al. 2000]. Thesemethods typically leverage importance sampling to sample sub-pathdirections at any bounces for each traced path traversing the scene.Since the incident illumination is unknown, the importance sam-pling usually only considers the reflectance term (with a cosineterm) in the rendering equation (please refer to Sec. 3 for moredetails); this however is not efficient for challenging scenes withcomplex indirect lighting. Path guiding [Jensen 1995; Vorba et al.2019] can instead provide more efficient importance sampling; ournovel photon-driven path guiding approach can reconstruct high-quality sampling distributions that well approximate the complexincident light fields, thus leading to highly efficient rendering.

Photon-based rendering.

Particle density estimation has also beenapplied in computer graphics to evaluate the rendering equation,which introduces photon mapping and many other particle- orphoton- based rendering methods [Shirley et al. 1995; Jensen 1996;Hachisuka et al. 2008; Knaus and Zwicker 2011]. These methodsfocus on photon density estimation at any given shading point,which avoids the high-frequency noise in MC rendering and isvery effective for computing complex global illumination. Photondensity estimation can only provide biased radiance or irradianceestimates, since it blurs the photon contributions within a certainkernel bandwidth (though this bias can be consistently reduced tozero by progressively reducing the bandwidth and tracing infinitephotons [Hachisuka et al. 2008; Hachisuka and Jensen 2009; Knausand Zwicker 2011]). Our goal is not to compute photon density for asingle point but to approximate incident light fields for a local area(in a voxel) as sampling distributions. Therefore, we consider theintegral of irradiance over an area (i.e., the incident flux), which canbe effectively evaluated using photons in an unbiased way.Recently, Zhu et al. [2020] introduce a deep learning based methodfor photon density estimation in photon mapping. They leverage aPointNet [Qi et al. 2017] style neural network to process individualphotons. However, the complexity of running such a network growslinearly with the number of photons. We instead leverage a UNet[Ronneberger et al. 2015] style network and consider a raw photonhistogram map, composed by binning photons [Jensen 1995], asinput; therefore, the complexity of our network is independent tothe photon count and runs in constant time. We show that ourmethod consistently reconstructs better sampling distributions withmore photons.

Path guiding.

In general, path guiding aims to estimate the in-coming light fields and draw samples accordingly to accelerate theconvergence of Monte-Carlo rendering. The first path guiding tech-nique is based on photons [Jensen 1995]; it traces light paths fromthe light sources, distributes photons in the scene, and constructslocal photon histograms as sampling distributions for the impor-tance sampling in path tracing. Though very efficient to compute,such histogram-based sampling maps are only of high quality whenaccumulating dense enough photons. We extend this simple classi-cal histogram-based technique to a novel learning-based method ina new path guiding framework; our method regresses high-quality

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:3 sampling maps from sparse photons, avoiding expensively tracinga large number of photons.Vorba et al. [2014] present a bidirectional guiding method, whereboth camera paths and light paths are guided using online fittedgaussian-mixture (GM) distributions at spatial cache points. Thistechnique was further extended to product sampling [Herholz et al.2016], and to account for parallax [Ruppert et al. 2020]. However, theonline fitting process in these methods is usually slow and the GMmodel also makes it difficult to express high-frequency samplingdistributions. Our approach leverages histograms as input (that canbe easily computed at very low cost online) and an offline trainedcompact neural network that can rapidly reconstruct high-qualitysampling maps with high-frequency details from the input.Recently, unidirectional guiding methods have become more ef-fective and practical, thanks to the efficient adaptive guiding frame-work introduced by Müller et al. [2017]. Many works extend thisframework to achieve sampling in primary space [Guo et al. 2018],product sampling [Diolatzis et al. 2020], and variance-aware sam-pling [Rath et al. 2020]. These methods iteratively trace camerapaths to adaptively reconstruct the incident light fields; this relieson early iteration paths to discover the light sources, in order toreconstruct reliable sampling distributions to guide the followingiteration paths. However, the light discovery can be slow and unsuc-cessful for a scene with dominant indirect lighting, and the errorsin the early-iteration sampling distributions can bias the path sam-pling in later iterations and never get fixed. In contrast, we leveragephotons that are efficient in exploring indirect light transport; ourlearning based approach can also recover high-quality sampling dis-tributions from sparse photons at an early stage, effectively avoidinga slow start in the guiding and rendering. Moreover, our photons aretraced independently in each iteration, which avoids accumulatingthe sampling errors through multiple iterations.

Neural path guiding.

Recently, deep learning techniques havebeen applied in path guiding. Müller et al. [2019] train an onlineneural network to perform importance sampling in global pathspace. This method can reproduce accurate ground-truth samplingfunctions, but the online training process is extremely slow. Somerecent works leverage offline trained networks [Bako et al. 2019; Huoet al. 2020]; however, they only guide the path sampling at the firstbounce. While we also leverage an offline trained neural network,our method instead leverages photons and supports guiding at anybounces, enabling significantly better rendering results than thefirst-bounce guiding approach [Bako et al. 2019] (see Fig. 1).

Physically-based rendering can be expressed by the Rendering Equa-tion [Kajiya 1986] that describes the radiance leaving an intersectionpoint 𝒙 in direction 𝜔 𝑜 : 𝐿 ( 𝒙 , 𝜔 𝑜 ) = 𝐿 𝑒 ( 𝒙 , 𝜔 𝑜 ) + ∫ Ω 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) 𝑓 𝑟 ( 𝒙 , 𝜔 𝑖 , 𝜔 𝑜 ) cos 𝜃 𝑖 𝑑𝜔 𝑖 , (1)where 𝐿 𝑒 ( 𝒙 , 𝜔 𝑜 ) denotes the emitted radiance, 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) is the inci-dent radiance from direction 𝜔 𝑖 , 𝑓 𝑟 is the bidirectional scatteringdistribution function (BSDF), and Ω is the visible hemisphere. Thekey component in the equation is the integral that computes the reflected radiance 𝐿 𝑟 ( 𝒙 , 𝜔 𝑜 ) = ∫ Ω 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) 𝑓 𝑟 ( 𝒙 , 𝜔 𝑖 , 𝜔 𝑜 ) cos 𝜃 𝑖 𝑑𝜔 𝑖 over all directions in the hemisphere.The integral can be numerically evaluated using Monte Carloestimation [Veach 1997]: 𝐿 𝑟 ( 𝒙 , 𝜔 𝑜 ) = 𝑁 𝑁 ∑︁ 𝑖 = 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) 𝑓 𝑟 ( 𝒙 , 𝜔 𝑖 , 𝜔 𝑜 ) cos 𝜃 𝑖 𝑝 ( 𝜔 𝑖 ) (2)where 𝑁 Monte Carlo path samples in various directions 𝜔 𝑖 aredrawn from the probability density function (PDF) 𝑝 ( 𝜔 𝑖 ) . Consider-ing global illumination with multiple bounces, 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) is in factcomputed by recursively evaluating integrals using Eqn. 1. Thereforein Monte Carlo path tracing, rays are sampled from each intersec-tion point to compute the radiance that contributes to the pixel colorat multiple bounces.The variance of the Monte Carlo estimate 𝐿 𝑟 ( 𝒙 , 𝜔 𝑜 ) can be re-duced by sampling 𝜔 𝑖 from a density function 𝑝 ( 𝜔 𝑖 ) that resemblesthe numerator 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) 𝑓 𝑟 ( 𝒙 , 𝜔 𝑖 , 𝜔 𝑜 ) cos 𝜃 𝑖 . Ideally, if 𝑝 ( 𝜔 𝑖 ) and thenumerator only differ by a constant scale, the variance is reduced tozero. However, this numerator is unknown and is as difficult as theintegral to compute, due to complex visibility and indirect lightingin 𝐿 𝑖 ; therefore, in practice, path tracing often proceeds with BSDFimportance sampling.Path guiding aims to reconstruct a density function that matchesthe shape of the numerator as closely as possible. In particular, sincethe standard BSDF importance sampling satisfies 𝑝 BSDF ( 𝜔 𝑖 ) ∝ 𝑓 𝑟 ( 𝒙 , 𝜔 𝑖 , 𝜔 𝑜 ) , recent path guiding methods often set the target probability densityto be proportional to the incident light [Vorba et al. 2014; Mülleret al. 2017] 𝑝 guide ( 𝜔 𝑖 ) ∝ 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 . (3)The final sampling strategy is achieved by combining the guidingand BSDF sampling using one-sample Multiple Importance Sam-pling (MIS): [Veach and Guibas 1995b] 𝑝 ( 𝜔 𝑖 ) = 𝛼𝑝 BSDF ( 𝜔 𝑖 ) + ( − 𝛼 ) 𝑝 guide ( 𝜔 𝑖 ) , (4)where 𝛼 is the mixture coefficient that determines the probabilityof choosing BSDF sampling or guided sampling.Many recent works rely on early path samples in the path tracingto approximate the incident light field (Eqn. 3), which is not sufficientfor challenging scenes with strong indirect illumination as shown inFig. 1. We instead leverage photons traced from the light sources tocompute the sampling density functions, which effectively exploresthe challenging light transport. Our novel approach advances thetraditional path guiding with powerful deep learning techniquesand an efficient spatial structure, thus enabling highly efficient pathguiding from sparse photons. Our path guiding approach uses a deep neural network to regresshigh-quality sampling maps that can be used to guide path sam-pling. Correspondingly, we introduce a novel practical path guidingframework that utilizes our neural network to reconstruct samplingmaps in an adaptive spatial hierarchical grid, enabling effectivepath guiding at multiple bounces. In the following sections, wefirst introduce our sampling map parameterization, target sampling

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :4 • Zhu et al. (cid:27) (cid:36)(cid:71)(cid:71)(cid:76)(cid:87)(cid:76)(cid:82)(cid:81)(cid:54)(cid:78)(cid:76)(cid:83)(cid:3)(cid:70)(cid:82)(cid:81)(cid:81)(cid:72)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81)(cid:47)(cid:82)(cid:86)(cid:86)(cid:3)(cid:70)(cid:82)(cid:80)(cid:83)(cid:88)(cid:87)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:41)(cid:72)(cid:68)(cid:87)(cid:88)(cid:85)(cid:72)(cid:3)(cid:86)(cid:83)(cid:68)(cid:70)(cid:72)(cid:3)(cid:85)(cid:72)(cid:70)(cid:82)(cid:81)(cid:86)(cid:87)(cid:85)(cid:88)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81) (cid:20)(cid:25) (cid:22)(cid:21) (cid:25)(cid:23) (cid:53)(cid:68)(cid:90)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:76)(cid:81)(cid:74)(cid:3)(cid:80)(cid:68)(cid:83)(cid:3)(cid:14)(cid:3)(cid:41)(cid:72)(cid:68)(cid:87)(cid:88)(cid:85)(cid:72)(cid:86) (cid:53)(cid:72)(cid:70)(cid:82)(cid:81)(cid:86)(cid:87)(cid:85)(cid:88)(cid:70)(cid:87)(cid:72)(cid:71)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:76)(cid:81)(cid:74)(cid:3)(cid:80)(cid:68)(cid:83) (cid:41)(cid:72)(cid:68)(cid:87)(cid:88)(cid:85)(cid:72)(cid:3)(cid:36)(cid:74)(cid:74)(cid:85)(cid:72)(cid:74)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:37)(cid:79)(cid:82)(cid:70)(cid:78) (cid:22) (cid:91) (cid:22) (cid:3) (cid:39) (cid:76)(cid:79) (cid:68) (cid:87) (cid:72)(cid:71) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40) (cid:47) (cid:56) (cid:22) (cid:91) (cid:22) (cid:3) (cid:39) (cid:76)(cid:79) (cid:68) (cid:87) (cid:72)(cid:71) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40) (cid:47) (cid:56) (cid:21) (cid:91) (cid:21) (cid:3) (cid:56) (cid:83) (cid:86) (cid:68) (cid:80) (cid:83) (cid:79) (cid:72) (cid:21) (cid:91) (cid:21) (cid:3) (cid:56) (cid:83) (cid:86) (cid:68) (cid:80) (cid:83) (cid:79) (cid:72) (cid:22) (cid:91) (cid:22) (cid:3) (cid:39) (cid:76)(cid:79) (cid:68) (cid:87) (cid:72)(cid:71) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40) (cid:47) (cid:56) (cid:22) (cid:91) (cid:22) (cid:3) (cid:39) (cid:76)(cid:79) (cid:68) (cid:87) (cid:72)(cid:71) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:22) (cid:91) (cid:22) (cid:3) (cid:39) (cid:76)(cid:79) (cid:68) (cid:87) (cid:72)(cid:71) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40) (cid:47) (cid:56) (cid:40) (cid:47) (cid:56) (cid:38)(cid:72)(cid:81)(cid:87)(cid:72)(cid:85)(cid:3)(cid:38)(cid:82)(cid:81)(cid:89)(cid:3)(cid:37)(cid:79)(cid:82)(cid:70)(cid:78) (cid:39)(cid:72)(cid:70)(cid:82)(cid:71)(cid:72)(cid:85)(cid:3)(cid:38)(cid:82)(cid:81)(cid:89)(cid:3)(cid:37)(cid:79)(cid:82)(cid:70)(cid:78) (cid:22) (cid:91) (cid:22) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40) (cid:47) (cid:56) (cid:40) (cid:47) (cid:56) (cid:22) (cid:91) (cid:22) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:21) (cid:91) (cid:21) (cid:3) (cid:56) (cid:83) (cid:86) (cid:68) (cid:80) (cid:83) (cid:79) (cid:72)(cid:22) (cid:91) (cid:22) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40) (cid:47) (cid:56) (cid:40) (cid:47) (cid:56) (cid:22) (cid:91) (cid:22) (cid:3) (cid:38) (cid:82)(cid:81) (cid:89) (cid:40)(cid:81)(cid:70)(cid:82)(cid:71)(cid:72)(cid:85)(cid:3)(cid:38)(cid:82)(cid:81)(cid:89)(cid:3)(cid:37)(cid:79)(cid:82)(cid:70)(cid:78) (cid:36) (cid:89) (cid:74) (cid:3) (cid:51) (cid:82)(cid:82) (cid:79) (cid:56)(cid:83)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:72) (cid:54)(cid:78)(cid:76)(cid:83)(cid:3)(cid:70)(cid:82)(cid:81)(cid:81)(cid:72)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81) (cid:36)(cid:71)(cid:71)(cid:76)(cid:87)(cid:76)(cid:82)(cid:81)(cid:41)(cid:72)(cid:68)(cid:87)(cid:88)(cid:85)(cid:72)(cid:3)(cid:80)(cid:68)(cid:83) (cid:38)(cid:72)(cid:81)(cid:87)(cid:72)(cid:85)(cid:3)(cid:73)(cid:72)(cid:68)(cid:87)(cid:88)(cid:85)(cid:72)(cid:3)(cid:80)(cid:68)(cid:83) (cid:38)(cid:82)(cid:81)(cid:70)(cid:68)(cid:87)(cid:3)(cid:73)(cid:72)(cid:68)(cid:87)(cid:88)(cid:85)(cid:72)(cid:3)(cid:80)(cid:68)(cid:83)

Fig. 2. The neural network architecture for sampling map reconstruction. We use a compact autoencoder with light-weight masked convolutions [Liu et al.2018; Yi et al. 2020] and ELU [Clevert et al. 2015] activation function which can extract high-level features from the input energy map and output a smoothand dense sampling map. The bottleneck layers use dilated convolutions [Iizuka et al. 2017] to further expand the size of the receptive fields. density, and how to use photons to compute the sampling maps inSec. 5. We then introduce our deep neural network that can regresshigh-quality sampling maps given noisy low-quality sampling mapsin Sec. 6. We present our full neural path guiding framework inSec. 7, which describes our iterative guiding and rendering process,adaptive spatial structure, and how paths, photons, and the neuralnetwork are incorporated in the system. The implementation detailsare discussed in Sec. 8. We present an extensive evaluation of ourmethod in Sec. 9. In the end, we conclude our paper and discussfuture work in Sec. 10.

Previous methods [Jensen 1995; Vorba et al. 2014] usually computehemispherical distributions at sampled surface points to approxi-mate incident light fields. However, such hemispherical functionsonly approximate light fields at very local flat 2D surface regions,and are hard to interpolate on surfaces with complex normal vari-ation. Inspired by the recent unidirectional path guiding methods[Müller et al. 2017; Rath et al. 2020; Bako et al. 2019], we utilize afull spherical sampling distribution (instead of a hemispherical one)that models the incident light distribution in a local 3D region. Inparticular, we build a hierarchical grid (see Sec. 7.1) in the scenespace, and compute a spherical sampling distribution for each local3D voxel of the grid. In this section, we discuss the representationof our sampling function and the computation of it from photons.

Spherical function representation.

We use a regular directional gridthat represents the sampling density function as a 2D sampling map(similar to [Bako et al. 2019]). We leverage the cylindrical mappingto parameterize the spherical domain for better area preservation(similar to [Müller et al. 2017]). In particular, a unit vector 𝑟 = ( 𝑥, 𝑦, 𝑧 ) (corresponding to a point on a unit sphere) is mapped to a 2Dlocation ( 𝑢, 𝑣 ) = ( 𝑧, 𝜙 ) on the sampling map, where 𝜙 = arctan ( 𝑦 / 𝑥 ) .Our sampling map is like a standard environment map or radiancemap in lighting representation, but ours is monochromatic and usescylindrical mapping. Target sampling density.

As discussed in Sec. 3 (Eqn. 3), in general,the goal of path-guiding is to compute the sampling density at anyposition, making it proportional to the incident light 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 .For our discrete case where we consider a 3D voxel region and acertain pixel range (representing a solid angle bin) of a sampling map, it is in fact the expected incident light that is of our interest.In particular, given a voxel 𝑗 and a solid angle footprint of pixel 𝑘 in the sampling map, the expected 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 coming from thesolid angle over the 2D surface area (that is of the scene geometrylocated in the voxel) inside the voxel is expressed by: E ( 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 ) = ∫ Δ 𝐴 𝑗 ∫ ΔΩ 𝑘 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 𝑑𝜔 𝑖 𝑑 𝒙 ΔΩ 𝑝 Δ 𝐴 𝑗 (5) = Φ 𝑗,𝑘 ΔΩ 𝑝 Δ 𝐴 𝑗 , (6)where Δ 𝐴 𝑗 represents the entire surface area of the scene geometrycovered by the voxel 𝑗 , ΔΩ 𝑘 represents the solid angle footprintcovered by the pixel 𝑘 in the sampling map, and Φ 𝑗,𝑘 represents thetotal incident energy in the spatial and directional range. Therefore,it is the total energy (radiant flux) Φ 𝑗,𝑘 = ∫ Δ 𝐴 𝑗 ∫ ΔΩ 𝑘 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 𝑑𝜔 𝑖 𝑑 𝒙 , (7)that governs our sampling map distribution. Essentially, Φ 𝑗,𝑘 modelsthe integrated incident radiance and is proportional to the samplingprobability of a pixel 𝑘 in a sampling map at a voxel 𝑗 . Note that, theirradiance ( 𝐸 ( 𝒙 , ΔΩ 𝑘 ) = ∫ ΔΩ 𝑘 𝐿 𝑖 ( 𝒙 , 𝜔 𝑖 ) cos 𝜃 𝑖 𝑑𝜔 𝑖 ) at surface point 𝑥 is a standard radiometry term and widely discussed in previousworks [Jensen 1995; Rath et al. 2020]; when divided by the total area, Φ 𝑗,𝑘 also describes the expected irradiance ( Φ 𝑗,𝑘 / Δ 𝐴 𝑗 ) in the voxel.Therefore, we seek to obtain sampling densities that are proportionalto the expected incident light: 𝑝 guide ( 𝜔 𝑖 ) ∝ Φ 𝑗,𝑘 𝑖 / ΔΩ 𝑘 𝑖 , (8)where 𝑘 𝑖 is the pixel covering direction 𝜔 𝑖 in the sampling map, andwe ignore the Δ 𝐴 𝑗 in Eqn. 6 since it is a constant value for all solidangles in a voxel. This sampling density corresponds to a samplingmap, each pixel value of which is proportional to Φ 𝑗,𝑘 𝑖 . We thusreconstruct a sampling map by normalizing an energy map thatrecords the energy Φ 𝑗,𝑘 𝑖 in each pixel. Computing incident energy.

In this work, we leverage particletracing to effectively evaluate the integral of Φ 𝑗,𝑘 (Eqn. 7). Wetrace light paths from the light sources to distribute photons in thescene, where each photon carries a portion of flux; Φ 𝑗,𝑘 can then be ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:5

Iteration 2 Reference I n p u t O u t p u t I n p u t O u t p u t Spaceship

Iteration 4 Iteration 6 Iteration 8

Fig. 3. Example input and output sampling maps of the pre-trained neural network over iterations (gamma transformed for better visualization purpose). Withmore iterations of path and photon tracing, both the input raw sampling map and the reconstructed output sampling map get better over time. Numbers arerMSE computed using the reference sampling maps. evaluated by simply binning the photons similar to [Jensen 1995].In particular, Φ 𝑗,𝑘 is estimated by: Φ 𝑗,𝑘 = ∑︁ 𝜔 𝑝 ∈ ΔΩ 𝑘 , 𝒙 𝑝 ∈ Δ 𝐴 𝑗 ΔΦ 𝑝 , (9)where 𝑝 denotes a photon, the photon arrives at the surface point 𝒙 𝑝 from direction 𝜔 𝑝 , and ΔΦ 𝑝 is the energy carried by the photon.Equation 9 essentially accumulates all the photon energies insidethe voxel and directional bin.Note that, Müller et al. [2017] leverages path tracing to accumulatethe radiance samples inside a 3D region; this can also be seen as anintegral (a Monte Carlo one) of the radiance over an area and a solidangle, similar to the energy integral of Eqn. 7. We leverage photontracing to evaluate the integral and our particle-based approachprovides an unbiased estimate for the energy Φ 𝑗,𝑘 when the photoncount goes to infinity.Since the evaluation is governed by a single summation, we canprogressively trace as many photons as needed, and accumulatethe photons to compute an energy map without any memory bot-tleneck. Once a photon is accumulated in a voxel, the photon datais immediately deleted, except when the grid needs to be refinedat the beginning (Sec. 7.1). Note that, an accurate energy map re-quires tracing a large number of photons, but in practice, we canonly allow for tracing a small number of photons at rendering time,which by themselves cannot directly lead to high-quality sampling.We propose to compute high-quality sampling maps, using a largenumber of photons, and take them as ground truth to train a deepneural network offline that can regress high-quality sampling mapsonline efficiently. While using a large number of photons can result in an accurateestimate of Φ 𝑗,𝑘 , it requires a significant amount of tracing time. Onthe other hand, computing a sampling map from sparse photons isfast, but the map is usually low-quality and appears noisy with manyempty bins. As a result, neither using dense photons (too slow) nor sparse photons (too low-quality) is suitable for efficient path guiding.To overcome this, our central idea is to obtain accurate samplingmaps offline as ground truth using dense enough photons, and thenleverage supervised learning to regress such maps from low-qualitymaps that can be computed efficiently from sparse photons forpath guiding. Specifically, we propose to train a deep convolutionalneural network that learns to reconstruct a high-quality samplingmap from sparse photons.Our sampling maps are reconstructed iteratively through multipleiterations in our path guiding framework. Specifically, we considera raw sampling map 𝑆 𝑒,𝑡 (1 channel) as input, acquired by accu-mulating a sparse set of traced photons from iteration 1 to 𝑡 usingEqn. 9, where 𝑡 denotes the iteration number and 𝑒 means accu-mulated photon energy. To give the neural network a hint on howthe raw sampling map evolves over previous iterations with morephotons, we keep the raw sampling map 𝑆 𝑒,𝑡 − from the previousiteration also as an input channel. In addition, we record the numberof photons per solid angle bin in 𝑆 𝑒,𝑡 and 𝑆 𝑒,𝑡 − , resulting in twoadditional maps 𝑃 𝑒,𝑡 and 𝑃 𝑒,𝑡 − , and use them as auxiliary buffers inthe input, which provides two additional input channels. Inspiredby the image inpainting techniques [Liu et al. 2018; Yu et al. 2019; Yiet al. 2020], we also concatenate a binary mask 𝐵 𝑒,𝑡 (1 channel) in-dicating whether a solid angle bin contains photon data or not, anduse light-weight masked convolutions to process the input maps.As a result, our full input is a 2D image map with 5 channels in totaland our network F can be expressed by: 𝑆 𝑑 = F ( 𝑆 𝑒,𝑡 , 𝑆 𝑒,𝑡 − , 𝑃 𝑒,𝑡 , 𝑃 𝑒,𝑡 − , 𝐵 𝑒,𝑡 ) . (10)Our network learns to regress a one-channel sampling map 𝑆 𝑑 ,supervised by the ground-truth map ˜ 𝑆 𝑑 computed from a largenumber of photons. Note that, our network is essentially designed to solve an image-to-image reconstruction task. Many existing 2D neural networks forimage-to-image denoising, translation, and impainting ([Chaitanyaet al. 2017; Bako et al. 2017; Vogels et al. 2018; Liu et al. 2018]) canthus be potentially applied to address our problem. However, our

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :6 • Zhu et al. network is applied on a large number (thousands) of voxels, whileour end goal is to speed up the total rendering process. Therefore,we balance the inference speed and reconstruction quality in ournetwork design.We propose to use a compact U-Net [Ronneberger et al. 2015]style neural network with residual links and skip connections toachieve the sampling map reconstruction as illustrated in Fig. 2. Ournetwork uses multiple downsampling and upsampling convolutionallayers to extract meaningful neural features from the input samplingmap 𝑆 𝑒 and regress a high-quality sampling map 𝑆 𝑑 . Our input rawsampling maps are computed from sparse photons, which containmany holes or empty bins as shown in Fig. 3. Therefore, we use thelight-weight masked convolutions as the convolutional layers in ournetwork, inspired by the recent image inpainting works [Liu et al.2018; Yi et al. 2020]. This ensures that valid (non-empty) and invalid(empty) solid angle bins are treated differently in the network andonly valid bins can contribute in a convolutional operation. Notethat, our network is relatively compact, compared to the previousU-Net-like networks ([Chaitanya et al. 2017; Bako et al. 2017; Vogelset al. 2018]) used in other tasks; the maximum number of featurechannels in our network is only 64. This compactness allows forfast sampling map reconstruction during path guiding on high-end GPUs, keeping our network from becoming the bottleneck inthe entire rendering process. In fact, a large network is not verynecessary for our task, since a sampling map has only a singlechannel (no color variations) and we only need to reconstruct low-resolution maps (just 32 ×

64 or 64 ×

128 that are much lowerthan other reconstruction tasks), which are already adequate forhigh-quality rendering. While compact, our network can regresshigh-quality maps that enable efficient path guiding in path tracingwith quickly reduced variances. We believe our network size canbe further reduced by advanced network compression techniques[Cheng et al. 2018; Deng et al. 2020] that can enable even moreefficient path guiding, and we leave this as future work.We also find that the usage of photon counts and the previoussampling map as buffers is effective; these auxiliary features aresimple to obtain but useful indicators for the quality and evolutionof the original per-solid-angle probabilities. These buffers are alsocompact and improve the reconstruction quality with marginal extracost. On the other hand, we find that other geometric features, suchas position, normal, and depth – that are used in previous screen-space guiding methods [Bako et al. 2019; Huo et al. 2020] – are notvery helpful in most of our scenes since our reconstruction operateson each local 3D voxel. To justify our neural network design, wecompare the performance with a standard U-Net [Ronneberger et al.2015] in the supplementary material.

We utilize an 𝐿 loss to supervise the output sampling map: L 𝑆 = | ˆ 𝑆 𝑑 − 𝑆 𝑑 | (11)where ˆ 𝑆 𝑑 is the ground-truth sampling map computed by tracinga large number of photons. Inspired by the deep supervision inmachine learning [Xie and Tu 2015; Lee et al. 2015], we also providethe ground-truth signal to every decoding level in order to ease the ALGORITHM 1:

Our neural path guiding framework in Sec. 7. Throughmultiple iterations of path tracing and photon tracing, we construct ahierarchical grid (Sec. 7.1), reconstruct and update the sampling mapin each valid grid voxel (Sec. 7.2), and guide the path tracing using thesampling maps (Sec. 7.3). We also apply a final path tracing pass guidedby the reconstructed sampling maps (Sec. 7.4). We use different colorsto mark different subsections, with green for Sec. 7.1, blue for Sec. 7.2,red for Sec. 7.3 and purple for Sec. 7.4.

Input :

Target scene, pre-trained neural network F Output :

A rendered image Initialize a regular spatial grid; set all 𝑄 𝑗 = for each iteration 𝑡 < 𝑇 do Initiate 𝑡 SPP path samples ; for each path do for each bounce 𝑏 do Locate voxel 𝑗 ( 𝒙 𝑏 ∈ Δ 𝐴 𝑗 ) ; if not isValid( 𝑗 ) (no sampling map) then Sample ( 𝑝 BSDF ) → 𝜔 𝑏 ; else Sample ( 𝑝 MIS ) → 𝜔 𝑏 (Eqn. 14); end markValid( 𝑗 ) ; end Compute path throughput and 𝐿 ( 𝒙 𝑏 , 𝜔 𝑏 ) ; for each bounce at 𝒙 𝑏 ∈ Δ 𝐴 𝑗 do if isValid( 𝑗 ) then 𝜈 𝑏 = 𝐿 ( 𝒙 𝑏 , 𝜔 𝑏 ) cos 𝜃 𝑏 𝑓 𝑟 ( 𝒙 , 𝜔 𝑏 , 𝜔 𝑜 ) ; if 𝜔 𝑏 ← 𝑝 guide then 𝜈 𝑗, G + = 𝜈 𝑏 else 𝜈 𝑗, B + = 𝜈 𝑏 ; if 𝜔 𝑏 ← 𝑝 guide then 𝑄 𝑗, G + = else 𝑄 𝑗, B + = if 𝑄 𝑗, G ≥ & 𝑄 𝑗, B ≥ then Update 𝛼 𝑗 (Eqn. 13); end end Update the output image ; end Trace 𝑡 𝑁 𝑝 light paths for photons ; for each photon 𝑝 do Locate voxel 𝑗 , solid angle 𝑘 ( 𝒙 𝑝 ∈ Δ 𝐴 𝑗 , 𝜔 𝑝 ∈ ΔΩ 𝑘 ) ; if isValid( 𝑗 ) then Update energy map: Φ 𝑗,𝑘 + = ΔΦ 𝑝 (for Eqn.9); 𝑀 𝑗 + =

1; Update 𝑉 𝑛 ; if 𝑀 𝑗 > 𝑀 thr or 𝑉 𝑛 > 𝑉 𝑛 thr then Subdivide voxel 𝑗 into two sub-voxels (Sec. 7.1); end end end for each valid voxel 𝑗 do Reconstruct sampling maps (i.e. 𝑝 Guiding ) with F ; end end Trace 𝑁 𝑓 paths for final output (Sec. 7.4) ; loss backpropagation. To avoid potential over-blurring, we leveragean asymmetric function inspired by [Vogels et al. 2018]; this leadsto our full loss L rec = L 𝑆 · ( + ( 𝜆 − ) · H ) (12) ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:7 where H = H = 𝜆 = . ∼ . Our network focuses on reconstructing high-quality sampling func-tions for local path sampling. This is a central sub-problem in manypath guiding frameworks. Note that, this problem of sampling mapregression is independent of other sub-modules in path guiding.We thus train our network independently without relying on anyspecific guiding frameworks; we randomly construct 3D voxels withvarious sizes in training scenes, and compute sampling maps withboth sparse and dense photons to obtain many training pairs (pleaserefer to Sec. 8 for details of data generation and training).Note that, our learning-based sampling map reconstruction mod-ule can potentially be applied in many existing path-guiding frame-works (like [Jensen 1995; Vorba et al. 2014; Müller et al. 2017; Rathet al. 2020]), and improves the traditional sampling distribution re-construction modules. In this work, we present a new framework(Sec. 7) with adaptive spatial partitioning which iteratively buildssampling maps using our neural network in a hierarchical grid forpath guiding.

In this section, we introduce our novel path guiding framework thatleverages our presented deep network to reconstruct high-qualitysampling maps in a hierarchical grid. Our full framework is illus-trated in Algorithm 1. As shown in Algorithm 1, we first initializea grid (Line 1) and then utilize an iterative process (Line 2 ∼

39) tobuild a hierarchical grid with per-voxel sampling maps for pathguiding and rendering. In each iteration, we trace camera paths(Line 3 ∼ ∼

11) when tracing,and they are used to detect valid voxels (Line 12) and compute themixture weight of one-sample MIS (Line 17 ∼ ∼

35) per iteration; in each valid voxel, we accumulatephoton energy (Line 29) that is required by our network and alsocollect other photon statistics for subdividing the hierarchical grid(Line 30 ∼ ∼ (cid:57)(cid:68)(cid:79)(cid:76)(cid:71)(cid:3)(cid:89)(cid:82)(cid:91)(cid:72)(cid:79)(cid:11)(cid:6)(cid:83)(cid:68)(cid:87)(cid:75)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:72)(cid:3)(cid:33)(cid:3)(cid:19)(cid:12)(cid:11)(cid:20)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:76)(cid:81)(cid:74)(cid:3)(cid:80)(cid:68)(cid:83)(cid:12) (cid:57)(cid:68)(cid:79)(cid:76)(cid:71)(cid:3)(cid:86)(cid:88)(cid:69)(cid:16)(cid:89)(cid:82)(cid:91)(cid:72)(cid:79)(cid:86)(cid:11)(cid:6)(cid:83)(cid:68)(cid:87)(cid:75)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:72)(cid:3)(cid:33)(cid:3)(cid:19)(cid:12)(cid:11)(cid:24)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:76)(cid:81)(cid:74)(cid:3)(cid:80)(cid:68)(cid:83)(cid:86)(cid:12) (cid:51)(cid:75)(cid:82)(cid:87)(cid:82)(cid:81)(cid:38)(cid:68)(cid:80)(cid:72)(cid:85)(cid:68)(cid:83)(cid:68)(cid:87)(cid:75)(cid:86) (cid:47)(cid:76)(cid:74)(cid:75)(cid:87)(cid:3)(cid:83)(cid:68)(cid:87)(cid:75)(cid:86) (cid:44)(cid:81)(cid:89)(cid:68)(cid:79)(cid:76)(cid:71)(cid:3)(cid:89)(cid:82)(cid:91)(cid:72)(cid:79)(cid:11)(cid:6)(cid:83)(cid:68)(cid:87)(cid:75)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:72)(cid:3)(cid:32)(cid:3)(cid:19)(cid:12)(cid:11)(cid:19)(cid:3)(cid:86)(cid:68)(cid:80)(cid:83)(cid:79)(cid:76)(cid:81)(cid:74)(cid:3)(cid:80)(cid:68)(cid:83)(cid:12) (cid:168) (cid:7508) (cid:6874) Fig. 4. Our proposed hierarchical grid spatial caching structure. The pathsamples are used to detect valid voxels to store sampling maps. A voxelis subdivided into a binary tree based on the local photon statistics (redpoints with energy ΔΦ 𝑝 ). In this example, there are 3 coarse-level voxels inthe regular grid (from left to right). The left invalid voxel does not receiveany path sample thus no sampling maps or photons are stored. The middlevalid voxel stores one sampling map from accumulated photon energies.The right valid voxel gets refined by subdivision and stores 5 sampling maps,one for each sub-voxel. samples are also used for computing the sampling maps in eachvoxel (see Sec. 7.2 and blue blocks in Algorithm 1) to guide thetracing of the paths in the following iterations; the path samplesare also used for rendering and computing the weight 𝛼 for one-sample MIS (see Sec. 7.3 and red blocks in Algorithm 1). After a totalnumber of 𝑇 iterations, we do a final path tracing pass (see Sec. 7.4and the purple block in Algorithm 1) with 𝑁 𝑓 spp to render theimage. The final rendering result is computed from all path samplesin the iterations (except for the first iteration that is not guided)and the final pass. Note that, we double the number of paths andphotons after each iteration, so that both the quality of the input rawsampling maps and final rendering can be progressively improved;this leads to 2 𝑡 𝑁 𝑐 spp paths and 2 𝑡 𝑁 𝑝 photon rays for iteration 𝑡 ,where 𝑁 𝑐 is the initial spp and 𝑁 𝑝 is the initial number of photonrays in the first iteration. Since a pure uniform spatial structure (often achieved by spatialcache points in early works [Jensen 1995; Vorba et al. 2014]) is veryexpensive and impractical for large-scale scenes, recent works oftenutilize a KD-Tree [Müller et al. 2017] to adaptively partition thespace, starting from a single root node that covers the entire scene.This coarse-to-fine spatial structure is effective, and, in fact, alsonecessary for many pure online-learning approaches [Guo et al.2018; Rath et al. 2020], since they need to use a large number of(path) samples that can be only acquired in a large spatial region atan early stage. In contrast, our deep learning based approach can re-construct a high-quality sampling map from a sparse set of photons;consequently, starting from a highly coarse spatial partitioning isunnecessary and also even inefficient for our approach. Therefore,we propose to use a hierarchical grid for spatial partitioning, whichcombines uniform and adaptive spatial partitioning (Fig. 4).

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :8 • Zhu et al.

An initial regular grid.

We start from a regular grid, uniformlydividing the entire scene at a relatively coarse level (see the threecoarse voxels in Fig. 4), as the initial spatial structure (Line 1 inAlgorithm 1); the initial grid is still coarse but relatively much denserthan a shallow KD-Tree used in early stages in previous work [Mülleret al. 2017]. This regular grid enables reconstructing more locallyrepresentative sampling maps, leading to good path guiding qualityeven at the first iteration in our framework, which fully utilizes thebenefits of our offline trained deep neural network. Starting fromthis regular grid, we iteratively sub-partition each grid voxel into alocal KD-tree (Line 30 ∼

33 in Algorithm 1), leveraging the statisticalinformation of per-iteration paths and photons; this results in ahierarchical grid that adaptively covers the scene and we reconstructthe sampling maps per voxel in each iteration accordingly.

Detecting valid voxels using paths.

While we can compute a sam-pling map for every voxel in the grid for path guiding, this is usu-ally costly and in fact unnecessary, since many voxels may notbe reached by any paths from the viewpoint when rendering alarge scene. Therefore, we leverage the per-iteration camera pathsto detect which voxels are necessary for rendering this viewpoint(Line 12 in Algorithm 1). Specifically, when tracing the 2 𝑡 𝑁 𝑐 spppath samples in each iteration, we mark a voxel (that hasn’t beenmarked before) as a new valid voxel, if there is at least one bouncepoint of the paths located in the voxel (see the two valid voxels inFig. 4). In other words, we only consider a voxel for sampling mapreconstruction and further spatial partitioning when it is known tobe necessary (at least likely necessary) in rendering in the followingiterations. This avoids the waste of reconstructing many unneces-sary sampling maps and local sub-KD-trees. Once a voxel is markedas valid, we start accumulating photons in the voxel for samplingmap reconstruction and further subdivision of the voxel. Voxel subdivision.

It is not efficient to use a regular grid for spatialpartitioning, since various local spatial regions may involve highlydiverse geometry, appearance, and lighting distributions. Therefore,we iteratively subdivide the initial regular grid into a hierarchicalgrid, where a voxel is divided into a binary tree similar to a local KD-tree if necessary (Line 30 ∼

33 in Algorithm 1). Our hierarchical gridis built to adapt to the complexity of local geometry and incidentlight fields. In the very beginning iterations, we trace small numbersof light paths and photon data is temporarily stored in each voxel.We leverage the statistics of accumulated photons in the currentiteration in each valid voxel for the voxel’s possible subdivision. Inparticular, for each valid voxel 𝑗 , we consider 𝑀 𝑗 – the total numberof photons hitting the voxel through iterations – and 𝑉 𝑛 𝑗 – the vari-ance of the surface normals at the photon hitpoints. A voxel is splitinto two sub-voxels by the middle of photon positions along an axis(just like KD-tree construction), if 𝑀 𝑗 > 𝑀 thr or 𝑉 𝑛 𝑗 > 𝑉 𝑛 thr , where 𝑀 thr and 𝑉 𝑛 thr are two predefined thresholds and we recursivelyapply our subdivision criterion to sub-voxels (see the right voxel inFig. 4). Once a voxel is subdivided, its two sub-voxels are kept asvalid, accumulating photons from the current iteration and waitingfor photons in following iterations to reconstruct sampling maps.These simple photon statistics are easy to compute, enabling effi-cient subdivision. This photon-based subdivision process subdivides voxels that either have complex light fields (dense photons) or com-plex geometry (large normal variation). Our method allows thesecomplex voxels to utilize more local and accurate sampling maps inthe following iterations, thus leading to more accurate renderings. Apart from determining the subdivision in the hierarchical grid,the main goal of tracing the per-iteration photons is to reconstructthe per voxel sampling maps for path guiding. For any valid voxel(marked by camera paths), we accumulate photon energies to com-pute the energy map of the voxel (Line 29 in Algorithm 1), as isexpressed by Eqn. 9. The energy map records the sum of the energiesof all hitting photons ΔΦ 𝑝 in the voxel through the current and allprevious iterations. The per-pixel accumulated energy Φ 𝑗,𝑘 in anenergy map will be normalized, which leads to a raw sampling map 𝑆 𝑒,𝑡 that is sent as input to the network to reconstruct the samplingmap in iteration 𝑡 . As discussed in Sec. 6, we also provide additionalinput buffers (photon count, previous raw sampling map, and binarymask) for the network. Specifically, we record the number of accu-mulated photons and also keep the raw sampling map and numberof photons in the previous iteration to construct the network input.After tracing all photons in an iteration, we collect all valid voxelsthat have new photons arrived and reconstruct the sampling maps 𝑆 𝑑 using our deep neural network for path guiding (Line 37 in Algo-rithm 1). As mentioned, we exponentially increase the photon countper iteration with a base of 2, similar to the growth of path samplesby Müller et al. [2017], so that the number of photons consumedby the input sampling map is roughly doubled after each iteration.Once a sampling map is reconstructed at a voxel in one iteration,the map is used in the following iterations and the final path tracingpass to guide the path sampling in the voxel. In any iteration, if a path hits a voxel that doesn’t have a samplingmap, we just use standard BSDF sampling at the bounce point (Line 8in Algorithm 1); such a voxel is usually still an invalid voxel, whichwill be marked as valid and start accumulating photons immediatelyin the same iteration, allowing for path guiding in the followingiterations. On the other hand, once a path ray hits a valid voxel thathas a reconstructed sampling map, path guiding can be achieved bydoing importance sampling on the sampling map (where a CDF isbuilt via a fast cumulative sum over pixels on GPUs, just like sam-pling an environment map). Since our sampling map only considersthe incident radiance (and a cosine term), we apply a one-sampleMIS similar to previous works to combine guided sampling andBSDF sampling (Line 10 in Algorithm 1), as discussed in Eqn. 4. Thecombined sampling strategy however requires a parameter 𝛼 thatdetermines how often either sample strategy is selected. Usually, 𝛼 = . 𝛼 that is learned via online optimiza-tion ([Müller 2019]) is also presented for better performance butrequires expensive online training. ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:9

We present a heuristic 𝛼 computation technique, based on pathstatistics (Line 17 ∼

20 in Algorithm 1); though simple, it results in ef-fective per-voxel 𝛼 𝑗 in practice for high-quality path guiding. In par-ticular, we initially use 𝛼 𝑗 = . 𝛼 𝑗 . And once afull path is constructed, we compute the actual sub-path contribu-tion (often known as throughput, Line 17 in Algorithm 1) for everybounce point 𝑏 on the path as 𝜈 𝑏 = 𝐿 ( 𝒙 𝑏 , 𝜔 𝑏 ) cos 𝜃 𝑏 𝑓 𝑟 ( 𝒙 , 𝜔 𝑏 , 𝜔 𝑜 ) ,where 𝒙 𝑏 is the position of the bounce point, 𝜔 𝑏 is the sampleddirection (that can come from either BSDF or guided sampling), 𝐿 ( 𝒙 𝑏 , 𝜔 𝑏 ) is computed by consecutively multiplying the light ra-diance, BSDFs, and inversed sampling PDFs through all followingbounce points as in a standard Monte Carlo path sample. Mean-while, for each voxel 𝑗 , we accumulate all bounce contributions 𝜈 𝑏 (of the bounces that are in the voxel, i.e., 𝒙 𝑏 ∈ Δ 𝐴 𝑗 ) in 𝜈 𝑗, B and 𝜈 𝑗, G ,according to from which distribution 𝜔 𝑏 is sampled (Line 18 in Algo-rithm 1). Specifically, 𝜈 𝑗, B records the sum of all path contributions 𝜈 𝑏 if its direction 𝜔 𝑏 is sampled by BSDF sampling, and 𝜈 𝑗, G recordsthe sum of 𝜈 𝑏 if 𝜔 𝑏 is sampled by guided sampling. We also recordthe numbers of bounces sampled by the two sampling strategies as 𝑄 𝑗, B and 𝑄 𝑗, G (Line 19 in Algorithm 1) in each valid voxel. Once 𝑄 𝑗, B ≥

50 and 𝑄 𝑗, G ≥

50 sub-paths are sampled in a valid voxel 𝑗 (Line 20 in Algorithm 1), we use the ratio of the averaged 𝜈 𝑗, B and 𝜈 𝑗, G to determine the mixing weight 𝛼 𝑗 for following path guiding: 𝛼 𝑗 = 𝜈 𝑗, B 𝜈 𝑗, B + 𝜈 𝑗, G , (13)where 𝜈 𝑗, B = 𝜈 𝑗, B / 𝑄 𝑗, B and 𝜈 𝑗, G = 𝜈 𝑗, G / 𝑄 𝑗, G . Correspondingly, ourone-sample MIS is expressed by: 𝑝 MIS ( 𝜔 𝑖 ) = 𝜈 𝑗, B 𝜈 𝑗, B + 𝜈 𝑗, G 𝑝 BSDF ( 𝜔 𝑖 ) + 𝜈 𝑗, G 𝜈 𝑗, B + 𝜈 𝑗, G 𝑝 guide ( 𝜔 𝑖 ) . (14) We set 𝛼 𝑗 = . 𝛼 𝑗 between 0.2and 0.8 otherwise. This heuristic mixing weight considers the datathat reflects the actual performance of BSDF sampling and guidingsampling, leading to effective one-sample MIS sampling in our pathguiding. Our learning based approach is able to reconstruct high-qualitysampling maps from very sparse photons, leading to efficient guidedpath tracing even in early iterations. The first iteration paths arenot guided at all since there are no sampling maps reconstructedyet. However, thanks to our deep neural network, our path guidingis often of very good quality starting from the second iteration.We therefore leverage all path sampling starting from the seconditeration for rendering the final image.While we can keep iteratively tracing more photons and refiningour sampling maps, we find that our reconstructed sampling mapsare often of very high quality after 𝑇 = ∼ 𝑇 = ∼ 𝑁 𝑓 spp as needed. This is called the final Fig. 5. Example scenes used for training our proposed neural network. path tracing pass in our framework (Line 40 in Algorithm 1). Ourfinal rendered image is computed from all path samples traced inall 𝑇 iterations and the final path tracing pass. Dataset generation and neural network training.

We create a largescale dataset to train our sampling map reconstruction network.Our dataset consists of both designed scenes and auto-generatedscenes as shown in Fig. 5. We first collect available online scenesdesigned by researchers or artists, by collecting several releasedscenes from previous work and purchasing scenes from several web-sites [Bitterli 2016; Jakob 2010; Evermotion 2012; Trader 2020; Squid2020; Blend Swap 2016]. This leads to 32 designed scenes in total,including multiple realistic indoor and outdoor scenes; we use 20from them in our training set and the rest for testing our algorithm.To enhance the generalizability of our network, we further enlargeour training set by auto-generating many more scenes. In particular,we procedurally generate 500 scenes using randomized shape prim-itives, materials, and area lights, similar to [Zhu et al. 2020; Xu et al.2018]. We also leverage a complex lighting dataset [Gardner et al.2017] and randomly select an environment map for each generatedscene as its additional illumination. This auto-generation processlargely increases the diversity and complexity of our training scenes,leading to better generalization on novel testing scenes.We reconstruct sampling maps with the same resolution of 128 ×

64. As expected, if memory allows, a higher resolution of samplingmaps often leads to better rendering quality. While our renderingquality degrades with a lower resolution, we find that, even using64 ×

32 sampling maps, our method can still outperform previousstate-of-the-art methods (see Fig. 10). Our network aims to recon-struct a sampling map of a local 3D voxel. We partition the spaceof each training scene uniformly using a regular grid with a ran-dom resolution ranging from 50 to 200 . This makes our networkgeneralize to various voxel sizes, naturally enabling high-qualitysampling map reconstruction for any voxel at any depth in a hierar-chical grid. To further augment the data, we also randomly rotate ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :10 • Zhu et al.

Component Path(%) Photon(%) Neuralrec (%) Path(%) Time(min)Algorithm 1 LN 3 ∼

24 LN 25 ∼

35 LN 36 ∼

38 LN 40 /Device CPU CPU GPU CPU /Phase iterative process (when 𝑡 < 𝑇 ) final /Caustics Egg 13.91 21.83 8.36 55.88 4.0Veach Ajar 14.58 21.08 5.76 58.56 18.0Bathroom 15.42 11.86 9.77 62.93 5.0Hotel 15.05 18.58 5.89 60.46 20.0Staircase 15.79 15.77 5.01 63.41 11.0Living Room 16.42 11.73 5.89 65.94 11.0Spaceship 16.49 9.20 8.06 66.23 3.0Classroom 15.33 16.68 6.39 61.57 13.0Wild Creek 17.05 8.41 6.02 68.50 10.0Torus 15.72 12.78 8.32 63.16 4.0Kitchen 14.35 19.06 8.94 57.63 4.0Pool 16.78 8.22 7.57 67.40 4.0 Table 1. Running time. Percentages of running time of different componentsin the proposed system are shown in the table for different testing scenes.The total rendering time for each scene is also shown in the rightmostcolumn. The time distribution varies depending on the scene complexityand light setup. the world coordinate frame when partitioning. We trace photonsin each scene and compute sampling maps based on Eqn. 9 usingboth sparse and dense photons, which constructs the input andground-truth training pairs. The total number of training pairs inour dataset is about 10.5 million. To make the network generalizewell on different iterations in our path guiding, for each trainingscene, we randomly select an iteration number 𝑡 from 1 to 12, andcompute the corresponding input sampling map using the photonsgenerated by the 2 𝑡 𝑁 𝑝 light paths. On the other hand, the ground-truth sampling map of each voxel is computed by accumulatingphotons generated through 20 iterations for each scene.During rendering, the number of photons in different voxels canbe highly different (from several to several thousand), leading tohighly diverse input distributions; we therefore train multiple net-works as a mixture of experts[Jacobs et al. 1991], and make eachnetwork focus on a certain range of input photon numbers in a voxel.Specifically, we train five networks separately and the correspond-ing ranges of photon numbers are [ , ) , [ , ) , [ , ) , [ , ) , [ , ∞) . This enables better reconstruction qualitycompared to using a single network for all cases. And since our net-works are very compact (several MBs), using five different networksdoes not lead to any memory issues. We implement our networksusing PyTorch. During training, we use mini-batches with a size of50 and train each network using ADAM [Kingma and Ba 2014] witha learning rate of 1 . × − . Our network generally converges to avery good optimum after 500K epochs, taking about a week using 8Nvidia RTX 2080Ti GPUs. Path guiding details.

We use 𝑁 𝑐 = 𝑡 𝑁 𝑐 = 𝑡 spp paths for iteration 𝑡 . We also correspond-ingly trace the same number of light paths ( 𝑁 𝑝 thus equals to the number of pixels) per iteration for distributing photons. The initialregular grid is implemented as a hash grid that can be accessed in 𝑂 ( ) time. Each sub binary tree is like a local KD-tree that can beaccessed in 𝑂 ( log ( 𝑛 )) time. Our final hierarchical grid is a hybridspatial structure and can thus be quickly accessed at rendering time,enabling highly efficient path guiding. Since our spatial structure isadaptively constructed, our method is not very sensitive to the res-olution of the initial grid, and we use a resolution of 100 for all thetesting scenes. For voxel subdivision, we use an iteration-dependentthreshold for the photon count, given by 𝑀 thr = 𝑐 · √ 𝑡 similar to[Müller et al. 2017], where 𝑡 is the iteration number and 𝑐 is a scalarparameter. We find 𝑐 = ∼

800 performs similarly in practice,and we use 𝑐 =

500 for all our testing experiments. The normalvariance threshold is set to 𝑉 𝑛 thr = .

5. We also set the maximumdepth of a local KD-tree to 8, which already corresponds to a veryfine grid and avoids unnecessarily detailed subdivision. In practice,we also only proceed to voxel subdivision in the first two iterations,which results in a reasonable hierarchical grid.We use a high-end machine with Intel Core i9-7960X CPU andNvidia Titan RTX GPUs for rendering our testing scenes. Our frame-work is implemented in the standard rendering engine Mitsuba[Jakob 2010], and we use the PyTorch C++ API [Paszke et al. 2019]at rendering time for sampling map reconstruction on GPUs. ThisMitsuba based implementation ensures a fair comparison with previ-ous methods, most of which are also implemented with Mitsuba. Inparticular, we only use GPUs to do network inference for samplingmap reconstruction, while all other parts of the algorithm (includingpath tracing, photon tracing, ray sampling, radiance computation,spatial grid construction, etc.) are done on the CPU as in the stan-dard Mitsuba renderer. The CPU and the GPU parts are run in asequence in our implementation. We believe this is a fair enoughsetting when comparing with traditional pure CPU-implementedpath guiding methods that do not use neural networks. In fact, ourGPU computation time is only about 10% of the total running time;please refer to Tab. 1 for detailed running times for each of our test-ing scenes. In the future, a more efficient implementation in practicecan be done by making the GPU part run in parallel with the CPUpart or even implementing a pure GPU-based framework leveraginghardware ray tracing in modern GPUs [Parker et al. 2010].

We now present extensive experiments to evaluate our path guidingapproach. We first evaluate the rendering quality of our method bycomparing against various state-of-the-art path guiding methodsquantitatively and qualitatively. We then investigate sub-componentsin our system to justify their effectiveness. Some additional evalua-tion results can be found in the supplementary material.

Configuration.

We evaluate our method comprehensively on 12realistic testing scenes; the corresponding images of these scenescan be found in Fig. 6, 7, 8, 10 and 11. These testing scenes includechallenging indoor and outdoor cases with complex global illumina-tion, covering a wide range of scene complexity and diversity. Eachscene contains both direct and indirect illumination. For indoorscenes with outside environment map illumination, we provide thewindow geometry for sampling light paths from the environment

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:11

Scene/Method PT [Bakoet al.2019] [Vorbaet al.2014] [Mülleret al.2017] [Rathet al.2020]

Ours

PT [Bakoet al.2019] [Vorbaet al.2014] [Mülleret al.2017] [Rathet al.2020]

Ours

Metric rMSE ↓ SSIM ↑ Caustics Egg 0.3187 0.1353 0.0462 0.0311 0.0121 0.0052 0.1017 0.1824 0.3472 0.4581 0.7006 0.8242Veach Ajar 0.3684 0.2585 0.0154 0.0073 0.0047 0.0011 0.0474 0.0898 0.4579 0.5455 0.6325 0.8572Bathroom 0.0610 0.0403 0.0204 0.0249 0.0142 0.0050 0.4481 0.4725 0.5472 0.5260 0.5924 0.7427Hotel 0.4176 0.2607 0.2838 0.0812 0.0792 0.0276 0.0695 0.1155 0.0914 0.2665 0.2801 0.4378Staircase 0.0176 0.0183 0.0110 0.0045 0.0038 0.0013 0.4810 0.4957 0.6513 0.7337 0.8626 0.8951Living Room 0.1928 0.1553 0.0235 0.0468 0.0416 0.0060 0.1360 0.1719 0.4734 0.2960 0.3327 0.6576Spaceship 0.2212 0.0914 0.0198 0.0716 0.0389 0.0137 0.5610 0.7476 0.8611 0.7452 0.8124 0.8793Classroom 0.0733 0.0514 0.0124 0.0085 0.0038 0.0021 0.2789 0.3037 0.5756 0.6352 0.7681 0.8234Wild Creek 0.1425 0.1100 0.0560 0.0618 0.0549 0.0382 0.3023 0.3734 0.4890 0.4852 0.5386 0.6222Torus 0.0511 0.0425 0.0150 0.0015 0.0015 0.0005 0.2610 0.6660 0.7864 0.9150 0.9300 0.9529Kitchen 0.0644 0.0578 0.0249 0.0063 0.0035 0.0030 0.3898 0.4173 0.4655 0.6753 0.7873 0.8168Pool 0.1175 0.0528 0.0026 0.0025 0.0016 0.0011 0.2264 0.4595 0.8551 0.8598 0.9364 0.9510

Table 2. Quantitative comparison. We compare our results and the results of [Bako et al. 2019; Vorba et al. 2014; Müller et al. 2017; Rath et al. 2020] with equalrendering time. We show the corresponding rMSEs and SSIMs of the rendered full images of the 12 testing scenes. Red, orange, and yellow denote the best,the second-best, and the third-best method in terms of rMSE (lower is better) and SSIM (higher is better). Our method achieves the best results on all testingscenes. The total rendering time for each scene is presented in Tab. 1. map, facilitating the photon tracing process in these scenes. Forour method, the required time to achieve good rendering qualityranges from 3 to 20 minutes (depending on scene complexity) onthese testing scenes. We demonstrate equal-time comparisons bycomparing with four state-of-the-art path guiding methods [Bakoet al. 2019; Müller et al. 2017; Vorba et al. 2014; Rath et al. 2020] onall testing scenes; we also show the corresponding equal-qualityrendering time on a few scenes. In the comparisons, we directly usethe released source code of [Müller et al. 2017], [Rath et al. 2020],and [Vorba et al. 2014], which are all implemented with Mitsuba[Jakob 2010] that runs on CPU. Since there’s no publicly availablesource code of [Bako et al. 2019], we use our own implementationof it with Mitsuba for all experiments. As discussed in Sec. 8, weimplement our method also in Mitsuba, mostly running on CPU forfair comparisons, while only the network inference for samplingmap reconstruction runs on GPU, which only takes about 10% of thetotal running time (see Tab. 1 for detailed timing). Our implementa-tion of [Bako et al. 2019] follows similar CPU and GPU separation,where we run their sampling map reconstruction network on GPUsand run other parts of the algorithm on CPU. All comparisons arerun on the same machine with the same CPU and GPUs (if needed).To better illustrate the effectiveness of path guiding, we turn off theNext Event Estimation (NEE) for our and all comparison methods asdone in previous work [Vorba et al. 2014; Müller et al. 2017]. Com-parison results with NEE turned on are shown in the supplementarymaterial. The ground-truth images are rendered using path tracingwith NEE for 2 to 6 days per scene.

Quantitative and qualitative evaluation.

We now demonstrate thequantitative and qualitative results of our method and compareagainst other methods with equal rendering time. For quantitativeevaluation, we use the relative Mean Squared Error (rMSE, as used in[Rath et al. 2020]) and the perceptually-based Structural SimilarityIndex (SSIM, as used in [Bako et al. 2019]) as metrics. Table. 2 shows the quantitative results of rMSEs and SSIMs of the full images of all12 testing scenes. The corresponding percentages of running timeof sub-components are shown in Tab. 1. In most cases, the path andphoton tracing on CPU take more than 90% of the entire systemrunning time, and we only spend a small amount of time (10%) onrequesting GPU resources for neural sampling map reconstruction.Our method achieves the best quantitative results with the lowestrMSEs and highest SSIMs on all testing images. Note that ours is ableto lower the rMSEs of the best comparison methods by more than50% in many challenging scenes like Caustics Egg, Veach Ajar,Bathroom, Hotel, Staircase, Living Room, and Torus. Theseresults demonstrate the high effectiveness and efficiency of ourmethod, which is significantly better than all comparison methods.To illustrate the details of our results, we also show quantitativeand qualitative comparisons on multiple crops of the rendered im-ages in Fig. 1, 6, 7 and 8. Our results are of the highest visual qualityin these figures, which can also be reflected by the lowest rMSEs ofall the comparison image crops.Note that the two unidirectional guiding methods [Müller et al.2017; Rath et al. 2020] are usually the best two of all four comparisonmethods on these testing cases. They utilize an adaptive tree astheir spatial partitioning, which is more efficient than the uniformcache points used in [Vorba et al. 2014], leading to much betterrendering quality in most testing scenes despite the fact that [Vorbaet al. 2014] is bidirectional. However, it can be highly challenging forunidirectional methods to discover high-energy paths, when a sceneinvolves complex specular-diffuse interactions (like those in Fig. 6that contain many reflective and refractive objects) or other strongglobal illumination effects (like in Fig. 8). Therefore, [Vorba et al.2014] sometimes achieves better results than the unidirectional ones,like the results of Spaceship and Living Room, since it leveragesphotons from light paths that ease the process of light discovery.In contrast, our approach also leverages an adaptive spatial struc-ture and our novel hierarchical grid enables finer spatial partitioning

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :12 • Zhu et al.

Egg Müller et al.Bako et al. Rath et al. ReferencePath tracer

Ours

Vorba et al.

Wild Creek

Spaceship E q u a l - t i m e ( m i n ) E q u a l - t i m e ( m i n ) E q u a l - t i m e ( m i n ) Fig. 6. Qualitative and quantitative comparison with equal rendering time. These scenes contain many transparent surfaces and involve complex specular-diffuse interactions; Photon-based methods have a natural advantage over path samples in this case. We show zoomed-in crops with rMSEs in the figure andcompare with the results of [Vorba et al. 2014], [Müller et al. 2017], [Bako et al. 2019] and [Rath et al. 2020]. Corresponding equal rendering time for eachscene is also listed. Our method achieves the best visual quality and the lowest rMSEs in these challenging cases. than [Müller et al. 2017; Rath et al. 2020] in early iterations. Mean-while, our deep learning based method can reconstruct high-qualitysampling maps from sparse photons; this enables high-quality pathguiding in our finer spatial partitioning from the first through alliterations, avoiding the slow starting of those online learning meth-ods and leading to highly efficient rendering. Our approach purelyrelies on photons to reconstruct sampling maps, which is effectivein general and also highly efficient for challenging scenes that aredominated by indirect lighting. Thanks to our deep neural networksand our efficient spatial partitioning, our approach utilizes photonsin a way that is much more efficient than previous work [Vorba et al.2014]. Our photon-driven neural path guiding approach enableshigh-quality rendering results that are significantly better than allprevious unidirectional and bidirectional guiding methods.[Bako et al. 2019] is a recent deep learning approach that firstleverages an offline trained network for unidirectional path guiding;yet their method can only guide the first bounces and leads to theworst results in most testing cases. As shown in their paper, thistechnique can be effective for lowering the initial severe MC noisewith sparse path samples, especially on scenes with strong direct illumination. However, such a first-bounce technique is not veryeffective for scenes with strong indirect illumination; the benefits ofits offline learning also become more limited through longer render-ing, once other traditional multi-bounce techniques get enough pathsamples online. In contrast, our method is the first offline deep learn-ing method that enables multi-bounce path guiding. Our approachtakes full advantage of an offline trained network and successfullymodels the incident light field at any local regions in a scene, en-abling significantly better rendering quality than [Bako et al. 2019]and all other traditional multi-bounce guiding techniques.

Equal-quality comparison.

In addition to the equal-time compari-son, we also compare the time spent to achieve the results of similarquality on some highly challenging scenes shown in Fig. 8; the cor-responding rendering time (compared to our time) of each methodis listed, for achieving the same rMSE (with a threshold of 10 − indifference) of the full image as our method (corresponding to therMSEs shown in Tab. 2). We can see that our method can signifi-cantly speed up the naive path tracing with the rendering speedthat is several tens of times faster. Moreover, the fastest comparison ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:13

Bathroom Müller et al.Bako et al. Rath et al. ReferencePath tracer

Ours

Vorba et al.

Kitchen

Staircase E q u a l - t i m e ( m i n ) E q u a l - t i m e ( m i n ) E q u a l - t i m e ( m i n ) Fig. 7. Qualitative and quantitative comparison with equal rendering time. These scenes contain complex indoor lighting, lit by ∼ area light sources withdifferent shapes. Our deep learning based approach enables accurate sampling map reconstruction for the complex direct and indirect lighting, leading toefficient rendering. We show zoomed-in crops with rMSEs in the figure. Corresponding equal rendering time for each scene is also listed. Our method achievesthe best visual quality and the lowest rMSEs in these challenging cases. methods for these scenes still require at least two times the render-ing time as our method does. Our approach significantly reducesthe required amount of time to achieve realistic rendering. Sampling map reconstruction.

The core of our path guiding ap-proach is our deep learning based sampling map reconstruction. Weshow examples of our reconstructed sampling maps, correspondinginputs and the ground-truth in Fig. 3; more examples are provided inthe supplementary material. Note that our method can consistentlyimprove the reconstruction quality through iterations. Even at thesecond iteration, when the input is extremely noisy, our networkcan still denoise the input and recover a full sampling map that hasmany details and is very close to the ground-truth. We also showadditional comparison with using a simple U-Net for sampling mapreconstruction in the supplementary material. This high-qualitysampling map reconstruction allows for highly efficient path sam-pling when rendering.To further justify the effectiveness of our network, we comparewith only using the raw input sampling map (without the networkreconstruction) for path guiding in Fig. 10. We also compare witha version that reconstructs sampling maps at a lower resolution of 64 ×

32 (we use 128 ×

64 by default as mentioned in Sec. 8). Theresults of [Müller et al. 2017] and [Rath et al. 2020] (which generallyperforms the best among all comparison methods as stated) arealso shown in the figure to better understand the position of theseversions of our method with reduced or degraded components. Notethat, our method without the network can already achieve com-parable rendering quality compared to previous methods in somecases. And for Pool, our method without network reconstructioncan even perform better than [Müller et al. 2017]; this is becauseusing photons is highly effective for such a scene, involving complexspecular-diffuse interactions. This example clearly demonstrates thebenefit of leveraging photons. The neural network in our frameworkcan significantly improve the rendering quality achieved withoutthe network. Our full model achieves the best visual quality and thelowest rMSE in these testing scenes. Also note that, while worsethan our final model, our method with a lower resolution of sam-pling maps can already outperform the comparison methods andthe one without the network. This demonstrates the high recon-struction quality of our network. This also shows that our method

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :14 • Zhu et al.

Veach Ajar Müller et al.Bako et al. Rath et al. ReferencePath Tracer

Ours

Vorba et al.

Living Room

Hotel E q u a l - t i m e ( m i n ) Equal-quality (full-img) E q u a l - t i m e ( m i n ) E q u a l - t i m e ( m i n ) Equal-quality (full-img)Equal-quality (full-img) ~253.67x ~170.53x ~12.33x ~6.45x ~4.31x~31.89x ~23.24x ~2.91x ~7.31x ~5.85x~14.88x ~9.22x ~8.54x ~2.95x ~2.83x ~1x (18min)~1x (11min)~1x (20min)

Fig. 8. Equal-time and equal-quality comparison. Similar to Fig. 6 and 7, we do qualitative and quantitative equal-time comparisons on crops of the finalrenderings on these challenging scenes. Our method achieves better qualitative and quantitative results given equal rendering time. In addition, we also showequal-quality rendering time comparison. In particular, we list the corresponding rendering time (expressed by the scale to our time) of each method forachieving the same rMSE (that our method achieves in the equal-time comparison, shown in Tab. 2) of the full image. Note that, our method takes significantlyless time; the fastest comparison method still requires more than two times the rendering time as our method for each scene.

Full-img R e f e r e n ce Regular grid

Adaptive grid (Ours)

Fig. 9. The effect of initial grid resolution and our hierarchical spatial par-titioning framework. Ideally, the voxel size should be small enough to re-flect the locality of the incident radiance, and large enough to containenough photons for sampling map reconstruction. In the extreme case,under-partitioning and over-partitioning will both hurt the performance. generalizes well on different sampling map resolutions, though ahigher resolution often leads to higher quality.

Hierarchical grid.

We now investigate our presented spatial struc-ture - the hierarchical grid. We show rMSEs of images rendered withdifferent resolutions for the initial regular grid in Fig. 9. We alsoshow corresponding results using only a regular grid without theadaptive partitioning inside voxels. Note that, without the adaptivepartitioning, rendering quality varies drastically across different res-olutions, since a low-resolution grid lacks expressibility of complexlight fields in the scene and a high-resolution grid does not haveenough photons in each voxel. On the contrary, our hierarchicalgrid is more stable with different resolutions, since it is able to adap-tively subdivide the initial grid to a desired resolution locally. Ourhierarchical grid also consistently enables better rendering qualitythan a regular grid at the same initial resolution.

Temporal stability.

We also evaluate the temporal stability ofour method. In particular, we use the DSSIM (i.e., dissimilarity asused in [Vogels et al. 2018]) between consecutive frames with amoving camera to express the temporal stability. Figure 11 showsthe DSSIMs of our method and other comparison methods. Since

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:15

PoolClassroom E q u a l - t i m e ( m i n ) Equal-quality (full-img) E q u a l - t i m e ( m i n ) Reference

Müller et al. Rath et al.Path Tracer

Ours

Ours-noNeural Ours-lowRes ~3.07x ~2.86x~1.54x0.0178 0.0076 0.0207 0.0061 ~3.79x ~1.77x ~4.39x ~1.56x

Equal-quality (full-img) ~1x (4min)~1x (13min)

Fig. 10. We study the effectiveness of the neural reconstruction module. We compare our full model with a version without neural sampling map reconstructionand a version that uses a lower resolution ( × ) of sampling maps. We also compare with [Müller et al. 2017] and [Rath et al. 2020] on these results. Weshow crops with rMSEs of the rendered images for each method given equal rendering time. The corresponding equal-quality rendering time to achieve ourfull-image rMSE is also listed. Our final model achieves the best results among all comparison methods. Ours0.2102

DSSIM (30 frames)Torus Path tracerVorba et al. Bako et al.Müller et al. Rath et al. 0.4321 0.41720.3655 0.2979 0.2341 F r a m e F r a m e F r a m e F r a m e F r a m e F r a m e Fig. 11. Average DSSIM (lower is better) computed from 30 consecutiveframes when the camera is moving fast along a direction. The dissimilarityis affected by both the content change as well as the noise level. our renderings are consistently better than those of other methods,our method also achieves the best temporal stability.

Limitations.

We use a regular grid to represent a sampling distri-bution as a standard 2D map (image). This is easy for a deep neural network to process and reconstruct. However, it consumes morememory than the directional quad-tree used in [Müller et al. 2017];the memory also limits the resolution of sampling maps we can re-construct. Nonetheless, we show that our resolution of 128 ×

64 canalready achieve high-quality sampling, and even a lower resolution(like 64 ×

32 as shown in Fig. 10) can also provide better results thanprevious methods. We leave extensions with a sparse directionalrepresentation in a learning framework as future work. Our ap-proach leverages photons to reconstruct sampling maps. However,tracing photons can sometimes be highly inefficient; for example,if a camera is looking at only a small region of a large scene, therecan be a large number of photons that are traced but never reachany valid voxels, leading to very expensive photon tracing. Leverag-ing bidirectional guiding techniques like [Vorba et al. 2014] to alsoguide the photon tracing process can potentially resolve this. Pleaserefer to our supplementary material for an initial extension of ourmethod with photon guiding. Finally, we currently use CPU forrendering and GPUs for neural reconstruction. Although we over-lap data transfers with computation to reduce the synchronizationlatency, integrating our proposed framework to a GPU-based ren-derer like Nvidia OptiX [Parker et al. 2010] may be a better choiceto accelerate the whole system.

10 CONCLUSION AND FUTURE WORK

In this paper, we present the first deep learning-based photon-drivenpath guiding approach. Our approach leverages photons to recon-struct sampling distributions, which is more effective than pureunidirectional (path-driven) methods for challenging scenes thatare dominated by indirect lighting; we propose to use a deep neuralnetwork to regress high-quality sampling maps from low-qualityphoton histograms, enabling highly efficient path guiding using

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :16 • Zhu et al.

Egg Müller et al. Rath et al. ReferencePath tracer

Ours D e n o i s e o ﬀ D e n o i s e o n Müller et al. Rath et al. ReferencePath tracer

Ours

Space- ship

Hotel

Full-img (denoise o ﬀ )Full-img (denoise on) D e n o i s e o ﬀ D e n o i s e o ﬀ D e n o i s e o n D e n o i s e o n Full-img (denoise o ﬀ )Full-img (denoise on)Full-img (denoise o ﬀ )Full-img (denoise on) Fig. 12. Monte-Carlo denoising on the path guiding rendering results. We use the default deep learning based denoiser in Nvidia OptiX 6.5. In general, thedenoiser fills the holes in between pixels and filters out the high-frequency MC noise. The denoised images rendered with our method are more acceptablewithout severe blur or distortion. only sparse photons. To fully utilize the benefits of our network, weintroduce an adaptive hierarchical grid to cache our reconstructedsampling maps across the scene, allowing for efficient path guidingat any spatial location. We demonstrate that our method achievessignificantly better quantitative and qualitative results than variousprevious state-of-the-art path guiding methods on diverse challeng-ing scenes.Our method is the first neural path guiding method that usesan offline trained network and supports guiding at any bounces,whereas previous related techniques either train an online network[Müller et al. 2019] or only guide the first bounce [Bako et al. 2019].We believe our method takes an important step towards makingthe neural path guiding more practical, thus also opening up manyappealing future directions. Our approach leverages local photonstatistics for sampling map reconstruction; an interesting extensionis to also consider some global context and even achieve globalguiding in primary space (like [Müller et al. 2019; Guo et al. 2018]).In addition, our target sampling density function can be potentiallyextended to some advanced distributions like variance-aware [Rathet al. 2020] or product sampling [Herholz et al. 2016] (avoiding theone-sample MIS) for better sampling efficiency. Combining our deeplearning based local sampling reconstruction with reinforcementlearning techniques [Huo et al. 2020] to achieve sampling with aproper reward function could provide more benefits. We leverageheuristic criteria to achieve voxel subdivision in the hierarchical grid; this spatial partitioning process could also be potentially learnedvia a deep neural network in the future. While we purely leveragephotons in our method, combining photons and path samples ina holistic neural path guiding framework is an interesting futuredirection to explore.

ACKNOWLEDGEMENTS

This work was supported in part by NSF grants 1703957 and 1764078,the Ronald L. Graham Chair, two Google Fellowships, an AdobeFellowship and the UC San Diego Center for Visual Computing.

11 SUPPLEMENTARY MATERIAL

In this supplementary material, we provide additional experimentalresults and sampling map visualizations, as well as some discussionsabout potential extensions of our proposed framework. Althoughnot being emphasized in the main paper, these additional studies andevaluations are also very important in the design of a full-fledgedpath guiding system in practice.

Monte-Carlo (MC) rendering algorithms like path tracing are knownto suffer from the slow convergence problem when producing noise-free images [Kajiya 1986; Lafortune 1996]. In recent years MC de-noising has become a very successful approach to reduce pixelvariance, especially those based on neural networks [Bako et al.

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:17

Bathroom Müller et al. Rath et al. ReferencePath tracer

Ours N EE o ﬀ N EE o n Müller et al. Rath et al. ReferencePath tracer

Ours

Veach Ajar

Living

Room

Full-img (NEE o ﬀ )Full-img (NEE on) N EE o ﬀ N EE o ﬀ N EE o n N EE o n Full-img (NEE o ﬀ )Full-img (NEE on)Full-img (NEE o ﬀ )Full-img (NEE on) Fig. 13. The effect of next event estimation (NEE) on the final rendering results. The comparison is equal-time for each row. When turning on the NEE, therendering time increases since we keep a similar total sample count. Results show that NEE greatly improves the results in some cases while it is not veryuseful in some other cases, depending on the sampling map quality in path guiding as well as the levels of light visibility at different scene locations.

In the default experimental setting, we turn off the next event es-timation (NEE) to clearly compare the effects from path guiding(similar to [Müller et al. 2017] and [Vorba et al. 2014]), though inpractice NEE can be effective on some cases for all comparisonmethods. In particular, NEE can help reduce the variance by easingthe search of a light source and improving the sampling map quality.To study how NEE affects the results, we turn on the NEE and keepthe sample count unchanged on multiple test scenes. Results inFig. 13 show that whether NEE is useful or not depends mostly on

Classroom

Ours

ReferenceStd-UNet

Full-img

Gaussian

Fig. 14. Our proposed neural network performs better than a single stan-dard U-Net and traditional Gaussian filter in sampling map reconstruction,leading to lower-variance rendering results. The error curve is clipped forbetter visualization purposes. the light setup. For the Bathroom scene, the glass bulb fixture andstaggered window blinds make the direct connection very hard to

ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :18 • Zhu et al.

Iteration 3 Reference

Egg Iteration 5 Iteration 7 Iteration 9

HotelLiving Room

Veach Ajar

Fig. 15. Additional reconstructed sampling map visualization through learning iterations. The reconstructed sampling maps lead to better path spaceexploration at the beginning and more accurate representations of the incident radiance in the subsequent iterations. succeed; for the Veach Ajar and Living Room scenes, NEE failsand succeeds from time to time depending on the local light visi-bility. As a consequence, the rendering time greatly increases forall methods when NEE is turned on. In fact, our method achievesthe best performance whether NEE is enabled or not, thanks to thehigh-quality reconstructed sampling maps that can capture bothdirect and indirect illumination. We believe the decision to requestNEE or not is highly related to the total timing budget in specificapplications.

As we mentioned in Sec. 6 of the main paper, we design a neural net-work that effectively reconstructs high-quality sampling maps. Todemonstrate the effectiveness of our proposed network architecture,loss function, and our multi-expert inference scheme (Sec. 8 of themain paper), we train a single standard U-Net [Ronneberger et al.2015] with 𝑙 loss function and without the auxiliary features, anduse this simplest network to reconstruct all the sampling maps with-out training multiple versions. In addition, we try a simple Gaussianfilter denoising and choose the best result from a range of variancesfrom 0.01 to 10. Figure. 14 shows the error curve during neural Ours

Reference

Full-img

Ours-stdMIS

Pool

Fig. 16. Our proposed heuristic one-sample MIS scheme performs betterthan the default mixture coefficient 𝛼 = . especially when BSDF impor-tance sampling and guiding have very disparate contributions to the finalimage. network training, as well as a visual comparison on the Classroomscene. Although deep learning based results are both better than theone without applying neural reconstruction, our proposed neuralnetworks can produce a more smooth and lower-noise image. Asshown in the loss curve, the average error of our reconstructedsampling maps is also smaller. The traditional Gaussian filter gives ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. hoton-Driven Neural Path Guiding • 0:19

Ours

Ours-noGPT

ReferenceKitchen

Full-img

Fig. 17. We study the effectiveness of guided photon tracing extension (GPT).The reconstructed sampling maps have lower quality when photons are toosparse since there is not enough information for rebuilding the incidentradiance distribution. In contrast, it is sometimes beneficial to guide tracedphotons into visually important regions. much worse performance since it only adds the same level of blur tothe entire sampling map. We believe our proposed neural networkcan be further compressed by the state-of-the-art network compres-sion methods [Cheng et al. 2018; Deng et al. 2020] and improved bymore advanced architectures in the future.

In Sec. 7.3 of the main paper, we demonstrate a new heuristicpipeline for estimating mixture coefficient 𝛼 in one-sample MISof BSDF sampling and guiding. This is quite useful in some cases,as shown in Fig. 16. For example in the Pool scene, the BSDF sam-pled directions from the floor often fail to find the light sourceand leave the scene permanently, leading to a small contribution tothe final pixel color. In contrast, our heuristic encourages sendingmore guiding samples in those regions based on the statistics ofpreviously traced path samples. And for very glossy surfaces suchas the metal armrest in this scene, we send more BSDF samplessince many guided directions can have very small or zero BSDFvalue. Although the proposed heuristic may be sub-optimal, it isstraightforward to implement and does not introduce extra onlineoptimization overhead. In the future, we believe our heuristic canprovide a good starting point to initialize other methods that try tooptimize 𝛼 in path guiding [Müller 2019; Rath et al. 2020]. We leverage photons in our neural path guiding, which is very ef-fective for dominant indirect lighting. However, photons are onlyuseful when they are visible to the camera and in some cases manywasted photons can be traced. In order to handle some special lighttransport cases where many photons are invisible, we add the guidedphoton tracing module to our system as a simple extension. Simi-larly to [Vorba et al. 2014], we reconstruct the importance samplingmaps from the accumulated path samples in each voxel. These pathsamples are virtual particles (i.e., importons [Peter and Pietrek 1998])containing a value describing with what factor an illumination ata certain location would contribute to the final image. Here, weuse the same pre-trained network for such reconstruction. The re-constructed importance sampling maps are then used for guidingphoton tracing in every learning iteration. We use the Kitchen scene as an example since many emitted photons from outside sun-light cannot land inside the room without guided photon tracing,unless they are explicitly programmed to do so by manually pro-viding the location of windows as our default experimental setting.Figure.17 shows the reduced variance in regions that are indirectlyilluminated by the white sunlight and comparable results on regionsthat are lit by the indoor orange-color lights. Apart from these spe-cial cases, it is not always necessary to add this extra module formost common lighting setups created by the lighting artists whenmost photons are visible to the camera. Besides, our neural networkis trained to properly handle the input maps with multiple levels ofsparsity so our system can work well as long as the photons are nottoo sparse.

Some additional sampling maps are visualized in Fig. 15. After pre-training on an offline dataset, our neural network can progressivelyreconstruct higher-quality sampling maps with more accumulatedphoton energies through iterations on new scenes. Unlike previousMonte-Carlo denoising networks [Chaitanya et al. 2017; Bako et al.2017; Vogels et al. 2018] which only process the input image onceand stop after the inference, our reconstruction is getting betterand closer to the ground-truth sampling maps over time. Morespecifically, the network reconstructs blurrier sampling maps due tolow confidence in the early iterations to encourage more explorationof the directional space for the following path tracing; in the lateriterations the reconstructed sampling maps get sharper and thereemerge more accurate details of the incident radiance distributiondue to a higher level of confidence.

REFERENCES

Steve Bako, Mark Meyer, Tony DeRose, and Pradeep Sen. 2019. Offline Deep ImportanceSampling for Monte Carlo Path Tracing. In

Computer Graphics Forum , Vol. 38. WileyOnline Library, 527–542.Steve Bako, Thijs Vogels, Brian McWilliams, Mark Meyer, Jan Novák, Alex Harvill,Pradeep Sen, Tony Derose, and Fabrice Rousselle. 2017. Kernel-predicting convo-lutional networks for denoising Monte Carlo renderings.

ACM Trans. Graph.

36, 4(2017), 97–1.Benedikt Bitterli. 2016. Rendering resources. https://benedikt-bitterli.me/resources/.LLC Blend Swap. 2016. Blend swap.Chakravarty R Alla Chaitanya, Anton S Kaplanyan, Christoph Schied, Marco Salvi,Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive reconstruc-tion of Monte Carlo image sequences using a recurrent denoising autoencoder.

ACMTransactions on Graphics (TOG)

36, 4 (2017), 1–12.Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2018. Model compression andacceleration for deep neural networks: The principles, progress, and challenges.

IEEE Signal Processing Magazine

35, 1 (2018), 126–136.Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast and ac-curate deep network learning by exponential linear units (elus). arXiv preprintarXiv:1511.07289 (2015).Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model Compressionand Hardware Acceleration for Neural Networks: A Comprehensive Survey.

Proc.IEEE

Computer Graphics Forum , Vol. 39. Wiley Online Library, 23–33.TM Evermotion. 2012. Evermotion 3d models.Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gam-baretto, Christian Gagné, and Jean-François Lalonde. 2017. Learning to predictindoor illumination from a single image. arXiv preprint arXiv:1704.00090 (2017).Jerry Guo, Pablo Bauszat, Jacco Bikker, and Elmar Eisemann. 2018. Primary sample spacepath guiding. In

Eurographics Symposium on Rendering , Vol. 2018. The EurographicsAssociation, 73–82.Toshiya Hachisuka and Henrik Wann Jensen. 2009. Stochastic progressive photonmapping. In

ACM SIGGRAPH Asia 2009 papers . 1–8.ACM Trans. Graph., Vol. 0, No. 0, Article 0. Publication date: January 2021. :20 • Zhu et al.

Toshiya Hachisuka, Shinji Ogaki, and Henrik Wann Jensen. 2008. Progressive photonmapping. In

ACM SIGGRAPH Asia 2008 papers . 1–8.Sebastian Herholz, Oskar Elek, Jiří Vorba, Hendrik Lensch, and Jaroslav Křivánek. 2016.Product importance sampling for light transport path guiding. In

Computer GraphicsForum , Vol. 35. Wiley Online Library, 67–77.Yuchi Huo, Rui Wang, Ruzahng Zheng, Hualin Xu, Hujun Bao, and Sung-Eui Yoon.2020. Adaptive Incident Radiance Field Sampling and Reconstruction Using DeepReinforcement Learning.

ACM Transactions on Graphics (TOG)

39, 1 (2020), 1–17.Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locallyconsistent image completion.

ACM Transactions on Graphics (ToG)

36, 4 (2017),1–14.Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991.Adaptive mixtures of local experts.

Neural computation

Eurographics Workshop on Rendering Techniques . Springer, 326–335.Henrik Wann Jensen. 1996. Global illumination using photon maps. In

RenderingTechniques’ 96 . Springer, 21–30.James T Kajiya. 1986. The rendering equation. In

Proceedings of the 13th annualconference on Computer graphics and interactive techniques . 143–150.Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Claude Knaus and Matthias Zwicker. 2011. Progressive photon mapping: A probabilisticapproach.

ACM Transactions on Graphics (TOG)

30, 3 (2011), 25.Eric Lafortune. 1996. Mathematical models and Monte Carlo algorithms for physicallybased rendering.

Department of Computer Science, Faculty of Engineering, KatholiekeUniversiteit Leuven

20 (1996), 74–79.Eric P Lafortune and Yves D Willems. 1993. Bi-directional path tracing. (1993).Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. 2015.Deeply-supervised nets. In

Artificial intelligence and statistics . 562–570.Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and BryanCatanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In

Proceedings of the European Conference on Computer Vision (ECCV) . 85–100.Thomas Müller. 2019. “Practical Path Guiding” in Production. In

ACM SIGGRAPHCourses: Path Guiding in Production, Chapter 10 . ACM, New York, NY, USA, 18:35–18:48. https://doi.org/10.1145/3305366.3328091Thomas Müller, Markus Gross, and Jan Novák. 2017. Practical path guiding for efficientlight-transport simulation. In

Computer Graphics Forum , Vol. 36. Wiley OnlineLibrary, 91–100.Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, and Jan Novák.2019. Neural importance sampling.

ACM Transactions on Graphics (TOG)

38, 5(2019), 1–19.Steven G Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock,David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison,et al. 2010. OptiX: a general purpose ray tracing engine.

Acm transactions on graphics(tog)

29, 4 (2010), 1–13.Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, GregoryChanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Des-maison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, AlykhanTejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chin-tala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library.In

Advances in Neural Information Processing Systems 32 . Curran Associates, Inc.,8024–8035.Mark Pauly, Thomas Kollig, and Alexander Keller. 2000. Metropolis light transport forparticipating media. In

Rendering Techniques 2000 . Springer, 11–22.Ingmar Peter and Georg Pietrek. 1998. Importance driven construction of photon maps.In

Eurographics Workshop on Rendering Techniques . Springer, 269–280.Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deeplearning on point sets for 3d classification and segmentation. In

Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition . 652–660.Alexander Rath, Pascal Grittmann, Sebastian Herholz, Petr Vévoda, Philipp Slusallek,and Jaroslav Křivánek. 2020. Variance-Aware Path Guiding.

ACM Transactions onGraphics (Proceedings of SIGGRAPH 2020)

39, 4 (July 2020), 151:1–151:12. https://doi.org/10.1145/3386569.3392441Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutionalnetworks for biomedical image segmentation. In

International Conference on Medicalimage computing and computer-assisted intervention . Springer, 234–241.Lukas Ruppert, Sebastian Herholz, and Hendrik P. A. Lensch. 2020. Robust Fitting ofParallax-Aware Mixtures for Path Guiding.

ACM Transactions on Graphics (TOG) (2020).Peter Shirley, Bretton Wade, Philip M Hubbard, David Zareski, Bruce Walter, andDonald P Greenberg. 1995. Global illumination via density-estimation. In

RenderingTechniques’ 95 . Springer, 219–230.Turbo Squid. 2020. 3D Models, Plugins, Textures, and more at Turbo Squid.CG Trader. 2020. Cg trader.

Robust Monte Carlo methods for light transport simulation . Vol. 1610.Stanford University PhD thesis.Eric Veach and Leonidas Guibas. 1995a. Bidirectional estimators for light transport. In

Photorealistic Rendering Techniques . Springer, 145–167.Eric Veach and Leonidas J Guibas. 1995b. Optimally combining sampling techniques forMonte Carlo rendering. In

Proceedings of the 22nd annual conference on Computergraphics and interactive techniques . 419–428.Eric Veach and Leonidas J Guibas. 1997. Metropolis light transport. In

Proceedings ofthe 24th annual conference on Computer graphics and interactive techniques . 65–76.Thijs Vogels, Fabrice Rousselle, Brian McWilliams, Gerhard Röthlin, Alex Harvill, DavidAdler, Mark Meyer, and Jan Novák. 2018. Denoising with kernel prediction andasymmetric loss functions.

ACM Transactions on Graphics (TOG)

37, 4 (2018), 1–15.Jiří Vorba, Johannes Hanika, Sebastian Herholz, Thomas Müller, Jaroslav Křivánek, andAlexander Keller. 2019. Path Guiding in Production. In

ACM SIGGRAPH Courses .ACM, New York, NY, USA, 18:1–18:77. https://doi.org/10.1145/3305366.3328091Jiří Vorba, Ondřej Karlík, Martin Šik, Tobias Ritschel, and Jaroslav Křivánek. 2014.On-line learning of parametric mixture models for light transport simulation.

ACMTransactions on Graphics (TOG)

33, 4 (2014), 1–11.Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In

Proceedingsof the IEEE international conference on computer vision . 1395–1403.Zexiang Xu, Kalyan Sunkavalli, Sunil Hadap, and Ravi Ramamoorthi. 2018. Deepimage-based relighting from optimal sparse samples.

ACM Transactions on Graphics(TOG)

37, 4 (2018), 126.Zili Yi, Qiang Tang, Shekoofeh Azizi, Daesik Jang, and Zhan Xu. 2020. ContextualResidual Aggregation for Ultra High-Resolution Image Inpainting. In

Proceedings ofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition . 7508–7517.Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019.Free-form image inpainting with gated convolution. In

Proceedings of the IEEEInternational Conference on Computer Vision . 4471–4480.Shilin Zhu, Zexiang Xu, Henrik Wann Jensen, Hao Su, and Ravi Ramamoorthi. 2020.Deep Kernel Density Estimation for Photon Mapping. In