Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction
LLearning Patterns in Sample Distributionsfor Monte Carlo Variance Reduction
Oskar Elek Manu M. Thomas Angus ForbesUniversity of California, Santa Cruz, USA
Abstract
This paper investigates a novel a-posteriori variance reduction approach in Monte Carlo image synthesis. Unlike most establishedmethods based on lateral filtering in the image space, our proposition is to produce the best possible estimate for each pixelseparately, from all the samples drawn for it. To enable this, we systematically study the per-pixel sample distributions fordiverse scene configurations. Noting that these are too complex to be characterized by standard statistical distributions (e.g.Gaussians), we identify patterns recurring in them and exploit those for training a variance-reduction model based on neuralnets. In result, we obtain numerically better estimates compared to simple averaging of samples. This method is compatible withexisting image-space denoising methods, as the improved estimates of our model can be used for further processing. We concludeby discussing how the proposed model could in future be extended for fully progressive rendering with constant memory footprintand scene-sensitive output.
CCS Concepts • Computing methodologies → Ray tracing;
1. Introduction
Monte Carlo (MC) integration is the de-facto standard for physicallybased image synthesis, stemming mainly from its broad applica-bility [CJ16] and solid mathematical foundations [KW08]. On thedownside, the stochastic nature of MC methods causes the resultingsolution to contain a certain amount of error that fully vanishes onlyin the theoretical limit case of infinitely many evaluated samples.This error is the residual variance of the used MC estimator, of-ten referred to as “Monte Carlo noise”. This nomenclature is some-what unfortunate, since “noise”—in the signal processing context—typically refers to an undesired distortion of a signal caused byinstruments or otherwise. In contrast, MC variance is an inherentphenomenon , inseparable from, and specific to any MC estimator.The above distinction goes beyond mere linguistic nitpicking.Numerous existing approaches [ZJL ∗
15] approach the task of MCvariance reduction through the image-processing lens, that is, as animage denoising problem. While that can certainly produce satisfac-tory results due to the similarities of the two problems, the inheritedassumptions are often not applicable in MC image synthesis (aspartially pointed out by Boughida and Malik [BB17], for instance).As a practical consequence of this mismatch, it is difficult to predictsuch methods’ performance upfront, necessitating trial-and-errorapplication and hand-tuning of parameters.Consequently, the core mission of this paper is to critically exam-ine the problem of variance reduction in Monte Carlo image syn-thesis. We build on the following paradigm (already adopted in thiscontext by several methods [ZHD18, BB17, DWR10, DMB ∗ every image sample produced by an MC estimator carries rele-vant information about the underlying distribution from whichit is drawn. These distributions—specific to every point in theimage—will be denoted as characteristic distributions . Our propo-sition is to identify the characteristic distributions in the contextof MC image synthesis, and use that knowledge to produce bet-ter estimates for every image pixel individually , rather than simplyaveraging the samples generated for that pixel. In that sense, ourmethod belongs to the a-posteriori category. This is in contrast to a-priori methods which attempt to derive a variance reduction strategyfrom theoretical principles; although we do ground our reasoning intheory as well, whenever possible.Towards the above goal, we undertake three main steps. • We first discuss the problem of variance reduction in MC im-age synthesis , pointing out the key sources of variance typicalfor this domain; then review the works tackling the problem (Sec-tion 2). Afterwards we critically examine the assumptions com-monly used in MC denoising methods (Section 4) and proposehow to alleviate them. • Next, we present a qualitative study of characteristic sampledistributions in several scenes and their configurations (Sec-tion 5). This is our first contribution, intended to identify patternsoccurring in these distributions. • Finally, we describe a prototype framework for learning thesecharacteristic distributions (Section 6). The learned statisticaldescription of the distributions then serves to provide better pixelestimates from MC samples collected during rendering. This c (cid:13) a r X i v : . [ c s . G R ] J un . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction framework, agnostic to the internal workings of the employedintegrator, is our second contribution.The benefit of the proposed method is that it is complementary tothe existing image-space methods: our intent is to provide improvedestimates for each image pixel, which can subsequently serve as aninput to further processing. In contrast to many existing denoisers,we do not (explicitly or implicitly) assume any specific form or prop-erties of the underlying sample distribution. The only assumptionembedded in our method is that the characteristic distributions docontain identifiable patterns , which can be learned by a sufficientlyexpressive statistical model (based on neural nets, in our case).
2. Background
This section first defines the problem of Monte Carlo variance re-duction and provides an overview of works that address it.
Realistic image synthesis.
The central problem of realistic imagesynthesis is governed by the rendering equation [Kaj86], which, fora given position x and direction ω , evaluates the (monochromatic)incident radiance L : L ( x , ω ) = L e ( x (cid:48) ) + (cid:90) Ω L ( x (cid:48) , ω (cid:48) ) ρ ( x (cid:48) , ω , ω (cid:48) ) d ω (cid:48) . (1)Here, x (cid:48) is the point on the nearest surface visible from x in the di-rection ω ; L e and ρ are the emitted radiance and the cosine-weightedbi-directional reflectance distribution function (BRDF) at that point,respectively. The key to solving Equation 1 is the hemisphericalintegral on the right, which again requires solving for incident radi-ance at x (cid:48) , leading to the well known self-referential nature of therendering equation which makes its solution so challenging.Rendering an image for a given scene entails gathering the radiantenergy passing through every pixel i . Since each pixel correspondsto a small solid angle Ω i , this yields the irradiance E i of that pixel(for a fixed camera position x c ): E i = (cid:90) Ω i L ( x c , ω ) d ω . (2)The question this paper is concerned with is: assuming a particularMonte Carlo solver, what is the ‘best’ way to obtain the value of E for every pixel i in the rendered image? For this, we will employ thestandard MC path tracing with next event estimation [PJH16]; ingeneral, we will limit our analysis to unbiased MC solvers to onlyfocus on variance analysis and reduction.
Monte Carlo estimation.
Being a stochastic method, path tracinggenerates samples X n , n ∈ . . . N for each pixel (for simplicity wewill assume the same N across the entire image, although our analy-sis does not depend on this assumption). These samples X n are pathsoriginating at the camera position x c and uniformly sampled direc-tions ω i ∈ Ω i , probabilistically traced through the scene. Togetherthey form an estimator for Equation 2 (cid:98) E = N N ∑ n (cid:98) L ( X n ) = N N ∑ n L ( X n ) p ( X n ) , (3)where p ( X n ) is the probability of that path; it is the joint probabilityof all stochastic decisions made by the path tracer. Therefore, the estimate for E is composed of N evaluations of the estimator (cid:98) L forEquation 1. Since we assume (cid:98) L to be an unbiased estimator, also theestimator for the pixel value is unbiased, that is, E [ (cid:98) E ] = E .The variance (the error) of the estimator for a particular E i de-pends on two factors: the variance of the individual estimates (cid:98) L ( X n ) ,and the variation of L itself across the region of the pixel i . If, for amoment, we assume the latter to be constant, thenVar [ (cid:98) E ] = Var [ N N ∑ n (cid:98) L ( X n ) ] = N · Var [ N ∑ n (cid:98) L ( X n ) ] == N · (cid:32) N ∑ n Var [ (cid:98) L ( X n )] + N ∑ n (cid:54) = n Cov [ (cid:98) L ( X n ) , (cid:98) L ( X n )] (cid:33) ≤≤ N · N ∑ n Var [ (cid:98) L ( X n )] , (4)where the final inequality results from the fact that sample co-variance is zero for samples independently distributed across thepixel (as is the case in this analysis), or negative if correlated (low-discrepancy) sampling is used. Dropping the temporary assumptionof constant L within the pixel i , the actual variance of (cid:98) E i is bound tobe slightly higher than Equation 4 indicates (and without caching theestimates to adaptively refine the pixel-space sampling, we cannotdo much better than sample it uniformly, since no prior estimate onthe distribution of inter-pixel L is available to us). Reducing estimator variance.
Since the variance of (cid:98) E is the mainefficiency bottleneck of MC rendering, diverse strategies exist toreduce it. For instance, according to the zero-variance samplingtheory [Hoo08, KW08], generating the paths X n exactly accordingto the energy distribution in the scene (i. e., p ( X ) ∝ L ( X ) , cf. Equa-tion 3) would result in an optimal estimator (cid:98) L . On the other hand, anydeviations from the optimal sampling probabilities for Equation 1result in accumulation of variance—including: • using loose approximations for sampling the BRDF ρ , • omitting the radiance term L when sampling the integrand, • incorrect path termination strategies, and • visibility fluctuation during next event estimation.Numerous works in realistic image synthesis therefore strive to ap-proach the theoretical zero-variance limit by developing advancedimportance-sampling strategies, for instance using the path guid-ing approach [VKv ∗
14, MGN17, GBBE18]. Given our a-posteriori sample-based approach, we refer the reader to established litera-ture [PJH16] for further information regarding this class of methods.
Image filtering.
The tradition of image-space (lateral) filteringto denoise images has a long history in image processing. Per-haps the most influential signal-preserving method—bilateral fil-tering [TM98]—has been extended through non-local neighbor-hoods [BCM05], cross / joint filtering [XP05], to comprehensivemulti-faceted methods [RMZ13], to provide a few examples.The sophistication of the recent methods nevertheless means thatthe simplicity of the original bilateral filter is lost, and stricter as-sumptions are needed (cf. Section 3); assumptions that have thepotential to fail. The inherent noisiness of the feature descriptors,and their unknown correlation with radiance distribution, are just c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction Figure 1:
Example distributions of 64 superimposed batches of 32 radiance samples (cid:98)
L each (top row) and the corresponding pixel irradianceestimate (cid:98)
E histograms (bottom row) for 5 randomly chosen pixels (columns) of the B ATHROOM scene (left) . The corresponding expectedvalues are marked with dashed black lines. The apparent long-tailedness of the radiance samples (leading to numerous outliers above theexpectncy) is reflected in the slight skew in the aggregate irradiance values. some examples of the necessary guesswork. It is interesting thatmany of these methods can be cast as variants of statistical regres-sion [BRM ∗ ∗ Sample-based methods.
A very insightful progress was made byDelbracio et al. [DMB ∗ Learning-based filtering.
Complex filtering operators can belearned from data—even if accurate references are not feasi-ble [LMH ∗ ∗
17, VRM ∗
18] description of lat-eral filters. In addition to the offline operators above, a low-sample temporally stable method has been demonstrated by Chai-tanya et al. [CKS ∗ ∗
15] and the works of Bitterli et al. [BRM ∗ ∗
18] for reviews of more recent papers.
3. Overview of Methodology
Our exposition has two parts. We first review review the assump-tions commonly made to tackle the MC variance reduction problem(Section 4). Finding them too optimistic, we conduct a study toempirically determine the properties of the characteristic sampledistributions in different rendering configurations (Section 5). Theseconstitute empirical priors for the next step.The information contained in each sample is viewed throughthe perspective of the scene-specific priors to produce better pixelestimates than simply averaging the available samples. In the secondpart (Section 6) we present a learning-based approach to facilitatethis, and show that it is indeed possible to significantly improve thepixel estimates on such grounds.
4. Properties of Sample Distributions
We aim to gain a better understanding of the characteristic per-pixelsample distributions in MC image synthesis, and set reasonableexpectations about the patterns we might encounter in them. To thisend, we start with a high-level discussion of the assumptions mostcommonly adopted by existing variance-reduction methods, andcontinue with the detailed study in Section 5.
Spatial uniformity.
One of the two most common assumptions isthat the estimator error can be modeled by a spatially invariant addi-tive ‘noise’ ν , i. e., (cid:98) E = E + ν E . While suitable in image processingto model, e. g., thermal sensor noise, it is an unjustified assumptionin MC image synthesis.First, as a straightforward consequence of Equation 4, the estima-tor variance is a function of the incoming radiance and its estimator.As such, it will differ not only for each image pixel (for (cid:98) E ), but evenvary within the pixel itself (for (cid:98) L ). This has also been pointed outby Boughida and Boubekeur [BB17] and Kalantari and Sen [KS13],among others.Second, as a consequence of the previous property and the factthat light transport is a relative process (implying that the estimatorvariance scales with the value of the estimated quantity itself), mod-eling variance with an additive term is problematic in case of ν L , c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction Figure 2:
The scenes used in our study: B ATHROOM (a) , W HITE R OOM (b) , N IGHT R OOM (c) and M EASURE O NE (d) . and by transition (although to a lesser extent), for ν E as well. Thisis also supported by empirical data (cf. Section 5), mainly due tothe characteristic skew and long-tailedness of characteristic sampledistributions (see Figure 1). Gaussian distribution.
While the per-pixel characteristic distribu-tions are known to be complex [DWR10, DMB ∗
14, BB17], theirmeans—the estimator values (cid:98) E —are very frequently assumed to benormally distributed. This seems like a safe bet following from thecentral limit theorem (CLT), but is seldom verified.CLT holds for random variables which are independent and iden-tically distributed with finite variance . However, the aggregated ran-dom variables (cid:98) L are certainly not identically distributed (see previouspoint) and might not be independent (when correlated sampling isused). Furthermore, in spite of the theoretically backed convergencerates (Equation 4), no bounds on the individual sample variancesare available; in fact, under certain circumstances involving sourcesingularities, variance might be provably infinite [GKH ∗ Correlation with ‘scene features’.
A common variance-reductionapproach is the use of feature buffers—auxiliary information like sur-face positions, normals and albedos—to guide the filtering process(Section 2). Again adopted from image processing, this approach ispurely heuristical in the context of MC image synthesis: it carriesthe hidden assumption that variance be correlated with these fea-tures, which is trivially not true for numerous physical phenomena(shadows, caustics, transparent objects, volumetric media). Some ofthese pathological cases have been demonstrated by Boughida andBoubekeur [BB17].Again, this does not mean that a variance-reduction approachbased on feature buffers cannot work; we simply argue that byrelying on feature buffers, an unknown amount of variance is eitherleft untreated, or actual image features get filtered out. In general,the complexity of the path space is virtually unbounded (due to itsrecursive nature), and thus it seems unlikely that any reasonablenumber of hand-picked features can be found to sufficiently describethat complexity.
Regularity of the radiance distribution.
Finally, even if no as-sumptions are made about the variance distribution, methods basedeither on lateral filtering or statistical regression assume some de-gree of smoothness in the radiance distribution. This assumptioncan hold true for large portions of the image, but is violated byvisibility and illumination discontinuities—an issue that becomesmore abundant as the complexity of the scene in question increases.Concluding this discussion, we have argued that many of theassumptions frequently adopted for solving the MC variance re-duction problem are fragile. We therefore propose to take astep back and—to complement the existing theoretical analy-sis [JMD15, BB17, ZHD18]—examine the characteristic distribu-tions empirically.
5. Case Study
Next, we present a detailed study of characteristic per-pixel radiancedistributions. We are primarily looking for recurring patterns in thedistributions, as well as any other empirical properties that mightguide the variance reduction model (Equation 5).
Instrumentation.
To facilitate the study, we developed a special-ized web-based tool (which we plan to release with as a part of thepublication) which allows for interaction with a specified renderedimage, providing on-demand visualization and diagnostics of thecharacteristic distributions of the selected pixel, both in the linearand log-10 domains. These data were extracted during renderingusing 2k samples per pixel (SPP) of the next-event-estimation pathtracer in PBRT3 [PJH16], and have the form of per-pixel RGBhistograms with 100 bins for both the linear and log variants.To overcome the challenge of setting the pixel-specific histogramsupport (cf. Delbracio et al. [DMB ∗ absolute support [ − , ] to complement the relative information in theirlinear counterparts. We count the null samples from occluded pathsseparately and do not include them in the visualized histograms, asthey would cause discontinuous spikes without contributing muchinsight into the distributions’ shapes. Procedure.
We study the characteristic distributions (CHD) corre-sponding to different objects, configurations and phenomena char-acteristic to every scene. We examined each scene for interesting c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction (a) Linear histogram (adaptive support) Log histogram (fixed support)
Diffuse objects Glossy objects Noisy reflections Direct illumination Indirect illumination Illumination gradients D i ff u s e o b j e c t s G l o ss y o b j e c t s N o i s y r e f l e c t i o n s D i r e c t ill u m i n a t i o n I n d i r e c t ill u m i n a t i o n I ll u m i n a t i o n g r a d i e n t s Figure 3:
Study for the B ATHROOM scene, analyzed in Section 5.1. The regions of interest are selected using different colors, which thenmark the corresponding magnified insets as well as the characteristic sample distributions below; we use this scheme across the whole study. patterns, which we then present and discuss in the study but onlyif they are sufficiently abundant (i. e., outlying distributions arenot considered). In the following subsections we study four mainscenes selected from the standard distribution of PBRT3, depictedin Figure 2.
ATHROOM
Scene (Figure 2a)
We start with this simple interior scene illuminated by a single closedwindow; most of the illumination is therefore indirect. There is a good variation of diffuse and glossy materials. The transport in thisscene is moderately difficult for a path tracer—after 2k SPP we geta reasonably converged image with only a few fireflies.A selection of interesting characteristic distributions is depictedin Figure 3. We will now discuss these cases individually. • Diffuse objects.
CHD tend to be quite symmetric in the logdomain, close to Gaussian in fact. In the linear (radiance) domain,CHD exhibit long tails, strong skew, with mode typically lowerthan mean—resembling the Gamma-type distribution quite well. c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction Linear histogram (adaptive support) Log histogram (fixed support) D i r e c t ill u m i n a t i o n D i ff u s e o b j e c t s I n d i r e c t ill u m i n a t i o n C o l o r f u l o b j e c t s T h e F i r e p l a c e Direct illumination Diffuse objects Indirect illumination Colorful objects The Fireplace
Figure 4:
Study for the W HITE R OOM scene, analyzed in Section 5.2. • Glossy objects.
Shapes surprisingly similar to diffuse objects,although in this case the different RGB channels’ CHD are shiftedw. r. t. each other (as a consequence of the underlying brown-woodtexture). Interestingly, their spread differs across color channels,suggesting that their width is a function of the incoming radianceitself (as proposed in Section 4). • Noisy reflections.
All of the pixels in the reflection, in spiteof their high estimate variance, share virtually the same CHDshape. This is in full agreement with the assumptions of Delbra-cio et al. [DMB ∗
14] regarding viable pixel similarity measures. • Direct illumination.
Very well resolved, low-noise CHD—inspite of their very significant spread and long tail. • Indirect illumination.
Noisy CHD with very long tails (high kur-tosis). High number of null samples (around 60 %) and significantproportion of dark samples, showing inverted (negative) skew inthe log domain (similar to the noisy reclection case). • Illumination gradients.
A visibly bi-modal CHD marking thetransition from lit to shadowed regions. The relative proportionsof the modes correspond to the strength of the penumbra.
HITE R OOM
Scene (Figure 2b)
A moderately complex scene illuminated by three windows, sourc-ing a lot direct illumination but also having a high proportion ofindirect illumination due to the highly reflective walls. The illumi-nation has a red color cast, both due to the outside source as wellas the dark-red carpet. It is predominantly populated with diffusematerials. This scene is easily handled by a path tracer, due to thelarge open windows and low amount of geometric occlusion; thisleads to low numbers of null samples as well.Figure 4 shows the characteristic distributions. • Direct illumination.
White surfaces illuminated both directly c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction Linear histogram (adaptive support) Log histogram (fixed support) D i r e c t ill u m i n a t i o n S h a d o w s M u l t i p l e b o un c e s V i r t u a l s o u r c e s Direct illumination Shadows Multiple bounces Virtual sources
Figure 5:
Study for the N IGHT R OOM scene, analyzed in Section 5.3. and indirectly. This yields complex multi-modal CHD, with anadded color cast for the aforementioned reasons. Very long tailsare present, which is a common trait across the whole scene. • Diffuse objects.
Similar as the previous case, but with higherproportion of indirect illumination. This causes less spiky CHDin log domain and significantly longer tails in the primal domain.Significant noise is present as well, indicating more complex pathstructure in this region. • Indirect illumination.
In spite of the fact that these objects arespread across large space, their CHD look strikingly alike, as theyuniformly reflect the CHD across the room. We again observevery long tails in the CHD, leading to somewhat higher variancein these areas compared to their directly illuminated counterparts. • Colorful objects.
These CHD follow the same patterns as alreadyobserved, but exhibit strong color cast due to the underlying dif-fuse texture. Note also, that the color cast shifts the distributionsin the log domain, it basically does not change their shape—thismakes the case that the material albedo has no correlation withvariance and therefore makes for a poor feature descriptor fordenoising purposes. • The Fireplace.
This fireplace appears boring, but after closerexamination in the log domain displays a wild CHD, which, suspiciously, looks like fire. We are currently trying to interpretthis behavior.
IGHT R OOM
Scene (Figure 2c)
The same scene as in Section 5.2, but illuminated by two interiorlights instead. Although mostly indirectly illuminated, the lamps’translucent shades act as secondary ‘virtual’ sources of direct lightwhich then shows in the CHD. This is a moderately difficult scenefor a path tracer, due to the small and enclosed sources, leading toincreased variance especcialy in the darker regions—consequentlythe proportion of null samples is high, often exceeding 50 %.Figure 5 shows the characteristic distributions. • Direct illumination.
These areas have bi-modal CHD composedof strong direct peaks (with a color shoft corresponding to thenearby illuminant) and faint indirect sample groups again withlong tails. Further from the sources, the proportion of indirectsamples grows, as expected. • Direct shadows.
In the linear domain, the CHD are very similarto the previous case, but missing the direct components. The logdomain shows stronger skew towards the dark end, however, due c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction Linear histogram (adaptive support) I n d i r e c t ill u m i n a t i o n F i r e f l y F i r e f l y ( n e i g h b o r ) N o i s y r e g i o n ( - b o un c e ) N o i s y r e g i o n ( - b o un c e ) -5.00000 -3.90000 -2.80000 -1.70000 -0.60000 0.50000 1.60000 2.70000 3.80000 4.900000102030405060708090100110120130-5.00000 -3.90000 -2.80000 -1.70000 -0.60000 0.50000 1.60000 2.70000 3.80000 4.90000020406080100120140160180200220-5.00000 -3.90000 -2.80000 -1.70000 -0.60000 0.50000 1.60000 2.70000 3.80000 4.90000020406080100120140-5.00000 -3.90000 -2.80000 -1.70000 -0.60000 0.50000 1.60000 2.70000 3.80000 4.90000020406080100120140160180200220 Indirect illumination Fireflies Noisy region 1 Noisy region 2 N o i s y r e g i o n ( - b o un c e ) N o i s y r e g i o n ( - b o un c e ) Log histogram (fixed support)
Figure 6:
Study for the M EASURE O NE scene, analyzed in Section 5.4. Note that in this figure, some of the histograms corresponding to thesame regions (and color marks) are drawn from two different neighboring pixels (yellow) or the two respective images (blue, magenta) . to the higher complexity of the path structure. Consequently, weobserve higher estimator variance in these regions. • Multi-bounce transport.
The CHD contain complex multi-modal patterns, but their shape is remarkably stable across chan- nels, albeit with a color cast caused by the combination of reddishilluminants and green diffuse color of the objects themselves. • Virtual sources.
These are translucent lamp shades, which retainmuch of the direct component (compare to the first, directly illu- c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction minated case), but in addition contain a very long faint tail causedby the multi-scattered diffuse transmission. In spite of this, thevariance in these areas is low, as the path tracer eventually doesfind the source inside for a sufficient proportion of the paths. EASURE O NE Scene (Figure 2d)
A geometrically complex scene with neutrally colored mixture ma-terials. Illuminated mostly indirectly with area sources ranging fromlarge background white panels down to small yellow sources. Inter-estingly, even though this scene is geometrically very symmetric,the CHD are not in all cases, since the illumination differs for eachhalf somewhat. Though being a moderately difficult scene for a pathtracer (with low numbers of null samples, typically under 20 %), thesmall yellow sources are difficult to find, creating patches of highvariance around the image.Due to the geometric complexity of this scene, we primarily use itfor a special purpose: to study CHD with respect to statistical outliers(‘fireflies’) and with respect to different order of light transport (i. e.,how many bounces of indirect illumination are simulated by thepath tracer).Figure 6 shows the characteristic distributions. • Indirect illumination.
The complexity of this scene is reflectedin its CHD as well. In this example, the linear histogram has sucha wide support that most of its mass vanishes in a single spike; thelog version however reveals a complicated multi-modal structure(although only a few modes are dominant, leaving hope that suchshapes can still be learned). • Fireflies.
Here we compare two pixels: a bright firefly (RGBirradiance [ . , . , . ] ) and its immediate neighbor on top(RGB irradiance [ . , . , . ] ). In spite of this stark differ-ence in value, their distributions are almost identical (which isbetter visible in the log histograms, as the linear variants differin support). This is a behavior that we observe universally in allscenes, supporting the thesis of Delbracio et al. [DMB ∗
14] andjustifying the reweighing strategies of Jung et al. [JMD15] andZirr et al. [ZHD18]. One of the aims of our model is to learn suchreweighing as well, but from data alone. • Different transport path lengths.
Here we compare the sceneunder identical configurations, with only one change: using twoand five bounces of indirect transport respectively. As expected,that influences the amount of transported energy, making theorder-2 image visibly darker than the order-5 one.The additional insight of our study is that the different path lengthsalso influence the shape of CHD: in accordance with the previousfinding that higher orders of indirect illumination result in morecomplex path structures, indeed the resulting CHD for the order-5images are more spread, with wider support and longer, noisytails. The natural result is that the order-5 image ends up havinghigher variance, despite taking roughly twice longer to render.This finding agrees with theory (such as Equation 4), and pointsto the importance of discussing the complexity of the simulatedtransport in any MC variance reduction work.
We finish the study by drawing several general conclusions aboutthe observed distributions.Most importantly, recurring patterns do exist in characteristicsample distributions ; despite being complex, CHD are certainlynot random. Moreover, CHD are not even discontinuous betweenneighboring pixels—if a radiance discontinuity lies in a given pixel,its CHD will end up a blend between the two regions of the functionseparated by that discontinuity (as indirectly implied in the work ofDelbracio et al. [DMB ∗ the shapes ofthe characteristic distributions are scene-specific : they dependon the scene configuration holistically, rather than on a few keycomponents or objects. This suggests a clear design path: a variancereduction model aiming to take advantage of the CHD propertiesneeds to be trained for each scene specifically (or even better, be ableto classify the scene in a top-down manner and be conditioned bythat classification). In the following section we present a prototypeof such a model, based on fully connected neural nets.
6. Variance Reduction Model
The approach we propose—complementary to the existing lateraldenoising approaches (Section 2)—is to improve the estimates for E i (Equation 2) by training an a-posteriori model M Θ defined by aset of parameters Θ , according to the following criterion:arg min Θ ∑ i λ (cid:16) M Θ (cid:104) { (cid:98) L ( X ) } i (cid:105) , E i (cid:17) , (5)where λ is a suitable loss function, in our case a squared errorbetween the model’s response and the ground-truth irradiance foreach pixel. The assumption embedded in this strategy is that theper-pixel sample sets { (cid:98) L ( X ) } i generated by laws of light transportand the MC process that simulates them (path tracing, in our case)contain systematic patterns that the model M can learn and exploit—and indeed, our study in Section 5 has identified such patterns. Areasonable expectation from such a model is that, in the absence of c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction Figure 7:
Variance reduction results for the B ATHROOM and M EASURE O NE scenes. We compare the baseline estimate from 32 SPP and theresult of our model for the same sample set, against the ground-truth images rendered with 2k SPP. any identifiable patterns (or when they are too complex), it shouldgracefully reduce to the baseline solution, i. e., the sample averageas per Equation 3. In its core, this is a Bayesian approach: start witha prior knowledge about the involved distributions and incorporatenew data to create a posterior estimate. Model implementation.
Our prototype model is a multi-layer per-ceptron network with a width-32 input layer (for the radiance sam-ples (cid:98) L ), three 32-neuron densely connected layers, and a singleoutput unit representing the estimate. In accordance with the find-ings of our study (Section 5) we trained a separate network for eachtested scene.The training process works with the sample data obtained fromrendering the ground truth images using 2k samples per pixel. Foreach training run, we uniformly selected 6400 pixels and dividedtheir radiance samples into batches of 32 samples each; then sepa-rated out 10 % of the batches as validation data. Using 30k trainingiterations with the learning rate of 0.005, the training took 20 min-utes in Tensorflow on NVIDIA RTX 2080 Ti GPU. The runtime ofthis model on an 800 ×
800 RGB image with 3 ×
32 SPP is under 1minute in the unoptimized development mode.
Results.
The ability of our model to improve upon the baseline es-timates is visually demonstrated in Figure 7 for the B
ATHROOM andM
EASURE O NE scenes. These results are promising, as our modelyields smoother images even without exchaning any informationbetween neighboring pixels. In particular, a majority of the firefliesin the baseline is suppressed, a behavior similar to the results ob-tained by Zirr et al. [ZHD18]. This has the benefit of regularizingthe image to yield a more balanced variance distribution, whichmeans that lateral denoising approaches would have an easier timeusing our model’s output as the starting point. E rr o r Base loss Training loss Valida � on loss 00.0050.010.0150.020.0250.030.0350.040.045 E rr o r Base loss Training loss Valida � on loss Figure 8:
The loss function values obtained during the 30k trainingiterations for the B ATHROOM scene (left) and the M EASURE O NE scene (right) . The loss for the training and validation data is plottedseparately, against the baseline error. To evaluate our model numerically, we first examined the qualityof the results during training; this is shown in Figure 8 for the tworesult scenes. We see a reasonably stable behavior for both scenes,although surprisingly the model does not improve markedly after theinitial iteration batch. This indicates a space for future optimizationof the training time, so that potentially the model could learn onthe fly—that is, during the rendering process itself, to adapt to thecurrent scene and rendering configuration.The numerical counterpart to the evaluation in Figure 7 is shownin Figure 9. Here we compare the baseline results (sample averages)and our model to the ground-truth solution. Although the majorityof the estimates is improved by our model, some exceptions remain.One possible way to address these would be to train the network tooutput a certainty of each individual estimate—this is feasible as weknow the deviations of the network’s estimates from ground truthduring training. The certainty could then serve as a mixing weightfor the model’s output in combination with the baseline as fallback. c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction I rr a d i a n c e Pixel
Baseline Our model Ground truth00.20.40.60.811.21.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I rr a d i a n c e Pixel
Baseline Our model Ground truth
Figure 9:
Numerical evaluation for the B ATHROOM scene (top) and the M EASURE O NE scene (bottom) , shown for 25 randomlyselected pixels from the respective images.
7. DiscussionStudy extensions.
Our study identifies a number of properties re-garding the characteristic sample distributions, as well as puttinginto question some of the commonly used assumptions about theMC variance reduction problem. The aim of this study was to gaininformed understanding about the problem, as well as inciting anopen discussion about these issues.A future prospect in improving the study involves both its breadthand depth. More phenomena should be investigated—such as many-light configurations, caustics and participating media, to name afew—as well as more advanced integrators. In particular, we wishto take a closer look at path guiding methods, as these are becomingstate of the art in production and have to potential to subsumethe majority of path-based methods. More advanced visualizationstrategies might also be explored to extract further insights aboutthe rendering process.
Model improvements.
Based on the obtained results, the prototypeimplementation of our proposed model already shows promise. Forthe future we envision several directions in which we plan to improveupon it. First, several quantitative improvements can be investigated. Thetraining and estimation timings are currently obtained from an un-optimized development implementation; we believe speedups of atleast an order of magnitude are feasible. Optimizing the networkparameters would orthogonally improve the accuracy of the currentestimates.We also plan to investigate different network architectures, inparticular recurrent NNs which have the potential of incorporatingadditional samples into the estimate as they are evaluated by the inte-grator; that would allow to run the model in a progressive renderingsetting, with interactive refinements of the rendered image.While naively extending the network to include neighboringsamples would not be difficult, our aim was to show the pos-sibility of reducing variance without this. A compromise solu-tion could operate in the spirit of Delbracio et al. [DMB ∗
14] andBoughida et al. [BB17]: the individual pixel estimators could shareonly statistical information about the characteristic distributions.Furthermore, a learning-based approach has the promise of incor-porating high-level information about structures probabilisticallydetected in the image, such as edges, illumination gradients andeven textures.
A-priori versus a-posteriori methods.
The presented approachtackles the problem of MC variance reduction in the a-posteriori manner, trying to make sense of the collected estimator sampleswithout making overly strong assumption about their distribution.Due to the fact that we have the full information about the renderingprocess, bottom-up a-priori approaches might seem favorable atfirst. However, since we cannot theoretically reason about someof the parameters of the rendering process (particularly the scene),these approaches share the general ‘chicken-and-egg’ problem ofMC methods: the optimal solution to the problem involves knowingthe solution upfront (another example is importance sampling).In MC variance reduction, the problem of noisy pixel estimatesis frequently deferred to the problem of noisy meta-estimates ofhigher-order statistical quantities, or to idealized assumptions abouttheir behavior. We therefore argue that a certain amount of ‘data-drivedness’ is necessary to break this loop and inform the denoiserabout the properties of the rendered scene. In future, we hope forthe bottom-up methods (e. g., [JMD15, ZHD18]) to be unified withthe learning-based approaches such as ours; a framework with thepotential for this is arguably still yet to be found, but we believe thatit should have Bayesian nature.
8. Conclusion
Our approach is simple enough to be explained in one sentence: weexamine and learn the shapes of statistical sample distributionsin Monte Carlo rendering, and apply that prior knowledge toproduce improved estimates from finite sample sets .The case study we conducted confirms the expectation that per-pixel sample distributions typically have complex shapes that couldhardly be represented by standard statistical distributions. At thesame time, we have identified recurring patterns that can be exploitedby a learning-based model, along with the key finding that the c (cid:13) . Elek, M. M. Thomas, A. Forbes / Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction characteristic per-pixel distributions holistically depend on the sceneand the particular integrator used for the rendering.The variance reduction model we present shows promising results,in spite of the fact that it has no information about the surroundingpixels. Our plan is to continue our investigation and develop atruly robust model that is scene-sensitive, adapts to the particularrendering configuration, and runs progressively in lockstep with therendering process. References [BB17] B
OUGHIDA
M., B
OUBEKEUR
T.: Bayesian collaborative denois-ing for Monte Carlo rendering.
Computer Graphics Forum (Proc. EGSR2017) 36 , 4 (2017), 137–153. 1, 3, 4, 11[BCM05] B
UADES
A., C
OLL
B., M
OREL
J. M.: A review of imagedenoising algorithms, with a new one.
SIAM Multiscale Model. Simul. 4 ,2 (2005), 490–530. 2[BRM ∗
16] B
ITTERLI
B., R
OUSSELLE
F., M
OON
B., I
GLESIAS -G UITIÁN
J. A., A
DLER
D., M
ITCHELL
K., J
AROSZ
W., N
OVÁK
J.:Nonlinearly weighted first-order regression for denoising Monte Carlorenderings.
Computer Graphics Forum (Proc. EGSR) 35 , 4 (2016), 107–117. 3[BVM ∗
17] B
AKO
S., V
OGELS
T., M C W ILLIAMS
B., M
EYER
M.,N
OVÁK
J., H
ARVILL
A., S EN P., D E R OSE
T., R
OUSSELLE
F.: Kernel-predicting convolutional networks for denoising Monte Carlo renderings.
ACM Transactions on Graphics (Proc. of SIGGRAPH) 36 , 4 (2017). 3[CJ16] C
HRISTENSEN
P. H., J
AROSZ
W.: The path to path-traced movies.
Found. Trends. Comput. Graph. Vis. 10 , 2 (2016), 103–175. 1[CKS ∗
17] C
HAITANYA
C. R. A., K
APLANYAN
A. S., S
CHIED
C.,S
ALVI
M., L
EFOHN
A., N
OWROUZEZAHRAI
D., A
ILA
T.: Interac-tive reconstruction of Monte Carlo image sequences using a recurrentdenoising autoencoder.
ACM Trans. Graph. 36 , 4 (2017), 98:1–98:12. 3[DMB ∗
14] D
ELBRACIO
M., M
USÉ
P., B
UADES
A., C
HAUVIER
J.,P
HELPS
N., M
OREL
J.-M.: Boosting Monte Carlo rendering by rayhistogram fusion.
ACM Trans. Graph. 33 (2014), 8. 1, 3, 4, 6, 9, 11[DWR10] D E C ORO
C., W
EYRICH
T., R
USINKIEWICZ
S.: Density-basedoutlier rejection in Monte Carlo rendering.
Computer Graphics Forum(Proc. Pacific Graphics) 29 (2010), 7. 1, 3, 4, 9[GBBE18] G UO J., B
AUSZAT
P., B
IKKER
J., E
ISEMANN
E.: Primarysample space path guiding. In
Proc. of Eurographics Symposium onRendering – EI&I (2018), pp. 73–82. 2[GKH ∗
13] G
EORGIEV
I., K ˇRIVÁNEK
J., H
ACHISUKA
T.,N
OWROUZEZAHRAI
D., J
AROSZ
W.: Joint importance samplingof low-order volumetric scattering.
ACM Transactions on Graphics 32 (2013), 6. 4[Hoo08] H
OOGENBOOM
J. E.: Zero-variance Monte Carlo schemes re-visited.
Nuclear Science and Engineering 160 , 1 (2008), 1–22. 2[JMD15] J
UNG
J. W., M
EYER
G., D E L ONG
R.: Robust statistical pixelestimation.
Computer Graphics Forum 34 , 2 (2015), 585–596. 3, 4, 9, 11[Kaj86] K
AJIYA
J. T.: The rendering equation.
SIGGRAPH Comput.Graph. 20 (1986), 143–50. 2[KBS15] K
ALANTARI
N. K., B
AKO
S., S EN P.: A machine learningapproach for filtering Monte Carlo noise.
ACM Trans. Graph. 34 , 4(2015). 3[KS13] K
ALANTARI
N. K., S EN P.: Removing the noise in Monte Carlorendering with general image denoising algorithms.
Computer GraphicsForum (Proc. Eurographics) 32 (2013). 3[KW08] K
ALOS
M. H., W
HITLOCK
P. A.:
Monte Carlo methods . Wiley,2008. 1, 2 [LMH ∗
18] L
EHTINEN
J., M
UNKBERG
J., H
ASSELGREN
J., L
AINE
S.,K
ARRAS
T., A
ITTALA
M., A
ILA
T.: Noise2Noise: learning imagerestoration without clean data. In
Proc. of ICML (2018), pp. 2965–2974.3[MGN17] M
ÜLLER
T., G
ROSS
M., N
OVÁK
J.: Practical path guidingfor efficient light-transport simulation.
Computer Graphics Forum 36 , 4(2017), 91–100. 2[PJH16] P
HARR
M., J
AKOB
W., H
UMPHREYS
G.:
Physically basedrendering: from theory to implementation , 3rd ed. Morgan Kaufmann,2016. 2, 4[RMZ13] R
OUSSELLE
F., M
ANZI
M., Z
WICKER
M.: Robust denoisingusing feature and color information.
Computer Graphics Forum (2013). 2[SKW ∗
17] S
CHIED
C., K
APLANYAN
A., W
YMAN
C., P
ATNEY
A.,C
HAITANYA
C. R. A., B
URGESS
J., L IU S., D
ACHSBACHER
C.,L
EFOHN
A., S
ALVI
M.: Spatiotemporal variance-guided filtering: real-time reconstruction for path-traced global illumination. In
Proc. of HighPerformance Graphics (2017), pp. 2:1–2:12. 3[TM98] T
OMASI
C., M
ANDUCHI
R.: Bilateral filtering for gray and colorimages. In
Proc. of ICCV (1998). 2[VKv ∗
14] V
ORBA
J., K
ARLÍK
O., Š IK M., R
ITSCHEL
T., K ˇRIVÁNEK
J.: On-line learning of parametric mixture models for light transportsimulation.
ACM Trans. Graph. 33 , 4 (2014), 101:1–101:11. 2[VRM ∗
18] V
OGELS
T., R
OUSSELLE
F., M C W ILLIAMS
B., R
ÖTHLIN
G., H
ARVILL
A., A
DLER
D., M
EYER
M., N
OVÁK
J.: Denoising withkernel prediction and asymmetric loss functions.
ACM Transactions onGraphics (Proc. SIGGRAPH 2018) 37 , 4 (2018), 124:1–124:15. 3[XP05] X U R., P
ATTANAIK
S. N.: A novel Monte Carlo noise reductionoperator.
IEEE Computer Graphics and Applications 25 , 2 (2005), 31–35.2[ZHD18] Z
IRR
T., H
ANIKA
J., D
ACHSBACHER
C.: Re-weighting fireflysamples for improved finite-sample Monte Carlo estimates.
ComputerGraphics Forum (Proc. EGSR) 37 , 6 (2018), 410–421. 1, 3, 4, 9, 10, 11[ZJL ∗
15] Z
WICKER
M., J
AROSZ
W., L
EHTINEN
J., M
OON
B., R A - MAMOORTHI
R., R
OUSSELLE
F., S EN P., S
OLER
C., Y
OON
S.-E.:Recent advances in adaptive sampling and reconstruction for Monte Carlorendering.
Computer Graphics Forum (Proc. of EG State of the ArtReports) 34 , 2 (2015), 667–681. 1, 3 c (cid:13)(cid:13)