[PDF] Perceptual error optimization for Monte Carlo rendering

Abstract

Realistic image synthesis involves computing high-dimensional light transport integrals which in practice are numerically estimated using Monte Carlo integration. The error of this estimation manifests itself in the image as visually displeasing aliasing or noise. To ameliorate this, we develop a theoretical framework for optimizing screen-space error distribution. Our model is flexible and works for arbitrary target error power spectra. We focus on perceptual error optimization by leveraging models of the human visual system's (HVS) point spread function (PSF) from halftoning literature. This results in a specific optimization problem whose solution distributes the error as visually pleasing blue noise in image space. We develop a set of algorithms that provide a trade-off between quality and speed, showing substantial improvements over prior state of the art. We perform evaluations using both quantitative and perceptual error metrics to support our analysis, and provide extensive supplemental material to help evaluate the perceptual improvements achieved by our methods.

Full PDF

PPerceptual Error Optimization for Monte Carlo Rendering

VASSILLEN CHIZHOV,

MIA Group Saarland University, Max-Planck-Institut für Informatik, Germany

ILIYAN GEORGIEV,

Autodesk, United Kingdom

KAROL MYSZKOWSKI,

Max-Planck-Institut für Informatik, Germany

GURPRIT SINGH,

Max-Planck-Institut für Informatik, Germany

AverageAverage

Ours w.r.t. surrogateOurs w.r.t. surrogate Ours w.r.t. GTOurs w.r.t. GT (a) Error distribution (b) After denoising (c) Tiled power spectrum of the image error

Fig. 1. Using our perception-motivated model, we analyze different ways to distribute screen-space error in Monte Carlo rendering, showing one suchapproach here. We start with a noisy image and some auxiliary data (surface normals and albedo). Average image (leftmost) is rendered with 16 samples perpixel (spp). Our method uses varying spp from the set: { , , , } spp. (a) Our approach exhibits both visually pleasing screen-space error distribution (seeSection 4.1 for details) while keeping less samples per pixel. Our energy model uses a guiding image, or surrogate , which can be created easily from the noisyestimate. We also show our method using the ground truth (GT) image as the guide. (b) To demonstrate the impact of error distribution on denoising, weapply Intel’s Open Image Denoiser [int 2018] on the images in (a): the white-noise distribution results in conspicuous artifacts, which are avoided whenthe error has a blue-noise distribution (as in our results). (c) The corresponding tiled power spectrum ( × pixels per tile) of the error images from (a),confirming that our approach distributes error with a locally blue-noise spectrum. Realistic image synthesis involves computing high-dimensional light trans-port integrals which in practice are numerically estimated using MonteCarlo integration. The error of this estimation manifests itself in the imageas visually displeasing aliasing or noise. To ameliorate this, we develop atheoretical framework for optimizing screen-space error distribution. Ourmodel is flexible and works for arbitrary target error power spectra. Wefocus on perceptual error optimization by leveraging models of the humanvisual system’s (HVS) point spread function (PSF) from halftoning literature.This results in a specific optimization problem whose solution distributesthe error as visually pleasing blue noise in image space. We develop a setof algorithms that provide a trade-off between quality and speed, showingsubstantial improvements over prior state of the art. We perform evaluationsusing both quantitative and perceptual error metrics to support our analysis,and provide extensive supplemental material to help evaluate the perceptualimprovements achieved by our methods.Additional Key Words and Phrases:

Monte Carlo, rendering, sampling,perceptual error, blue noise

Authors’ addresses: Vassillen Chizhov, MIA Group Saarland University,, Max-Planck-Institut für Informatik, Saarbrücken, Germany; Iliyan Georgiev, Autodesk, UnitedKingdom; Karol Myszkowski, Max-Planck-Institut für Informatik, Saarbrücken, Ger-many; Gurprit Singh, Max-Planck-Institut für Informatik, Saarbrücken, Germany.

Using (quasi) Monte Carlo (MC) sampling for rendering producesapproximation error. This error can result in visually displeasingartifacts in the image, unless care is taken to control the correla-tion of the samples used to obtain the individual pixel estimates. Astandard approach is to decorrelate these estimates by randomlyassigning sample sets to pixels, turning potential structured artifactsinto white noise.In digital halftoning, the error induced by quantizing continu-ous tone images has been studied extensively. These studies haveshown that a blue-noise distribution of the quantization error isperceptually optimal [Ulichney 1987] and can achieve substantiallyhigher fidelity than a white-noise distribution. Recent works haveproposed ways to transfer these ideas empirically to image syn-thesis [Georgiev and Fajardo 2016; Heitz and Belcour 2019; Heitzet al. 2019]. This is achieved by carefully re-introducing negative pixel correlation, exploiting the local smoothness present in typicalimages.We propose a theoretical formulation of perceptual error for ren-dering which unifies previous methods in a common theoreticalframework and justifies the need for blue-noise error distribution.We start by extending the comparatively simpler problem of digital , Vol. 0, No. 0, Article 0. Publication date: December 2020. a r X i v : . [ c s . G R ] D ec :2 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh halftoning [Sullivan et al. 1991; Analoui and Allebach 1992; Pappasand Neuhoff 1999] to the substantially more complex one of MCimage synthesis. Our formulation bridges the gap between halfton-ing and rendering by interpreting the error distribution problem asa general extension of non-uniform multitone energy minimizationhalftoning, where the MC estimates are taken to be the admissiblequantization levels in the halftoning setting. Through this insightvirtually any halftoning method can be adapted to work with MCrendering. We demonstrate this by adapting representative methodsfrom the three main classes of halftoning algorithms: dither matrixhalftoning, error diffusion halftoning, iterative energy minimizationhalftoning.Previous methods [Georgiev and Fajardo 2016; Heitz and Belcour2019; Heitz et al. 2019] work by distributing MC error via target errormasks which are produced by minimizing an energy involving a pre-chosen kernel. The kernel, typically a Gaussian, can be interpretedas an approximation to the human visual system’s (HVS) pointspread function (PSF) [Daly 1987; Pappas and Neuhoff 1999]. Werevisit the kernel-based perceptual model from halftoning [Sullivanet al. 1991; Analoui and Allebach 1992; Pappas and Neuhoff 1999]and reformulate it for MC rendering. The resulting energy can beoptimized for MC error distribution without any explicit mask. Byproviding a formal analysis, we make the hidden assumptions ofprevious methods explicit and quantify their limitations. Notably, allprevious methods can be seen as variants of dither matrix halftoningintroducing varying degrees of bias. In summary: ● Our formulation unifies both a-priori and a-posteriori methods.This removes the necessity of using empirically motivated en-ergies, making all assumptions explicit, and prescribing generalguidelines for devising new a-priori and a-posteriori methods ina principled manner. ● All a-posteriori methods in our framework can be seen as meth-ods, which can be used to decrease the bias of a rendered biasedimage. This provides a continuous spectrum between the fullyunbiased estimates (white noise error distribution) and highlybiased such (high quality blue noise distribution). This insightmakes the limitations of a-posteriori methods quantifiable. ● We demonstrate our theoretical results through several experi-ments that showcase that we can substantially improve the errordistribution in rendering compared to all previous state of the artmethods (Section 7) which aim to redistribute the error as bluenoise. Notably, this is achieved while using the exact same render-ing data as previous methods. These results include but are notlimited to: suppressing fireflies, removing visually objectionablehallucinated artifacts from a denoiser, producing images closerto the ground truth after convolution, achieving a high qualityblue noise error distribution in animations even in the presenceof large temporal discontinuities.Following an overview of relevant prior work (Section 2), we presentnecessary background and motivation in Section 3.1. We then deviseour perceptual error formulation for image synthesis (Section 3.2)and present our algorithms for solving the proposed optimizationproblem (Section 4). We discuss extensions to our base model inSection 6, and demonstrate the benefits on several realistic scenes in Section 7. We conclude with an outline of promising directionsfor future research (Section 8).

Our work draws from diverse research in imaging, rendering, andperception. We begin our exposition with a structured survey ofrelevant prior work in these fields.

Digital halftoning [Stoffel and Moreland 1981] involves creatingthe illusion of continuous-tone images through the arrangement ofbinary elements; various algorithms target different display devices.Bayer [1973] developed the widely used dispersed-dot ordered ditherpatterns. Allebach and Liu [1976] introduced the use of randomnessin clustered-dot ordered dithering. Ulichney [1987] introduced blue-noise patterns that yield better perceptual quality. Mitsa and Parker[1991] mimic those patterns to produce ordered dither arrays forgiven spatial frequency domain characteristics. In 1993, Ulichneyproposed the void-and-cluster algorithm which uses a Gaussiankernel to create dither masks with isotropic blue-noise distribu-tion. Sullivan et al. [1991] capitalized on this kernel-based approachand developed an energy function in the Fourier domain to obtainvisually optimal halftone patterns. Analoui and Allebach [1992]designed a spatial-domain interpretation of Sullivan et al.’s model,in order to develop a practical algorithm for blue-noise dithering.Their approach was later refined by Pappas and Neuhoff [1999].Motivated by Ulichney’s [1993] approach, various structure-awarehalftoning algorithms were developed in graphics [Ostromoukhov2001; Pang et al. 2008; Chang et al. 2009]. In this work, we lever-age the kernel-based model [Analoui and Allebach 1992; Pappasand Neuhoff 1999] in the context of Monte Carlo rendering [Kajiya1986].

In rendering, the light transport integral estimation error ( ℒ , MSE,or RMSE) is usually reported as a single value evaluated over thewhole image or a set of images. Since these metrics often do not ac-curately reflect the visual quality, equal-time visual comparisons arealso commonly reported. Various theoretical frameworks have beendeveloped in the spatial [Niederreiter 1992; Kuipers and Niederreiter1974] and Fourier [Singh et al. 2019] domains to understand theerror reported through these metrics. Numerous variance reduc-tion algorithms like multiple importance sampling [Veach 1998],and control variates [Loh 1995; Glasserman 2004] improve the er-ror convergence of light transport renderings. Recently, Celareket al. [2019] proposed error spectrum ensemble (ESE), a tool formeasuring the distribution of error over frequencies. ESE revealscorrelation between pixels and can be used to detect outliers, whichoffset the amount of error substantially.Many denoising methods [Zwicker et al. 2015] employ thesemetrics to optimize over noisy images to get noise-free renderings.Even if the most advanced denoising techniques driven by suchmetrics can efficiently steer adaptive sampling [Chaitanya et al.2017; Kuznetsov et al. 2018; Kaplanyan et al. 2019], they emphasize , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:3 on the local sample density, and are incapable of indicating theoptimal sample layout in screen space.Contrary to aforementioned metrics, we develop a perceptualmodel for rendered images that focuses on the perceptually optimaldistribution of error, i. e., with respect to the human visual system’s(HVS) modulation transfer function (MTF) [Daly 1987; Sullivanet al. 1991]. Our theoretical framework argues that the screen-spacesample layout is crucial for perceptual fidelity, which the mostcommonly used error metrics do not capture. Although substantial progress in the study of HVS is continuouslymade, well understood are mostly the early stages of the visualpathways from the eye optics, through the retina to the visual cor-tex. This limits the scope of existing computational models of theHVS that are used in imaging and computer graphics. Such modelsshould additionally be computationally efficient and generalize oversimplistic stimuli that have been used in their derivation throughpsychophysical experiments.

Contrast sensitivity function.

The contrast sensitivity function (CSF)is one of the core HVS models that fulfills those conditions and com-prehensively characterizes overall optical [Westheimer 1986; Deeleyet al. 1991] and neural processes [Souza et al. 2011] in detectingcontrast visibility as a function of spatial frequency. While origi-nally it is modeled as a band-pass filter [Barten 1999; Daly 1993],its shape changes towards a low-pass filter with retinal eccentricity[Robson and Graham 1981; Peli et al. 1991] and reduced luminanceadaptation in scotopic and mesopic levels [Wuerger et al. 2020]. Low-pass characteristics are also inherent for chromatic CSFs [Mullen1985; Wuerger et al. 2020; Bolin and Meyer 1998]. In many practicalimaging applications, e.g., JPEG compression [Rashid et al. 2005],rendering [Ramasubramanian et al. 1999], or halftoning [Pappas andNeuhoff 1999], the CSF is modeled as a low-pass filter, which alsoallows for a better control of image intensity. By normalizing such aCSF by the maximum contrast sensitivity value, a unitless functionakin to the MTF can be derived [Daly 1987; Mannos and Sakrison1974; Mantiuk et al. 2005; Sullivan et al. 1991; Souza et al. 2011] thatafter transforming from frequency to spatial domain results in thePoint Spread Function (PSF) [Analoui and Allebach 1992; Pappasand Neuhoff 1999]. Following Pappas and Neuhoff [1999] we ap-proximate such a PSF by a suitable Gaussian filter, where the errorof such approximation becomes practically negligible for sampledensity of 300 dpi and the viewer distance to the screen beyond60 cm.

Advanced quality metrics.

More costly, and often less robust, model-ing of the HVS beyond the CSF is performed at advanced qualitymetrics [Lubin 1995; Daly 1993; Mantiuk et al. 2011] that have beenadapted to rendering where they guide the computation to the im-age regions where the visual error is most perceived [Bolin andMeyer 1995, 1998; Ramasubramanian et al. 1999; Ferwerda et al.1996; Myszkowski 1998; Volevich et al. 2000]. An important appli-cation is visible noise reduction in ray and path tracing by content

Table 1. List of commonly used notations throughout the paper.

Symbol Meaning ∐︀⋅ , ⋅̃︀ , ⊙ , ∗ inner product, element-wise product, convolution ∏︁⋅∏︁ 𝑔𝑔𝑔 , ˆ 𝑔𝑔𝑔 , ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ convolution kernel, its Fourier and power spectra 𝐼𝐼𝐼 , 𝑄𝑄𝑄 , 𝜖𝜖𝜖 = 𝑄𝑄𝑄 − 𝐼𝐼𝐼 reference image, estimated image, error image 𝑆 𝑖 , 𝑆𝑆𝑆 sample set for pixel 𝑖 , sample sets for all pixelsadaptive sample density control [Bolin and Meyer 1995, 1998; Rama-subramanian et al. 1999]. Our framework enables further significantreduction of the noise visibility for the same sample budget. Sampling in rendering.

Sample correlations [Singh et al. 2019] di-rectly affect the error in rendering. Quasi-Monte Carlo samplers [Hal-ton 1964; Sobol 1967] preserve correlations (i. e., stratification) wellin higher dimensions, which makes them a perfect candidate forrendering problems [Keller 2013]. However, imposing perceptualcontrol over these samplers is not well studied. On the other hand,stochastic samplers are shown to have direct resemblance to thecone layout in the human eye [Yellott 1983]. This has inspired thedevelopment of various stochastic sample correlations in render-ing [Cook 1986; Dippé and Wold 1985; Mitchell 1991], e. g., bluenoise. In this work, we do not focus on construction of blue-noisepoint sets, but develop a theoretical framework to obtain percep-tually pleasing distribution of error in screen space for renderingpurposes.

Blue-noise error distribution.

Mitchell [1991] observed that bluenoise is a desirable property for spectrally optimal ray tracing.Georgiev and Fajardo [2016] were the first to apply results fromhalftoning literature to screen space-error redistribution for pathtracing. The resulting perceptual quality improvements are substan-tial for smooth enough integrands.Motivated by the results of Georgiev and Fajardo [2016], Heitzand Belcour [2019] devised a technique aiming to directly optimizethe error distribution instead of operating on sample distributions.Their pixel permutation strategy fits the initially white-noise pixelintensities to a prescribed blue-noise mask. This approach scales wellwith sample count and dimensionality, though the error distributionquality is limited by the fitting to a specific mask and degradesto white noise near geometry discontinuities, unlike the methodsof Georgiev and Fajardo [2016] and Heitz et al. [2019].We propose a perceptual error framework based on an expressivemodel that unifies these prior methods, providing insight into their(implicit) assumptions and guidelines to alleviate some of theirdrawbacks. Our general perceptual error formulation does not relyon a target (blue-noise) mask.

Our goal is to produce Monte Carlo renderings that, at a fixed sam-pling rate, are perceptually as close to the ground truth as possible.This requires formalizing the perceptual image error along with anoptimization problem that minimizes it. In this section, we builda perceptual model upon the extensive studies done in halftoning , Vol. 0, No. 0, Article 0. Publication date: December 2020. :4 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Image Image spectrum Kernel spectrum Product spectrum 𝜖𝜖𝜖 w ⋃︀ ˆ 𝜖𝜖𝜖 w ⋃︀ ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ ⊙ ⋃︀ ˆ 𝜖𝜖𝜖 w ⋃︀ 𝜖𝜖𝜖 b ⋃︀ ˆ 𝜖𝜖𝜖 b ⋃︀ ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ ⊙ ⋃︀ ˆ 𝜖𝜖𝜖 b ⋃︀ Fig. 2. Error images 𝜖𝜖𝜖 w and 𝜖𝜖𝜖 b with respective white-noise, ⋃︀ ˆ 𝜖𝜖𝜖 w ⋃︀ , and blue-noise, ⋃︀ ˆ 𝜖𝜖𝜖 b ⋃︀ , power spectra. For a low-pass kernel 𝑔𝑔𝑔 modeling the PSF of theHVS (here a Gaussian with std. dev. 𝜎 = ), the product of its spectrum ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ with ⋃︀ ˆ 𝜖𝜖𝜖 b ⋃︀ has lower magnitude than the product with ⋃︀ ˆ 𝜖𝜖𝜖 w ⋃︀ . This corre-sponds to lower perceptual sensitivity to 𝜖𝜖𝜖 b , even though 𝜖𝜖𝜖 w has the sameamplitude as it is obtained by randomly permuting the pixels of 𝜖𝜖𝜖 b . literature. We will discuss how to efficiently solve the resultingoptimization problem in Section 4.Given a ground-truth image 𝐼𝐼𝐼 and its quantized or noisy approxi-mation

𝑄𝑄𝑄 , we denote the signed error image by: 𝜖𝜖𝜖 = 𝑄𝑄𝑄 − 𝐼𝐼𝐼 . (1)It is useful to quantify the error as a single number. A commonapproach is to take the ℒ , ℒ , or ℒ ∞ norm of 𝜖𝜖𝜖 . Such vector normsare permutation-invariant, i. e., they account for the magnitudes ofindividual pixel errors but not for their distribution over the image.This distribution is an important factor for the perceived fidelity,since contrast perception is an inherently spatial characteristic ofthe HVS (Section 2.3). Several metrics have been proposed in halftoning literature to cap-ture the human perception of image error. Such metrics model theprocessing done by the HVS as a convolution of the error image 𝜖𝜖𝜖 with a kernel 𝑔𝑔𝑔 (see notation in Table 1): 𝐸 = ∏︁ 𝑔𝑔𝑔 ∗ 𝜖𝜖𝜖 ∏︁ = ∏︁ ˆ 𝑔𝑔𝑔 ⊙ ˆ 𝜖𝜖𝜖 ∏︁ = ∐︀⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ , ⋃︀ ˆ 𝜖𝜖𝜖 ⋃︀ ̃︀ . (2)The convolution is equivalent to the element-wise product of thecorresponding Fourier spectra ˆ 𝑔𝑔𝑔 and ˆ 𝜖𝜖𝜖 , whose 2-norm is in turnequal to the inner product of the power spectra images ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ and ⋃︀ ˆ 𝜖𝜖𝜖 ⋃︀ . Sullivan et al. [1991] minimized the error (2) w.r.t. a kernel 𝑔𝑔𝑔 that approximates the HVS’s modulation transfer function ⋃︀ ˆ 𝑔𝑔𝑔 ⋃︀ (MTF) [Daly 1987]. Analoui and Allebach [1992] used a similarmodel in the spatial domain where the kernel approximates the PSFof the human eye.Convolving the error image with a kernel incorporates both themagnitude and the distribution of the error into the resulting metric 𝐸 . In general, the kernel 𝑔𝑔𝑔 can have arbitrary form and charac-teristics; we assume it represents the HVS PSF. As we discuss inSection 2.3, the HVS sensitivity to a spatial signal can be well ap-proximated by a low-pass filter. Optimizing the error image 𝜖𝜖𝜖 to 𝜎 = 𝜎 = . 𝜎 = . 𝜎 = Fig. 3. The appearance of blue noise ( left images) converges to a constantimage faster than white noise ( right images) with increasing observer dis-tance, simulated here by the standard deviation 𝜎 of a Gaussian kernel. minimize the cost (2) w.r.t. a low-pass kernel would then naturallyyield a blue-noise error distribution (see Fig. 2). Consequently,such a distribution can be seen as a byproduct of such optimization,which pushes the spectral components of the error to frequenciesleast visible to the human eye. To better understand the spatial as-pects of contrast sensitivity in the HVS, the MTF (magnitude of thePSF Fourier transform) is usually modeled over a range of viewingdistances [Daly 1993]. This is done to account that with increasingviewer distance, spatial frequencies in the image are projected intohigher spatial frequencies in the retina. These eventually becomeinvisible, filtered out by the PSF that expands its correspondingkernel in image space. We recreate this experiment to understandthe impact of distance over the nature of the error distribution (bluevs. white noise). In Fig. 3, we convolve white- and blue-noise dis-tributions with a Gaussian kernel. Increasing the kernel’s standarddeviation corresponds to increasing the distance between observerand screen. The blue-noise distribution reaches a homogeneousstate (where the image tone is indiscernible) faster compared towhite noise. In the context of rendering, this is equivalent to blue-noise error becoming indiscernible at smaller distances (or smallerdenoising low-pass filters) compared to white-noise error. Conse-quently, a smaller kernel (i. e., viewing distance) allows preservingmore image details while removing the noise, for an optimized errordistribution. In rendering, the value of each pixel 𝑖 is estimated via point sampling.Its signed error is thus a function of the sample set 𝑆 𝑖 used for itsestimation: 𝜖 𝑖 ( 𝑆 𝑖 ) = 𝑄 𝑖 ( 𝑆 𝑖 ) − 𝐼 𝑖 , where 𝑄 𝑖 is the pixel estimate and 𝐼 𝑖 is the reference (i. e., ground-truth) pixel value. The error of theentire image can be written in matrix form, similarly to Eq. (1): 𝜖𝜖𝜖 ( 𝑆𝑆𝑆 ) =

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) −

𝐼𝐼𝐼, (3)where

𝑆𝑆𝑆 = { 𝑆 , . . . , 𝑆 𝑁 } is an “image” containing the sample set foreach of the 𝑁 pixels. With these definitions, we can express theperceptual error in Eq. (2) for the case of Monte Carlo rendering asa function of the sample-set image 𝑆𝑆𝑆 : 𝐸 ( 𝑆𝑆𝑆 ) = ∏︁ 𝑔𝑔𝑔 ∗ 𝜖𝜖𝜖 ( 𝑆𝑆𝑆 )∏︁ , (4)where 𝑔𝑔𝑔 is a given kernel, e. g., the PSF of the HVS.Our goal is to minimize the energy in Eq. (4). We formulate it asan optimization problem: The term “blue noise” is often used loosely to refer to any spectrum with minimallow-frequency content and no concentrated energy spikes., Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:5 arg min

𝑆𝑆𝑆 ∈ ΩΩΩ ∏︁ 𝑔𝑔𝑔 ∗ 𝜖𝜖𝜖 ( 𝑆𝑆𝑆 )∏︁ . (5)The solution 𝑆𝑆𝑆 produces an image estimate

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) that is closest tothe reference 𝐼𝐼𝐼 w.r.t. the kernel 𝑔𝑔𝑔 . The search space

ΩΩΩ is the set ofall possible locations for every sample of every pixel.Note that the classical MSE metric corresponds to using a zero-width (i. e., one-pixel) kernel 𝑔𝑔𝑔 in Eq. (4). However, the MSE accountsonly for the magnitude of the error 𝜖𝜖𝜖 , while using wider kernels(such as the PSF) accounts for both magnitude and distribution.Consequently, while the MSE can be minimized by optimizing pixelsindependently, minimizing the perceptual error requires spatialcoordination. In the following section, we devise practical strategiesfor solving this optimization problem.

The search space in Eq. (5) for each sample set of every pixel is a high-dimensional unit hypercube. Each point in this so-called primarysample space maps to a light transport path in the scene [Pharr et al.2016]. Optimizing for the sample-set image

𝑆𝑆𝑆 thus entails evaluatingthe contributions

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) of all corresponding paths. This evaluationis costly, and for any non-trivial scene, 𝑄𝑄𝑄 is an unpredictable func-tion with many discontinuities. This precludes us from studying all(uncountably infinite) sample locations in practice.To make the problem tractable, we restrict the search in eachpixel to a finite number of (pre-defined) sample sets.] We devisetwo variants of the resulting discrete optimization problem, whichdiffer in their definition of the search space

ΩΩΩ . In the first variant,each pixel has a separate list of sample sets to choose from (“ver-tical” search space). The setting is similar to that of (multi-tone)halftoning [Lau and Arce 2007], which allows us to import clas-sical optimization techniques from that field, such as mask-baseddithering, error diffusion, and iterative minimization. In the secondvariant, each pixel has one associated sample set, and the searchspace comprises permutations of these assignments (“horizontal”search space). We develop a greedy iterative optimization methodfor this second variant.In contrast to halftoning, in our setting the ground-truth image

𝐼𝐼𝐼 ,required to compute the error 𝜖𝜖𝜖 during optimization, is not readilyavailable. We describe our algorithms assuming the ground truth isavailable, and then discuss how to substitute it with a surrogate tomake the algorithms practical.

Our first variant considers a vertical search space where the sampleset for each of the 𝑁 image pixels is one of 𝑀 given sets: ΩΩΩ = {

𝑆𝑆𝑆 = { 𝑆 , . . . , 𝑆 𝑁 } ∶ 𝑆 𝑖 ∈ { 𝑆 𝑖, , . . . , 𝑆 𝑖,𝑀 }} . (6)The objective is to find a sample set for every pixel such that allresulting pixel estimates together minimize the perceptual error (4).This is equivalent to directly optimizing over the 𝑀 possible es-timates 𝑄 𝑖, , . . . , 𝑄 𝑖,𝑀 for each pixel 𝑖 , with 𝑄 𝑖,𝑗 = 𝑄 𝑖 ( 𝑆 𝑖,𝑗 ) . Theseestimates can be obtained by, e. g., rendering a stack of 𝑀 images A more general formulation could operate on individual samples, without groupingthem into per-pixel sets; we leave this study to future work.

𝑄𝑄𝑄 𝑗 = { 𝑄 ,𝑗 , . . . , 𝑄 𝑁,𝑗 } , with 𝑗 = ..𝑀 . The resulting minimizationproblem reads arg min 𝑂𝑂𝑂 ∶ 𝑂 𝑖 ∈ { 𝑄 𝑖, ,...,𝑄 𝑖,𝑀 } ∏︁ 𝑔𝑔𝑔 ∗ ( 𝑂𝑂𝑂 − 𝐼𝐼𝐼 )∏︁ . (7) 𝑂 𝑂 𝑄 ,𝑀 𝑄 ,𝑀 𝑂 𝑂 𝑄 , 𝑄 , 𝑂 𝑂 𝑄 , 𝑄 , The problem in Eq. (7) is al-most identical to that in mul-titone halftoning. The onlydifference is that in our set-ting the “quantization levels”,i. e., the pixel estimates, are non-uniform and differ from pixel topixel as they are not pre-defined but are the result of point-samplinga spatially varying light-transport integral. This similarity allowsus to directly apply existing optimization techniques from halfton-ing [Lau and Arce 2007]. We consider three such methods, whichwe outline in Alg. 1 and describe next.

Iterative minimization.

Some halftoning methods aim to solve theproblem (7) directly via iterative greedy minimization [Analoui andAllebach 1992; Pappas and Neuhoff 1999]. After initializing eachpixel to a random quantization level, our implementation traversesthe image in serpentine order (as is standard practice in halfton-ing) and for each pixel picks the level that minimizes the energy.Several full-image iterations are performed; in our experiments con-vergence to a local minimum is achieved within 10–20 iterations. Asa further improvement, the optimization can be terminated whenthe perceptual error reduction rate falls below a certain thresholdor when no estimates are updated within one full iteration. Randompixel traversal order allows terminating at any point but convergesslightly slower.

Error diffusion.

A classical halftoning algorithm, error diffusionscans the image pixel by pixel, snapping each to the closest quanti-zation level and distributing the resulting error to yet unprocessednearby pixels according to a given diffusion kernel 𝜅𝜅𝜅 [Floyd andSteinberg 1976]. In our setting, this can be interpreted as using a ker-nel ( 𝜅𝜅𝜅 ) that is learned over a specific class of functions, such that theresult of error diffusion approximately implies minimizing Eq. (7)[Hocevar and Niger 2008]. For color images, the Euclidean distancein a relevant color space can be used to find the closest quantizationlevel. .

Dither masks.

An efficient halftoning approach is to quantize pixelvalues using thresholds stored in a pre-computed dither mask (ormatrix) [Spaulding et al. 1997]. For each pixel, the closest lower andhigher (in terms of brightness) quantization levels to the referencevalue are found, and one of the two is chosen based on the thresh-old associated with the pixel. This aims to transfer the spectralcharacteristics of the mask to the error distribution of the ditheredimage. In a blue-noise mask, neighboring pixels have very differentthresholds, leading to a visually pleasant high-frequency outputerror distribution. Following Eq. (7), here the minimization involvestwo parts. First, the pre-processing step (performed offline) thatinvolves minimizing arg min

𝐵𝐵𝐵 ∏︁ 𝑔𝑔𝑔 ∗ 𝐵𝐵𝐵 ∏︁ to obtain a mask 𝐵𝐵𝐵 using 𝑔𝑔𝑔 , which is equivalent to the problem in Eq. (7) with a linear inte-grand and

𝐼𝐼𝐼 = ∏︁ 𝑂𝑂𝑂 − 𝐼𝐼𝐼 − 𝑓 ( 𝐵𝐵𝐵 )∏︁ isminimized, which can be done efficiently through sorting. Here 𝑓 , Vol. 0, No. 0, Article 0. Publication date: December 2020. :6 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh Algorithm 1. Three algorithms to (approximately) solve the problem as-sociated with a vertical search space (7) , producing an output image

𝑂𝑂𝑂 = { 𝑂 , . . . ,𝑂 𝑁 } given a reference image 𝐼𝐼𝐼 and a stack of initial estimates

𝑄𝑄𝑄 , . . . ,𝑄𝑄𝑄 𝑀 . Our iterative minimization scheme updates pixels repeatedly,for each picking the estimate that minimizes the perceptual error (4) . Errordiffusion quantizes each pixel to the closest estimate, distributing the errorto its neighbours based on a kernel 𝜅𝜅𝜅 . Dithering quantizes each pixel in 𝐼𝐼𝐼 based on thresholds in a dither mask

𝐵𝐵𝐵 (optimized w.r.t. the kernel 𝑔𝑔𝑔 ). function IterativeMinimization( 𝑔𝑔𝑔 , 𝐼𝐼𝐼 , 𝑄𝑄𝑄 , . . . , 𝑄𝑄𝑄 𝑀 , 𝑂𝑂𝑂 , 𝑇 ) 𝑂𝑂𝑂 ← { 𝑄 , rand , . . . , 𝑄 𝑁, rand } ←←← Init each pixel to random estimate3: for 𝑇 iterations do for pixel 𝑖 = ..𝑁 do ←←← E. g., random or serpentine order5: for estimate 𝑄 𝑖,𝑗 ∈ { 𝑄 𝑖, , . . . , 𝑄 𝑖,𝑀 } do if 𝑂 𝑖 = 𝑄 𝑖,𝑗 reduces ∏︁ 𝑔𝑔𝑔 ∗ ( 𝑂𝑂𝑂 − 𝐼𝐼𝐼 )∏︁ then 𝑂 𝑖 ← 𝑄 𝑖,𝑗 ←←← Update estimate8: function

ErrorDiffusion( 𝜅𝜅𝜅 , 𝐼𝐼𝐼 , 𝑄𝑄𝑄 , . . . , 𝑄𝑄𝑄 𝑀 , 𝑂𝑂𝑂 ) 𝑂𝑂𝑂 ← 𝐼𝐼𝐼 ←←←

Initialize solution to reference10: for pixel 𝑖 = ..𝑁 do ←←← E. g., raster or serpentine order11: 𝑂 old 𝑖 ← 𝑄 𝑖 𝑂 𝑖 ∈ arg min 𝑄 𝑖,𝑗 ∏︁ 𝑂 old 𝑖 − 𝑄 𝑖,𝑗 ∏︁ 𝜖 𝑖 ← 𝑂 old 𝑖 − 𝑂 𝑖 ˘˘˘ Diffuse error 𝜖 𝑖 to yet unprocessed neighbors14: for unprocessed pixel 𝑘 within support of 𝜅𝜅𝜅 around 𝑖 do 𝑂 𝑘 ← 𝑂 𝑘 + 𝜖 𝑖 ⋅ 𝜅 𝑘 − 𝑖 function Dithering(

𝐵𝐵𝐵 , 𝐼𝐼𝐼 , 𝑄𝑄𝑄 , . . . , 𝑄𝑄𝑄 𝑀 , 𝑂𝑂𝑂 ) for pixel 𝑖 = ..𝑁 do ˘˘˘ Find tightest interval (︀ 𝑄 low 𝑖 ,𝑄 high 𝑖 ⌋︀ 𝑄 low 𝑖 ← arg max 𝑄 𝑖,𝑗 ∶ ⋃︀ 𝑄 𝑖,𝑗 ⋃︀ ≤ ⋃︀ 𝐼 𝑖 ⋃︀ ⋃︀ 𝑄 𝑖,𝑗 ⋃︀ containing 𝐼 𝑖 𝑄 high 𝑖 ← arg min 𝑄 𝑖,𝑗 ∶ ⋃︀ 𝑄 𝑖,𝑗 ⋃︀ > ⋃︀ 𝐼 𝑖 ⋃︀ ⋃︀ 𝑄 𝑖,𝑗 ⋃︀ if ⋃︀ 𝐼 𝑖 ⋃︀ − ⋃︀ 𝑄 low 𝑖 ⋃︀ < 𝐵 𝑖 ⋅ (⋃︀ 𝑄 high 𝑖 ⋃︀ − ⋃︀ 𝑄 low 𝑖 ⋃︀) then 𝑂 𝑖 ← 𝑄 low 𝑖 ˜˜˜ Quantize 𝐼 𝑖 to 𝑄 low 𝑖 or 𝑄 high 𝑖 using threshold 𝐵 𝑖 else 𝑂 𝑖 ← 𝑄 high 𝑖 is a monotonic mapping, and different choices result in differentalgorithms (see more details in Section 7 in the supplemental). We now describe the second horizontal dis-crete variant of our minimization formula-tion (5). It considers a single sample set 𝑆 𝑖 assigned to each of the 𝑁 pixels, all repre-sented together as a sample-set image 𝑆𝑆𝑆 . Thesearch space comprises all possible permuta-tions Π ( 𝑆𝑆𝑆 ) of these assignments: ΩΩΩ = Π ( 𝑆𝑆𝑆 ) , with 𝑆𝑆𝑆 = { 𝑆 , . . . , 𝑆 𝑁 } . (8)The goal is to find a permutation 𝜋 ( 𝑆𝑆𝑆 ) that minimizes the perceptualerror (4). The optimization problem (5) thus takes the formarg min 𝜋 ( 𝑆𝑆𝑆 )∈ Π ( 𝑆𝑆𝑆 ) 𝐸 ( 𝜋 ( 𝑆𝑆𝑆 )) = arg min 𝜋 ( 𝑆𝑆𝑆 )∈ Π ( 𝑆𝑆𝑆 ) ∏︁ 𝑔𝑔𝑔 ∗ ( 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 )∏︁ . (9) Algorithm 2. Given a convolution kernel 𝑔𝑔𝑔 , a reference image

𝐼𝐼𝐼 , an initialpixel sample-set assignment

𝑆𝑆𝑆 , and an image estimated with that assign-ment

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) , our greedy sample relocation algorithm iteratively swaps pixelassignment to minimize the perceptual error 𝐸 ΔΔΔ (10) , producing a permuta-tion 𝜋 of the sample-set assignment. function IterativeMinimization( 𝑔𝑔𝑔 , 𝐼𝐼𝐼 , 𝑆𝑆𝑆 , 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) , 𝑇 , 𝑅 , 𝜋 ) 𝜋 ← identity permutation ←←← Initialize solution permutation3: for 𝑇 iterations do for pixel 𝑖 = ..𝑁 do ←←← E. g., random or serpentine order5: 𝜋 ′ ← 𝜋 ˘˘˘ Find best pixel in neighborhood to swap with6: for pixel 𝑗 in ( 𝑅 + ) neighborhood centered at 𝑖 do if 𝐸 ΔΔΔ ( 𝜋 𝑖 ⇆ 𝑗 ( 𝑆𝑆𝑆 )) < 𝐸 ΔΔΔ ( 𝜋 ′ ( 𝑆𝑆𝑆 )) then ←←← Eq. (10)8: 𝜋 ′ ← 𝜋 𝑖 ⇆ 𝑗 ←←← Accept swap as current best9: 𝜋 ← 𝜋 ′ We can explore the permutation space Π ( 𝑆𝑆𝑆 ) by swapping the sample-set assignments between pixels. The minimization requires updat-ing the image estimate 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) for each permutation 𝜋 ( 𝑆𝑆𝑆 ) , i. e.,after each swap. Such updates are costly as they involve multipleray-tracing operations for each of potentially millions of swaps. Weneed to eliminate these extra rendering invocations during the opti-mization to make it practical. To that end, we observe that for pixelssolving similar light-transport integrals, swapping their sample setsgives a similar result to swapping their estimates. We thereforerestrict the search space to permutations that can be generatedthrough swaps between such pixels. This enables efficient optimiza-tion by directly swapping the pixel estimates of an initial rendering 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) . Error decomposition.

Formally, we express the estimate producedby a sample-set permutation in terms of permuting the pixels of theinitial rendering:

𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) = 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ))+

ΔΔΔ . The error

ΔΔΔ is zero whenthe swapped pixels solve the same integral(s). Substituting intoEq. (9), we can approximate the perceptual error by (see derivationin Appendix A) 𝐸 ( 𝜋 ( 𝑆𝑆𝑆 )) = ∏︁ 𝑔𝑔𝑔 ∗ ( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 + ΔΔΔ )∏︁ (10) ≈ ∏︁ 𝑔𝑔𝑔 ∗ ( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 )∏︁ + ∏︁ 𝑔𝑔𝑔 ∏︁ ∑ 𝑖 𝑑 ( 𝑖, 𝜋 ( 𝑖 )) = 𝐸 ΔΔΔ ( 𝜋 ( 𝑆𝑆𝑆 )) . In the approximation 𝐸 ΔΔΔ , the term 𝑑 ( 𝑖, 𝜋 ( 𝑖 )) measures the dissim-ilarity between pixel 𝑖 and the pixel 𝜋 ( 𝑖 ) it is relocated to by thepermutation. The purpose of this metric is to predict how differentwe expect the result of re-estimating the pixels after relocating theirsample sets to be compared to directly relocating their initial esti-mates. It can be constructed based on assumptions or knowledgeabout the image, e. g., coming from auxiliary buffers (depth, normals,etc). Local similarity assumption.

In our implementation we use a simplebinary dissimilarity function returning zero when 𝑖 and 𝜋 ( 𝑖 ) arewithin some distance 𝑟 and infinity otherwise. We set 𝑟 between1 and 3; it should ideally be locally adapted to the image smooth-ness/regularity. This allows us to restrict the search space Π ( 𝑆𝑆𝑆 ) onlyto permutations that swap adjacent pixels where it is more likelythat ΔΔΔ is small. , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:7

Iterative minimization.

We devise a greedy iterative minimizationscheme for this variant similar in spirit to the iterative minimizationin Alg. 1. Given an initial image estimate

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) produced by ran-domly assigning a sample set to every pixel, our algorithm iteratesover all pixels and for each considers swaps within a ( 𝑅 + ) neigh-borhood; we use 𝑅 =

1. The swap that brings the largest reductionin the perceptual error 𝐸 ΔΔΔ is accepted. Several full-image iterationsare performed. Algorithm 2 provides pseudo code. We use 𝑇 = 𝑅 controls the trade-off between the cost of oneiteration and the amount of exploration it can do. Note that thisparameter is different from the maximal relocation distance 𝑟 in thedissimilarity metric, and also 𝑅 ≤ 𝑟 .Due to the pixel (dis)similarity assumptions, the optimizationcan produce some mispredictions, i. e., it may swap the estimatesof pixels for which swapping the sample sets produces a signif-icantly different result. Thus 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) cannot be used as a finalimage estimate. We therefore re-render the image with the opti-mized permutation 𝜋 to obtain the final estimate 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) . Practical algorithms can be classified based on the choice of searchspace ( horizontal versus vertical ) and optimization strategy (itera-tive, error diffusion, dithering). The above choices affect the overallquality and speed of algorithms.

Search space.

Discretizing the search space

ΩΩΩ makes the optimiza-tion problem (5) tractable. To make it truly practical, it is necessaryto avoid repeated image estimation (i. e.,

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) evaluation) duringthe search for the solution 𝑆𝑆𝑆 . Our vertical (7) and horizontal (9)optimization variants are formulated specifically with this goal inmind. All methods in Algs. 1 and 2 operate on pre-generated imageestimates that constitute the solution search space.Our vertical formulation takes a set of 𝑀 input estimates { 𝑄 𝑖,𝑗 = 𝑄 𝑖 ( 𝑆 𝑖,𝑗 )} 𝑀𝑗 = for every pixel 𝑖 , one for each sample set 𝑆 𝑖,𝑗 . Notingthat 𝑄 𝑖,𝑗 are MC estimates of the true pixel value, this set can becheaply expanded to a size as large as 2 𝑀 − 𝑀 .In contrast, our horizontal formulation builds a search space givenjust a single input estimate 𝑄 𝑖 per pixel 𝑖 . We consciously restrictthe space to permutations between nearby pixels, so as to lever-age local pixel similarity and avoid repeated pixel evaluation. Thedisadvantage of this approach is that it requires re-rendering the im-age after optimization, with unpredictable results (mispredictions)that can lead to local degradation of image quality. Additionally,while some halftoning methods (iterative energy minimization andversions of dither matrix halftoning) can be adapted to this searchspace, it is non-trivial to find straightforward reformulations forother algorithms (e. g., error diffusion).It is also possible to combine our two formulations into a hybridone that takes one input estimate per pixel but considers a separate search space for each pixel, constructed by borrowing neighboringestimates. Optimization strategy.

Another important design decision is thechoice of optimization method. For the vertical formulation, iterativemethods provides the most flexibility, control, and quality, but aremost computationally expensive. In contrast, error diffusion anddither-mask halftoning can be seen as only approximately solvingthe optimization problem (7), yielding results of a lower quality. Animportant difference between classical halftoning and our extensionis the fact that quantization levels differ between pixels. This makesthe gap in quality between image-adaptive (iterative, error diffusion)and non-adaptive (dither-mask halftoning) methods even larger.Additionally, in the classical MC rendering setting, iterative anderror-diffusion methods handle color natively, while dither-maskhalftoning requires greyscale input. While this can be mitigatedby modifying the renderer or by extending the dithering throughalgorithms such as bipartite Euclidean matching, this comes at botha performance and quality cost compared to iterative and errordiffusion algorithms. The main advantage of dither-mask halftoningover error diffusion in our setting is that the former involves thekernel 𝑔𝑔𝑔 explicitly, while error diffusion relies on a diffusion kernel 𝜅𝜅𝜅 that cannot be related directly to 𝑔𝑔𝑔 . Surrogate for reference image.

All our formulations depend on thereference image

𝐼𝐼𝐼 which is unknown, unlike in halftoning. To thatend, the reference can be substituted with an approximation, i. e., surrogate image, 𝐼 ′ 𝐼 ′ 𝐼 ′ , which can be obtained, e. g., via (ideally feature-preserving) filtering of an estimate 𝑄𝑄𝑄 . We will discuss surrogates inmore detail in Sections 7.1 and 7.3, but it is important to point outthat all existing methods rely on a surrogate whether explicitly orimplicitly.

In our framework, the histogram sampling approach of Heitz andBelcour [2019] classifies as a vertical method, and their permutationapproach as a horizontal method. Both these methods employ vari-ants of dither-mask halftoning. We next show that in both cases,their algorithms rely on implicitly constructed surrogates.

All existing error distribution methods rely on an implicit surrogate.For the histogram method of Heitz and Belcour [2019], since themean value of the blue-noise mask maps to the median of the esti-mates, the implicit surrogate is the median of the sorted estimates ineach pixel. Furthermore, the histogram method can pick any of theestimates within a pixel—unlike classical halftoning that samplesthe closest lower and higher quantization level [Spaulding et al.1997]—which results in a higher error amplitude. Consequently,pixels for which the mask value deviates strongly from the meanvalue could end up with outliers leading to fireflies in the renderedimage, even if those were not present in the averaged image (Fig. 7).The permutation method of Heitz and Belcour [2019], on the otherhand, has a piecewise constant implicit surrogate. We show thisformally in the supplemental document (supplemental, Section 7). Inwords, the permutation method uses a sorting pass which minimizes , Vol. 0, No. 0, Article 0. Publication date: December 2020. :8 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Input (white noise) Low-pass (blue noise) Band-stop (green noise) High-pass (red noise) Band-pass (violet noise) Low-pass anisotropic Spatially varying

Fig. 4. Using our formulation (5) , the distribution of an input error image can be optimized w.r.t. arbitrary kernels. Here we have modified our relocationalgorithm (Section 4.2) to directly swap the pixels of the input image. Insets show the power spectra of the kernels (top right) and the images (bottom left). the ℒ distance between the pixel values and the dither mask withineach tile. This transfers the blue-noise characteristics of the dithermask to the signal . However, the actual goal is to transfer thesespectral characteristics to the error , not the signal . For the two to beequivalent, the signal within each tile needs to be assumed constant.And indeed, a constant offset does not change the minimizer ofthe energy which sorting minimizes. This implies that the implicitsurrogate in this case is a piecewise constant function within eachtile. Our framework, however, uses an explicit surrogate that allowsfull control over the quality of error distribution. By making simplifying assumptions about

𝐼𝐼𝐼 and

𝑄𝑄𝑄 , our sample re-location algorithm could be used with no actual knowledge aboutthe image. For example, assuming that every pixel solves the sameintegral turns

𝐼𝐼𝐼 into a constant image and makes the error term

ΔΔΔ in Eq. (10) vanish. This allows us to swap any two pixels andsimplifies the perceptual error to 𝐸 ( 𝜋 ( 𝑆𝑆𝑆 )) = ∏︁ 𝑔𝑔𝑔 ∗ 𝜋 ( 𝜖𝜖𝜖 ( 𝑆𝑆𝑆 ))∏︁ . Fur-ther assuming a simple analytic shape for the pixel integrand, theper-pixel error image 𝜖𝜖𝜖 ( 𝑆𝑆𝑆 ) can be quickly rendered once and theperceptual optimization performed by simply swapping its pixels.In this formulation, prior methods correspond to using a (family of)step function(s) [Heitz et al. 2019] or a class of functions mappingfar away samples to far away values [Georgiev and Fajardo 2016]for the integrand, respectively. This approach, coined “a-priori” byHeitz and Belcour [2019], is practical but its output fidelity is limitedby lack of adaptivity resulting from its strong assumptions.Our framework unifies the a-priori and a-posteriori approaches,showing that the two lie on a continuum (supplemental Section 2).In our framework, the “a-priori” method of Heitz et al. [2019] triesto simultaneously minimize for a large number of simple integrandswhile the search space is restricted to the ranking and scramblingkeys of a Sobol sequence. This roughly corresponds to a vertical and horizontal search space, respectively. At the same time, we notethat the energy that is used is empirically motivated, but it can bemade to agree with our perceptually motivated energy by simplyabsorbing the Gaussian convolution within the ℒ norm.Similarly, the method of Georgiev and Fajardo [2016] consistingof toroidally shifting a sequence by a mask, can also be cast in ourframework provided that the mask is optimized w.r.t. our energywith an appropriate set of integrands. Even if motivated purelyempirically, the energy in the paper can be related to our energy ifa whole class of integrands is considered where faraway samplesare mapped into faraway values. We also note that the toroidal shift [Näsänen 1984][Näsänen 1984] [González et al. 2006][González et al. 2006] OursOurs ℎℎℎ = 𝑔𝑔𝑔ℎℎℎ = 𝑔𝑔𝑔 ℎℎℎ = 𝛿𝛿𝛿ℎℎℎ = 𝛿𝛿𝛿 (a) Kernel comparison (b) Sharpening effect Fig. 5. (a) Comparison of our Gaussian kernel 𝑔𝑔𝑔 against those of Näsä-nen [1984] and González et al. [2006]. (b) Sharpening effect of setting thereference-image kernel ℎℎℎ to the zero-width Kronecker 𝛿𝛿𝛿 kernel in Eq. (13). degrades the quality of the optimized mask and consequently thequality of the blue noise (see supplemental Section 8).

Our perceptual error formulation (4) considers some image dis-tortions due to the CSF but not all. The HVS applies additional,localized and non-linear processing to the input signal. We ana-lyze different aspects of the model and devise three extensions thatcapture more effects.

Kernels and PSFs.

In Fig. 5a, we compare different PSF models(i. e., low-pass kernels) from halftoning literature [Näsänen 1984;González et al. 2006]. In our experiments, we use a binomial kernelthat approximates a Gaussian kernel [Papoulis and Pillai 2002]; it ischeap to evaluate and performs on par with these state-of-the-artPSFs.We further analyze kernels with band-stop, high-pass, band-pass,and anisotropic spectral characteristics in Fig. 4. Starting from awhite-noise error distribution (i. e., with a uniform random valuein (︀− . , . ⌋︀ assigned to each pixel), our horizontal iterative min-imization algorithm is able to optimize the shape of the noise toproduce the inverse behaviour to the kernel in the spectral domain.The rightmost image in Fig. 4 illustrates the result from using sucha spatially varying kernel produced from the convex combinationof a binomial and an anisotropic high-pass kernel, with the in-terpolation parameter varying horizontally across the image. Ouralgorithm can well adapt the noise shape based on the kernel. , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:9 Tone reproduction.

Considering that the rendered image will beviewed on a medium with limited dynamic range (e. g., screen orpaper), we can incorporate a tone mapping operator 𝒯 into theperceptual error: 𝐸 ( 𝑆𝑆𝑆 ) = ∏︁ 𝑔𝑔𝑔 ∗ (𝒯 (

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) − 𝒯 (

𝐼𝐼𝐼 ))∏︁ . (11)Doing this also bounds the per-pixel error: 𝜖𝜖𝜖 ( 𝑆𝑆𝑆 ) = 𝒯 (

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) − 𝒯 (

𝐼𝐼𝐼 ) ,suppressing pixel outliers and making the optimization more robustin scenes with high dynamic range. Chromatic noise.

While the HVS reacts more strongly to luminancecompared to color, ignoring chromaticity entirely can have a nega-tive effect on the distribution of color noise in the image. To mitigatethis, one can penalize each color channel 𝑐 separately: 𝐸 ( 𝑆𝑆𝑆 ) = ∑ 𝑐 ∈{ r , g , b } 𝜆 𝑐 ∏︁ 𝑔𝑔𝑔 𝑐 ∗ ( 𝑄𝑄𝑄 𝑐 ( 𝑆𝑆𝑆 ) −

𝐼𝐼𝐼 𝑐 )∏︁ . (12)where 𝜆 𝑐 is a weight parameter. In our experiments (Section 7),we set 𝜆 𝑐 = 𝑔𝑔𝑔 𝑐 for every channel. Itis, however, possible to consider a more appropriate color spacethan RGB (e. g., YCbCr). In such case, per-channel kernels maydiffer [Sullivan et al. 1991]. Observer-screen distance.

Being based on the perceptual modelsof the HVS [Sullivan et al. 1991; Analoui and Allebach 1992], ourformulation (4) assumes that the estimate

𝑄𝑄𝑄 and the reference

𝐼𝐼𝐼 areviewed from the same (range of) distance(s). The viewing distancescan be decoupled by applying different kernels to

𝑄𝑄𝑄 and

𝐼𝐼𝐼 : 𝐸 ( 𝑆𝑆𝑆 ) = ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ . (13)Minimizing this error makes 𝑄𝑄𝑄 appear, from some distance 𝑑 𝑔𝑔𝑔 , simi-lar to 𝐼𝐼𝐼 seen from a different distance 𝑑 ℎℎℎ . The special case of using aKronecker delta kernel ℎℎℎ = 𝛿𝛿𝛿 (i. e., with the reference 𝐼𝐼𝐼 seen from upclose) yields 𝐸 ( 𝑆𝑆𝑆 ) = ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) −

𝐼𝐼𝐼 ∏︁ , which has been shown to havean edge enhancing effect in halftoning [Anastassiou 1989; Pappasand Neuhoff 1999]. We demonstrate this effect in Fig. 5b. We use ℎℎℎ = 𝛿𝛿𝛿 for all our experiments in Section 7. In this section, we detail our testing methodology and present empir-ical results employing our proof-of-concept algorithms. We renderseveral scenes using our iterative energy minimization, error diffu-sion, and dither matrix halftoning approaches and compare againstthe histogram and permutation methods of Heitz and Belcour [2019].The quantitative analysis is performed in Table 2 over all the meth-ods, and the runtime for the optimization can be found in Table 3.

Energy formulation.

We use the perceptual model ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )− ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ with ℎℎℎ = 𝛿𝛿𝛿 from Eq. (13). In our implementation, we tone map(Eq. (11)) the images to match the display and consider all (RGB)channels (Eq. (12)) for both iterative and error diffusion methods.The dithering approach, however, requires solving a bipartite Eu-clidean matching problem to take into account different channels,thus we consider it only in the greyscale setting, since this allowsusing sorting for the optimization. Kernel 𝑔𝑔𝑔 . In all our experiments (except where explicitly mentioned),we use a 3 × 𝑔𝑔𝑔 which approximates a Gaussiankernel [Lindeberg 1990] with standard deviation 𝜎 = ⇑⌋︂

2. Moresophisticated kernels may be used if desired, however, we foundthat our Gaussian approximation gives satisfactory results (Fig. 5a)in all experiments. The optimal viewing distance corresponding toour kernel is approximately 90cm for a screen with 218 DPI (136pixels per visual degree).

Rendering setup.

All scenes are rendered in PBRTv3 [Pharr et al.2016] using the path integrator and, occasionally, the bidirectionalpath tracing integrator. Our methods do not depend on the dimen-sionality of the problem, however, in order to keep the renderingtime reasonable we set the maximum path depth to 5 for all scenes.The reference images are generated using a Sobol sampler withsample counts greater than 1024 samples per pixel. All other imagesare rendered using a random sampler. In all scenes (except in the an-imations), we shoot primary rays through the center of the pixel inorder to separate the geometry integration from the light transportintegration. We also provide animation results in the supplementarydata on the following scenes:

Utah Teapot (360 frames),

Modern Hall (120 frames),

San Miguel (120 frames),

Bathroom (120 frames).

Surrogate construction.

To construct a surrogate for our methods, wedenoise the averaged image with the Intel Open Image Denoiser [int2018]. The denoiser’s input is the averaged image, a normal buffer,and a buffer containing the base color of materials where applicable.Heitz and Belcour’s methods use implicit surrogates. We analyze theeffect of various surrogates in Fig. 10, and showcase the surrogatesgenerated from the Intel Denoiser in Fig. 11.

Perceptual error metrics.

We also evaluate the quality of our resultsusing perceptual metrics like S-CIELAB [Zhang and Wandell 1997](see supplemental) and HDR-VDP-2 [Mantiuk et al. 2011], with pa-rameters matching our kernel. Among these, HDR-VDP-2 quantifiesbest the considerable visual improvement due to a-posteriori meth-ods. We also include a metric, dubbed perceptual MSE (pMSE). Itis derived by normalizing our perceptual error Eq. (4) by the pixelcount multiplied by the number of channels (with ℎℎℎ = 𝛿𝛿𝛿 ). Addition-ally, to analyze the local blue-noise quality we provide tiled Fourierpower spectra (Fig. 1, see supplemental results) of the signed error 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 in order to showcase the quality achieved for the dis-tribution of the error as blue noise. The image is split into tiles ofsize 32 ×

32 and the power spectrum is computed for each tile. Forvisualization purposes, a logarithmic transform is applied withineach tile and the resulting values are normalized so that the highestvalue is ( , , ) . Note that the power spectrum is computed forthe error w. r. tthe reference image 𝐼𝐼𝐼 and not the surrogate, whichquantifies the blue noise distribution objectively.

Vertical approaches.

We compare all three different class of verti-cal methods (Alg. 1): dithering-based, error diffusion using Floyd-Steinberg kernel 𝜅𝜅𝜅 [Floyd and Steinberg 1976] and the iterativeminimization approach using our kernel 𝑔𝑔𝑔 .We first compare our iterative minimization approach againstthe histogram sampling method [Heitz and Belcour 2019] in Fig. 6. , Vol. 0, No. 0, Article 0. Publication date: December 2020. :10 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Histogram (1/16) Ours (1/4) Histogram (1/64)

Average 4sppAverage 4spp

Fig. 6.

Vertical methods. In the top row, we compare our iterative approach with Histogram sampling method [Heitz and Belcour 2019] over three differentscenes:

Modern Hall , San Miguel , Bathroom . One estimate from a histogram of (left) or (right) different spp estimates is used to render the histogrammethod. Our approach (middle) uses only primary estimates as search space following Eq. (7) and outperforms both histograms. Bottom row shows acropped region of the baseline averaged image with spp. Please zoom-in (or see HTML viewer in the supplement) to appreciate the quality difference. The histogram sampling approach performs suboptimally comparedto our iterative minimization approach even when it is providedwith 16 times more samples. This goes to illustrate the fact thatit is neither theoretically nor practically optimal, mainly becauseits dithering and implicit surrogate are suboptimal (see Section 5.1,and supplemental Section 7). In Fig. 7, we perform an equal sam-ple comparison for all vertical methods. All our vertical methodsconsistently outperform the histogram sampling method. This alsoseems to match the probability of detection map from HDR-VDP-2shown in the bottom row of the figure.In Fig. 1, we illustrate that denoising artifacts can be removedthrough our optimization. A white noise image denoised with theOpen Intel Image Denoiser results in objectionable artifacts, whileperforming our optimization subsequently and then denoising, re-moves those artifacts entirely. In Fig. 1, the denoised version of theaveraged image results in conspicuous unpleasant artifacts. Usingthis image as a surrogate for our method, the optimization finds asubset of unbiased estimates that match the surrogate well in termsof pMSE. Clearly the new image even if constructed from previouslyunbiased estimates is not fully unbiased, however, the introducedbias allows for more regularity. This regularity pays off since thesubsequent denoising does not produce the artifacts which wereinherent to the denoised results using the averaged image (wherethe error distribution was white noise).Overall, vertical approaches generally rank in the following orderof increasing visual quality: histogram sampling, our dithering, ourerror diffusion, our iterative minimization with a stack of 4 estimates,and our iterative minimization with the powerset of the estimates(Table 2). For more details and full-size image comparisons we referthe reader to our supplementary HTML material.

Horizontal approaches.

We consider two horizontal approaches: ouriterative energy minimization approach (Alg. 2) and Heitz and Bel-cour’s permutation approach. For our relocation method, we use asearch radius 𝑅 = 𝑟 to 1 pixel (Alg. 2), which approximately corre-sponds to tile size 3 for Heitz and Belcour’s permutation method.For the permutation approach we consider tiles sizes 2 and 8.Comparisons on the Modern Hall , White Room , and the

Bath-room scenes can be found in Fig. 8. The scenes are rendered over 16frames, and retargeting is applied for Heitz and Belcour’s approach.The scenes are intentionally kept static in order to achieve optimalresults for [Heitz and Belcour 2019], nevertheless, the permutationapproach’s results remain substantially closer to white noise in dis-tribution. For non-static animation results we refer the reader to thesupplementary videos where we have provided several animations.We have tested the permutation approach at tile sizes 2 and 8 and wealways include the better image. In all cases the pMSE is 20 to 40 per-cent lower for our approach compared to the permutation approach(see Table 2), which also agrees with our perceptual evaluation ofthe images. The error in our images is a lot closer in distribution tohigh quality blue noise, while in the permutation approach it hasstronger white noise characteristics.In Fig. 9, we perform an equal sample comparison of our verti-cal methods with Heitz and Belcour’s permutation approach onthe scenes:

Living Room , Classroom , and

Grey Room . Our verticalmethods consistently outperform the permutation approach both interms of perceptual metrics (Table 2) and our subjective perceptualevaluation. , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:11 (0.09 sec) (0.04 sec) (15.2 sec) (0.08 sec)(a) White noise (b) Averaged (c) Dithering (Ours) (d) Error-diffusion (Ours) (e) Iterative (Ours) (f) Histogram method [2019]MSE: . × − . × − . × − . × − . × − . × − . × − . × − pMSE: . × − . × − . × − . × − . × − . × − . × − . × − Fig. 7.

Vertical method comparisons where for each pixel one primary estimate is picked from primary estimates. We compare our three variants: dithering(c), error-diffusion (d) and iterative approach (e) with Histogram sampling method [Heitz and Belcour 2019] (f). Our iterative approach in (e) gives very highquality blue-noise error distribution, marked by the lower pMSE. Our error-diffusion based approach in (d) is computationally fast and has quality comparableto the iterative approach in (e). Our dithering approach (c) shows minor improvement but is still better than the histogram sampling method in (f). Note thatmost of our variants also reduce overall MSE compared to the baseline averaged image ( spp) even though they do not directly optimize for it. Bottom rowshows the detection probability [Mantiuk et al. 2011] which indicates how likely it is for a human observer to notice the difference w.r.t. the reference image(blue refers to lower error) when viewed from the appropriate distance. Timings.

Optimization timings for all approaches can be found inTable 3. Here we discuss some peculiarities of the approaches whichaffect the runtime. Our error diffusion approach (Alg. 1) is compa-rable in terms of optimization runtime to dither matrix halftoningbased approaches like Heitz and Belcour’s histogram sampling andpermutation approaches. In the absence of parallelism, it can oftenoutperform those, since no sorting is required. There are also paral-lel variants of error diffusion in the literature [Metaxas 2003]. Atthe same time, the quality of the approach is often comparable tothe quality of our iterative energy minimization method. The methods that provide the best quality and control are also theslowest: our iterative energy minimization approaches. For iterative vertical methods we have chosen a maximum of 100 iterations,while usually those converge in about 20. The iterative horizontal approach generally cannot fully converge in less than a hundrediterations thus we limit the iterations to 𝑇 =

10 (see the supplementalfor an insight on this difference in convergence). This limit doesnot affect the quality greatly as mispredictions dominate the errorfor horizontal methods. Both methods can be made several ordersof magnitude faster if additional optimisations are implemented[Analoui and Allebach 1992; Koge et al. 2014], and we have derived , Vol. 0, No. 0, Article 0. Publication date: December 2020. :12 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Frame 1[Heitz and Belcour 2019][Heitz and Belcour 2019]

Horizontal (Ours)

Frame 1 Frame 1Frame 16 Frame 16 Frame 16

Fig. 8. We compare our horizontal iterative energy minimization (Alg. 2) with Heitz and Belcour’s permutation approach (with retargeting) over a sequenceof frames. Top row shows the st frame and the bottom row shows the -th frame over three different scenes. Our horizontal approach consistentlyoutperforms Heitz and Belcour’s permutation approach both qualitatively and quantitatively in terms of pMSE and MSE (see Table 2). such optimizations in our supplemental (supplemental Section 5).We also note that the runtime scales linearly in the number of pixelsof the image multiplied by the number of pixels of the kernel. Onthe other hand, it doesn’t increase based on the number of samplesor the dimensionality of the integrand.Except for the time required for optimizing the sample sets as-signment, our methods rely on an explicit surrogate constructedthrough the Open Intel Image Denoiser in most of the consideredexperiments, which for 512 ×

512 images takes about half a second.The surrogate can be constructed through other means as we discussnext.

In order to achieve a blue noise error distribution, a-posteriori meth-ods introduce varying degrees of bias. For instance, Heitz andBelcour’s approaches can be fully unbiased only if a white noisemask is used. However, the resulting error distribution, in that case,would also be white (MC) noise. Making the mask correlated (or non-random ) allows for a better error distribution at the cost ofintroducing bias in the end result. Due to the implicit surrogate, these approaches do not allow for explicit control over the bias,other than modifying the mask to include varying degrees of ran-domness. Conversely, our approach relies on an explicit surrogatewhich allows full control over the bias.We propose different ways to control the amount of bias by:modifying the energy (Appendix B), restricting the search spaceand limiting the number of iterations for iterative methods. Ouriterative energy minimization approaches are able to fit very closelyto a surrogate, however, if that surrogate is suboptimal this maybe undesirable. In Fig. 10, we perform experiments with severalsurrogates and extend the energy of Alg. 1 to provide control overthe bias (see Appendix B). We can observe that perceptually optimalresults are achieved for differing amounts of bias to the surrogatedepending on how it was constructed. Notably, with the referencethe best result is achieved when maximum bias is allowed. On theother hand, our method with the piecewise tile-constant surrogaterequires the amount of bias to be limited in order to remove artifactsdue to the surrogate, as can be seen from the third row. , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:13

Dithering (Ours) Error Diffusion (Ours)Iterative (Ours)Heitz and Belcour [2019]Heitz and Belcour [2019]

Average 4sppAverage 4spp

Fig. 9. Quality comparison at equal sample count between our three variants of vertical methods ( ⇑ spp) against Heitz and Belcour’s permutation approachwith retargeting ( spp). Since their permutation approach improves visual quality over a sequence of frames, we show the -th frame from the sequence. Toget better blue-noise error distribution we consider × tile size for their method. Our vertical methods, however, show improvements from the first frame asshown here. Particularly, our error-diffusion method outperforms their approach in both quality and speed (see Table 2 and Table 3). Bottom row shows the baseline averaged ( spp) image. Surrogate image: Reference Surrogate image: Gaussian convolved average Surrogate image: Piecewise-constant S u rr o g a t e 𝒞 = 𝒞 = . Histogram [2019]Histogram [2019] 𝒞 = Fig. 10. Bias analysis. We design different surrogates to analyze their impact on the error distribution. Each column uses a different surrogate. From top tobottom, we vary the bias control parameter:

𝒞 = implies perfect fit to a surrogate whereas 𝒞 = implies otherwise. This showcases that, for poor surrogates,our methods can avoid fitting too closely. We place Heitz and Belcour’s histogram method (in the middle) next to the image that approximately resembles itsvisual quality. All other images are rendered using one out of the primary estimates per pixel using the iterative vertical method. Zoom-in recommended tosee the quality difference. , Vol. 0, No. 0, Article 0. Publication date: December 2020. :14 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh Fig. 11. Here we show the surrogates used for all the scenes. For each scene,the surrogates are generated by denoising the underlying noisy estimateusing the Intel Open Image denoiser [int 2018].

Our theoretical framework makes an important step towards aformal understanding of the optimality of sample distributions notjust w.r.t. numerical integration error but also human perception.We transfer a perceptual kernel-based model from halftoning andadapt it to the context of Monte Carlo rendering.We formulate an optimization problem that can leverage priorinformation about the scene. This may be knowledge about thelight transport, its dimensionality and smoothness, geometry andtextures, or post-processing kernels (e. g., the HVS PSF, or spatiallyvarying denoising kernels). It also provides valuable insights anda formal mathematical framework in which the problem can bestudied further.Our optimization is robust and can also be used to create artisticnoise distributions with custom targetFourier power spectra, here showingthe SIGGRAPH logo as an example onthe right. To obtain the correspond-ing spatial kernel, we make the logo(greyscale, downscaled) image conju-gate symmetric, which ensures that itsinverse Fourier transform contains onlyreal values. That inverse, divided byits maximum value, is our optimizationkernel 𝑔𝑔𝑔 .Our method is also applicable for progressive and adaptive per-ceptually optimal rendering, which has only been explored in ana-priori context [Heitz et al. 2019]. Particularly, vertical methodscan trivially handle a varying number of samples per pixel. Addi-tionally, if samples are sequentially added to the various sample setsfrom which vertical methods choose, then progressive renderingonly requires running the optimization every time new samples areintroduced. One promising avenue for future research is incorporating morecomplex perceptual effects into our error formulation (4), such asvisual masking [Bolin and Meyer 1998; Ramasubramanian et al.1999], as well as more robust metrics than the squared ℒ norm ofa convolution (possibly including nonlinear relationships).It is also interesting to study the various interactions between ourformulation and denoising methods. Ideally, the shape of the (spa-tially varying) denoising kernels and the error distribution (w.r.t.these kernels) would be optimized simultaneously in a loop, toachieve better image reconstruction for a given sample budget(Fig. 1).The discrete nature of optimization methods prevents them fromexploring the entire, continuous sample space. This seems possiblefor a-priori methods, which could potentially optimize the samplesw.r.t. a suitably chosen smooth pixel integrand, e. g., via gradient de-scent. The same can be done for a-posteriori methods if assumptionson the integrands are made, or approximation of the integrands areconstructed based on point samples and a class of reconstructionfunctions. In an a-posteriori setting it would also be interesting toinvestigate the use of differentiable rendering [Loubet et al. 2019;Zhang et al. 2020].Finally, we have discussed: energies, search spaces, and opti-mization strategies for a-priori methods as an extension of ourformulation for a-posteriori methods. We have not evaluated thoseexperimentally however. A promising direction for future researchis learning a set of integrands typical for a specific class of scenes.Then those integrands can be used directly in our proposed a-priori energy to optimize sequences with desirable integration and spectralproperties. REFERENCES

J. Opt. Soc.Am.

66, 9 (Sep 1976), 909–917. https://doi.org/10.1364/JOSA.66.000909Mostafa Analoui and Jan P. Allebach. 1992. Model-based halftoning using directbinary search. In

IEEE Transactions onCircuits and Systems

36, 9 (1989), 1175–1186.Peter G.J. Barten. 1999.

Contrast sensitivity of the human eye and its effects on imagequality . SPIE – The International Society for Optical Engineering.Barbara E. Bayer. 1973. An optimum method for two-level rendition of continuous-tone pictures. In

Proceedings of IEEE International Conference on Communications,Conference Record , Vol. 26. 11–15.Mark R. Bolin and Gary W. Meyer. 1995. A Frequency Based Ray Tracer. In

SIGGRAPH’95 . 409–418.Mark R. Bolin and Gary W. Meyer. 1998. A Perceptually Based Adaptive SamplingAlgorithm. In

SIGGRAPH ’98 . 299–309.A. Celarek, W. Jakob, M. Wimmer, and J. Lehtinen. 2019. Quantifying the Error ofLight Transport Algorithms.

Computer Graphics Forum

38, 4 (2019), 111–121. https://doi.org/10.1111/cgf.13775Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi,Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruc-tion of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder.

ACM Trans. Graph.

36, 4, Article Article 98 (2017).Jianghao Chang, Benoundefinedt Alain, and Victor Ostromoukhov. 2009. Structure-Aware Error Diffusion.

ACM Trans. Graph.

28, 5 (Dec. 2009), 1–8. https://doi.org/10.1145/1618452.1618508R. L. Cook. 1986. Stochastic Sampling in Computer Graphics.

ACM Transactions onGraphics

5, 1 (1986), 51–72.Simona Daly. 1987. Subroutine for the generation of a two dimentional human visualcontrast sensitivity function.S. Daly. 1993. The Visible Differences Predictor: An algorithm for the assessment ofimage fidelity. In

Digital Image and Human Vision , A.B. Watson (Ed.). Cambridge,, Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:15

MA: MIT Press, 179–206.R.J. Deeley, N. Drasdo, and W. N. Charman. 1991. A simple parametric model of thehuman ocular modulation transfer function.

Ophthalmology and Physiological Optics

11 (1991), 91–93.Mark A. Z. Dippé and Erling Henry Wold. 1985. Antialiasing through stochasticsampling. In

SIGGRAPH ’85 .James A. Ferwerda, Sumanta N. Pattanaik, Peter Shirley, and Donald P. Greenberg. 1996.A Model of Visual Adaptation for Realistic Image Synthesis. In

Proceedings of the23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’96) . Association for Computing Machinery, New York, NY, USA, 249–258. https://doi.org/10.1145/237170.237262Robert W. Floyd and Louis Steinberg. 1976. An Adaptive Algorithm for Spatial Greyscale.

Proceedings of the Society for Information Display

17, 2 (1976), 75–77.Iliyan Georgiev and Marcos Fajardo. 2016. Blue-noise Dithered Sampling. In

ACMSIGGRAPH 2016 Talks (Anaheim, California) (SIGGRAPH ’16) . ACM, New York, NY,USA, Article 35, 1 pages. https://doi.org/10.1145/2897839.2927430P. Glasserman. 2004.

Monte Carlo Methods in Financial Engineering . Springer.Alvaro J. González, Jan Bacca Rodríguez, Gonzalo R. Arce, and Daniel Leo Lau. 2006.Alpha stable human visual system models for digital halftoning. In

Electronic Imag-ing .J. H. Halton. 1964. Algorithm 247: Radical-inverse Quasi-random Point Sequence.

Commun. ACM

7, 12 (Dec. 1964), 701–702. https://doi.org/10.1145/355588.365104Eric Heitz and Laurent Belcour. 2019. Distributing Monte Carlo Errors as a Blue Noisein Screen Space by Permuting Pixel Seeds Between Frames.

Computer GraphicsForum (2019). https://doi.org/10.1111/cgf.13778Eric Heitz, Laurent Belcour, Victor Ostromoukhov, David Coeurjolly, and Jean-ClaudeIehl. 2019. A Low-Discrepancy Sampler that Distributes Monte Carlo Errors as aBlue Noise in Screen Space. In

SIGGRAPH’19 Talks . ACM, Los Angeles, United States.https://hal.archives-ouvertes.fr/hal-02150657Edwin Hewitt and Kenneth A. Ross. 1994.

Abstract Harmonic Analysis: Volume I Structureof Topological Groups Integration Theory Group Representations . SpringerNewYork.Sam Hocevar and Gary Niger. 2008. Reinstating Floyd-Steinberg: Improved Metricsfor Quality Assessment of Error Diffusion Algorithms, Vol. 5099. 38–45. https://doi.org/10.1007/978-3-540-69905-7_5James T. Kajiya. 1986. The Rendering Equation. In

SIGGRAPH ’86 , Vol. 20. 143–150.Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkühler, Mikhail Okunev, ToddGoodall, and Gizem Rufo. 2019. DeepFovea: Neural Reconstruction for FoveatedRendering and Video Compression Using Learned Statistics of Natural Videos.

ACMTrans. Graph.

38, 6, Article Article 212 (2019).Alexander Keller. 2013. Quasi-Monte Carlo Image Synthesis in a Nutshell. In

MonteCarlo and Quasi-Monte Carlo Methods 2012 , Josef Dick, Frances Y. Kuo, Gareth W.Peters, and Ian H. Sloan (Eds.). Springer Berlin Heidelberg, 213–249.Hiroaki Koge, Yasuaki Ito, and Koji Nakano. 2014. A GPU Implementation of Clipping-Free Halftoning Using the Direct Binary Search. In

Algorithms and Architecturesfor Parallel Processing , Xian-he Sun, Wenyu Qu, Ivan Stojmenovic, Wanlei Zhou,Zhiyang Li, Hua Guo, Geyong Min, Tingting Yang, Yulei Wu, and Lei Liu (Eds.).Springer International Publishing, Cham, 57–70.Lauwerens Kuipers and Harald Niederreiter. 1974.

Uniform Distribution of Sequences .Wiley, New York, USA.Alexandr Kuznetsov, Nima Khademi Kalantari, and Ravi Ramamoorthi. 2018. DeepAdaptive Sampling for Low Sample Count Rendering.

Computer Graphics Forum

Modern Digital Halftoning, Second Edition .CRC Press, Inc., USA.T. Lindeberg. 1990. Scale-space for discrete signals.

IEEE Transactions on PatternAnalysis and Machine Intelligence

12, 3 (1990), 234–254.W.W Loh. 1995.

On the method of control variates . Ph.D. Dissertation. Stanford, CA,USA.Guillaume Loubet, Nicolas Holzschuch, and Wenzel Jakob. 2019. Reparameterizingdiscontinuous integrands for differentiable rendering.

Transactions on Graphics(Proceedings of SIGGRAPH Asia)

38, 6 (Dec. 2019). https://doi.org/10.1145/3355089.3356510Jeffrey Lubin. 1995. A Visual Discrimination Model for Imaging System Design andEvaluation. In

Vision Models for Target Detection and Recognition , Eli Peli (Ed.).World Scientific Publishing Company, Inc., 245–283.J. Mannos and D. Sakrison. 1974. The effects of a visual fidelity criterion of the encodingof images.

IEEE Transactions on Information Theory

20, 4 (1974), 525–536. https://doi.org/10.1109/TIT.1974.1055250Rafal Mantiuk, Scott J. Daly, Karol Myszkowski, and Hans-Peter Seidel. 2005. Predictingvisible differences in high dynamic range images: model and its calibration. In

Human Vision and Electronic Imaging X , Vol. 5666. SPIE, 204 – 214.Rafał Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminanceconditions.

ACM Transactions on Graphics (Proc. SIGGRAPH)

30, 4 (2011). Panagiotis Takis Metaxas. 2003. Parallel Digital Halftoning by Error-Diffusion. In

Proceedings of the Paris C. Kanellakis Memorial Workshop on Principles of Comput-ing & Knowledge: Paris C. Kanellakis Memorial Workshop on the Occasion of His50th Birthday (San Diego, California, USA) (PCK50) . Association for ComputingMachinery, New York, NY, USA, 35–41. https://doi.org/10.1145/778348.778355Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing.

SIGGRAPH Comput. Graph.

25, 4 (July 1991), 157–164. https://doi.org/10.1145/127719.122736T. Mitsa and K. J. Parker. 1991. Digital halftoning using a blue noise mask. In

Proceedingsof International Conference on Acoustics, Speech, and Signal Processing . 2809–2812vol.4. https://doi.org/10.1109/ICASSP.1991.150986K.T. Mullen. 1985. The contrast sensitivity of human colour vision to red-green andblue-yellow chromatic gratings.

Journal of Physiology

359 (1985), 381–400.Karol Myszkowski. 1998. The Visible Differences Predictor: applications to globalillumination problems. In

Rendering Techniques ’98 , George Drettakis and NelsonMax (Eds.). Springer Vienna, Vienna, 223–236.R. Näsänen. 1984. Visibility of halftone dot textures.

IEEE Transactions on Systems, Man,and Cybernetics

SMC-14, 6 (1984), 920–924.Harald Niederreiter. 1992.

Random Number Generation and quasi-Monte Carlo Methods .Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.Victor Ostromoukhov. 2001. A Simple and Efficient Error-Diffusion Algorithm. In

Proceedings of the 28th Annual Conference on Computer Graphics and InteractiveTechniques (SIGGRAPH ’01) . Association for Computing Machinery, New York, NY,USA, 567–572. https://doi.org/10.1145/383259.383326Wai-Man Pang, Yingge Qu, Tien-Tsin Wong, Daniel Cohen-Or, and Pheng-Ann Heng.2008. Structure-Aware Halftoning. In

ACM SIGGRAPH 2008 Papers (Los Angeles,California) (SIGGRAPH ’08) . Association for Computing Machinery, New York, NY,USA, Article Article 89, 8 pages. https://doi.org/10.1145/1399504.1360688Athanasios Papoulis and S. Unnikrishna Pillai. 2002.

Probability, Random Variables, andStochastic Processes (fourth ed.). McGraw Hill.Thrasyvoulos N. Pappas and David L. Neuhoff. 1999. Least-squares model-basedhalftoning.

IEEE Transactions on Image Processing

8, 8 (Aug 1999), 1102–1116.https://doi.org/10.1109/83.777090Eli Peli, Jian Yang, and Robert B. Goldstein. 1991. Image invariance with changes in size:the role of peripheral contrast thresholds.

J. Opt. Soc. Am. A

8, 11 (1991), 1762–1774.Matt Pharr, Wenzel Jakob, and Greg Humphreys. 2016.

Physically Based Rendering:From Theory To Implementation (3rd ed.). Morgan Kaufmann Publishers Inc.Mahesh Ramasubramanian, Sumanta N. Pattanaik, and Donald P. Greenberg. 1999. APerceptually Based Physical Error Metric for Realistic Image Synthesis. In

SIGGRAPH’99 . 73–82.Ansari Rashid, Guillemot Christine, and Memon Nasir. 2005. 5.5 - Lossy Image Com-pression: JPEG and JPEG2000 Standards. In

Handbook of Image and Video Processing(Second Edition) , AL BOVIK (Ed.). Academic Press, 709 – XXII.J.G. Robson and Norma Graham. 1981. Probability summation and regional variationin contrast sensitivity across the visual field.

Vision Research

21, 3 (1981), 409–418.Gurprit Singh, Cengiz Oztireli, Abdalla G.M. Ahmed, David Coeurjolly, Kartic Subr,Oliver Deussen, Victor Ostromoukhov, Ravi Ramamoorthi, and Wojciech Jarosz.2019. Analysis of Sample Correlations for Monte Carlo Rendering.

Comp. GraphForm. (Proc. EGSR)

38, 2 (2019).I. M. Sobol. 1967. The distribution of points in a cube and the approximate evaluationof integrals.

U. S. S. R. Comput. Math. and Math. Phys.

Psychology &Neuroscience

Journal of Electronic Imaging

6, 2 (1997), 208 – 230. https://doi.org/10.1117/12.266861J. Stoffel and J. Moreland. 1981. A Survey of Electronic Techniques for Pictorial ImageReproduction.

IEEE Transactions on Communications

29, 12 (December 1981), 1898–1925. https://doi.org/10.1109/TCOM.1981.1094946James R. Sullivan, Lawrence A. Ray, and Rodney Miller. 1991. Design of minimum visualmodulation halftone patterns.

IEEE Transactions on Systems, Man, and Cybernetics

21, 1 (Jan 1991), 33–38. https://doi.org/10.1109/21.101134R. Ulichney. 1987.

Digital Halftoning . Cambridge, Massachusetts.Robert A. Ulichney. 1993. Void-and-cluster method for dither array generation, Vol. 1913.https://doi.org/10.1117/12.152707Eric Veach. 1998.

Robust Monte Carlo Methods for Light Transport Simulation . Ph.D.Dissertation. Stanford, CA, USA. Advisor(s) Guibas, Leonidas J. AAI9837162.Valdimir Volevich, Karol Myszkowski, Andrei Khodulev, and Edward A. Kopylov. 2000.Using the Visual Differences Predictor to Improve Performance of Progressive GlobalIllumination Computation.

ACM Trans. Graph.

19, 2 (April 2000), 122–161.G. Westheimer. 1986. The eye as an optical instrument. In

Handbook of Perception andHuman Performance: 1. Sensory Processes and Perception , K.R. Boff, L. Kaufman, andJ.P. Thomas (Eds.). Wiley, New York, 4.1–4.20., Vol. 0, No. 0, Article 0. Publication date: December 2020. :16 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Sophie Wuerger, Maliha Ashraf, Minjung Kim, Jasna Martinovic, María Pérez-Ortiz,and Rafał K. Mantiuk. 2020. Spatio-chromatic contrast sensitivity under mesopicand photopic light levels.

Journal of Vision

20, 4 (04 2020), 23–23.JI Yellott. 1983. Spectral consequences of photoreceptor sampling in the rhesus retina.

Science

ACM Trans. Graph.

39, 6 (2020), 143:1–143:19.X. Zhang and B. A. Wandell. 1997. A spatial extension of CIELAB for digital color-imagereproduction.

Journal of the Society for Information Display

5, 1 (1997), 61–63.M. Zwicker, W. Jarosz, J. Lehtinen, B. Moon, R. Ramamoorthi, F. Rousselle, P. Sen,C. Soler, and S.-E. Yoon. 2015. Recent Advances in Adaptive Sampling and Re-construction for Monte Carlo Rendering.

Computer Graphics Forum

34, 2 (2015),667–681.

A ERROR DECOMPOSITION

For pixels solving similar light transport integrals, swapping theirsamples gives a similar result to swapping their estimates:

𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) = 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) +

ΔΔΔ . The error

ΔΔΔ is zero when the swapped pixels solve thesame integral. Substituting into Eq. (9), we can bound the perceptualerror using the triangle inequality, the discrete Young convolutioninequality [Hewitt and Ross 1994], and the Cauchy–Schwarz in-equality: 𝐸 ( 𝜋 ( 𝑆𝑆𝑆 )) = ∏︁ 𝑔𝑔𝑔 ∗ ( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 + ΔΔΔ )∏︁ (14a) = ∏︁ 𝑔𝑔𝑔 ∗ ( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 )∏︁ + ∏︁ 𝑔𝑔𝑔 ∗ ΔΔΔ ∏︁ + ∐︀ 𝑔𝑔𝑔 ∗ ( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 ) ,𝑔𝑔𝑔 ∗ ΔΔΔ ̃︀ (14b) ≤ ∏︁ 𝑔𝑔𝑔 ∗ ( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 )∏︁ + ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ ΔΔΔ ∏︁ + ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 ∏︁ ∏︁ ΔΔΔ ∏︁ . (14c)The first summand in Eq. (14b) involves pixel permutations in thereadily available estimated image 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) . In the second and thirdsummands we ideally want to use an approximation for the termsinvolving ΔΔΔ that forgoes rendering invocations: 𝐸 ( 𝜋 ( 𝑆𝑆𝑆 )) ≈ ∏︁ 𝑔𝑔𝑔 ∗( 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ))−

𝐼𝐼𝐼 )∏︁ + ∏︁ 𝑔𝑔𝑔 ∏︁ ∑ 𝑖 𝑑 ( 𝑖, 𝜋 ( 𝑖 )) = 𝐸 ΔΔΔ ( 𝜋,𝑆𝑆𝑆 ) . (15)We approximate the terms involving ∏︁ ΔΔΔ ∏︁ with a dissimilarity metric 𝑑 ( 𝑖, 𝑗 ) between any two pixels 𝑖 and 𝑗 . The purpose of this metric isto predict how different we expect the result of swapping the samplesets of 𝑖 and 𝑗 to be, compared to only swapping their pixel values 𝑄 𝑖 and 𝑄 𝑗 . It can be constructed based on assumptions or priorknowledge about the image coming from, e. g., auxiliary buffers(depth, normals, etc).In our implementation we use a simple binary dissimilarity func-tion returning zero when 𝑖 and 𝑗 are within distance 𝑟 and infinityotherwise; we set 𝑟 between 1 and 3. This allows us to restrictthe search space Π ( 𝑆𝑆𝑆 ) only to permutations that swap adjacentpixels such that it is more likely that ΔΔΔ is small. More elaborateheuristics can be devised in the future to better account for pixel(dis)similarity.

B HANDLING BIAS TOWARDS SURROGATE

The trade-off between white noise and bias towards a surrogatecan be controlled by modifying the original energy to also includea data term that penalizes large deviations from the initial unbi-ased white noise image

𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) (for a derivation see Section 3 in the The radius should ideally depend on the image smoothness/regularity, and can belocally adapted. supplemental): ⌈︂ 𝐸 𝑐 ( 𝑆𝑆𝑆 ′ ) = 𝑐 ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ′ ) − 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )∏︁ (16) + ( − 𝑐 )∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ′ ) − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ . (17)Our original formulation is retrieved for 𝑐 =

0, while the initialimage is enforced for 𝑐 =

1, and the parameter can be varied in (︀ , ⌋︀ to achieve different levels of bias. The parameter 1 − 𝑐 is asmoothness/bias parameter describing how much one trusts thesurrogate (or equivalently the quality of the surrogate). For greatercontrol 𝑐 may be defined per-pixel, in which case it can be multipliedcomponentwise with the vector inside the norm ∏︁⋅∏︁ . , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering • 0:17 Table 2. MSE and pMSE metrics for various methods on different scenes rendered at 4 samples per pixel. All methods aim to minimize the pMSE. We notethat our methods consistently outperform current state of the art [Heitz and Belcour 2019]. Notably, every method is superior to the histogram samplingapproach, and all of our methods except for dithering are better that the permutation approach. The occasional better performance of horizontal methods canbe explained by the fact that they always choose from 4spp estimates, while vertical methods choose from 1spp estimates (except for 𝑁 which uses estimateswith 1-4spp). The scene for horizontal approaches is kept static and the metrics for the 16th frame are presented. Thus this is the ideal scenario for suchmethods, so metrics there may even have an unfair advantage over vertical ones, since generally the scene cannot be assumed to be static in an animationsequence. Method Bathroom Classroom Grey Room Living Room Modern Hall San Miguel Staircase White RoomMSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − nonoptimized 1 spp 3.58 8.19 7.13 16.03 11.45 6.81 5.76 12.91 11.44 4.09 6.85 18.93 18.82 7.79 5.75 18.23nonoptimized 4 spp 1.40 3.15 3.13 7.91 7.91 3.02 3.37 5.61 5.22 1.70 3.58 8.92 8.88 5.60 2.78 7.98Dither (Ours) 1.31 3.31 4.36 11.63 8.46 5.07 2.27 4.43 5.25 1.80 3.74 11.19 7.80 5.36 2.51 7.95Error diffusion (Ours) vertical (Ours) 2.32 2.02 6.00 6.10 9.07 2.97 4.32 1.86 7.15 1.29 5.51 7.05 10.50 4.45 3.98 5.00Iterative vertical 𝑁 (Ours) 1.26 Histogram [2019] 3.58 6.29 7.11 13.08 11.49 6.67 5.75 9.88 11.43 3.60 6.84 16.52 18.90 6.69 5.75 14.09Relocation (Ours, frame 16) 1.52 2.06 3.83 5.31 8.34

Table 3. Timings for the minimization of the energy for various methods on different scenes rendered at 4 samples per pixel. Note that the timings do notinclude the construction of the surrogate. We see that our error diffusion approach is consistently the fastest method, since it doesn’t require any sorting. Ourdithering approach is comparable to Heitz and Belcour’s approaches in terms of speed, since those use a similar minimization strategy relying on sorting.Finally, our iterative minimization methods are the slowest. We note that those were not optimized, and based on [Analoui and Allebach 1992; Koge et al.2014] a speed-up of × can be expected (for larger kernels and images the speed-up can be even larger). Additional optimization strategies are discussed inthe supplemental. Method Bathroom Classroom Grey Room Living Room Modern Hall San Miguel Staircase White RoomDither (Ours) 0.07 0.08 0.07 0.07 0.02 0.12 0.09 0.07Error diffusion (Ours) 0.04 0.03 0.04 0.04 0.01 0.06 0.04 0.04Iterative vertical (Ours) 18.44 111.41 12.82 15.26 5.43 29.09 15.21 19.45iterative vertical 𝑁 (Ours) 95.09 404.12 59.69 83.41 23.93 137.89 35.39 102.05Histogram [2019] 0.06 0.07 0.11 0.06 0.02 0.09 0.08 0.06Relocation (Ours, frame 16) 23.04 21.57 22.00 30.08 8.48 36.36 23.78 22.76Permutation [2019] (frame 16) 0.10 0.10 0.10 0.11 0.03 0.21 0.10 0.14 Table 4. MSE and pMSE metrics for various methods on different scenes rendered at 16 samples per pixel. All methods aim to minimize the pMSE. We notethat our methods consistently outperform current state of the art [Heitz and Belcour 2019]. Notably, every method is superior to the histogram samplingapproach.

Method Bathroom Classroom Grey Room Living Room Modern Hall San Miguel Staircase White RoomMSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − × − nonoptimized 4 spp 14.03 3.13 3.15 7.92 7.88 3.02 3.38 5.62 5.24 17.06 3.53 8.73 7.18 4.53 2.78 7.96nonoptimized 16 spp 4.94 1.47 1.55 4.89 vertical (Ours) 9.03 Histogram [2019] 13.98 2.37 3.12 6.20 7.88 2.72 3.36 3.57 5.23 14.78 3.52 6.82 7.13 4.09 2.77 5.77 , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering:Supplemental document

VASSILLEN CHIZHOV,

MIA Group Saarland University, Max-Planck-Institut für Informatik, Germany

ILIYAN GEORGIEV,

Autodesk, United Kingdom

KAROL MYSZKOWSKI,

Max-Planck-Institut für Informatik, Germany

GURPRIT SINGH,

Max-Planck-Institut für Informatik, Germany

Additional Key Words and Phrases: path tracing, perceptual error, bluenoise, sampling

In the current supplemental we discuss various details related toour general formulation from the main paper. We start with a de-scription of the extension of our framework to the a-priori setting(Section 2). Then we show how the reference image substitutionwith a surrogate can be translated to a more general problem (Sec-tion 3), which also allows optimizing how close the result fits to thesurrogate. In Section 4 we describe a way in which textures can beaccounted for in our horizontal approach, so that mispredictions dueto multiplicative (and additive) factors are eliminated. In Section 5we describe ways in which the runtime of iterative energy minimiza-tion methods can be improved considerably. Notably, an expressionis derived allowing the energy difference due to trial swaps to beevaluated in constant time (no scaling with image size or kernelsize). In the remaining sections we analyze how current a-posteriori [Heitz and Belcour 2019] and a-priori [Georgiev and Fajardo 2016;Heitz et al. 2019] state of the art approaches can be related to ourframework. Interpretations are discussed, major sources of error areidentified, and the assumptions of the algorithms are made explicit.

We extend our theory to the a-priori setting and discuss the mainfactors affecting the quality. The quality of a-priori approaches isdetermined mainly by three factors: the energy, the search space,and the optimization strategy. We discuss each of those briefly inthe following paragraphs.

Our energy.

We extend the a-posteriori energy from the mainpaper in order to handle multiple estimators involving differentintegrands:

𝑄𝑄𝑄 , . . . ,𝑄𝑄𝑄 𝑇 , with associated weights 𝑤 , . . . , 𝑤 𝑇 : 𝐸 ( 𝑆𝑆𝑆 ) = 𝑇 ∑ 𝑡 = 𝑤 𝑡 ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 𝑡 ( 𝑆𝑆𝑆 ) −

𝐼𝐼𝐼 𝑡 ∏︁ . (1)In the above 𝑔𝑔𝑔 would typically be a low-pass kernel (e. g., Gauss-ian), and 𝐼𝐼𝐼 𝑡 is the integral of the function used in the estimator 𝑄𝑄𝑄 𝑡 .Through this energy a whole set of functions can be optimized for,in order for the sequence to be more robust to different scenes andestimators, that do not fit any of the considered integrands exactly. Authors’ addresses: Vassillen Chizhov, MIA Group Saarland University,, Max-Planck-Institut für Informatik, Saarbrücken, Germany; Iliyan Georgiev, Autodesk, UnitedKingdom; Karol Myszkowski, Max-Planck-Institut für Informatik, Saarbrücken, Ger-many; Gurprit Singh, Max-Planck-Institut für Informatik, Saarbrücken, Germany.

We note that the derived optimization in Section 5 is also applicableto the minimization of the proposed energy.

Search space.

The search space plays an important role towardsthe qualities which the optimized sequences exhibit. A more re-stricted search space provides more robustness and may help avoidoverfitting to the considered set of integrands.For instance, sample sets may be generated randomly within eachpixel. Then, their assignment to pixels may be optimized over thespace of all possible permutations. This is the setting of horizontal methods. If additionally this assignment is done within each dimen-sion separately it allows for an even better fit to the integrands inthe energy. The scrambling keys’ search space in [Heitz et al. 2019]is a special case of the latter applied for the Sobol sequence.Constraining the search space to points generated from low-discrepancy sequences provides further robustness and guaranteesdesirable integration properties of the considered sequences. Simi-larly to [Heitz et al. 2019], we can consider a search space of Sobolscrambling keys in order for the optimized sequence to have a lowdiscrepancy.Ideally, such integration properties should arise directly fromthe energy. However, in practice the scene integrand cannot beexpected to exactly match the set of considered integrands, thusextra robustness is gained through the restriction. Additionally,optimizing for many dimensions at the same time is costly as notedin [Heitz et al. 2019], thus imposing low-discrepancy properties alsohelps in that regard.Finally, by imposing strict search space constraints a severe re-striction on the distribution of the error is imposed. This can bealleviated by imposing the restrictions through soft penalty termsin the energy. This can allow for a trade-off between blue noisedistribution and integration quality for example.

Progressive rendering.

In order to make the sequence applicable inprogressive rendering, subsets of samples should be considered inthe optimization. Given a sample set 𝑆 𝑖 for pixel 𝑖 we can decomposeit in sample sets of 1 , . . . , 𝑁 samples: 𝑆 𝑖, ⊂ . . . ⊂ 𝑆 𝑖,𝑁 ≡ 𝑆 𝑖 . We denotethe respective images of sample sets 𝑆𝑆𝑆 , . . . ,𝑆𝑆𝑆 𝑁 .Then an energy that also optimizes for the distribution of theerror at each sample count is: 𝐸 ( 𝑆𝑆𝑆 ) = 𝑇 ∑ 𝑡 = 𝑁 ∑ 𝑘 = 𝑤 𝑡,𝑘 ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 𝑡 ( 𝑆𝑆𝑆 𝑘 ) − 𝐼𝐼𝐼 𝑡 ∏︁ . (2)If 𝑤 𝑖,𝑘 are set to zero for 𝑘 < 𝑁 then the original formulation isrecovered. The more general formulation imposes additional con-straints on the samples, thus the quality at the full sample count , Vol. 0, No. 0, Article 0. Publication date: December 2020. a r X i v : . [ c s . G R ] D ec :2 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh may be compromised if we also require a good quality at lowersample counts.Choosing samples from 𝑆 𝑖 for 𝑆 𝑖, , . . . , 𝑆 𝑖,𝑁 − (in each dimension)constitutes a vertical search space analogous to the one discussedin the main paper for a-posteriori methods. The ranking keys’ opti-misation in [Heitz et al. 2019] is a special case of this search spacefor the Sobol sequence.Adaptive sampling can be handled by allowing a varying numberof samples per pixel, and a corresponding energy derived from theone above. Note that this poses further restrictions on the achievabledistribution. Optimization strategies.

Typically the energies for a-priori meth-ods have been optimized through simulated annealing [Georgievand Fajardo 2016; Heitz et al. 2019]. Metaheuristics can lead to verygood minima especially if the runtime is not of great concern, whichis the case since the sequences are precomputed. Nevertheless, thecomputation still needs to be tractable. The energies in previousworks are generally not cheap to evaluate. On the other hand, ourenergies, especially if the optimizations in Section 5 are considered,can be evaluated very efficiently. This is beneficial for keeping theruntime of metaheuristics manageable, allowing for more complexsearch spaces to be considered.

Implementation details.

Implementation decisions for a renderer,such as how samples are consumed, or how those are mapped tothe hemisphere and light sources affect the estimator

𝑄𝑄𝑄 . This isimportant, especially when choosing

𝑄𝑄𝑄 for the described energies tooptimize a sequence. It is possible that very small implementationchanges make a previously ideal sequence useless for a specificrenderer. It is important to keep this in mind when optimisingsequences by using the proposed energies and when those are usedin a renderer.

For practical algorithms we substitute the reference image

𝐼𝐼𝐼 with asurrogate 𝐼 ′ 𝐼 ′ 𝐼 ′ . It is instructive to consider error bounds in relation tothat, and subsequently extend the formulation to one that allowscontrol over the amount of bias towards the surrogate. We denote 𝑄𝑄𝑄 = 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) , and 𝑄𝑄𝑄 ′ = 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ′ ) , where we want to optimize for thelatter, and 𝑄𝑄𝑄 remains fixed. ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ =∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ + ℎℎℎ ∗ ( 𝐼 ′ 𝐼 ′ 𝐼 ′ − 𝐼𝐼𝐼 )∏︁ ≤∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ + ∏︁ ℎℎℎ ∗ ( 𝐼 ′ 𝐼 ′ 𝐼 ′ − 𝐼𝐼𝐼 )∏︁ (3)The above bound results in the original formulation when consider-ing only the first term. We can also bound the error through a terminvolving 𝑄𝑄𝑄 : ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ =∏︁ 𝑔𝑔𝑔 ∗ ( 𝑄𝑄𝑄 ′ − 𝑄𝑄𝑄 ) + ( 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 − ℎℎℎ ∗ 𝐼𝐼𝐼 )∏︁ ≤∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝑄𝑄𝑄 ′ − 𝑄𝑄𝑄 ∏︁ + ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ . (4) Combining the two with weights 𝑐 and ( − 𝑐 ) , for 𝑐 ∈ (︀ , ⌋︀ yields: ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ ≤ 𝑐 ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝑄𝑄𝑄 ′ − 𝑄𝑄𝑄 ∏︁ + ( − 𝑐 )∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ + 𝑐 ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ + ( − 𝑐 )∏︁ ℎℎℎ ∗ ( 𝐼 ′ 𝐼 ′ 𝐼 ′ − 𝐼𝐼𝐼 )∏︁ . (5)We assume 𝑔𝑔𝑔 , ℎℎℎ , 𝐼𝐼𝐼 , 𝐼 ′ 𝐼 ′ 𝐼 ′ , 𝑄𝑄𝑄 , and 𝑐 to be fixed, and our optimizationvariable to be 𝑄𝑄𝑄 ′ . Then the optimization problem simplifies to:min 𝑄𝑄𝑄 ′ 𝑐 ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝑄𝑄𝑄 ′ − 𝑄𝑄𝑄 ∏︁ + ( − 𝑐 )∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ (6)The standard energy that we use is retrieved for 𝑐 = 𝑄𝑄𝑄 ′ ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ = arg min 𝑄𝑄𝑄 ′ ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ − ℎℎℎ ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ . (7) Our iterative energy minimization algorithms (Alg. 1, Alg. 2) directlywork with the original energy formulation, unlike error diffusionand dither matrix halftoning which only approximately minimizeour energy. This allows textures to be handled more robustly com-pared to the permutation approach of Heitz and Belcour.

Reducing misprediction errors.

Our horizontal approach relies ona dissimilarity metric 𝑑 (⋅ , ⋅) which approximates terms involvingthe difference ΔΔΔ due to swapping sample sets instead of pixels. Thisdifference can be decreased, so that 𝑑 is a better approximation, ifadditional information is factored out in the energy: screen-spacevarying multiplicative and additive terms. Specifically, if we havea spatially varying multiplicative image 𝛼𝛼𝛼 , and a spatially varyingadditive image 𝛽𝛽𝛽 : 𝑄𝑄𝑄 = 𝛼𝛼𝛼𝑄𝑄𝑄 ′ + 𝛽𝛽𝛽 (8) ΔΔΔ ′ ( 𝜋 ) = 𝛼𝛼𝛼 ⊙ 𝑄𝑄𝑄 ′ ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝛼𝛼𝛼 ⊙ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) (9) ΔΔΔ ( 𝜋 ) = 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) = 𝛼𝛼𝛼 ⊙ 𝑄𝑄𝑄 ′ ( 𝜋 ( 𝑆𝑆𝑆 )) + 𝛽𝛽𝛽 − 𝜋 ( 𝛼𝛼𝛼 ⊙ 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 ) + 𝛽𝛽𝛽 ) , (10)we can make use of this in our formulation: 𝐸 ( 𝜋 ) = ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ (11) ⌈︂ 𝐸 ( 𝜋 ) ≤ ∏︁ 𝑔𝑔𝑔 ∗ ( 𝛼𝛼𝛼 ⊙ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) + 𝛽𝛽𝛽 ) − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ + ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ ΔΔΔ ′ ∏︁ . (12)Contrast this to the original formulation where 𝛼𝛼𝛼 and 𝛽𝛽𝛽 are notfactored out: ⌈︂ 𝐸 ( 𝜋 ) ≤ ∏︁ 𝑔𝑔𝑔 ∗ 𝜋 ( 𝛼𝛼𝛼 ⊙ 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 ) + 𝛽𝛽𝛽 ) − ℎℎℎ ∗ 𝐼𝐼𝐼 ∏︁ + ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ ΔΔΔ ∏︁ . (13)With the new formulation it is sufficient that 𝑄𝑄𝑄 ′ ( 𝜋 ( 𝑆𝑆𝑆 )) = 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) for ΔΔΔ ′ to be zero, while originally both 𝛼𝛼𝛼 and 𝛽𝛽𝛽 play a role in ΔΔΔ becoming zero. Intuitively this means that screen space integranddifferences due to additive and multiplicative factors do not result inmispredictions with the new formulation, if the integrand is assumeto be the same (locally) in screen space.

Comparison to demodulation.

In the method of Heitz and Belcourthe permutation is applied on the albedo demodulated image. Thispreserves the property that the global minimum of the implicitenergy can be found through sorting. Translated to our framework , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:3

Input Ours Heitz and Belcour [2019] m u l t i p l i c a t i v e a dd i t i v e No Demodulation Demodulationdemodulation w/ tilesize 8 w/o tiling

Fig. 1. We demonstrate the importance of the extension presented in Sec-tion 4. A high-frequency sinusoidal texture is corrupted by white noise(leftmost column) multiplicatively ( top row ) and additively ( bottom row ).Contrary to Heitz and Belcour’s method, our optimization distributes erroras a high-quality blue-noise distribution (see the power-spectrum insets).The reference images for the top/bottom image are respectively a flat greyand a sinusoidal image. this can be formulated as (

𝐵𝐵𝐵 is a blue noise mask optimized for akernel 𝑔𝑔𝑔 ): 𝐸 𝐻𝐵𝑃 ( 𝜋 ) = ∏︁ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) − 𝐼 ′ 𝐼 ′ 𝐼 ′ − 𝐵𝐵𝐵 ∏︁ ≈ ∏︁ 𝑔𝑔𝑔 ∗ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) − 𝑔𝑔𝑔 ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ . (14)We have assumed that 𝛽𝛽𝛽 is zero, but we can also extend the methodto handle an additive term 𝛽𝛽𝛽 as in our case. The more importantdistinction is that while the albedo demodulated image 𝑄𝑄𝑄 ′ is usedin the permutation, it is never remodulated ( 𝛼𝛼𝛼 ⊙ ⋅ is missing). Thus,this does not allow for proper handling of textures, even if it allowsfor modest improvements in practice. An example of a fail caseconsists of an image 𝛼𝛼𝛼 that is close to white noise. Then the errordistribution will also be close to white noise due to the missing 𝛼𝛼𝛼 ⊙ ⋅ factor. More precisely, even if 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) − 𝐼 ′ 𝐼 ′ 𝐼 ′ is distributed as 𝐵𝐵𝐵 , thisdoes not imply that 𝛼𝛼𝛼 ⊙ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) − 𝐼 ′ 𝐼 ′ 𝐼 ′ will be distributed similarly.Dropping 𝛼𝛼𝛼 ⊙ ⋅ is, however, a reasonable option if one is restrictedto sorting as an optimisation strategy.We propose a modification of the original approach (and energy)such that not only the demodulated estimator values are used, butthe blue noise mask 𝐵𝐵𝐵 is also demodulated. To better understandhow it is derived (and how 𝛽𝛽𝛽 may be integrated) we study a boundbased on the assumption that 𝛼 𝑖 ∈ (︀ , ⌋︀ , and ΔΔΔ ′ = 𝐸 ( 𝜋 ) = ∏︁ 𝑔𝑔𝑔 ∗ ( 𝛼𝛼𝛼 ⊙ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) + 𝛽𝛽𝛽 ) − 𝑔𝑔𝑔 ∗ 𝐼 ′ 𝐼 ′ 𝐼 ′ ∏︁ ≈ (15) ∏︁ 𝛼𝛼𝛼 ⊙ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) + 𝛽𝛽𝛽 − 𝐼 ′ 𝐼 ′ 𝐼 ′ − 𝐵𝐵𝐵 ∏︁ = (16) ∑ 𝑖 𝛼 𝑖 (( 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 ))) 𝑖 + 𝛽 𝑖 − 𝐼 ′ 𝑖 − 𝐵 𝑖 𝛼 𝑖 ) ≤ (17) ∏︁ 𝜋 ( 𝑄𝑄𝑄 ′ ( 𝑆𝑆𝑆 )) + 𝛽𝛽𝛽 − 𝐼 ′ 𝐼 ′ 𝐼 ′ − 𝐵𝐵𝐵𝛼𝛼𝛼 ∏︁ . (18)The global minimum of the last energy (w.r.t. 𝜋 ) can be foundthrough sorting also, since there is no spatially varying multiplica-tive factor 𝛼𝛼𝛼 in front of the permutation. Sinusoidal Textures.

To demonstrate texture handling (multiplica-tive term 𝛼𝛼𝛼 ), in the top row of Fig. 1, a white-noise texture 𝑊 is multi-plied to a sine-wave input signal: 𝑓 ( 𝑥,𝑦 ) = . ∗( . + sin ( 𝑥 + 𝑦 ))∗ 𝑊 ( 𝑥,𝑦 ) . The reference is a constant image at 0 .

5. Heitz and Belcour proposed to handle such textures by applying their method on thealbedo-demodulated image. While this strategy may lead to a modestimprovement, it ignores the fact that the image is produced by re-modulating the albedo, which can negate that improvement. Instead,our horizontal iterative minimization algorithm can incorporate thealbedo explicitly using the discussed energy.The bottom row demonstrates the effect of a non-flat signal onthe error distribution (additive term 𝛽𝛽𝛽 ). Here 𝑊 is added to a sine-wave input signal: 𝑓 ( 𝑥,𝑦 ) = . ∗ ( . + sin ( 𝑥 + 𝑦 )) + 𝑊 ( 𝑥,𝑦 ) . Thereference image is 0 . ∗ ( + sin ( 𝑥 + 𝑦 )) . Our optimization is closerto the reference suggesting that our method can greatly outperformthe current state of the art by properly accounting for auxiliaryinformation, especially in regions with high-frequency textures. Dimensional decomposition.

The additive factor 𝛽𝛽𝛽 can be used tomotivate splitting the optimization over several dimensions, sincethe Liouville–Neumann expansion of the rendering equation is ad-ditive [Kajiya 1986]. If some dimensions are smooth (e. g., lowerdimensions), then a screen space local integrand similarity assump-tion can be encoded in 𝑑 (⋅ , ⋅) and it will approximate ΔΔΔ better forsmoother dimensions. If the optimization is applied over all dimen-sions at the same time, this may result in many mispredictions dueto the assumption being violated for dimensions in which the in-tegrand is less smooth in screen space (e. g., higher dimensions).We propose splitting the optimization problem starting from lowerdimensions and sequentially optimizing higher dimensions whileencoding a local smoothness (in screen space) assumption on theintegrad in 𝑑 (⋅ , ⋅) (e. g., swaps limited to a small neighbourhoodaround the pixel). This requires solving several optimization prob-lems, but potentially reduces the amount of mispredictions. Notethat it does not require more rendering operations than usual. The main cost of iterative minimization methods is computing theenergy for each trial swap, more specifically the required convolu-tion and the subsequent norm computation. In the work of Analouiand Allebach an optimisation has been proposed to efficiently eval-uate such trial swaps, without recomputing a convolution or normat each step, yielding a speed up of more than 10 times. The opti-misation relies on the assumption that the kernel 𝑔𝑔𝑔 is the same inscreen space (the above optimization is not applicable for spatiallyvarying kernels). We extend the described optimisation to a moregeneral case, also including spatially varying kernels. We also notesome details not mentioned in the original paper.

We will assume the most general case: instead of just swappingpixels, we consider swapping sample sets from which values aregenerated through

𝑄𝑄𝑄 . It subsumes both swapping pixel values andswapping pixel values in the presence of a multiplicative factor 𝛼𝛼𝛼 . Single swap.

The main goal is to evaluate the change of the energy 𝛿 due to a swap between the sample sets of some pixels 𝑎,𝑏 . Moreprecisely, if the original sample set image is 𝑆𝑆𝑆 then the new sampleset image is

𝑆𝑆𝑆 ′ such that 𝑆 ′ 𝑎 = 𝑆 𝑏 , 𝑆 ′ 𝑏 = 𝑆 𝑎 , and 𝑆 ′ 𝑖 = 𝑆 𝑖 everywhereelse. This corresponds to images: 𝑄𝑄𝑄 = 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ) and 𝑄𝑄𝑄 ′ = 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 ′ ) . The , Vol. 0, No. 0, Article 0. Publication date: December 2020. :4 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh two images differ only in the pixels with indices 𝑎 and 𝑏 . Let: 𝛿 𝑎 = 𝑄 ′ 𝑎 − 𝑄 𝑎 = 𝑄 𝑎 ( 𝑆 𝑏 ) − 𝑄 𝑎 ( 𝑆 𝑎 ) (19) 𝛿 𝑏 = 𝑄 ′ 𝑏 − 𝑄 𝑏 = 𝑄 𝑏 ( 𝑆 𝑎 ) − 𝑄 𝑏 ( 𝑆 𝑏 ) . (20)We will also denote the convolved images ˜ 𝑄𝑄𝑄 = 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 and ˜

𝑄𝑄𝑄 ′ = 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ′ ,and also 𝜖𝜖𝜖 = ˜ 𝑄𝑄𝑄 − 𝐼𝐼𝐼 . Specifically:˜ 𝑄 𝑖 = ∑ 𝑗 ∈ Z 𝑄 𝑗 𝑔 𝑖 − 𝑗 , ˜ 𝑄 ′ 𝑖 = ˜ 𝑄 𝑖 + 𝛿 𝑎 𝑔 𝑖 − 𝑎 + 𝛿 𝑏 𝑔 𝑖 − 𝑏 . (21)We want to be able to efficiently evaluate 𝛿 = ∏︁ ˜ 𝑄𝑄𝑄 ′ − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 − 𝐼𝐼𝐼 ∏︁ ,since in the iterative minimization algorithms the candidate with theminimum 𝛿 is kept. Using the above expressions for ˜ 𝑄 ′ 𝑖 we rewrite 𝛿 as: 𝛿 = ∏︁ ˜ 𝑄𝑄𝑄 ′ − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 − 𝐼𝐼𝐼 ∏︁ = (22) ∑ 𝑖 ∈ Z ∏︁ ˜ 𝑄 𝑖 − 𝐼 𝑖 + 𝛿 𝑎 𝑔 𝑖 − 𝑎 + 𝛿 𝑏 𝑔 𝑖 − 𝑏 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 − 𝐼𝐼𝐼 ∏︁ = (23)2 ∑ 𝑖 ∈ Z ∐︀ ˜ 𝑄 𝑖 − 𝐼 𝑖 , 𝛿 𝑎 𝑔 𝑖 − 𝑎 + 𝛿 𝑏 𝑔 𝑖 − 𝑏 ̃︀ + ∑ 𝑖 ∈ Z ∏︁ 𝛿 𝑎 𝑔 𝑖 − 𝑎 + 𝛿 𝑏 𝑔 𝑖 − 𝑏 ∏︁ = (24)2 ∐︀ 𝛿 𝑎 , ∑ 𝑖 ∈ Z 𝜖 𝑖 𝑔 𝑖 − 𝑎 ̃︀ + ∐︀ 𝛿 𝑏 , ∑ 𝑖 ∈ Z 𝜖 𝑖 𝑔 𝑖 − 𝑏 ̃︀+∐︀ 𝛿 𝑎 , ∑ 𝑖 ∈ Z 𝑔 𝑖 − 𝑎 𝑔 𝑖 − 𝑎 ̃︀ + ∐︀ 𝛿 𝑏 , ∑ 𝑖 ∈ Z 𝑔 𝑖 − 𝑏 𝑔 𝑖 − 𝑏 ̃︀+ ∐︀ 𝛿 𝑎 𝛿 𝑏 , ∑ 𝑖 ∈ Z 𝑔 𝑖 − 𝑎 𝑔 𝑖 − 𝑏 ̃︀ = (25)2 ∐︀ 𝛿 𝑎 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑎 )̃︀ + ∐︀ 𝛿 𝑏 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑏 )̃︀+∐︀( 𝛿 𝑎 + 𝛿 𝑏 ) ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( )̃︀ + ∐︀ 𝛿 𝑎 𝛿 𝑏 ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑏 − 𝑎 )̃︀ , (26)where 𝐶 𝑓 ,ℎ ( 𝑥 ) = ∑ 𝑖 ∈ Z 𝑓 ( 𝑖 − 𝑥 ) ℎ ( 𝑖 ) is the cross-correlation of 𝑓 and ℎ . We have reduced the computation of 𝛿 to the sum of only 4terms. Assuming that 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 is known (it can be precomputed once fora known kernel) and that 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 is known (it can be recomputed aftera sufficient amount of swaps have been accepted), then evaluatinga trial swap takes constant time (it does not scale in the size of theimage or the size of the kernel). Multiple accepted swaps.

It may be desirable to avoid recomputing 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 even upon accepting a trial swap. For that purpose we extendthe strategy from [Analoui and Allebach 1992] for computing 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ,where 𝜖𝜖𝜖 𝑛 is the error image after 𝑛 swaps have been accepted: {( 𝛿 𝑎 , 𝛿 𝑏 ) , . . . , ( 𝛿 𝑎 𝑛 , 𝛿 𝑏 𝑛 )} . (27)This implies: ˜ 𝑄 𝑛𝑖 = ˜ 𝑄 + ∑ 𝑛𝑘 = ( 𝛿 𝑎 𝑘 𝑔 𝑖 − 𝑎 𝑘 + 𝛿 𝑏 𝑘 𝑔 𝑖 − 𝑏 𝑘 ) , and conse-quently: 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑥 ) = (28) ∑ 𝑖 ∈ Z ⎛⎝ ˜ 𝑄 𝑖 − 𝐼 𝑖 + 𝑛 ∑ 𝑘 = ( 𝛿 𝑎 𝑘 𝑔 𝑖 − 𝑎 𝑘 + 𝛿 𝑏 𝑘 𝑔 𝑖 − 𝑏 𝑘 )⎞⎠ 𝑔 𝑖 − 𝑥 = (29) 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑥 ) + 𝑛 ∑ 𝑘 = ( 𝛿 𝑎 𝑘 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥 − 𝑎 𝑘 ) + 𝛿 𝑏 𝑘 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥 − 𝑏 𝑘 )) . (30)This allows avoiding the recomputation of 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 after every acceptedswap, and instead, the delta on the 𝑛 + 𝛿 𝑎 , 𝛿 𝑏 is: 𝛿 𝑛 + = ∏︁ 𝑄𝑄𝑄 𝑛 + − 𝐼𝐼𝐼 ∏︁ − ∏︁ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (31)2 ∐︀ 𝛿 𝑎 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑎 )̃︀ + ∐︀ 𝛿 𝑏 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑏 )̃︀+∐︀( 𝛿 𝑎 + 𝛿 𝑏 ) ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( )̃︀ + ∐︀ 𝛿 𝑎 𝛿 𝑏 ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑏 − 𝑎 )̃︀ , (32)where 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 is computed from 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 and 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 as derived in Eq. (22).This computation scales only in the number of accepted swapssince the last recomputation of 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 . We also note that 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥 − 𝑦 ) evaluates to zero if 𝑥 − 𝑦 is outside of the support of 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 . Addi-tional optimisations have been devised due to this fact [Analoui andAllebach 1992]. In the vertical setting swaps happen only within the pixel itself,that is: 𝛿 𝑎 = 𝑄 𝑎 ( 𝑆 ′ 𝑎 ) − 𝑄 𝑎 ( 𝑆 𝑎 ) . Consequently, ˜ 𝑄 ′ 𝑖 = ˜ 𝑄 𝑖 + 𝛿 𝑎 𝑔 𝑖 − 𝑎 .Computing the difference in the energies for the 𝑛 + 𝛿 𝑛 + = ∏︁ ˜ 𝑄𝑄𝑄 𝑛 + − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (33) ∑ 𝑖 ∈ Z ∏︁ ˜ 𝑄 𝑛𝑖 − 𝐼 𝑖 + 𝛿 𝑎 𝑔 𝑖 − 𝑎 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (34)2 ∑ 𝑖 ∈ Z ∐︀ ˜ 𝑄 𝑛𝑖 − 𝐼 𝑖 , 𝛿 𝑎 𝑔 𝑖 − 𝑎 ̃︀ + ∑ 𝑖 ∈ Z ∏︁ 𝛿 𝑎 𝑔 𝑖 − 𝑎 ∏︁ = (35)2 ∐︀ 𝛿 𝑎 , ∑ 𝑖 ∈ Z 𝜖 𝑛𝑖 𝑔 𝑖 − 𝑎 ̃︀ + ∐︀ 𝛿 𝑎 , ∑ 𝑖 ∈ Z 𝑔 𝑖 − 𝑎 𝑔 𝑖 − 𝑎 ̃︀ = (36)2 ∐︀ 𝛿 𝑎 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑎 )̃︀ + ∐︀ 𝛿 𝑎 ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( )̃︀ . (37)The corresponding expression for 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 is: 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑥 ) = 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑥 ) + 𝑛 ∑ 𝑘 = 𝛿 𝑎 𝑘 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥 − 𝑎 𝑘 ) . (38) If the search space is ignored and the formulation is analyzed inan abstract setting it becomes obvious that the vertical approachcorresponds to an update of a single pixel, while the horizontal approach corresponds to an update of two pixels at the same time.This can be generalized further. Let 𝑁 different pixels be updatedper trial, and let there be 𝑛 trials that have been accepted since 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 has been updated. Let the pixels to be updated in the current trialbe: 𝑎 𝑛 + , . . . , 𝑎 𝑛 + 𝑁 , and the accepted update at step 𝑘 be at pixels: 𝑎 𝑘 , . . . , 𝑎 𝑘𝑁 . Let 𝑄𝑄𝑄 = 𝑄𝑄𝑄 be the original image. We define the sequenceof images:

𝑄𝑄𝑄 𝑘 ∶ 𝑄 𝑘𝑖 = 𝑄 𝑘 − 𝑖 , 𝑖 ⇑∈ { 𝑎 𝑘 , . . . , 𝑎 𝑘𝑁 } and otherwise let 𝑄 𝑘𝑎 𝑘𝑖 be given. Let 𝛿 𝑘𝑖 = 𝑄 𝑘𝑎 𝑘𝑖 − 𝑄 𝑘 − 𝑎 𝑘𝑖 . Using the above notation we arriveat an expression for 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 : 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑥 ) = 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑥 ) + 𝑛 ∑ 𝑘 = 𝑁 ∑ 𝑖 = 𝛿 𝑘𝑖 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥 − 𝑎 𝑘𝑖 ) . (39)The change in the energy due to the 𝑛 + , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:5 𝛿 𝑛 + = ∏︁ ˜ 𝑄𝑄𝑄 𝑛 + − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (40) ∑ 𝑖 ∈ Z ∏︁ ˜ 𝑄 𝑛𝑖 − 𝐼 𝑖 + 𝑁 ∑ 𝑗 = 𝛿 𝑛 + 𝑗 𝑔 𝑖 − 𝑎 𝑛 + 𝑗 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (41)2 ∑ 𝑖 ∈ Z ∐︀ ˜ 𝑄 𝑛𝑖 − 𝐼 𝑖 , 𝑁 ∑ 𝑗 = 𝛿 𝑛 + 𝑗 𝑔 𝑖 − 𝑎 𝑛 + 𝑗 ̃︀ + ∑ 𝑖 ∈ Z ∏︁ 𝑁 ∑ 𝑗 = 𝛿 𝑛 + 𝑗 𝑔 𝑖 − 𝑎 𝑛 + 𝑗 ∏︁ = (42)2 𝑁 ∑ 𝑗 = ∐︀ 𝛿 𝑛 + 𝑗 , ∑ 𝑖 ∈ Z 𝜖 𝑛𝑖 𝑔 𝑖 − 𝑎 𝑛 + 𝑗 ̃︀+ 𝑁 ∑ 𝑗 = 𝑁 ∑ 𝑘 = ∐︀ 𝛿 𝑛 + 𝑗 𝛿 𝑛 + 𝑘 , ∑ 𝑖 ∈ Z 𝑔 𝑖 − 𝑎 𝑛 + 𝑗 𝑔 𝑖 − 𝑎 𝑛 + 𝑘 ̃︀ = (43)2 𝑁 ∑ 𝑗 = ∐︀ 𝛿 𝑛 + 𝑗 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑎 𝑛 + 𝑗 )̃︀+ 𝑁 ∑ 𝑗 = 𝑁 ∑ 𝑘 = ∐︀ 𝛿 𝑛 + 𝑗 𝛿 𝑛 + 𝑘 ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑎 𝑛 + 𝑗 − 𝑎 𝑛 + 𝑘 )̃︀ . (44) Leaky energy.

Similar to the original paper [Analoui and Allebach1992], in our extension 𝛿 was computed for a "leaky energy" whichextended the support of the image by convolution. That is reflectedin the fact that the sums are over Z . To rectify this, the sum needsto be limited to the support of 𝐼𝐼𝐼 . This would require clamped sums ofthe cross-correlation to be evaluated, which can also be precomputedbut requires extra memory. The same holds for the cross-correlationwith 𝜖𝜖𝜖 , where clamped terms are required near the image boundary.

Reflecting boundary conditions.

Another desirable property maybe a convolution such that it acts on the image extended to bereflected at the boundaries - this avoids artifacts near the borders.This can be achieved by including the relevant terms includingpixels for which the kernel is partially outside of the support of

𝐼𝐼𝐼 . Care must be taken when expressing ˜ 𝑄 𝑖 , however, since it mayinclude the same updated pixel numerous times (especially if it isnear the border). The same ideas apply for a toroidally extendedconvolution. Further optimisations.

Various other strategies have been pro-posed in the literature for improving the runtime of iterative errorminimization approaches for halftoning.In our algorithms we usually use a randomized initial state, how-ever, it is possibly to initialize the algorithms with the result ofa dither matrix halftoning algorithm or error diffusion algorithmwhich would result in faster convergence [Analoui and Allebach1992].Another strategy involves partitioning the image in blocks. In-stead of updating the pixels in raster or serpentine order, the blocksare updated simultaneously by keeping only the best update perblock in each iteration. This has been reported to run 10+ timesfaster [Lieberman and Allebach 1997]. In the same paper [Liebermanand Allebach 1997], approximating the kernel with box functionshas been proposed yielding a speed up of 6 times. Similarly, if thekernel is separable or can be approximated by a separable kernel, the convolution can also be made considerably faster. A speed-upof an additional 30 times has been reported in [Koge et al. 2014]through the usage of a GPU.Finally, several heuristics related to the the order in which pixelsare iterated over have been proposed in [Bhatt et al. 2006].

We propose an optimisation for spatially varying kernels also. Letkernel 𝑔𝑔𝑔 𝑖 be associated with pixel 𝑖 . Let pixel 𝑎 be updated to a newvalue 𝑄 ′ 𝑎 , while everywhere else the images match: 𝑄 ′ 𝑖 = 𝑄 𝑖 , and 𝛿 𝑎 = 𝑄 ′ 𝑎 − 𝑄 𝑎 . We denote ˜ 𝑄 𝑖 = ∐︀ 𝑔𝑔𝑔 𝑖 ,𝑄𝑄𝑄 ̃︀ , ˜ 𝑄 ′ 𝑖 = ∐︀ 𝑔𝑔𝑔 𝑖 ,𝑄𝑄𝑄 ′ ̃︀ = ˜ 𝑄 𝑖 + 𝑔 𝑖,𝑎 𝛿 𝑎 .Our goal is to evaluate the change in the energy due to the update: 𝛿 = ∏︁ ˜ 𝑄𝑄𝑄 ′ − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 − 𝐼𝐼𝐼 ∏︁ = (45) ∑ 𝑖 ∈ Z ∏︁ ˜ 𝑄 𝑖 − 𝐼 𝑖 + 𝑔 𝑖,𝑎 𝛿 𝑎 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 − 𝐼𝐼𝐼 ∏︁ = (46)2 ∑ 𝑖 ∈ Z ∐︀ 𝜖 𝑖 , 𝑔 𝑖,𝑎 𝛿 𝑎 ̃︀ + ∑ 𝑖 ∈ Z ∏︁ 𝑔 𝑖,𝑎 𝛿 𝑎 ∏︁ = (47)2 ∐︀ 𝛿 𝑎 , ∑ 𝑖 ∈ Z 𝜖 𝑖 𝑔 𝑖,𝑎 ̃︀ + ∐︀ 𝛿 𝑎 , ∑ 𝑖 ∈ Z 𝑔 𝑖,𝑎 𝑔 𝑖,𝑎 ̃︀ . (48)In the above 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑎 ) = ∑ 𝑖 ∈ Z 𝑔 𝑖,𝑎 𝑔 𝑖,𝑎 may be precomputed for every 𝑎 , which yields a function with support supp ( 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ) = ⋃ 𝑖 supp ( 𝑔𝑔𝑔 𝑖 ) ,and 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑎 ) = ∑ 𝑖 ∈ Z 𝜖 𝑖 𝑔 𝑖,𝑎 can also be recomputed after enoughupdates have been accepted. Multiple accepted updates.

Let a set of accepted updates results inthe differences: { 𝛿 𝑎 , . . . , 𝛿 𝑎 𝑛 } . And let 𝜖𝜖𝜖 𝑛 be the error image afterthe updates. We derive an expression for the efficient evaluation of 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 : 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑥 ) = ∑ 𝑖 ∈ Z 𝜖 𝑛𝑖 𝑔 𝑖,𝑥 = 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑥 ) + 𝑛 ∑ 𝑘 = 𝛿 𝑎 𝑘 ∑ 𝑖 ∈ Z 𝑔 𝑖,𝑎 𝑘 𝑔 𝑖,𝑥 . (49)An efficient computation of 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 can then be achieved if thefunction 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥,𝑦 ) = ∑ 𝑖 ∈ Z 𝑔 𝑖,𝑥 𝑔 𝑖,𝑦 is precomputed. Then, at step 𝑛 + 𝛿 𝑛 + = ∏︁ ˜ 𝑄𝑄𝑄 𝑛 + − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (50)2 ∐︀ 𝛿 𝑎 𝑛 + ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑎 𝑛 + )̃︀ + ∐︀ 𝛿 𝑎 𝑛 + ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑎 𝑛 + )̃︀ . (51) Multiple simultaneous updates.

We derive an expression wherean update consists of changing 𝑁 pixels simultaneously, and weassume that 𝑛 such updates have been accepted previously. Wedenote the differences of the pixels in update 𝑘 : { 𝛿 𝑘 , . . . , 𝛿 𝑘𝑁 } . Theexpression for the change in the energy is given as: 𝛿 𝑛 + = ∏︁ ˜ 𝑄𝑄𝑄 𝑛 + − 𝐼𝐼𝐼 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (52) ∑ 𝑖 ∈ Z ∏︁ ˜ 𝑄 𝑛𝑖 − 𝐼 𝑖 + 𝑁 ∑ 𝑗 = 𝛿 𝑛 + 𝑗 𝑔 𝑖,𝑎 𝑛 + 𝑗 ∏︁ − ∏︁ ˜ 𝑄𝑄𝑄 𝑛 − 𝐼𝐼𝐼 ∏︁ = (53)2 𝑁 ∑ 𝑗 = ∐︀ 𝛿 𝑛 + 𝑗 ,𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑎 𝑛 + 𝑗 )̃︀ + 𝑁 ∑ 𝑖 = 𝑁 ∑ 𝑗 = ∐︀ 𝛿 𝑛 + 𝑖 𝛿 𝑛 + 𝑗 ,𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑎 𝑛 + 𝑖 , 𝑎 𝑛 + 𝑗 ̃︀ . (54) , Vol. 0, No. 0, Article 0. Publication date: December 2020. :6 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh Where 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑥,𝑦 ) = ∑ 𝑖 ∈ Z 𝑔 𝑖,𝑥 𝑔 𝑖,𝑦 is assumed to be precomputed,and 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 can be computed as: 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 𝑛 ( 𝑥 ) = 𝐶 𝑔𝑔𝑔,𝜖𝜖𝜖 ( 𝑥 ) + 𝑛 ∑ 𝑘 = 𝑁 ∑ 𝑗 = 𝛿 𝑎 𝑘𝑗 𝐶 𝑔𝑔𝑔,𝑔𝑔𝑔 ( 𝑎 𝑘𝑗 , 𝑥 ) . (55) We show that the recent publications [Georgiev and Fajardo 2016;Heitz et al. 2019; Heitz and Belcour 2019] on blue noise error distribu-tion for path tracing, can be seen as special cases in our framework.This allows for a novel analysis and interpretation of the results inthe aforementioned works. We also state the necessary assumptionsand approximations necessary to get from our general formulationto the algorithms presented in the papers.

Classification.

The proposed techniques can be divided into a-priori [Georgiev and Fajardo 2016; Heitz et al. 2019] and a-posteriori [Heitz and Belcour 2019]. The main difference is that for a-priori techniques broad assumptions are made on the integrand withoutrelying on information from renderings of the current scene. Thecited a-priori approaches describe ways for constructing offlineoptimized point sets/sequences. We denote the method in [Georgievand Fajardo 2016] as BNDS (blue-noise dithered sampling), themethod in [Heitz et al. 2019] as HBS (Heitz-Belcour Sobol), and thehistogram and permutation method in [Heitz and Belcour 2019] asHBH and HBP respectively (Heitz-Belcour histogram/permutation).

Energy.

HBH/HBP both rely on a blue noise dither matrix op-timised while using a Gaussian kernel (through void-and-cluster[Ulichney 1993]). This kernel corresponds to the kernel in our frame-work 𝑔𝑔𝑔 . The optimisation of this dither matrix happens offline unlikein our iterative energy minimization algorithms. This imposes mul-tiple restrictions while allowing for a lower runtime. On the otherhand, the dither matrices in HBS and BNDS are optimized withrespect to empirically motivated energies that cannot be relateddirectly to what is used as energy in HBH and HBP. In the case ofBNDS the energy does not even introduce an implicit integrand,and instead it is devised to represent a whole class of integrands.We propose to substitute those empirically motivated energies witha modified version of our energy. This allows an intuitive interpre-tation and relating a-posteriori approaches to a-priori approaches.

Search space.

Another notable difference constitute the searchspaces on which the different approaches operate. HBH selectsa subset from a set of precomputed samples in each pixel, HBPpermutes the assignment of sample sets to pixels, BNDS directlymodifies the set of samples in each pixel, and HBS considers a searchspace made up of scrambling and ranking keys for a Sobol sequence.Working on the space of scrambling and ranking keys guaranteesthe preservation of the desirable integration qualities of the Sobolsequence used, and it should be clear that other methods can alsobe restricted to such a space. Clearly, a search space restrictiondiminishes the achievable blue noise quality. On the other hand, itmakes sequences more robust to integrands for which those werenot optimized.

In this section we analyze the permutation based approach (HBP)and the histogram sampling approach (HBH) proposed in [Heitz andBelcour 2019]. The two methods can be classified as dither matrixhalftoning methods in our framework, that operate on a horizontal and vertical search space respectively. We make the approximationsand assumptions necessary to get from our general formulation toHBP/HBH explicit.We also note that a-posteriori methods lead to solutions thatadapt to the current render by exploiting known information (e.g.previously rendered data, auxiliary buffers). They can generallyproduce better results than a-priori methods.Both HBP and HBH rely on a blue noise dither matrix

𝐵𝐵𝐵 . Let

𝐵𝐵𝐵 bethe optimized blue noise dither matrix resulting from the minimiza-tion of 𝐸 ( 𝐵𝐵𝐵 ) = ∏︁ 𝑔𝑔𝑔 ∗ 𝐵𝐵𝐵 ∏︁ over a suitable search space. The kernel 𝑔𝑔𝑔 isthe one used to generate the blue noise images for HBP/HBH. Thatis, the Gaussian kernel in the void-and-cluster method [Ulichney1993]. Our analysis does not rely on the kernel being a Gaussian,or on the void-and-cluster optimization, this is simply the settingof the HBP/HBH method. In the more general setting any kernel isadmissible. The permutation approach [Heitz and Belcour 2019] consists oftwo main parts: sorting (optimization), and retargeting (correctingfor mispredictions). The sorting step in HBP can be interpreted asminimizing the energy: 𝐸 𝐻𝐵𝑃 ( 𝜋 ) = ∏︁ 𝜋 ( 𝑄𝑄𝑄 )− 𝑓 ( 𝐵𝐵𝐵 )∏︁ , ∀ 𝑓 ∶ 𝑎 < 𝑏 (cid:212)⇒ 𝑓 ( 𝑎 ) < 𝑓 ( 𝑏 ) . (56)A global minimum of the above energy is achieved for a permutation 𝜋 that matches the order statistics of 𝑄𝑄𝑄 and

𝐵𝐵𝐵 . Thus our goal wouldbe to get from the minimization of: 𝐸 ( 𝜋 ) = ∏︁ 𝑔𝑔𝑔 ∗ ( 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) −

𝐼𝐼𝐼 )∏︁ = ∏︁ 𝑔𝑔𝑔 ∗ 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 ))∏︁ , (57)to the minimization of Eq. (56) over a suitable search space (inpractice it is limited to permutations within tiles).We successively bound the error, while introducing the assump-tions implicit to the HBP method. The bounds are not tight, how-ever, the different error terms that we consider illustrate the majorsources of error due to the approximation of the more general en-ergy (Eq. (57)) with a simpler one (Eq. (56)). The substitution of thekernel convolution 𝑔𝑔𝑔 ∗ ⋅ by a difference with a blue noise mask 𝐵𝐵𝐵 restricts the many possible blue noise error distributions towardswhich 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) can go with a single one: 𝐵𝐵𝐵 . A global minimizer ofthe new simplified energy can thus be found by just sorting.The closer the distributions of 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) and 𝛼𝐵𝐵𝐵, 𝛼 > 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) and 𝛼𝐵𝐵𝐵 can be matched closelyin practice. A different way to reduce the approximation error isto introduce a sufficient amount of different blue noise images andpick the one that minimizes the error. We start with the originalenergy (Eq. (57)) and bound it through terms that capture the mainassumptions on which the model relies: , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:7 ∏︁ 𝑔𝑔𝑔 ∗ 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 ))∏︁ = min 𝑓 ∏︁ 𝑔𝑔𝑔 ∗ ( 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝑓 ( 𝐵𝐵𝐵 ) + 𝑓 ( 𝐵𝐵𝐵 ))∏︁ ≤ min 𝛼 > ,𝑓 ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝑓 ( 𝐵𝐵𝐵 )∏︁ +∏︁ 𝑔𝑔𝑔 ∗ ( 𝑓 ( 𝐵𝐵𝐵 ) − 𝛼𝐵𝐵𝐵 + 𝛼𝐵𝐵𝐵 )∏︁ ≤ min 𝛼 > ,𝑓 ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝑓 ( 𝐵𝐵𝐵 )∏︁ +∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁( 𝑓 ( 𝐵𝐵𝐵 ) − 𝛼𝐵𝐵𝐵 )∏︁ + 𝛼 ∏︁ 𝑔𝑔𝑔 ∗ 𝐵𝐵𝐵 ∏︁ . (58)In the above, 𝑓 is taken over the space of all strictly monotonicallyincreasing functions, and 𝛼 > 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) and 𝐵𝐵𝐵 (this allows for thesecond term to go to zero as the pointwise error goes to zero).

We note that

𝐵𝐵𝐵 is precomputed offline inorder to approximately minimize 𝐸 ( 𝐵𝐵𝐵 ) = ∏︁ 𝑔𝑔𝑔 ∗ 𝐵𝐵𝐵 ∏︁ . Thus, the thirdterm reflects the quality of the blue noise achieved with respect to 𝑔𝑔𝑔 in the offline minimization. This error can be made small withouta performance penalty since the optimization is performed offline.We factor out a multiplicative scaling factor 𝛼 > 𝐵𝐵𝐵 to be normalized in the range (︀− , ⌋︀ andwe can encode the scaling in 𝛼 . The second term reflects the error intro-duced by substituting a large search space (many local minima) witha small search space. It introduces the first implicit assumption ofHBP by relating the first and third error terms (by using 𝑓 and 𝛼 respectively) through the second error term. The assumption is thatthere exists a permutation for which 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) can be made close to 𝛼𝐵𝐵𝐵 , which would make the second term small. This holds in practiceif the pixel-wise error is zero on average (unbiased estimator withineach pixel), and we have a sufficiently large resolution/tiles: whichresults in a higher probability that pixels from 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) can match 𝐵𝐵𝐵 well. Then the term ∏︁ 𝑔𝑔𝑔 ∏︁ ∏︁ 𝑓 ( 𝐵𝐵𝐵 ) − 𝛼𝐵𝐵𝐵 ∏︁ can be made small. Wenote that this is a generalization of the third optimality condition in[Heitz and Belcour 2019] ( correlation-preserving integrand ) since anintegrand linear in the samples can also better match 𝐵𝐵𝐵 providedenough pixels. For a linear integrand the optimal 𝑓 is also a linearfunction (ideal correlation between samples and integrand). Themain difference between a linear integrand and a nonlinear/discon-tinuous one, is the amount of sample sets/pixels necessary to match 𝑓 ( 𝐵𝐵𝐵 ) well, given an initial white noise samples’ distribution. So inpractice there are 4 factors directly affecting the magnitude of thesecond term: the number of considered blue noise images, the size ofthe tiles, the correlation between samples and integrand (accountedfor by 𝑓 ), the bias/consistency of the estimators.We note that the number of considered pixels depends on thetile size in HBP, and the practical significance of this has beendemonstrated through a canonical experiment in the main paper. Before we proceed we need to further boundthe first error term by substituting

𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) by 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) . As dis-cussed in the main paper, this is achieved by introducing a differenceterm ΔΔΔ ( 𝜋 ) = 𝑄𝑄𝑄 ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝜋 ( 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 )) , and then ⌋︂ 𝐸 𝐻𝐵𝑃 is recovered. The error there can be made arbitrarily small through 𝑓 (it is ac-counted for in the second term). Thus we only need to study theremaining error due to ΔΔΔ . In the case of HBP,

ΔΔΔ is approximated bynon-overlapping characteristic functions in each tile ( 𝑑 ( 𝑥,𝑦 ) = ∞ ,for 𝑥,𝑦 in different tiles). This means that the approximation erroris zero within each tile if the integrands are the same within thetile and permutations act only within the tile, since ΔΔΔ ( 𝜋 ) = ΔΔΔ term.

HBP partitions screen space into a several tiles ℛ , . . . , ℛ 𝐾 , and permutations are only over the pixel values in a tile.Having the partition induced by the tiling we can bound the firstterm: ∏︁ 𝜖𝜖𝜖 ( 𝜋 ( 𝑆𝑆𝑆 )) − 𝑓 ( 𝐵𝐵𝐵 )∏︁ ≤ 𝐾 ∑ 𝑘 = ∏︁ 𝜖𝜖𝜖 𝑘 ( 𝜋 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝑓 ( 𝐵𝐵𝐵 )∏︁ . (59)Since additionally the permutations are optimized for the pixel val-ues instead of the sample sets (which saves re-rendering operations),then there is an assumption that within each tile ℛ 𝑘 the followingholds (we denote 𝐴𝐴𝐴 𝑘 = 𝐴𝐴𝐴 ⋃︀ ℛ 𝑘 ) ∶ 𝑄𝑄𝑄 𝑘 ( 𝜋 𝑘 ( 𝑆𝑆𝑆 𝑘 )) = 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) . (60)Consequently it follows that 𝐼 𝑖 = 𝐼 𝑗 , ∀ 𝑖, 𝑗 ∈ ℛ 𝑘 .This assumption can be identified with the 4-th optimality con-dition proposed in [Heitz and Belcour 2019]: screen-space coher-ence . As discussed, the search space restriction to the tiles cor-responds to an approximation of the ΔΔΔ term in our frameworkby characteristic functions: 𝑑 𝑘 ( 𝑥,𝑦 ) = ∞ , 𝑥 ∈ ℛ 𝑘 ,𝑦 ⇑∈ ℛ 𝑘 and 𝑑 𝑘 ( 𝑥,𝑦 ) = , 𝑥,𝑦 ∈ ℛ 𝑘 . To account for the actual error when theassumption is violated we introduce an additional error term pertile: ΔΔΔ 𝑘 = 𝑄𝑄𝑄 𝑘 ( 𝜋 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) , then we have the bound: ∏︁ 𝜖𝜖𝜖 𝑘 ( 𝜋 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝑓 ( 𝐵𝐵𝐵 𝑘 )∏︁ =∏︁ 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝐼𝐼𝐼 𝑘 − 𝑓 ( 𝐵𝐵𝐵 𝑘 ) + ΔΔΔ 𝑘 ∏︁ ≤∏︁ 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝐼𝐼𝐼 𝑘 − 𝑓 ( 𝐵𝐵𝐵 𝑘 )∏︁ + ∏︁ ΔΔΔ 𝑘 ∏︁ . (61)This means that even if all of the previous error terms are madesmall, including ∏︁ 𝜋 𝑘 ( 𝜖𝜖𝜖 𝑘 ( 𝑆𝑆𝑆 𝑘 ))− 𝑓 ( 𝐵𝐵𝐵 𝑘 )∏︁ , the error may still be largedue to ∏︁ ΔΔΔ ∏︁ . We refer to a large error due to the delta term as mispre-diction - that is, a mismatch between the predicted error distributionfrom the minimization of ∏︁ 𝜋 𝑘 ( 𝜖𝜖𝜖 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝑓 ( 𝐵𝐵𝐵 𝑘 )∏︁ and the actualerror distribution resulting from the above permutation applied to 𝜖𝜖𝜖 𝑘 ( 𝜋 𝑘 ( 𝑆𝑆𝑆 𝑘 )) . The best way to identify mispredictions is to comparethe predicted image 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) and the image rendered with thesame permutation for the sample sets 𝑄𝑄𝑄 𝑘 ( 𝜋 𝑘 ( 𝑆𝑆𝑆 𝑘 )) . A mispredictionoccurring means that the assumption made to approximate ΔΔΔ wasincorrect (

ΔΔΔ 𝑘 ≠

000 for some tile ℛ 𝑘 ), equivalently the optimalitycondition of screen-space coherence is not satisfied. Avoiding mispredictions.

In practice mispredictions often occurfor larger tile sizes, since it is hard to guarantee that the integrandremains similar over each tile. On the other hand, larger tiles allowfor a better blue noise as long as

ΔΔΔ 𝑘 = , Vol. 0, No. 0, Article 0. Publication date: December 2020. :8 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh Power Power Heitz and PowerOurs ( 𝑅 = ) spectrum Ours ( 𝑅 = ) spectrum Belcour [2019] spectrum t i l e s i z e t i l e s i z e t i l e s i z e Fig. 2. Here we showcase the effect of tile size on the quality of blue noise.We also demonstrate the effect of a larger search neighbourhood 𝑅 in ouroptimization (Alg. 2). For our case, we consider disk neighbourhoods so thatthey are contained within Heitz and Belcour’s tiles in terms of size, butthey can also overlap due to our formulation. From left-to-right, the inputwhite noise texture is optimized using our relocation algorithm. The last twocolumns are from Heitz and Belcour’s [2019] method. The correspondingpower spectra of these optimized images ( × ) are also shown. respecting edges. More involved methods may take into accountnormals, depth, textures, etc. 𝐸 𝐻𝐵𝑃 error term.

The final step involves the minimizationof the energy in Eq. (61). Since different tiles do not affect eachother the minimization can be performed per tile (we adopt theassumption from HBP

ΔΔΔ 𝑘 = 𝜋 ∗ 𝑘 ∈ arg min 𝜋 𝑘 ∏︁ 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝐼𝐼𝐼 𝑘 − 𝑓 ( 𝐵𝐵𝐵 𝑘 )∏︁ = arg min 𝜋 𝑘 ∏︁ 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − 𝑓 ( 𝐵𝐵𝐵 𝑘 )∏︁ . (62)We have dropped the term 𝐼𝐼𝐼 𝑘 since it doesn’t affect the set of mini-mizers ( 𝐼𝐼𝐼 𝑘 is assumed constant in each tile). As discussed in Eq. (56),a global minimum is given by matching the order statistics of 𝑄𝑄𝑄 𝑘 to the order statistics of 𝑓 ( 𝐵𝐵𝐵 )) (we note that the order statisticsof 𝐵𝐵𝐵 𝑘 do not change from the application of 𝑓 since it is a strictlyincreasing function). This is equivalent to performing the sortingpass described in [Heitz and Belcour 2019]. A minor optimizationwould be to pre-sort 𝐵𝐵𝐵 and instead store the sorted indices.

Tiling effect.

In Fig. 2 we compare the effect of the tile size. In ourapproach, the “tiles” can be defined per pixel, can have arbitraryshapes, and are overlapping, the last being crucial for achieving agood blue noise distribution. We consider white-noise with mean0.5 (which is an ideal scenario for Heitz and Belcour’s method) andcompare various tile sizes. For a fair comparison, our tile radius 𝑟 corresponds similar tile-size in the permutation [2019] approach.The power-spectrum profiles confirm the better performance ofour method. Retargeting [2019] cannot improve the quality of thepermutation approach either, since no misprediction can occur ( ΔΔΔ = Custom surrogate.

The

𝐼𝐼𝐼 𝑘 term doesn’t need to be assumed con-stant in fact. If it is assumed constant, that is equivalent to pickinga tile-constant surrogate, however, a custom surrogate may be pro-vided instead. Then one would simply minimize the energy: ∏︁ 𝜋 𝑘 ( 𝑄𝑄𝑄 𝑘 ( 𝑆𝑆𝑆 𝑘 )) − ( 𝐵𝐵𝐵 𝑘 + 𝐼𝐼𝐼 𝑘 )∏︁ . (63)The energy has a different minimizer than the original HBP en-ergy, but the global minimum can be found efficiently throughsorting once again. The retargeting pass in HBP achieves two things. It introduces newpossible target solutions through new blue noise images, and itcorrects for mispredictions. The first is not so much a result of theretargeting, as it is of varying the blue noise image every frame.Ideally several blue noise images would be considered in a singleframe, and the best image would be chosen per tile (in that case onemust make sure that there are no discontinuities between the bluenoise images’ tiles) in order to minimize the second term in Eq. (58).Instead, in HBP this is amortized over several frames.The more important effect of retargeting is correcting for mispre-dictions, by transferring the recomputed correspondence betweensample set and pixel value (achieved through rerendering) to thenext frame. This allows reducing the error due to the approximationof

ΔΔΔ (when the piecewise-tile constancy assumption on the inte-grand is violated). Note however, that this is inappropriate if thereis a large temporal discontinuity between the two frames.

Implementation details.

Retargeting requires a permutation thattransforms the blue noise image in the current frame into the bluenoise image of the next frame [Heitz and Belcour 2019]. This per-mutation is applied on the optimized seeds to transfer the learnedcorrespondence between sample sets and pixel values to the nextframe. Implicitly, this transforming permutation also relies on ascreen space integrand similarity assumption, since there is noguarantee that the corresponding values from the swap will match,possibly incurring a misprediction once again (it can be modeledby an additional

ΔΔΔ term). In HBP [Heitz and Belcour 2019] the max-imum radius of travel of each pixel in the permutation is set to 6pixels. This has a direct effect on the approximation of

ΔΔΔ , as thetravel distance of a pixel is allowed to extend beyond the originaltile bounds. In the worst case scenario a pixel may allowed to travela distance of ⌉︂ 𝑡 𝑥 + 𝑡 𝑦 + 𝑡 𝑥 , 𝑡 𝑦 are the dimensionsof the tiles. An additional error is introduced since the retargetingpass does not produce the exact blue noise image used in the nextframe, but some image that is close to it [Heitz and Belcour 2019].This seems to be done purely from memory considerations since itallows one blue noise image to be reused by translating it toroidallyeach frame to produce the blue noise image for the next frame. Relationship to our horizontal approach.

Our horizontal approachdoes not require a retargeting pass. It can directly continue withthe optimized sample sets and pixel values from last frame. Thereis also no additional travel distance for a matching permutation as , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:9 in retargeting, which further minimizes the probability of mispre-diction. Thus, it inherently and automatically produces all of theadvantages of retargeting while retaining none of its disadvantages.

The histogram sampling approach from Heitz and Belcour can beinterpreted as both a dithering and a sampling method. We studythe dithering aspect to better understand the quality of blue noiseachievable by the method.

Algorithm analysis.

The sampling of an estimate in each pixel byusing the corresponding mask value to the pixel can be interpretedas performing a mapping of the mask’s range and then quantizing tothe closest estimate. In HBH each estimate is equally likely to be sam-pled (if a random mask is used), which implies a transformation thatmaps equal parts of the range to each estimate. Let 𝑄 𝑘, , . . . , 𝑄 𝑘,𝑁 bethe greyscale estimates in pixel 𝑘 sorted in ascending order. Let therange of the blue noise mask be in [0,1]. Then the range is split into 𝑁 equal subintervals: (︀ , 𝑁 ) , . . . , (︀ 𝑁 − 𝑁 , ⌋︀ which respectively mapto (︀ 𝑄 , 𝑄 + 𝑄 ) , . . . , (︀ 𝑄 𝑖 − + 𝑄 𝑖 , 𝑄 𝑖 + 𝑄 𝑖 + ) , . . . , (︀ 𝑄 𝑁 − + 𝑄 𝑁 , 𝑄 𝑁 ⌋︀ . If thequantization rounds to the closest estimate, then the above mappingguarantees the desired behaviour. We note that since the estimatesin each pixel can have different values, the mapping for each pixelmay be different. We will denote the above mapping through 𝑓𝑓𝑓 .Then the mapping plus quantization problem in a pixel 𝑘 may beformulated as: min 𝑖 ∈{ ,...,𝑁 } ⋃︀ 𝑄 𝑘,𝑖 − 𝑓 𝑘 ( 𝐵 𝑘 )⋃︀ . (64)Note that the minimization in each pixel is independent, andit aims to minimize the distance between the estimates and theremapped value from the blue noise mask. If the set of estimates areassumed to be the same across pixels, and are also assumed to bespaced regularly, then 𝑓 is only a linear remapping, which effectivelytransfers the spectral properties of 𝐵𝐵𝐵 onto the optimized image. No-tably, the former is the screen-space coherence assumption from HBP,while the latter is the correlation-preserving integrand assumption.Thus we have seen that for optimal results the HBH method relieson exactly the same assumption as the HBP method (while our vertical iterative minimization approach lifts both assumptions).

Disadvantages.

One of the key points is that the error distributionand not the signal itself ought to ideally be shaped as

𝐵𝐵𝐵 . This isactually the case even in the above energy. From the way 𝑓𝑓𝑓 waschosen it follows that the surrogate is equivalent to 𝑓𝑓𝑓 ( . ) which canbe identified as the image made of the median of the sorted estimateswithin each pixel. This is the case since if the target surrogate of 𝐵𝐵𝐵 (during the offline optimization) was assumed to be 0 .

5, then afterthe mapping it is 𝑓𝑓𝑓 ( . ) . Generally, this is a very bad surrogatein the context of rendering, and it generally increases the errorcompared to the averaged image, making the method impractical.Another notable disadvantage is that all estimates are consideredwith an equal weight. This means that outliers are as likely to bepicked as estimates closer to the surrogate. This results in firefliesappearing even when those were not present in the averaged imaged.Compared to classical halftoning, where only the closest lower and upper quantization levels are considered, HBH does not minimizethe magnitude of the error to the surrogate.Finally, the two assumptions of: screen-space coherence and s correlation - preserving integrand , generally do not hold in practice.Estimates cannot be assumed to match between pixels (especiallyif samples are taken at random), and they cannot be assumed tobe uniformly distributed, which implies that 𝑓𝑓𝑓 is not linear. Thisgreatly impacts the quality of the result, especially if it is comparedto adaptive approaches such as our vertical error diffusion approachand our iterative minimization techniques (see the experiments inthe main paper). Generalization.

The method can be generalized to take a customsurrogate instead of the one constructed by the median of the esti-mates within each pixel. This is achieved by splitting the per pixelset of estimates into two parts: (greyscale) estimates greater than thevalue of the (greyscale) surrogate in the current pixel, and estimateslower than it. Then the mapping 𝑓 𝑘 for the current pixel 𝑘 mapsvalues in (︀ , . ) to the lower set, and values in (︀ . , ⌋︀ to the higherset, such that 𝑓 𝑘 ( . ) = 𝐼 𝑘 . The original method is recovered if thesurrogate is chosen to be the implicit one for the original histogramsampling method and if the appropriate corresponding mapping 𝑓𝑓𝑓 is kept.The approach can be extended further by setting different proba-bilities for the different estimates. The original histogram samplingmethod correspond to setting the same probability for samplingevery estimate, equivalently: equal sized sub-intervals from (︀ , ⌋︀ map to each estimate. Classical dither matrix halftoning can beinterpreted as setting an equal probability for the closest to the sur-rogate upper and lower estimates, while every other estimate gets azero probability. Equivalently: equal sub-intervals from (︀ , ⌋︀ mapto the two aformentioned estimates while no part of the intervalmaps to the remaining estimates. Generally a custom probabilitycan be assigned to each estimate: 𝑝 , ..., 𝑝 𝑁 , by having the intervals (︀ , 𝑝 ) , ..., (︀∑ 𝑁 − 𝑘 = 𝑝 𝑘 , ⌋︀ map to 𝑄 , ..., 𝑄 𝑁 (after quantization). Wenote that an unbiased image can be recovered only if there is a mapto every estimate. We discuss current state of the art a-priori approaches [Georgiev andFajardo 2016; Heitz et al. 2019] and their relation to our framework,as well as insights regarding those.

In Heitz et al.’s work, a scrambling energy and a ranking energyhave been proposed (note that those energies are maximimized andnot minimized): , Vol. 0, No. 0, Article 0. Publication date: December 2020. :10 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh 𝐸 𝑠 = ∑ 𝑖,𝑗 exp (− ∏︁ 𝑖 − 𝑗 ∏︁ 𝜎 ) ∏︁ 𝐸 𝑖 − 𝐸 𝑗 ∏︁ (65) 𝐸 𝑟 = ∑ 𝑖,𝑗 exp (− ∏︁ 𝑖 − 𝑗 ∏︁ 𝜎 ) (∏︁ 𝐸 𝑖 − 𝐸 𝑗 ∏︁ + ∏︁ 𝐸 𝑖 − 𝐸 𝑗 ∏︁ ) (66) 𝐸 𝑖 = ( 𝑒 ,𝑖 , . . . , 𝑒 𝑇,𝑖 ) (67) 𝑒 𝑡,𝑖 ( 𝑆 𝑖 ) = ⋃︀ 𝑆 𝑖 ⋃︀ ⋃︀ 𝑆 𝑖 ⋃︀ ∑ 𝑘 = 𝑓 𝑡 ( 𝑝 𝑖,𝑘 ) − ∫ (︀ , ⌋︀ 𝐷 𝑓 𝑡 ( 𝑥 ) 𝑑𝑥 (68) 𝑆 𝑖 = { 𝑝 𝑖, , . . . , 𝑝 𝑖,𝑀 𝑖 } . (69)The upper indices in 𝐸 𝑖 , 𝐸 𝑖 indicate that the two energies areevaluated with different subsets of the sample set 𝑆 𝑖 in the pixel 𝑖 . The 𝑓 𝑡 are taken from an arbitrary set of functions (in the orig-inal paper those are random Heaviside functions). The describedform of the energies has been partially motivated by the energy in[Georgiev and Fajardo 2016]. This doesn’t allow for a straightfor-ward interpretation or a direct relation to the (implicit) energy usedfor a-posteriori approaches in [Heitz and Belcour 2019]. Scrambling energy.

We modify 𝐸 𝑠 in order to relate it to the energyin our framework and to provide a meaningful interpretation: 𝐸 ′ 𝑠 = 𝑇 ∑ 𝑡 = 𝑤 𝑡 ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 𝑡 ( 𝑆𝑆𝑆 ) −

𝐼𝐼𝐼 𝑡 ∏︁ , (70) 𝑄 𝑡,𝑖 ( 𝑆𝑆𝑆 ) = ⋃︀ 𝑆 𝑖 ⋃︀ ⋃︀ 𝑆 𝑖 ⋃︀ ∑ 𝑘 = 𝑓 𝑡 ( 𝑝 𝑖,𝑘 ) , 𝐼 𝑡,𝑖 = ∫ (︀ , ⌋︀ 𝐷 𝑓 𝑡 ( 𝑥 ) 𝑑𝑥. (71)We have relaxed the Gaussian kernel to an arbitrary kernel 𝑔𝑔𝑔 and absorbed it into the norm. More importantly we have removedthe heuristic dependence of error terms on their neighbours, andinstead the coupling happens through the kernel itself. Finally, wehave introduced weights 𝑤 , . . . , 𝑤 𝑇 that allow assigning differentimportance to different integrands. Thus, this is a weighted aver-age of our original energy applied to several different integrands,matching our a-priori approach (Eq. (1)). Through this formulationa direct relationship to the a-posteriori methods can be established,and it can be motivated in the context of both the human visualsystem and denoising. Particularly, the scrambling energy 𝐸 ′ 𝑠 is overthe space of scrambling keys, which allow permuting the assign-ment of sample sets. This is in fact the horizontal setting from ourformulation in the main paper. The space can be extended furtherif the scrambling keys in each dimension are different (as in HBS).The same can be done in a-posteriori methods, if the optimization isperformed in each dimension as discussed in Section 4. Ranking energy.

The ranking keys in HBS describe the order inwhich samples are consumed. This is useful for constructing pro-gressive a-priori methods. Notably, the order in which samples willbe introduced can be optimized. Having a sequence of sample sets ineach pixel: 𝑆 𝑖, ⊂ . . . ⊂ 𝑆 𝑖,𝑀 ≡ 𝑆 𝑖 and respectively the images formedby those: 𝑆𝑆𝑆 , . . . ,𝑆𝑆𝑆 𝑀 , the progressive energy may be constructed as: 𝐸 ′ 𝑟 = 𝑀 ∑ 𝑘 = 𝑤 𝑘 ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 ( 𝑆𝑆𝑆 𝑘 ) − 𝐼𝐼𝐼 ∏︁ . (72)The quality at a specific sample count corresponding to 𝑆𝑆𝑆 𝑘 iscontrolled through the weight 𝑤 𝑘 . The original energy maximiz-ing the quality of the full set is retrieved for ( 𝑤 , . . . , 𝑤 𝑀 − , 𝑤 𝑀 ) =( , . . . , , ) . Since the sample sets 𝑆 𝑖 , . . . , 𝑆 𝑖,𝑀 are optimized by choos-ing samples from 𝑆 𝑖 this can be seen as a vertical method. Finally,the ranking keys can also be defined per dimension, which can berelated to a-posteriori methods through the suggested dimensionaldecomposition in Section 4. In Georgiev and Fajardo’s work, in order to get an optimized (multi-channel) blue noise mask, the following energy has been proposed: 𝐸 ( 𝑝 , . . . , 𝑝 𝑁 ) = ∑ 𝑖 ≠ 𝑗 exp (− ∏︁ 𝑖 − 𝑗 ∏︁ 𝜎 ) exp ⎛⎝− ∏︁ 𝑝 𝑖 − 𝑝 𝑗 ∏︁ 𝑑 ⇑ 𝜎 𝑠 ⎞⎠ , (73)which bears some similarity to the weights of a bilateral filter.In the above 𝑖, 𝑗 are pixel coordinates, and 𝑝 𝑖 , 𝑝 𝑗 are 𝑑 -dimensionalvectors associated with 𝑖, 𝑗 , let the image formed by those vectors be 𝑆𝑆𝑆 . The energy aims to make samples 𝑝 𝑖 , 𝑝 𝑗 distant ( ∏︁ 𝑝 𝑖 − 𝑝 𝑗 ∏︁ mustbe large) if they are associated with pixels which are close ( ∏︁ 𝑖 − 𝑗 ∏︁ is small). Relation to our framework.

Even though the energy is heuristicallymotivated, we can very roughly relate it to our framework. Theabove energy implicitly assumes classes of integrands

𝑄𝑄𝑄 , ..., 𝑄𝑄𝑄 𝑇 ,such that close samples 𝑝 𝑖 , 𝑝 𝑗 are mapped to close values 𝑄 𝑖,𝑡 ( 𝑝 𝑖 ) , 𝑄 𝑗,𝑡 ( 𝑝 𝑗 ) , and distant samples are mapped to distant values. Notably,the form of the energy doesn’t change over screen-space, so thesame can be implied about the integrands. One such class is theclass of bi-Lipschitz functions. The bound can be used to relate amodified version of the original energy, to an energy of the form: 𝐸 𝑄𝑄𝑄 𝑡 = ∑ 𝑖 ≠ 𝑗 exp (− ∏︁ 𝑖 − 𝑗 ∏︁ 𝜎 ) exp ⎛⎝− 𝐶 ∏︁ 𝑄 𝑖,𝑡 ( 𝑝 𝑖 ) − 𝑄 𝑗,𝑡 ( 𝑝 𝑗 )∏︁ 𝑑 ⇑ 𝜎 𝑠 ⎞⎠ . (74)Thus, the original energy can indeed be interpreted as reasonablefor a whole class of sufficiently smooth integrands, instead of anenergy that works very well with one specific integrand.A similar thing can be achieved in our framework, if the weightedenergy is considered: 𝐸 ′ ( 𝑆𝑆𝑆 ) = 𝑇 ∑ 𝑡 = 𝑤 𝑡 ∏︁ 𝑔𝑔𝑔 ∗ 𝑄𝑄𝑄 𝑡 ( 𝑆𝑆𝑆 ) −

𝐼𝐼𝐼 𝑡 ∏︁ . (75)The kernel 𝑔𝑔𝑔 can be a Gaussian with standard deviation 𝜎 , as inthe original energy, or it can be relaxed to an arbitrary desired kernel. 𝑄𝑄𝑄 , . . . ,𝑄𝑄𝑄 𝑇 are representative integrands that satisfy the discussedsmoothness requirements, and 𝑤 𝑡 are associated weights assigningdifferent importance to the integrands. Finally, the reference imagesare given by the integrals 𝐼𝐼𝐼 𝑡 = ∫ (︀ , ⌋︀ 𝑑 𝑄𝑄𝑄 𝑡 ( 𝑥 ) 𝑑𝑥 . , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:11 Random Georgiev and Fajardo [2016] Heitz et al. [2019] Ours SobolMSE: . . . . . pMSE: . . . . . Fig. 3. A comparison illustrating that even a sampling sequence formed by a stack of blue noise images (Ours) yields a good distribution (note the tiled errorspectra). The integration error is higher however, degrading the quality. This is the case because the assumed integrand is far from linear in each dimension(see

Extension in Section 8.3). The images use 4 samples per pixel, and the degradation of the spectral properties with the number of samples is clear for[Georgiev and Fajardo 2016] and even [Heitz et al. 2019], while it is not so much the case for Ours. This demonstrates that different methods offer a differenttrade-off between integration error and distribution for arbitrary integrands. Constraining the search space to using toroidal shifts or scrambling and rankingkeys restricts the achievable blue noise distribution.

It should be clear that this is a weighted average constructed fromthe standard energy in our framework applied to a set of integrands.There are a number of benefits of such an explicit formulation. Mostimportantly, it allows for a-priori methods to be studied in the sameframework as a-posteriori approaches. Additionally, explicit controlis provided over the set of integrands and the kernel in a mannerthat allows for a straightforward interpretation.

The second contribution of Georgiev and Fajardo’s work is a sam-pler which relies on an image optimized with Eq. (73) and usesit to achieve a blue noise distribution of the rendering error. Wesummarize the algorithm and discuss some details related to it.

Algorithm.

Let

𝐵𝐵𝐵 be an image (with 𝑑 -channels) optimized by min-mizing Eq. (73) over a suitable search space. Let 𝒫 = { 𝑝 , . . . , 𝑝 𝑁 } bea sequence of 𝑑 -dimensional points. Within each pixel 𝑖 the sampleset 𝑆 𝑖 is constructed, such that: 𝑝 𝑗 ∈ 𝒫 (cid:212)⇒ 𝑝 𝑖,𝑗 ∈ 𝑆 𝑖 ∶ 𝑝 𝑖,𝑗 = ( 𝑝 𝑗 + 𝐵 ∗ 𝑖 ) mod 1 . (76)The sequence 𝒫 can be constructed by using various samplers(e. g., random, low-discrepancy, blue-noise, etc.). The constructionof the new points for pixel 𝑖 can be interpreted either as toroidallyshifting the sequence 𝒫 by 𝐵 𝑖 or equivalently as toroidally shiftingthe sequence { 𝐵 𝑖 , . . . , 𝐵 𝑖 } by 𝒫 .The sequences constructed within each pixel are used to estimatethe integral in the usual manner. Since a finite number of dimen-sions 𝑑 are optimized the suggestion is to distribute the constructedsequences over smoother dimensions, while other dimensions mayuse a standard sampler. Effect of the toroidal shift.

Let us consider a linear one-dimensionalintegrand 𝑓 ( 𝑞 ) = 𝛼𝑞 + 𝛽 that does not vary in screen space, and asequence 𝒫 with a single point 𝑝 . Furthermore, if we assume 𝑝 = 𝑄𝑄𝑄 ( 𝐵𝐵𝐵 ) −

𝐼𝐼𝐼 = 𝛼𝐵𝐵𝐵 + 𝛽𝛽𝛽 − 𝐼𝐼𝐼 . (77) Since

𝑄𝑄𝑄 does not vary in screen space, then

𝐼𝐼𝐼 also doesn’t. Thenthe power spectrum of the error (excluding the DC) matches thepower spectrum of

𝐵𝐵𝐵 up to the multiplicative factor 𝛼 . Then, underthe assumption that the integrand is linear, does not vary in screenspace, and there is no toroidal shift, the power spectral propertiesof 𝐵𝐵𝐵 are transferred ideally to the error.On the other hand, if 𝑝 is chosen to be non-zero, then the spectralcharacteristics of the image (( 𝐵𝐵𝐵 + 𝑝 ) mod 1 ) will be transferredinstead. We have empirically verified that even with a very goodquality blue noise image 𝐵𝐵𝐵 the toroidal shift degrades its qualitydue to the introduced discontinuities. Thus, even in the ideal caseof a constant in screen space linear 1-D integrand, toroidal shiftsdegrade the quality.

Effect of using multiple samples.

Let us consider the same inte-grand 𝑓 ( 𝑞 ) = 𝛼𝑞 + 𝛽 , which we have identified as being ideal fortransferring the spectral characteristics of 𝐵𝐵𝐵 to the error. And letus assume that we are given several samples:

𝒫 = { 𝑝 , ..., 𝑝 𝑁 } , andwe have constructed the sample set image 𝑆𝑆𝑆 through toroidal shiftswith

𝐵𝐵𝐵 . Then the error is: 𝑄 𝑖 ( 𝑆 𝑖 ) − 𝐼 𝑖 = 𝛼𝑁 𝑁 ∑ 𝑘 = 𝑝 𝑘,𝑖 + 𝛽 − 𝐼 𝑖 . (78)The power spectrum of the error thus matches the power spec-trum of the image 𝐴 𝑖 = ∑ 𝑁𝑘 = 𝑝 𝑘,𝑖 (excluding the DC) up to a mul-tiplicative factor. For a random point sequence 𝒫 the more pointsare considered, the closer to white noise 𝐴𝐴𝐴 becomes. This is fur-ther exacerbated by the discussed discontinuities introduced by thetoroidal shifts.

Extension.

We have argued that both toroidal shifts and increasingthe number of samples has a negative effect on transferring thespectral properties of

𝐵𝐵𝐵 even in an ideal scenario. Naturally thequestion arises whether this can be improved. Our proposal is thedirect optimisation of point sets without the application of a toroidalshift. , Vol. 0, No. 0, Article 0. Publication date: December 2020. :12 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

For the discussed example this entails constructing a sequence of 𝑁 images 𝐵𝐵𝐵 , . . . ,𝐵𝐵𝐵 𝑁 such that 𝐴𝐴𝐴 𝑘 = ∑ 𝑘𝑗 = 𝐵𝐵𝐵 𝑗 is a blue noise image.Then the error has the (blue noise) spectral characteristics of 𝐴𝐴𝐴 𝑘 ateach sample count: 𝑄 𝑖 ( 𝐵 ,𝑖 , . . . , 𝐵 𝑘,𝑖 ) − 𝐼 𝑖 = 𝛼𝑘 𝑘 ∑ 𝑗 = 𝐵 𝑗,𝑖 + 𝛽 − 𝐼 𝑖 . (79) Mostafa Analoui and Jan P. Allebach. 1992. Model-based halftoning using directbinary search. In

Human Vision, Visual Processing, and Digital Display III , Bernice E.Rogowitz (Ed.), Vol. 1666. International Society for Optics and Photonics, SPIE, 96 –108. https://doi.org/10.1117/12.135959Sagar Bhatt, John Sabino, John Harlim, Joel Lepak, Robert Ronkese, and Chai WahWu. 2006. Comparative study of search strategies for the direct binary searchimage halftoning algorithm. In

NIP22 (International Conference on Digital Printing Technologies) . 244–247. International Conference on Digital Printing Technologies ;Conference date: 17-09-2006 Through 22-09-2006.Iliyan Georgiev and Marcos Fajardo. 2016. Blue-noise Dithered Sampling. In

ACMSIGGRAPH 2016 Talks (Anaheim, California) (SIGGRAPH ’16) . ACM, New York, NY,USA, Article 35, 1 pages. https://doi.org/10.1145/2897839.2927430Eric Heitz and Laurent Belcour. 2019. Distributing Monte Carlo Errors as a Blue Noisein Screen Space by Permuting Pixel Seeds Between Frames.

SIGGRAPH’19 Talks . ACM, Los Angeles, United States.https://hal.archives-ouvertes.fr/hal-02150657James T. Kajiya. 1986. The Rendering Equation. In

SIGGRAPH ’86 , Vol. 20. 143–150.Hiroaki Koge, Yasuaki Ito, and Koji Nakano. 2014. A GPU Implementation of Clipping-Free Halftoning Using the Direct Binary Search. In

Proceedings of International Conference on Image Processing , Vol. 1.775–778 vol.1. https://doi.org/10.1109/ICIP.1997.648077Robert A. Ulichney. 1993. Void-and-cluster method for dither array generation, Vol. 1913.https://doi.org/10.1117/12.152707, Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:13

Histogram (1/16) Ours (1/4) Histogram (1/64)

Fig. 4. We compare Heitz and Belcour’s histogram sampling method to our vertical iterative minimization on additional scenes. We provide 4 times and 16times more samples for the histogram method, where each method picks one out of all of its allocated samples in each pixel. Despite the fact that our methodmay use 16 times less samples it still outperforms the histogram sampling approach. This is mainly due to the implicit surrogate and the suboptimal ditheringinherent to the histogram sampling approach. (a) White noise (b) Averaged (c) Dithering (Ours) (d) Error-diffusion (Ours) (e) Optimization (Ours) (f) Histogram method [2019]MSE: . × − . × − . × − . × − . × − . × − . × − . × − pMSE: . × − . × − . × − . × − . × − . × − . × − . × − Fig. 5. In the main paper we compare all vertical methods on

The Wooden Staircase scene. All of our methods achieve a better pMSE than the baseline (theaveraged image), while the histogram sampling method increases the error both in terms of MSE and pMSE. The tiled error power spectra images confirm thepMSE ranking and provide a visualization of the local pMSE distribution. We also show S-CIELAB error visualizations which suggest that pointwise error isheavily weighted in S-CIELAB, which doesn’t make it a very good predictor for the perceptual quality related to the noise distribution, unlike HDR-VDP-2. , Vol. 0, No. 0, Article 0. Publication date: December 2020. :14 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Dithering (Ours) Error Diffusion (Ours)Optimization (Ours)Heitz and Belcour [2019]Heitz and Belcour [2019]

Fig. 6. Additional scenes comparing our vertical methods against Heitz and Belcour’s permutation approach. The images are rendered at 4 samples per pixel.The permutation approach uses 16 (static) frames with retargeting to improve mispredictions, however this still does not eliminate all of the mispredictions.This is especially evident in the

Bathroom scene.

Dithering (Ours) Error Diffusion (Ours)Optimization (Ours)Histogram method [2019]

Fig. 7. All vertical methods from the main paper are compared at 4 samples per pixel. The ranking based on our error metrics is (best last): histogram sampling,dithering, error diffusion, iterative minimization. This also matches our perceptual evaluation. , Vol. 0, No. 0, Article 0. Publication date: December 2020. erceptual Error Optimization for Monte Carlo Rendering: Supplemental document • 0:15

Frame 1 [Heitz and Belcour 2019][Heitz and Belcour 2019]

Horizontal (Ours)

Horizontal (Ours) Frame 4 Frame 16Frame 1 Frame 4 Frame 16

Fig. 8. Additional comparisons for horizontal approaches at 4 samples per pixel. The left side of the images are using Heitz and Belcour’s permutation approach,while the right side is using our horizontal iterative minimization approach. The achieved blue noise distribution for our approach is consistently better. , Vol. 0, No. 0, Article 0. Publication date: December 2020. :16 • Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh