Scaling Probe-Based Real-Time Dynamic Global Illumination for Production
SScaling Probe-Based Real-Time DynamicGlobal Illumination for Production
Zander Majercik Adam MarrsNVIDIA Josef Spjut Morgan McGuire
Figure 1 . Image rendered in a pre-release version of Unity with our global illumination tech-nique. Most of the indirect lighting in this scene comes from emissives (the orange monitorscreens) which are integrated automatically by our technique.
Abstract
We contribute several practical extensions to the probe based irradiance-field-with-visibilityrepresentation [Majercik et al. 2019] [McGuire et al. 2017] to improve image quality, con-stant and asymptotic performance, memory efficiency, and artist control. We developed theseextensions in the process of incorporating the previous work into the global illumination so-lutions of the NVIDIA RTXGI SDK [NVIDIA 2020], the Unity and Unreal Engine 4 gameengines, and proprietary engines for several commercial games. These extensions include: asingle, intuitive tuning parameter (the “self-shadow” bias); heuristics to speed transitions in a r X i v : . [ c s . G R ] O c t he global illumination; reuse of irradiance data as prefiltered radiance for recursive glossyreflection; a probe state machine to prune work that will not affect the final image; and mul-tiresolution cascaded volumes for large worlds.
1. Introduction
This paper discusses an algorithm to accelerate the evaluation of global illumination.The acceleration happens in two parts. The main part creates and maintains a datastructure that allows a query of the form irradiance(location, orientation) ( E ( X, ω ) ),which replaces a potentially expensive computation of diffuse global illuminationwith a O(1) lookup into a data structure for locations anywhere in space. The sec-ond part re-uses that data structure to sample weighted average of incident radiancefor glossy global illumination ( (cid:82) Γ L ( X, ω ) · W ( X, ω ) dω ) and combines the resultwith filtered screen-space and geometric glossy ray tracing.This paper describes a refinement of a previous version of the diffuse portionof this method [Majercik et al. 2019]. This refinement is the union of what welearned when incorporating that algorithm into several products, including the Unitygame engine, the Unreal Engine 4 game engine, the NVIDIA RTXGI SDK version1.1 [NVIDIA 2020], and several unannounced commercial games. These learningsinclude changes to the underlying algorithm to improve quality and performance, ad-vice on tuning the algorithm and content, expansion of the algorithm to a completesolution that also accelerates glossy reflections, and system integration best practicesfor these methods. This was driven by constraints from various platforms, requestsfrom game developers and game artists, and new research on the problem. Becausethey were developed across several different productization efforts with different ven-dors, we believe that these learnings are fairly universal and robust, but they shouldnot be construed as describing the features or performance of any one in particular.A key element of our algorithm is a probe, which stores directional informationat a point. Environment maps are a type of probe—they store distant radiance asseen from any point in the scene. Our probes store irradiance, weighted averages ofdistance, and weighted averages of squared distance (See Table 1 for terms we use inrelation to probes) for a 3D grid-like structure of points in the scene.Our algorithm has several components related to the organization, computation,and querying of probes. The new information described in this paper is indicatedin Table 2. In the table, we indicate what is new relative to descriptions of previousversions of this algorithm. In addition, we give a complete description of the full algo-rithm below so that readers will not need to consult descriptions of previous versionsto understand the algorithm. 2 erm Definition Probe A probe stores data at a point with values for directions on the sphere.Probe Query Trilinear interpolation (bilinear filtering and direction) and a visibility and an-gle weighted interpolation between multiple probes. The net result is an irra-diance value that esimates the irradiance field at a point relative to a normal.Irradiance Incident power per unit area; the cosine-weighted integral of radiance relativeto the sample direction.Weighted sum of distance Weighted sum (a weighted average in our implementation) of the distance tothe nearest surface seen from a 3D point in a particular direction. In our casewe use a cosine raised to a power.Direct lighting Light that is emitted from a light source, reflects from one surface, and thenreaches the viewer.Indirect lighting Light that reflects off two or more surfaces before reaching the viewer (alllighting that is not direct)Global illlumination Light that includes both direct and indirect lighting.
Table 1 . Terms and definitions.
2. Overview of the algorithm
At the core of the algorithm are probes that store weighted sums of color, distance,and squared distance. A 2D version of a probe storing a weighted average of distanceto nearest object is shown in Figure 2. These probes are processed as follows.
Start by building a 3D grid. From that grid, optimize probe positions by moving themoutside of static geometry (Section 5). Then, classify all probes into “Off”,“Sleeping”,“Newly Awake”, “Newly Vigilant”, “Awake”, or “Vigilant” (Section 6). At the end ofthis stage, all probes are in their final positions and initial states.
Take a 3D point (within the probe volume) and normal direction. For every pointwithin the volume, there are 8 probes (corners of a 3D box) that surround it. Loopover those 8 probes. For each one, compute a weight based combination of:• trilinear weight from probe position• backface weight (is the probe behind the point relative to the normal?)• visibility (can the probe see the point?). This includes a self-shadow bias’ termfor robust occlusion queries (Sec. 4.1).3ample the value from each probe in the direction of the normal, and sum those usingthe computed weights. That is the sampled irradiance value.For multiple volumes, do this for each volume, and then weight between the vol-umes as described in Section 7.3. Volume blending with tracking windows is dis-cussed in Section 7.2.
For each probe that is “Awake” or “Vigilant” (Section 6), trace rays in a sphericalfibonacci pattern, rotating the pattern randomly every frame. Shade these ray hitsusing the normal deferred shading algorithm, including sampling the probe volumeto include the irradiance from the probes. A section of an example ray cast, with atexel to which the rays contribute highlighted, is shown in Figure 2. The update thenproceeds for both irradiance and mean distance values as follows.
Figure 2 . A 2D probe for illustration. This probe shows one “cell” (texel) as the bold segmentof the circle. The bold arrow is the direction associated with the cell. The cell stores theweighted average of the hit distances of each of the sample directions. Note that this weightedaverage includes directions “outside” the center cell. The weighting function is larger fordirections near the cell center, and the resulting weighted average is thus influenced more bythe longer directions in this particular example. The bold dotted line is the stored “distance” inthe cell. Note that a direction can contribute to more than one cell, and we loop over directionsupdating any cell that a direction contributes to.
Irradiance
Compute a cosine-weighted average of the radiance values of these shadedray hits relative to the direction of each probe texel. Then, for each probe texel, blendthese newly computed values into the probe texel at a rate of (1 − α ) —we refer to thisalpha term as “hysteresis”. We adjust this hysteresis per probe and per texel based on4ur convergence heuristics, described in Section 4.3. Mean Distance and Mean Distance-Squared
Compute a power -cosine weighted av-erage of the distance values for each ray relative to the direction of each probe texel.For each probe texel, blend these values as with irradiance above. We adjust thehysteresis for mean distance separately from irradiance—details and reasoning areprovided in Section 4.3.We update the probe texels by alpha blending in the new shading results at arate of − α , where α is a hysteresis parameter that controls the rate at which newirradiance and visibility values override results from previous frames (Eq. 2). Wedynamically adapt this hysteresis value per-probe and per-texel (Section 4.3). Theupodate equation is as follows: E (cid:48) [ˆ n ] = αE [ˆ n ] + (1 − α ) (cid:88) ProbeRays max(0 , ˆ n · ˆ ω ) · L (ˆ ω ) (1)Where E is the old irradiance/visibility texel in direction ˆ n , E (cid:48) is the new texelvalue, ˆ ω is the direction of the ray, and L (ˆ ω ) is the radiance transported along the ray.
3. Related Work
Interactive global illumination has been an active area of research for years. We re-view the areas most relevant to our work.
Interactive Global Illumination with Light Probes
Image-based lighting solutions areubiquitous in modern video games [Martin and Einarsson 2010; Ritschel et al. 2009;McAuley 2012; Hooker 2016]. A common workflow for such solutions involvesplacing light probes densely inside the volume of a scene, each of which encodessome form of a spherical (ir)radiance map. Prefiltered versions of these maps canalso be stored to accelerate diffuse and glossy runtime shading queries.Variants of traditional light probes allow artists to manually place box or sphereproxies in a scene. These proxies are used to warp probe queries at runtime in amanner that better approximates spatially-localized reflection variations [Lagarde andZanuttini 2012]. Similarly, manually-placed convex proxy geometry sets are alsoused to bound blending weights when querying and interpolating between many lightprobes at runtime, in order to reduce the light leaking artifacts common to probe-basedmethods.Practitioners agree that eliminating manual probe and proxy placement remainsan important open problem in production [Hooker 2016]. Without manual adjust-ment of traditional probes, it is impossible to automatically avoid probe placementsthat lead to light and dark (i.e., shadow) leaks or displaced reflection artifacts. Majer-cik et al.’s [2019] light probes avoid light and dark leaking with raytraced visibility5nformation, but placing these probes in a uniform grid still leads to suboptimal probelocations (e.g. probes stuck in walls). To avoid these issues for glossy GI, someengines rely instead on screen-space ray tracing [Valient 2013] for pixel-accurate re-flections. These methods, however, fail when a reflected object is not visible from thecamera’s point of view, leading to inconsistent lighting and view dependent (and sotemporally unstable) reflection effects. 6 revious Approaches[Martin and Einarsson 2010][Ritschel et al. 2009][McAuley 2012][Hooker 2016][Stefanov 2016] Light-field probes[McGuire et al. 2017][Wang et al. 2019] DDGI[Majercik et al. 2019] This workSpatial Organization 3D grid, with manually placedprobes and box proxies, algorithmi-cally precomputed probe locations 3D grid, × × probes [2017]Non-uniform automatic placementover static geometry [2019] 3D grid,varying resolutions 3D grid with offsets, multiple vol-umes, tracking windowsEncoding Cube maps Octahedral encoding [Cigolle et al.2014], × Octahedral, varying resolutions Octahedral, × irradiance, × visibilityInitialization Precomputed Static, precomputed Uniform initialization to 0, valueconverges with update Classified into states based on up-date rate, converge “live” probesUpdate Static. Dynamic lighting, staticgeo [Stefanov 2016] Static, precomputed Ray trace with alpha blending, pixelshader with stencil buffer Ray trace with dynamic alphablending (convergence and percep-tion), Optimized convolution com-pute shaderQuery Shading weights based on manuallyplaced proxy geometry Light field ray tracing using probes Raster for direct lighting. Use WSpositions to query 8 probe cage withvariance bias, chebyshev bias, loadsof bias terms Previous probe sampling + sin-gle bias term (self-shadow bias),multivolume blending, primary hitglossy raycast + second orderglossy reflection sampled fromprobes Table 2 . Evolution of probe based GI showing spatial organization, encoding, initialization, update, and query for the GI computation. ight Field probes [McGuire et al. 2017] automatically resolve many light/darkleaking issues (in scenes with static geometry and lighting) by encoding additionalinformation about the scene geometry into spherical probes. A solution for dynamiclighting is presented in Silvennoinen et al. [2017], but this solution only supportscoarse dynamic occluders and requires complex probe placement based on static ge-ometry. As mentioned above, the irradiance probes of Majercik et al. [2019] avoidmost light/dark leaks in scenes with dynamic lighting and geometry, but probe place-ment is stilll suboptimal. Suboptimal placement can lead to lighting results that, whilebelievable, are inferior to the correctly sampled result, and sometimes exhibit shadowleaking in cases of complex geometry with acute corners. Interactive Ray Tracing and Shading.
Correct shading with probe-based lightingmethods relies on point-to-point visibility queries. At a high-level, one can inter-pret our ray tracing technique as tracing rays against a voxelized representation ofthe scene (as in voxel cone tracing), but with a spherical voxelization instead of anoctree. Two important differences that contribute to many of the practical advantagesof our representation are 1) we explicitly encode geometric scene information (i.e.radial depth and depth squared) instead of relying on the implicit octree structure toresolve local and global visibility details, and 2) that neither our spatial parameteriza-tion nor our filtering relies on scene geometry. This prevents light (and dark) leakingartifacts and allows us to resolve centimeter-scale geometry at about the same cost (inspace and time) as a voxel cone tracer that operates at meter-scale . As we target trueworld-space ray-tracing in a pixel shader, and not just screen-space ray tracing, ourtechnique can be seen as a generalization of many previous, e.g., real-time environ-ment map Monte Carlo integration methods[Stachowiak and Uludag 2015; Wyman2005; Toth et al. 2015; Jendersie et al. 2016] .
Probe Representation.
As in the work by Majercik et. al [2019], we apply Cigolleet al.’s [2014] octahedral mapping from the sphere to the unit square to store andquery our spherical distributions. This parameterization has slightly less distortionthan cube maps and provides easier methods for managing seams. In this work, weselect resolutions for octahedral irradiance and mean distance/distance squared forquality and performance.
GI in Production: A Motivating Example
In both offline and realtime rendering, sig-nificant previous work has been devoted to adapting existing global illumination al-gorithms for production. Path tracing in film, which radically changed both artistworkflow and render farm computation load, is a good example. The core path trac-ing algorithm has remained largely unchanged, but practical considerations of theparticular hardware and software systems required specialized updates to the tech-nique [Keller et al. 2015]. 8imilarly, our extensions to the previously published DDGI algorithm are a guidefor adapting it and other probe-based techniques to a production setting. We reportreal changes that we made to the base algorithm to fit production constraints.
4. Qualitative Image Improvements
When querying the probe volume at a surface, variance in the visibility estimate willbe highest around the mean of the distribution—in other words, at the surface (seeFigure 3). To avoid the shadow leaking that results from this, an additional bias awayfrom the mean of the distribution is added to the sample point during probe query.The previous technique [2019] used a combination of scene-tuned biases on the meanof the distribution, the variance of the distribution, and the chebyshev statistical test tomove the visibility query to a point of lower variance in the distribution. Intuitively, ”apoint of lower variance in the distribution” can be thought of as a point slightly offsetfrom the surface (in world space). Thus, we unify these statistical bias parameters intoa single self-shadow bias term. The self-shadow bias is a world-space vector pointingaway from the initial sample point on the surface and is computed as follows: BiasV ector = ( n ∗ . ω o ∗ . ∗ (0 . ∗ D ) ∗ B (2)Where n is the normal vector at the sample point, ω o is the direction from thesample point to the camera, . and . are empirically determined constants, D isthe minimum axial distance between probes, and B is a user-tunable floating pointscalar. We add this bias vector to the initial sample point to yield a new point whichwe use for the visibility test.Our self-shadow bias is more robust than the previous biases because a defaultvalue of the B parameter (0.3f) worked well for most scenes, whereas the previousbiases each had to be specifically tuned per scene. In cases where scene specific tuningis necessary, tuning is easier because we present a single tunable parameter insteadof three. Generally, a higher self shadow bias is necessary when there is increasedvariance in the depth estimate, as would be the case when lower ray counts are usedto update the probes (as might be done to improve performance).To further decrease light leaking, probe update rays that hit backfaces record avalue of 0 for irradiance and shorten their depth values by 80%. Shortening depthvalues ensures that the probe will see backface surfaces as shadowed and not lightthem. We set irradiance to 0 to ensure that any lighting that does come from thatprobe does not cause light to leak where it should not. We do not set depth values to 0for two reasons: 1) it would drive the computed chebyshev weight towards 0, whichmight be driven higher when the weights are normalized and 2) probes that see some9 igure 3 . A night scene from our prototype. The wall entering the alley in the left imageshows light leaking due to overly high self-shadow bias. The correct self-shadow bias in theright image computes proper occlusion. backfaces but are not stuck in walls (due to idiosyncrasies of geometry) could haveoverly skewed average depths if many of them were set to 0.To minimize the number of probes stuck in walls as much as possible, we offsetprobe positions using an iterative adjustment algorithm, as described in Section 5.
If the irradiance probes are slow to converge, abrupt lighting changes in a scene cancreate noticeable lag in the diffuse indirect illumination. The lag is most salient inlight-to-dark transitions. To combat this, we accelerate convergence by applying aperception-based exponential gamma encoding to probe irradiance values. This en-coding interpolates perceptually linearly during lighting changes—faster to light-to-dark convergence reads perceptually as a linear drop in brightness. We determinedexperimentally that an exponent of 5.0f leads to best results (lower does not convergeas fast, higher does not converge any faster). See our video supplement for results.Code listing is give in Figure 4.This perception-based encoding has the additional effect of reducing low fre-quency flicker due to fireflies—bright flashes in the diffuse GI caused by an updateray hitting a small, bright irradiance source.
We further accelerate convergence with new heuristic based on per-texel thresholdingfor irradiance data. Our lower threshold detects changes with magnitude above 25%of maximum value and lowers the hysteresis by 0.15f. Our higher threshold detectschanges with magnitude above 80% and lowers the hysteresis to 0.0f—we assume inthis case that the distribution the probe is sampling has changed completely. Thesethresholds are active only for irradiance updates—we found them to be too unstablewhen updating visibility. See Figure 5.We also implement scene-dependent, per-probe heuristics that adjust the hystere-sis based on lighting or geometry changes. These are as follows:10 loat irradianceGamma = 5.0f; // Perception encoding during probe update // Passed in or computed earlier in the shaderin vec3 sumOfCosineWeightedRayContributions;in vec3 oldValue;in float hysteresis;vec3 newIrradiance =pow(sumOfCosineWeightedRayContributions, invIrradianceGamma);return lerp(newIrradiance, oldValue, hysteresis); // //////////////////////////////////// // Perception decoding during probe samplingvec3 irradiance = vec3(0); // For the 8 probes in the surrounding cagefor (int i = 0; i < 8; ++i):vec3 probeIrradiance = texture(irradianceTexture, texCoord).rgb; // Decode the tone curve, but leave a gamma = 2 curve // to approximate sRGB blending for the trilinearprobeIrradiance = pow(probeIrradiance,vec3(irradianceGamma * 0.5));irradiance += probeWeight * probeIrradiance; // Go back to linear irradianceirradiance = square(irradiance);return irradiance;
Figure 4 . Perceptual encoding and decoding of probe irradiance during update and sampling. • Small lighting change (e.g. player-held flashlight turns on): reduce irradiancehysteresis by 15% for 4 frames.• Large lighting change (e.g. abrupt time of day shift): reduce irradiance hystere-sis by 50% for 10 frames.• Large object change (e.g. ceiling caves in): reduce irradiance hysteresis by50% for 10 frames and visibility hysteresis by 50% for 7 frames.In all our heuristics, we try to avoid low hysteresis for visibility updates as muchas possible to achieve the most stable result. In each of the scene dependent heuristics,hysteresis for all probes (not just the probes local to the change) is reduced.11any effective heuristics exist for adjusting probe hysteresis per-texel and per-probe on a scene dependent basis—we have not explored this space in depth. Forexample, it would probably be more effective to reduce hysteresis only for probesaffected by a lighting or object change rather than for all probes in the scene. Whileexploring more specific and sensitive heuristics remains a fruitful subject for futurework, the heuristics presented here worked well enough for us as we integrated thetechnique into multiple engines. We never came across content that forced us to adaptthem, but our survey was not exhaustive. 12 / Irradiance Probe Update With Per-Texel Hysteresis Adjustment // Sum ray contributionsin vec3 sumOfCosineWeightedRayContributions;in vec3 oldValue;in float hysteresis;const float significantChangeThreshold = 0.25;float newDistributionChangeThreshold = 0.8;float changeMagnitude = maxComponent(result.rgb - oldValue.xyz); // Lower the hysteresis when a large change is detectedif (abs(changeMagnitude) > significantChangeThreshold)hysteresis = max(0, h - 0.15);if (abs(changeMagnitude) > newDistributionChangeThreshold) {hysteresis = 0.0f;}return lerp(sumOfCosineWeightedRayContributions, oldValue, h);
Figure 5 . Pseudocode for probe update with per-texel hysteresis adjustment.
Note that temporal anti-aliasing (TAA) applies its own hysteresis, so the basehysteresis for our technique can be lower if TAA is applied. In this case, the TAAhysteresis should be adjusted according to scene heuristics just like the probe hystere-sis, or else it will always add a large cost to convergence even on a dramatic lightingor object change.
We compute glossy reflections with a half-screen resolution wavefront ray trace.These shaded ray hits are then blurred according to surface roughnes and distancefrom the camera before being integrated into the indirect radiance computation duringthe deferred shading pass. These raytraced reflections are more realistic than screen-space reflections, but tracing rays for 2nd through nth order reflections is infeasibleon most scenes. We improve reflections by reusing the filtered radiance data in theprobess to shade 2nd through nth order glossy reflections, resulting in better imagequality with minimal performance overhead. See Figure 6 for an on/off comparison.It is common practice in production path tracing to reduce noise by rougheningsurfaces (or otherwise truncating the BSDF evaluation) on recursive bounces [Fas-cione et al. 2019]. Reusing the irradiance probes for second order reflections is a13 igure 6 . A shiny robot against a mirror background. Both the mirror background and therobot have high glossy reflectance. The left image shows no second order glossy reflections,while the right image shows second order glossy reflections sampled from probes. similar approximation, which here avoids noise by taking advantage of a data struc-ture already available to us. Note, however, that the probe data structure stores cosine-filtered irradiance—not the cosine-weighted integral of radiance over the hemisphere,which is the correct measure for reflectance. These two quantities are equivalent to afactor of π , but the units are different: radiance (Ws − m − ) vs. irradiance (Wm − ).
5. Probe Position Adjustment
The probe visibility information prevents light and shadow leaks from occluded probes,but leaves some probes in total occlusion such that they never contribute to shading.We present a simple, fast optimizer that iteratively shifts probes around static geom-etry to maximize the number of useful probes and generate good viewpoints. Duringinitialization, our optimizer adjusts each probe through the closest backface it can see,then further adjusts probes away from close front-faces to maximize surface visibility(see Figure 7). Pseudocode is given in Figure 8. We do not move probes arounddynamic geometry because this causes instability—a stable result is preferable to anunstable result with lower average error. To correctly light dynamic objects, we lever-age the fact that a uniformly sampled probe is an approximation of the full irradiancefield at its sample location. If a probe passes through a dynamic object, our backfaceheuristics (described at the end of Section 4.1) will minimize shadow leaking. Whenthe probe emerges, our convergence heuristics (Section 4.3) will quickly converge itsvalue.Out of a desire to maintain a uniformly sampled irradiance field representation,we did not implement more complex probe sampling techniques, such as importancesampling, which might speed probe convergence at the cost of stability and on-the-flygeneralization to moving geometry. Exploring these update techniques in detail is14 igure 7 . A view of the ceiling on our Greek Villa scene. Spheres are a visualization of theprobes. The black probes are correctly dark, but are not contributing to the final image. Theacute corner leads to shadow leaking (labeled with a green ellipse) with a default probe grid(left). Our optimizer adjusts probes out of the wall and ceiling to remove the leak (right). promising for future work.The purpose of the optimizer is to increase the number of probes that can con-tribute to the final image. The following scenario, however, demonstrates that ouroptimizer can sometimes add additional computation without increasing image qual-ity. Consider an 8-probe cage surrounding a flat wall (Figure 9). The optimizer cancause probes to “double cover” a surface if the 4 probes within the surface are adjustedoutside it. This causes the full probe cage to turn on and shade the surface, increasingthe number of actively tracing probes without appreciably affecting the image quality(Figure 9). For our test scenes, this slight inefficiency was worth the added benefit ofoptimizing probe positions globally.The probe position optimizer runs for 5 iterations during probe state classification,which is enough for almost all probes to converge their locations. We cap the numberof iterations at 5 to prevent probes from moving back and forth (infinitely) throughtangent backfaces.More work is needed to determine the best position optimizer algorithm, andmany investigations in this vein exist (see, for example, Wang et al. [2019]). Ouroptimizer worked well for multiple engines, but is almost certainly not optimal.15 n int backfaceCount; // number of rays that hit backfacesin vec3 closestBackfaceVector; // direction to closest backfacein vec3 farthestFrontfaceVector; // direction to farthest frontfacein vec3 closestFrontfaceVector; // direction to closest frontfaceinout vec3 currentOffset; // Current offset from the grid for thisprobe.vec3 fullOffset = vec3(inf);vec3 offsetLimit = ddgiVolume.probeOffsetLimit *ddgiVolume.probeSpacing; // If there’s a close backface AND you see more than 25% backfaces, // assume you’re inside something.if ((float(backfaceCount) / RAYS_PER_PROBE) > 0.25f) { // Solve for the maximum scaling possible on each axis.vec3 positiveOffset = (-currentOffset.xyz + offsetLimit)/ closestBackfaceDirection;vec3 negativeOffset = (-currentOffset.xyz - offsetLimit)/ closestBackfaceDirection;vec3 combinedOffset =vec3(max(positiveOffset.x, negativeOffset.x),max(positiveOffset.y, negativeOffset.y),max(positiveOffset.z, negativeOffset.z)); // Slightly bias this point to ensure we stay within bounds.const float epsilon = 1e-3; // Millimeter scalefloat scaleFactor =(min(min(combinedOffset.x, combinedOffset.y),combinedOffset.z) - epsilon); // If we can’t move through the backface, don’t move at all.fullOffset = currentOffset.xyz + closestBackfaceDirection *((scaleFactor <= 1.0f) ? 0.0f : scaleFactor);} else if (!(dot(farthestDirection, randomOrientation *sphericalFibonacci(closestFrontfaceIndex, RAYS_PER_PROBE)) > 0.5f)) { // The farthest frontface is also the closest if the probe can // only see one surface. In this case, don’t move the probe. // Move minimum distance possible.vec3 farthestDirection = min(0.2f, farthestFrontfaceDistance) *normalize(randomOrientation *sphericalFibonacci(farthestFrontfaceIndex, RAYS_PER_PROBE));fullOffset = currentOffset.xyz + farthestDirection;}currentOffset = fullOffset;
Figure 8 . Pseudocode for an iteration of the probe position optimizer operating on a singleprobe. igure 9 . A corner of the Greek Villa scene. Spheres are visualizations of the probes, encir-cled in green to denote the ”Vigilant” state. Probes are marked ”Vigilant” when the optimizeradjusts them out of surfaces, leading to double coverage of surfaces when all 8 probes of acage can see the front face of the point they’re shading.
6. Probe States
For all but the most basic scene geometry, even after adjustment many probes in auniform 3D grid will not contribute to the final image. We introduce a robust set ofprobe states to avoid tracing or updating from such probes to increase performancewith the same visual result. Our probe states separate probes that should not updatefrom probes that must, with an additional intermediate state to identify probes thathave just appeared (either at scene initialization or with a moving volume—see Sec-tion 7.2) and adjust their hysteresis accordingly. The full set of states is shown inFigure 10 and discussed in the following sections.
As noted above, the constraints on probe movement imposed by the 3D grid indexingmake it impossible to move all probes out of walls (some probes are too constrainedby the grid structure). We identify probes that remain inside static geometry and turnthem “Off” (never trace or update). As the optimizer only considers static geome-try, probes that happen to spawn inside dynamic geometry are unaffected, and willcorrectly turn on when appropriate. 17 igure 10 . Probe states with transitions between each state. α is the hysteresis for the currentframe. α (cid:48) is the default hysteresis for the scene. Even probes that are outside static geometry are not used for shading every frame:when no geometry is within probeSpacing of a probe, that probe’s value is not used.We set these probes to “Asleep” and wake them up when a surface is about to use themfor shading. Note that a probe needs to be “Awake” if and only if it is shading a surfaceor about to shade one. Lighting changes and camera proximity do not matter if theprobe is not shading a surface. The same is true for making probes “Asleep”: whenthe camera can’t see a probe, it still needs to be “Awake” if it is shading a surfacebecause it is propagating diffuse irradiance (with 2nd through nth order visibility).Thus, probes that shade static geometry should be “Vigilant” (they should always trace and update). Though probes near geometry must trace to propagate GI, the gridresolution need not be as fine in regions that are far from the camera. Pseudocode forthe probe state optimizer is given in Figure 11.
Participating Media and Probe States
The probe data structure encodes a 3D irradi-ance field that is queryable at any point within its volume. Thus, it might be queriedat positions in empty space to provide global illumination in participating media. Inthis case, even probes not shading a surface would need to be ”Awake” if they arewithin the participating medium.
Probe positions and states are computed in a four step pass:• For all uninitialized probes, trace rays for five frames to determine optimal18ositioning and initial state. At the end of this pass, all previously uninitializedprobes are “Newly Vigilant”, “Off”, or “Sleeping”.• Extend AABBs for all dynamic objects by a probe grid cell + the self-shadowbias for a conservative estimate. Set all “Sleeping” probes inside the extendedAABB of a dynamic object to “Newly Awake”.• Optionally trace a large number of rays for “Newly Vigilant” and “NewlyAwake” probes to converge them in a frame, setting hysteresis to 0. Set theirstates to “Vigilant” and “Awake” respectively.• Trace rays from “Vigilant” and “Awake” probes to update their values with thenormal hysteresis value for the scene. This step can also be used to converge“Newly Vigilant” and “Newly Awake” probe values if the previous step wasomitted.The first step of the algorithm can be greatly accelerated with static geometrybounding boxes, as a probe can be directly adjusted against those bounding boxesrather than relying on distance and backface information from the spherical ray cast.Many probes could be immediately classified “Newly Vigilant” with this approach,though ray tracing would still be necessary to correctly determine which probes shouldbe set to “Off”.Though these passes run every frame, for the majority of frames the first step willnot run because no probes will be uninitialized. If the optional convergence pass isomitted, then only the final update step will run for most frames.
Probe sleeping using our probe state scheme leads to a 30-50% average performanceimprovement (Figure 12). In addition to the performance improvement (shown in themiddle column) we also show corresponding increases in rays cast per probe for thesame performance. Casting more rays per probe makes new probe values more stableand allows for a lower global hysteresis, which makes the GI converge faster.
7. Quantitative performance improvements
The approach of Majercik et al. [2019] updated probe texels using a pixel shader witha stencil buffer (to avoid processing border texels in the update pass). Border texelswere updated in a separate pixel shader pass for correct bilinear interpolation. Thisapproach leverages the graphics hardware for alpha blending results. Despite this,however, faster update can be achieved by using a general purpose GPU (GPGPU)19 or each uninitialized probe:Trace rays (distance only, no shading)Position optimizer iterationif (still in wall):OFFif (frontfaceDistance < probeSpacing):NEWLY VIGILANTelseSLEEPINGfor all dynamic geo:Extend bounding boxes grid cell size + self shadow biasfor all SLEEPING probes:if (probe inside bounding box):NEWLY AWAKE // Optionally converge probes in this frame...for all NEWLY AWAKE and NEWLY VIGILANT probes:Trace rays to converge valueNEWLY AWAKE -> AWAKENEWLY VIGILANT -> VIGILANT // ...or let them converge in the update pass.for all VIGILANT or AWAKE probes:Trace rays and update value. Figure 11 . Pseudocode for probe state computation. compute operation optimized with GPU compute best practices. We give backgroundand details of this approach below.Modern GPU architectures dispatch thread groups to cover user-specified com-pute grid dimensions. All threads in a group execute the same code in parallel, soensuring that threads do not take different control paths in the code (coherent execu-tion) is vital for performance. By ensuring coherent execution, we achieve a 3x per-formance improvement in the update pass over the pixel shader approach with carefulindexing over thread blocks consisting of an integer number of groups. All groupexecution is fullly coherent. In addition, we store incoming shaded sample ray hitsin shared memory buffers so that all threads can read it in parallel when computing anew probe texel value.Previous work showed the effect of probe resolution on image quality and perfor-mance. We maintain image quality while selecting probe resolution (8x8 irradiance,16x16 visibility) for a combination of bandwidth, memory footprint, fast convolu-tion, efficient index computation, and most important: mapping to SIMD instructions(thread lanes on a GPU) for peak occupancy on our target hardware. At powers of20 igure 12 . Performance data for probe sleeping. The ”Baseline” column shows the time forprobe trace and update without probe sleeping (all probes are marked ”Vigilant”). The ”EqualQuality” and ”Time Saved” columns show savings of probe sleeping as a percentage of timeand as absolute time respectively. Finally, the ”Better Quality” column shows the absoluteray increase achievable by tracing more rays from active probes to match the baseline time.Higher quality is achieved here by tracing more rays per probe per update pass—this reducesthe variance in the estimation and speeds convergence. two, a probe can be updated by an integer number of 32 or 64 thread groups (commonhardware-defined minimum sizes) for maximum possible occupation and coherence.Arbitrary resolution values offer the highest flexibility at the cost of efficiency.21 a) Octahedral representation and border copy texels. Colors denote faces on the collapsedoctahedron. Letters in border cells denote copy destinations for cells inside the border labeledwith the same letter. (b)
Thread block alignment for probe update on an 8x8 irradiance probe (left) and a 16x16visibility probe (right). (c)
Thread block alignment for probe border copy. One block of 32 threads copies corners forfour irradiance and four visibility probes (orange). Four blocks copy edges for four irradianceprobes (green). Eight blocks copy edges for four visibility probes (blue).
Figure 13 . Octahedral probe layout and probe update thread indexing.
Conceptually, a probe grid covers all space in the scene. In practice, however, wedo not have the compute or raytracing budget to update and trace a level-sized, highresolution probe grid as it may contain tens of thousands of probes. To maintain highprobe resolution where it is most necessary, we implement a 3D tracking window ofprobes. We used this window to track the camera, though any object can be trackedwith the same strategy. Our window begins centered on the camera. As the cam-era moves, if it moves further from the center than the distance between two probesin a cage (along any axis), a new plane of probes spawns in front of it (relative toits direction of motion) and the plane furthest behind it disappears. We implementthis behavior using a 3D fixed-length circular buffer. When a new probe plane ap-pears and is initialized, its new values are written to the memory of the plane in thelast row behind the camera: the probes ”leapfrog” over the camera in discrete steps(Figure 14). A discretely stepping probe window necessitates careful interpolationbetween multiple probe volumes—our strategy for this is discussed in Section 7.3. (a)
Default Grid (b)
Offset Grid
Figure 14 . Conceptual layout of the camera tracking window indexing with phase offset in2D. The row of probes that moves is colored in green. When the camera passes the centerbounding threshold moving in the +X direction, the leftmost row of probes leapfrogs to the+X face of the volume. The newly computed grid index is shown in green. The correspondingphase offset change is shown on the right. .3. Multiple Probe Volumes Multiple probe volumes at differing resolutions can be used to efficiently implementprogressively decreasing grid resolutions that cascade out from the camera, thus sav-ing performance without effecting image quality. The data for these probe volumesis packed into a single texture as shown in Figure ?? The same approach is used ingeoclipmaps [Losasso and Hoppe 2004], light propagation volumes [Kaplanyan andDachsbacher 2010], and voxel cone volumes [Crassin et al. 2011]. Additional high-resolution volumes can also be used to efficiently cover hero assets with complexgeometry that require higher resolution diffuse irradiance. (a)
Multiple probe volumes (b)
Transition start (marked in green) (c)
Dense volume hidden
Figure 15 . Spheres visualized show a dense volume (smaller sphers) and a sparse volume(larger spheres). The spheres are sized based on the probe spacing within each volume. Onthe far left, the pink region shows the the area fully shaded by the dense volume, whichgradually falls off to blue, the area shaded by the sparse volume. The center image marks thestart of this transition. The rightmost image hides the dense probes to make visualizing thetransition region easier.
Figure 16 . Shaded ray hit data for multiple volumes packed into a single texture. This textureis irradiance data taken from our multivolume scene in the supplemental video. The textureincludes shaded update rays for the camera locked volume, the city scale volume, and thelevel scale volume—these are labeled in the figure and delineated within the texture by thered lines (which are not part of the irradiance data).
We blend between volumes by linearly falling off from 1.0-0.0 at the last grid cell(starting at the second-to-last plane of probes) along each axis of the 3D grid (see Fig-ure 15). In the deferred shader, a weight is computed for each volume starting from24ost to least dense. This is also the sampling order because the most dense volumewill have the best approximation of the local lightfield. Volume weights are accumu-lated at each volume sample. After the weight total reaches 1.0, further volumes areskipped.The weighted volume blending described above yields smooth transitions forstatic volumes, but can cause popping in the GI when applied to camera locked vol-umes. When a volume leapfrogs in front of the camera, some points can go frombeing fully shaded by a sparse cascade to being heavily shaded by the camera cascade(Figure 17). When computing blending weights for camera locked volumes, we ad-dress this by tightening the transition region by one grid cell (along each axis) thencentering it on the camera. When a new plane of probes leapfrogs to the front ofa volume, points that are newly within that volume will not immediately be shadedby it. Instead, those points will gradually transition between volumes as the cameramoves towards them. Results are shown in our supplemental video.The prototype multivolume code passes all probe volumes to the deferred shader,and then per-pixel iterates through them to figure out which ones contain the pointbeing shaded. Though not the optimal approach for performance, this provides thehighest flexibility in tweaking the blending algorithm to evaluate image quality. Fora production implementation, the usual solutions for the deferred shading light loopissue (considering the volumes as lights) are available:• Do the full brute force light loop—for fewer than 10 volumes, the point-in-OBBtest to determine which volumes contain the shaded point is fast to evaluate.• Make one deferred pass per volume, rasterizing the volume’s bounds to find thecovered pixels.• Make a spatial data structure (e.g., octtree, BVH) over the volumes and thentraverse that at runtime in the pixel shader to find which volumes the pixel is in.This method requires more bookkeeping and potentially costly data-dependentfetches.• Use tiles [Olsson et al. 2012] set up on the CPU or with a GPU pass to conser-vatively approximate one of the previous methods.For the pure cascaded method, these optimizations are not necessary because vol-umes are axis-aligned in world space and nested in a regular pattern.
Previous probe schemes required an extra shader pass to gather the indirect contri-bution over the frame. We present a simpler framework that optimizes the globalillumination gather step to directly sample the probe data structure during shading,25ielding reduced bandwidth requirements. Our code is included in the supplementalmaterial in
GIRenderer_deferredShade.pix .26 a) Initial camera position. Labels show the blending region for the camera tracking win-dow, the camera boundary that will cause the volume to move, and the volume weights fora point being shaded by the camera volume (brown circles) and a surrounding volume (notvisualized). (b)
Camera moves. Without camera-aware blending, volume weights on the point changedramatically in one frame. (c)
Camera moves. With camera-aware blending, the volume weights change slowly over thecourse of multiple frames, leading to smoother transtions.
Figure 17 .
2D illustration of volume blending using the static volume method vs. a cameraaware volume blending. . Conclusion and Discussion We present multiple extensions to the dynamic diffuse global illumination algorithm [Ma-jercik et al. 2019] to improve image quality, performance, and ease of deploymentin a production setting. These extensions were developed in response to produc-tion constraints encountered when integrating the technique into the NVIDIA RTXGISDK [NVIDIA 2020], the Unity game engine, Unreal Engine 4, and several commer-cial games.The base algorithm of Majercik et al. [2019] is inherently practical due to it’s im-age quality and performance. This paper covers the gap between a practical algorithmand one that is ready for production deployment. Extensions like our “self-shadowbias” make the algorithm easier to tune, and our performance optimizations to theupdate pass make it feasible for the render budget of production games. For all of ourextensions, we sought solutions that were robust, easy to understand, and easy to tunewithout fundamentally changing the algorithm.
Though our proposed convergence heuristics increase convergence over the previousapproach, there is still some ghosting in the indirect illumination for small, brightlight sources (like flashlights—see our video supplement at 7:05). This lag could beaddressed by intensifying our specific hysteresis-reduction heuristics on small lightsknown to cause ghosting, though doing this globally may cause instability in otherregions of the image. While more specialized methods like reflective shadow mapsyield less ghosting [Xu 2016], an advantage of our method is that all light sourcescan be handled generically to produce global illumination—we trade some quality forgenerality.In addition to our performance improvements, a per-frame ray budget could beimplemented to allow more control over the render budget of the technique. For ourapplications, we found that controlling a) the rays per probe and b) the number ofprobes in a volume was enough to hit our performance targets. A more sophisti-cated treatment of ray budget would trace different ray amounts on a per-probe basis,adding a lot of complexity to the implementation. We chose simplicity over a moreoptimized ray budget, but a study of optimal ray apportioning between probes (takinginto account lighting and geometry changes, the camera position, etc.) is interestingfuture work.Our algorithm covers a large space of rendered effects and thus suggests manypossible directions for future work. For instance, our techniqe forces second-orderglossy reflections to maximum roughness in order to re-use the irradiance values ascosine-filtered radiance. Increasing roughness over scattering events has precedent asa noise reduction technique in film production [Kulla et al. 2018], although typicallynot at such an aggressive scale. 28econd order glossy reflections could be improved by using multiple higher res-olution filtered radiance textures with different cosine power weighting—like theweighting for visibility probes, but with multiple octahedral representations per sam-ple point instead of one. These could be used to render second order glossy reflectionsof varying roughness.
Acknowledgements
Foremost, we thank Peter Shirley for his invaluable feedback and editing. Thanks toCorey Taylor and Mike Mara for the initial probe implementation. Thanks to DerekNowrouzezahrai and Jean-Philippe Guertin for their effort in the original DDGI paper.Thanks to Paul Hodgson, Peter Featherstone, Jesper Mortensen, Kuba Cupisz, andthe rest of the Unity Copenhagen lighting team for their help with Unity. Thanks toKelsey Blanton and Alan Wolfe for their work on the NVIDIA RTXGI SDK. Thanksto Pablo Palmier at Ninja Theory for his help with Unreal Engine 4.
References C IGOLLE , Z. H., D
ONOW , S., E
VANGELAKOS , D., M
ARA , M., M C G UIRE , M.,
AND M EYER , Q. 2014. A survey of efficient representations for independent unit vec-tors.
Journal of Computer Graphics Techniques (JCGT) 3 , 2 (April), 1–30. URL: http://jcgt.org/published/0003/02/01/ . 7, 8C
RASSIN , C., N
EYRET , F., S
AINZ , M., G
REEN , S.,
AND E ISEMANN , E.2011. Interactive indirect illumination using voxel cone tracing.
ComputerGraphics Forum 30 , 7, 1921–1930. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2011.02063.x ,arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2011.02063.x,doi:10.1111/j.1467-8659.2011.02063.x. 24F
ASCIONE , L., H
ANIKA , J., H
ECKENBERG , D., K
ULLA , C., D
ROSKE , M.,
AND S CHWARZHAUPT , J. 2019. Path tracing in production: Part 1: Modern path tracing. In
ACM SIGGRAPH 2019 Courses , Association for Computing Machinery, New York, NY,USA, SIGGRAPH ’19. URL: https://doi.org/10.1145/3305366.3328079 ,doi:10.1145/3305366.3328079. 13H
OOKER , J. 2016. Volumetric global illumination at treyarch. In
Advances in Real-Time Ren-dering 2016 , SIGGRAPH 2016, Treyarch. URL: .5, 7J
ENDERSIE , J., K
URI , D.,
AND G ROSCH , T. 2016. Real-Time Global IlluminationUsing Precomputed Illuminance Composition with Chrominance Compression.
Jour-nal of Computer Graphics Techniques (JCGT) 5 , 4 (December), 8–35. URL: http://jcgt.org/published/0005/04/02/ . 8K
APLANYAN , A.,
AND D ACHSBACHER , C. 2010. Cascaded light propagation volumesfor real-time indirect illumination. In
Proceedings of the 2010 ACM SIGGRAPH Sympo- ium on Interactive 3D Graphics and Games , Association for Computing Machinery, NewYork, NY, USA, I3D ’10, 99–107. URL: https://doi.org/10.1145/1730804.1730821 , doi:10.1145/1730804.1730821. 24K ELLER , A., F
ASCIONE , L., F
AJARDO , M., G
EORGIEV , I., C
HRISTENSEN , P., H
ANIKA ,J., E
ISENACHER , C.,
AND N ICHOLS , G. 2015. The path tracing revolution in the movieindustry. In
ACM SIGGRAPH 2015 Courses . 1–7. 8K
ULLA , C., C
ONTY , A., S
TEIN , C.,
AND G RITZ , L. 2018. Sony pictures imageworksarnold.
ACM Trans. Graph. 37 , 3 (Aug.). URL: https://doi.org/10.1145/3180495 , doi:10.1145/3180495. 28L
AGARDE , S.,
AND Z ANUTTINI , A. 2012. Local image-based lighting with parallax-corrected cubemap. SIGGRAPH 2012, DONTNOD Entertainment. URL: https://seblagarde.wordpress.com/2012/11/28/siggraph-2012-talk/ . 5L
OSASSO , F.,
AND H OPPE , H. 2004. Geometry clipmaps: Terrain rendering using nestedregular grids.
ACM Trans. Graph. 23 , 3 (Aug.), 769–776. URL: https://doi.org/10.1145/1015706.1015799 , doi:10.1145/1015706.1015799. 24M
AJERCIK , Z., G
UERTIN , J.-P., N
OWROUZEZAHRAI , D.,
AND M C G UIRE , M. 2019.Dynamic diffuse global illumination with ray-traced irradiance fields.
Journal of Com-puter Graphics Techniques (JCGT) 8 , 2 (June), 1–30. URL: http://jcgt.org/published/0008/02/01/ . 1, 2, 5, 7, 8, 9, 19, 23, 28, 31M
ARTIN , S.,
AND E INARSSON , P. 2010. A real time radiosity architecture for videogames. In
Advances in Real-Time Rendering 2010 , SIGGRAPH 2010, Geomericsand EA DICE. URL: http://advances.realtimerendering.com/s2010/Martin-Einarsson-RadiosityArchitecture(SIGGRAPH2010AdvancedRealTimeRenderingCourse).pdf . 5, 7M C A ULEY , S. 2012. Calibrating lighting and materials in far cry 3. In
Prac-tical Physically Based Shading in Film and Game Production , SIGGRAPH 2012,Ubisoft Montreal. URL: https://blog.selfshadow.com/publications/s2012-shading-course/ . 5, 7M C G UIRE , M., M
ARA , M., N
OWROUZEZAHRAI , D.,
AND L UEBKE , D. 2017. Real-timeglobal illumination using precomputed light field probes. In
ACM SIGGRAPH Symposiumon Interactive 3D Graphics and Games , 11. URL: http://casual-effects.com/research/McGuire2017LightField/index.html . 1, 7, 8NVIDIA, 2020. Rtx global illumination, Sep. URL: https://developer.nvidia.com/rtxgi . 1, 2, 28O
LSSON , O., B
ILLETER , M.,
AND A SSARSSON , U. 2012. Tiled and clustered forwardshading: Supporting transparency and msaa. In
ACM SIGGRAPH 2012 Talks , Associationfor Computing Machinery, New York, NY, USA, SIGGRAPH ’12. URL: https://doi.org/10.1145/2343045.2343095 , doi:10.1145/2343045.2343095. 25R
ITSCHEL , T., G
ROSCH , T.,
AND S EIDEL , H.-P. 2009. Approximating dynamic globalillumination in image space. In
Proceedings of the 2009 Symposium on Interactive 3DGraphics and Games , ACM, New York, NY, USA, I3D ’09, 75–82. URL: http://doi.acm.org/10.1145/1507149.1507161 , doi:10.1145/1507149.1507161. 5, 7 ILVENNOINEN , A.,
AND L EHTINEN , J. 2017. Real-time global illumination by precom-puted local reconstruction from sparse radiance probes.
ACM Transactions on Graph-ics (Proceedings of SIGGRAPH Asia) 36 , 6 (11), 230:1–230:13. URL: https://doi.org/10.1145/3130800.3130852 , doi:10.1145/3130800.3130852. 8S
TACHOWIAK , T.,
AND U LUDAG , Y. 2015. Stochastic screen-space reflections. In
Advancesin Real-Time Rendering 2015 , SIGGRAPH 2015, EA DICE. URL: . 8S
TEFANOV , N. 2016. Global illumination in tom clancy’s the division. Presented atGame Developers Conference, 2016. URL: . 7T
OTH , R., H
ASSELGREN , J.,
AND A KENINE -M ¨
OLLER , T. 2015. Perception of high-light disparity at a distance in consumer head-mounted displays. In
Proceedingsof the 7th Conference on High-Performance Graphics , ACM, New York, NY, USA,HPG ’15, 61–66. URL: http://doi.acm.org/10.1145/2790060.2790062 ,doi:10.1145/2790060.2790062. 8V
ALIENT , M. 2013. Killzone shadow fall demo postmortem. Sony Devstation2013, Guerilla Games. URL: . 6W
ANG , Y., K
HIAT , S., K RY , P. G., AND N OWROUZEZAHRAI , D. 2019. Fast non-uniformradiance probe placement and tracing. In
Proceedings of the ACM SIGGRAPH Symposiumon Interactive 3D Graphics and Games , Association for Computing Machinery, New York,NY, USA, I3D ’19. URL: https://doi.org/10.1145/3306131.3317024 ,doi:10.1145/3306131.3317024. 7, 15W
YMAN , C. 2005. An approximate image-space approach for interactive refraction.
ACMTrans. Graph. 24 , 3 (July), 1050–1053. URL: http://doi.acm.org/10.1145/1073204.1073310 , doi:10.1145/1073204.1073310. 8X U , K., 2016. Temporal antialiasing in uncharted 4. URL: http://advances.realtimerendering.com/s2016/index.html . 28 Index of Supplemental Materials