[PDF] Computing Light Transport Gradients using the Adjoint Method

Abstract

This paper proposes a new equation from continuous adjoint theory to compute the gradient of quantities governed by the Transport Theory of light. Unlike discrete gradients ala autograd, which work at the code level, we first formulate the continuous theory and then discretize it. The key insight of this paper is that computing gradients in Transport Theory is akin to computing the importance, a quantity adjoint to radiance that satisfies an adjoint equation. Importance tells us where to look for light that matters. This is one of the key insights of this paper. In fact, this mathematical journey started from a whimsical thought that these adjoints might be related. Computing gradients is therefore no more complicated than computing the importance field. This insight and the following paper hopefully will shed some light on this complicated problem and ease the implementations of gradient computations in existing path tracers.

Full PDF

CComputing Light Transport Gradients using the Adjoint Method

Jos Stam, Graphics Researcher, NVIDIA. 03/10/2019 - present

Abstract

This paper proposes a new equation from continuous adjoint theory to compute the gradient of quantities governed by the Transport Theory of light. Unlike discrete gradients ala autograd , which work at the code level, we first formulate the continuous theory and then discretize it. The key insight of this paper is that computing gradients in Transport Theory is akin to computing the importance, a quantity adjoint to radiance that satisfies an adjoint equation. Importance tells us where to look for light that matters. This is one of the key insights of this paper. In fact, this mathematical journey started from a whimsical thought that these adjoints might be related. Computing gradients is therefore no more complicated than computing the importance field. This insight and the following paper hopefully will shed some light on this complicated problem and ease the implementations of gradient computations in existing path tracers.

1. Introduction

In this paper we present a general framework for computing gradients in the context of light propagation. Gradients are of central importance in the fields of machine learning, computer vision and computer graphics. Often, we need to invert a simulation like a rendering to recover hidden control parameters. For smooth problems the gradient is a key instrument in methods such as gradient descent or quasi-Newton iteration. This paper is concerned with computing the gradient of a solution to a transport equation. In order to achieve this goal, we derive a continuous adjoint equation for the gradient of the radiance. This equation is a generalization of the usual backpropagation algorithm popular in deep learning. We show that the adjoint equation for the gradient is almost identical to the adjoint equation for the importance in transport theory. The only difference is a different source term that is equal to the initial gradient of the cost function with respect to the radiance field. The method of computation is akin to a bi-directional Monte Carlo solution using radiance and importance. First the transport equation is solved for the radiance forward from the light sources to the receivers (camera/eye). Then the adjoint transport equation is solved backwards from the receiver for the adjoint of the gradient of radiance similarly to the importance. As the propagation progresses backwards, we update the gradients of the cost function with respect to the controls acting at that point in the path. The reader familiar with backpropagation in deep learning will appreciate the analogy with Autograd is just one of many packages out there that computes differentials at the code level. See https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html for more details and [2] for an excellent introduction to Automatic Differentiation. orward and backward propagation in neural networks. Keep this in mind when reading this paper. In fact, the adjoint theory of optimization is the backbone of backpropagation. We contrast our approach with a purely autograd style of computing the gradient. Indeed, one could automatically translate the code of a renderer into its adjoint (or reverse) version and thus obtain the gradient. This approach is known as

D.T.O:

Discretize Then Optimize . On the other hand, our approach falls in the category of

O.T.D:

Optimize Then Discretize methods. We derive the adjoint equation in the continuous setting and then discretize and reuse a standard implementation of a path tracer renderer. This is because the adjoint equation is almost identical to the computation of importance. Of course, the differentials appearing in the transport process that depend on the controls must be differentiated, either analytically or using automatic differentiation. This paper does not address the problem of smoothing non-continuous terms in the transport equation. The problem of handling discontinuities is orthogonal to the approach taken in this paper. We think that uncovering the mathematical structure in a smooth setting sheds another light on the problem and might lead to simplifications, insights and better implementations. We assume in our derivations that each function is differentiable. For non-differentiable terms some regularization or some weaker form of differentiability could be used (distribution theory for example). The rest of the paper is organized as follows. Section 2 provides the necessary theoretical background of transport theory. Section 3 gives a brief overview of the continuous adjoint method in optimization. Section 4 presents the derivation of the adjoint equation in Transport Theory setting for radiance and its adjoint for the computation of the gradient of the cost function. Section 5 provides details of a simple implementation while Section 6 discusses several applications. Finally, we conclude in Section 7 and mention directions for future research. But first as an appetizer we present some necessary results from functional analysis and fix notations.

Let

ℱ = ℱ(Ω, ℝ 𝑛 ) be the Hilbert space of all functions mapping a continuous domain Ω to ℝ 𝑛 equipped with the following inner product : 〈𝑓, 𝑔〉 = ∫ 𝑓 ∗ (𝑥)𝑔(𝑥)𝑑𝑥, Ω where 𝑓, 𝑔 ∈ ℱ. This induces a norm on the space: |𝑓| = √〈𝑓, 𝑓〉 , we will also denote 〈𝑓〉 = 〈1, 𝑓〉 = ∫ 𝑓(𝑥)𝑑𝑥 Ω , the integral of 𝑓 over the entire domain Ω . An operator is simply a linear function 𝐀 ∶ ℱ → ℱ . The adjoint of 𝐀 is an operator denoted by 𝐀 ∗ that satisfies 〈𝐀𝑓, 𝑔〉 = 〈𝑓, 𝐀 ∗ 𝑔〉 for all 𝑓, 𝑔 ∈ ℱ. An operator that satisfies

𝐀 = 𝐀 ∗ is called self adjoint . An important example is an operator defined by a kernel 𝐾(𝑥, 𝑦) as follows: 𝑥 = 〈𝐾(𝑥,∙),∙〉 = ∫ 𝐾(𝑥, 𝑦)𝑔(𝑦)𝑑𝑦 Ω . We have the identities: (𝐀 + 𝐁) ∗ = 𝐀 ∗ +𝐁 ∗ and (𝐀𝐁) ∗ = 𝐁 ∗ 𝐀 ∗ . Differentials of operators are to be understood as a Fréchet Derivative . The operator 𝐀 is differentiable at 𝑓 if there exist an operator 𝐃 (the derivative of 𝐀 at 𝑓 ) such that lim |ℎ|→0 |𝐀(𝑓 + ℎ) − 𝐀(𝑓) − 𝐃ℎ||ℎ| = 0. For all sequences ℎ = {ℎ 𝑛 } 𝑛=1∞ with ℎ 𝑛 → 0 as 𝑛 → ∞ . Fun fact : the Fréchet derivative of the Dirac delta operator: 𝛿 ∶ 𝐶 ∞ (Ω) → ℝ ⊂ 𝐶 ∞ (Ω) ∶ 𝜑 ⟼ 𝜑(0) is 𝛿 ′ ∶ 𝜑 ⟼ −𝜑 ′ (0) . In general 𝛿 (𝑘) ∶ 𝜑 ⟼ (−1) 𝑘 𝜑 (𝑘) (0) . It’s just integration by parts via Ries z’ Theorem. Yeah, a Dirac delta is not a weird function but an operator also known as a distribution.

2. Light Transport and the Adjoint Formulation We assume that our environment is comprised of a set of surfaces denoted by S . The environment between the surface is empty (no participating media) and light travels in straight lines between surface points. The properties of light like radiance are constant along each ray with changes occurring only at the surfaces. Consequently, the functions we will consider are defined over the space of rays spanned by the surfaces: This Section is inspired by Eric Veach’s excellent PhD thesis [5]. 𝐿 (𝑜) 𝐿 (𝑖) 𝐧 𝑆 𝜗 𝜗 ′ Figure 1:

Geometry at a surface point and the decomposition of the radiance into outgoing and incoming parts. = {𝑥̅ = 𝑥 → 𝑥 ′ with 𝑥, 𝑥 ′ ∈ 𝑆}. This space is four-dimensional since each ray is defined by a the two-coordinates of its endpoints on each surface. We distinguish one of these points as the origin of the ray. This is indicated by the arrow notation: 𝑥̅ = 𝑥 → 𝑥 ′ . Such that the ray with the opposite direction is denoted by (−𝑥̅) = 𝑥 ′ → 𝑥. The fundamental quantity in light transport is the radiance field:

𝐿(𝑥̅) = 𝐿(𝑥 → 𝑥 ′ ) which has physical units of radiant energy per area per solid angle: 𝑊 ∙ 𝑚 −2 ∙ 𝑠𝑟 −1 . In the following it will be convenient to distinguish between incoming radiances 𝐿 (𝑖) and outgoing radiances 𝐿 (𝑜) with respect to the normal 𝐧 at a point on the surface. This is illustrated in Fig. 1. We have by convention that: 𝐿(𝑥̅) = { 𝐿 (𝑜) (𝑥̅) if cos 𝜗 > 0𝐿 (𝑖) (−𝑥̅) if cos 𝜗 < 0

Where 𝜃 is the angle between the ray and the surface normal. These two fields are related by a propagator operator as follows: (𝐏𝐿)(𝑥̅) = 𝐿(−𝑥̅) It follows that we have 𝐿 (𝑖) = 𝐏𝐿 (𝑜) and 𝐿 (𝑜) = 𝐏𝐿 (𝑖) . This operator is self-adjoint. Light sources are modeled by an emitter field 𝐿 (𝑥̅) , while the interaction at the surfaces is given by a scattering kernel 𝐾(𝑥̅, 𝑦̅) = 𝑓 𝑠 (𝑥 → 𝑥 ′ , 𝑦 → 𝑦 ′ )𝛿(𝑦 − 𝑥 ′ ) Where 𝑓 𝑠 is the bi-directional scattering function (BSDF) and 𝛿 is the Dirac-delta operator. The transport equation relates the radiance along a ray to sources and scattered radiances: 𝐿(𝑥̅) = 𝐿 (𝑥̅) + ∫ 𝐾(𝑥̅, 𝑦̅)𝐿(𝑦̅ ℛ )𝑑𝜇(𝑦̅) (1) Where the integration measure is defined by 𝑑𝜇(𝑦̅) = 𝑉(𝑦̅) cos 𝜗 cos 𝜗 ′ |𝑦 − 𝑦 ′ | 𝑑𝑦𝑑𝑦 ′ . The visibility function

𝑉(𝑦̅) is equal to one when 𝑦 is visible from 𝑦 ′ along the ray 𝑦̅ = 𝑦 → 𝑦 ′ and zero otherwise (possibly smoothed for the sake of differentiability). Furthermore 𝜗 and 𝜗 ′ are the angles between the incoming and outgoing rays and the normal at the surface point 𝑦 = 𝑥 ′ . Eq. 1 can be written more compactly using a transport operator 𝐓(𝑥̅) = 〈𝐾(𝑥̅,∙),∙〉 for the scattering operation:

𝐿 = 𝐿 + 𝐓𝐿. (2) his is the transport equation for the radiance. Formally we can solve this equation using a Neumann series as follows: 𝐿 = 𝐒𝐿 = (𝐈 − 𝐓) −1 𝐿 = (𝐈 + 𝐓 + 𝐓 + 𝐓 + ⋯ )𝐿 . (3) This series has a physical interpretation. The final radiance is equal to successive contributions involving increasing orders of scatter events. Given the radiance function 𝐿 we can measure its value using a receiver function 𝑊 (𝑦̅) as follows: 𝐼 = 𝐼(𝐿) = ∫ 𝑊 (𝑦̅)𝐿 (𝑖) (𝑦̅ ℛ )𝑑𝜇(𝑦̅) = 〈𝑊 , 𝐿 (𝑖) 〉. (4) This is the quantity that we are essentially interested in computing. Using the propagator and the scattering operators we can rewrite the measured radiance:

𝐼 = 〈𝑊 , 𝐿 (𝑖) 〉 = 〈𝑊 , 𝐏𝐒𝐿 〉 = 〈(𝐏𝐒) ∗ 𝑊 , 𝐿 〉 = 〈𝐒 ∗ 𝐏𝑊 , 𝐿 〉 = 〈𝑊 (𝑜) , 𝐿 〉 The final radiance at a point can therefore be computed in two different ways. Either by propagating the source emitter or by propagating the receiving detector. For the latter we need to compute the adjoint of the scattering operator to obtain the importance field:

𝑊 = 𝐒 ∗ 𝑊 . (5) From Eq. 3 it follows that: 𝐒 ∗ = 𝐈 + 𝐓 ∗ + 𝐓 ∗2 + 𝐓 ∗3 + ⋯ = (𝐈 − 𝐓 ∗ ) −1 . Where 𝐓 ∗ = 〈𝐾 ∗ (𝑥̅,∙),∙〉 and 𝐾 ∗ (𝑥̅, 𝑦̅) = 𝐾(−𝑦̅, −𝑥̅) . So that the importance field satisfies the adjoint transport equation 𝑊 = 𝑊 + 𝐓 ∗ 𝑊. (6)

To summarize: we have two alternative ways to compute the radiance 𝐼 at the receiver/emitter. The first approach which we call the forward method is to solve for the radiance using the successive scatterings of the emitter (Eq. 3) and then compute the final radiance from 𝐼 = 〈𝑊 , 𝐒𝐿 〉 = 〈𝑊 , 𝐿 〉. In the forward mode we basically propagate radiance rays from the emitter to the receiver. Alternatively, we can use a backward method which solves for the importance by scattering the receiver (Eq. 5) and then compute the final radiance using Not named after the famous John Von Neumann (Hungarian-American) but Carl Neumann (German). In high school you probably learned that + ⋯ = 1 (1 − 𝑥)⁄ for |𝑥| < 1 . Well it is also true for operators when |𝐓| < 1 . But in some sense + ⋯ = −1 is fun nonsense. = 〈𝐒 ∗ 𝑊 , 𝐿 〉 = 〈𝑊 , 𝐿 〉. The backward mode traces importance rays from the receiver to the emitter. Hybrid schemes are also possible. Where one can start two sets of propagating rays, one starting from the emitter and the other starting at the receiver connecting them somewhere in the middle. This technique is known as bi-directional ray tracing in computer graphics. This concludes our brief overview of light transport theory and the role of the adjoint transport operator in connecting radiance and importance.

Figure 2:

Radiance (forward) and Importance (backward) propagation.

3. The Continuous Adjoint Method in Optimization The goal of optimization is to find the minimum (maximum) of a cost function

𝒥(𝑢, 𝜃) depending on a state 𝑢 and a control 𝜃 . Both the state and the control are continuous functions depending on a variable 𝜔 ∈ Ω ⊂ ℝ 𝑑 . The state and the control are also constrained to satisfy an equation 𝐸(𝑢, 𝜃) = 0 . For example, in the case of Ordinary Differential Equations, the continuous variable is time and the state must satisfy a differential equation:

𝐸(𝑢, 𝜃) = −𝑢̇ (𝑡) + 𝑓(𝑢(𝑡), 𝜃(𝑡)) . The fundamental problem of continuous optimization (and machine learning) can be stated concisely as:

𝐅𝐢𝐧𝐝 𝜃 ∗ = argmin 𝜃 𝒥(𝑢, 𝜃) 𝐬𝐮𝐜𝐡 𝐭𝐡𝐚𝐭 𝐸(𝑢, 𝜃) = 0, (7) The continuous adjoint method was first introduced by Pontryagin and coworkers in [4]. The article by Giles and Pierce is a very good introduction [1]. The adjoint method was first applied in computer graphics to control fluid-like animations [3]. receiver : 𝑊 emitter: 𝐿 surfaces : 𝑆 𝐓 𝐓 𝐈 𝐓 ∗ 𝐓 ∗2 radiance: 𝐿 (𝑖) = 𝐒𝐿 (𝑜) importance: 𝑊 (𝑜) = 𝐒 ∗ 𝑊 (𝑖) here 𝑢(𝜔) ∈ ℝ 𝑛 , 𝜃(𝜔) ∈ ℝ 𝑚 and 𝐸(𝑢, 𝜃) ∈ ℝ 𝑘 . We assume that the cost function is defined over the entire domain: 𝒥(𝑢, 𝜃) = ∫ 𝐽(𝑢(𝜔), 𝜃(𝜔))𝑑𝜔 = 〈𝐽(𝑢, 𝜃)〉 Ω . However, in many applications the cost function is only defined at a finite set of points 𝜔̂ 𝑠 ∈ Ω : 𝒥(𝑢, 𝜃) = 12 ∑|𝑢(𝜔̂ 𝑠 ) − 𝑢̂ 𝑠 | Where the 𝑢̂ 𝑠 ∈ ℝ 𝑛 (𝑠 = 1, ⋯ , 𝑁) are the desired states. This is a common type of cost function for least square optimization and supervised (deep) learning. In a smooth setting where all functions are assumed to de differentiable optimization and learning algorithms rely heavily on the gradient of the cost function. Consequently, a lot of research in these fields is devoted to computing this gradient. In fact, it is the fundamental challenge. The research described in this paper is no exception! For example, both gradient descent and quasi-Newton iterative methods rely heavily on a gradient of the cost function. More precisely, given a cost function we are interested in computing the gradient of the cost function with respect to the controls: 𝛿𝒥 = 𝑑𝒥𝑑𝜃. That is the holy grail we are after.

Notice that the “ 𝛿 ” symbol is a short - hand for “ 𝑑 𝑑𝜃⁄ ” not the Dirac -delta function: 𝛿𝑋 means a variation of 𝑋 with respect to the control 𝜃 . Constrained optimization problems like Eq. 7 can be transformed into unconstrained problems using the machinery of Lagrange multipliers. In the continuous setting one introduces a Lagrange multiplier function 𝑝(𝜔) . We then augment the cost function with a penalty term involving the multiplier and the constraint:

ℒ(𝑢, 𝑝, 𝜃) = 𝒥(𝑢, 𝜃) + ∫ 𝑝(𝜔) ∗ 𝐸(𝑢(𝜔), 𝜃(𝜔))𝑑𝜔 Ω = 〈𝐽(𝑢, 𝜃)〉 + 〈𝑝, 𝐸(𝑢, 𝜃)〉. This is the less familiar continuous version of the Lagrangian. The necessary conditions for optimality are (where the derivatives are Fréchet): 𝜕ℒ𝜕𝑢 = 0 𝜕ℒ𝜕𝑝 = 0 and 𝜕ℒ𝜕𝜃 = 0.

From the first condition we get an adjoint equation for the multiplier (see Appendix A) 𝜕𝐸𝜕𝑢) ∗ 𝑝 = − 𝜕𝐽𝜕𝑢 . (8) This equation is independent of the controls! This is the key reason why the adjoint method is so popular in optimization and machine learning. The consequence is that computing the gradient is no more costly then computing the function itself. The Lagrange multiplier is usually called the adjoint function in the optimization literature. Intuitively, the adjoint function models the sensitivity of the cost function with respect to the state independently of the controls. Once the adjoint function is computed we obtain the gradient of the cost function with respect to the controls as follows (see Appendix A) 𝑑𝒥𝑑𝜃 = 〈𝑝, 𝜕𝐸𝜕𝜃〉 + 〈𝜕𝐽𝜕𝜃〉. (9)

Computing the gradient of the cost function therefore involves two steps. The solution of the adjoint equation for 𝑝(𝜔) and the evaluation of the gradient. These equations are very general and can be applied to most optimization and machine learning problems. Next, we apply this methodology to the transport theory of light propagation.

4. Adjoint Method Applied to Transport Theory

We now combine the adjoint method with the transport equations. An example of a cost function in rendering is

𝐽(𝐿, 𝜃) = 12 ∑|𝐼 𝑠 (𝐿) − 𝐼̂ 𝑠 | 𝑠 𝑠=1 + 12 𝜀|𝜃| . (10) Where the sum is over the receivers and the 𝐼̂ 𝑠 are some target values and 𝜀 ≥ 0 models the smoothness of the control. In this case the gradient is: 𝑑𝐽𝑑𝜃 = ∑(𝐼 𝑠 (𝐿) − 𝐼̂ 𝑠 ) 𝑑𝐼 𝑠 𝑑𝜃 𝑁 𝑠 𝑠=1 + 𝜀𝜃. Our method can of course handle more general cost functions. But it is helpful to hold this typical example in your mind. Why? Because we are really after computing the gradient of the radiance with respect to the controls. Let that sink in. From Eq. 4 we have that 𝑑𝐼 𝑠 𝑑𝜃 = 〈𝑑𝑊 𝑑𝜃 , 𝐿 (𝑖) 〉 + 〈𝑊 , 𝑑𝐿 (𝑖) 𝑑𝜃 〉. In general, we assume that the transport operator and the emitters depend on the control function 𝜃(𝜔) . Consequently, our transport equation (Eq. 2) becomes:

𝐸(𝐿, 𝜃) = −𝐿 + 𝐓 𝜃 𝐿 + 𝐿 = 0. here the subscript denotes dependence of a function/operator on the control 𝜃 not differentiation. Its differential with respect to the radiance is: 𝜕𝐸𝜕𝐿 = −𝐈 + 𝐓 𝜃 . And an equation for the adjoint function 𝑝(𝜔) follows from Eq. 8: −𝑝𝐈 + (𝐓 𝜃 ) ∗ 𝑝 = − 𝜕𝐽𝜕𝐿 . (11) This equation can be written using the Neumann series (Eq. 5) 𝑝 = (𝐒 𝜃 ) ∗ 𝑝 . (12) Equation 12 is the main result of this paper . This equation is exactly the adjoint transport equation for the importance field with a different source term: 𝑝 = 𝜕𝐽𝜕𝐿 . (13) And we have for the particular cost function given by Eq. 10 that 𝑝 = (𝐼 𝑠 (𝐿) − 𝐼̂ 𝑠 ) 𝜕𝐼 𝑠 (𝐿)𝜕𝐿 . Eq. 12 does not depend on the number of controls and is therefore as efficient to solve as the adjoint transport equation for the importance. It also does not need the computation of derivatives with respect to the controls. This is a direct consequence of the fact that the transport operator is linear with respect to the radiance. The same operator is used for the equation of the adjoint function 𝑝(𝜔) . While solving the adjoint through propagation we compute the gradients of the cost function sequentially with respect to the controls at each scatter/emitter from Eq. 9: 𝑑𝐽𝑑𝜃 = 〈𝑝, (𝜕𝐓 𝜃 𝜕𝜃 ) 𝐿 + 𝜕𝐿 𝜕𝜃 〉 + 𝜕𝐽𝜕𝜃 . (14) Equation 14 is the second main result of this paper . This step requires the derivatives of the transport operator and the emitter with respect to the controls. These derivatives can be computed analytically or through automatic differentiation. . Implementation

First, we introduce the standard notations used in computer graphics: each ray is defined by a position 𝑥 and a direction 𝜔 . Let 𝑛 be the surface normal at this point. The outgoing radiance 𝐿(𝜔 𝑜𝑢𝑡 ) due to an incoming radiance 𝐿(𝜔 𝑖𝑛 ) is given by a B idirectional R eflectance D istribution F unction ( BRDF ): 𝐿(𝜔 𝑜𝑢𝑡 ) = 𝜌(𝑛, −𝜔 𝑖𝑛 , 𝜔 𝑜𝑢𝑡 )(𝑛 ∙ 𝜔 𝑖𝑛 )𝐿(𝜔 𝑖𝑛 ). Where 𝜌(𝑛, 𝜔 𝑖𝑛 , 𝜔 𝑜𝑢𝑡 ) is the BRDF. The minus sign in front of the incoming direction 𝜔 𝑖𝑛 is there because the BDRF is usually defined with both vectors pointing outwards from the surface. However, in path tracing the directions shown in Figure 4 are more natural. Similarly, the outgoing adjoint 𝑃(𝜔̂ 𝑜𝑢𝑡 ) in direction 𝜔̂ 𝑜𝑢𝑡 due to an incoming adjoint 𝑃(𝜔̂ 𝑖𝑛 ) from a direction 𝜔̂ 𝑖𝑛 is related via the adjoint 𝜌 ∗ of the BRDF: 𝑃(𝜔̂ 𝑜𝑢𝑡 ) = 𝜌 ∗ (𝑛, −𝜔̂ 𝑖𝑛 , 𝜔̂ 𝑜𝑢𝑡 )(𝑛 ∙ 𝜔̂ 𝑖𝑛 )𝑃(𝜔̂ 𝑖𝑛 ). 𝜔 𝑘 Figure 3:

Geometry of a path. 𝑥 𝑥 𝑘 𝑥 𝑁−1 𝑥 𝑥 𝑁 𝑛 𝑘 𝑛 𝑛 𝑛 𝑁−1 𝑛 𝑁 −𝜔 𝑘−1 𝜔 −𝜔 𝜔 −𝜔 𝑁−2 𝜔 𝑁−1 −𝜔 𝑁−1 𝑛 𝐿(𝜔 𝑜𝑢𝑡 ) 𝐿(𝜔 𝑖𝑛 ) 𝜔 𝑜𝑢𝑡 𝜔 𝑖𝑛 𝑛 𝑃(𝜔̂ 𝑜𝑢𝑡 ) 𝑃(𝜔̂ 𝑖𝑛 ) 𝜔̂ 𝑜𝑢𝑡 𝜔̂ 𝑖𝑛 Figure 4:

Definitions of the BRDF, the radiance and its adjoint. 𝑥 𝑥 he situation is shown in the Figure 4. The angles for the radiance in a forward pass are related to the angles for the importance by the following relations: 𝜔̂ 𝑜𝑢𝑡 = −𝜔 𝑖𝑛 and 𝜔̂ 𝑖𝑛 = −𝜔 𝑜𝑢𝑡 . We can generate random paths in the environment as follows. We start at the receiver and then trace rays until we hit an emitter as shown in Figure 3. For each incoming ray we generate an outgoing ray randomly from a P robability D istribution F unction ( PDF ) 𝑝(𝜔) . Appendix B shows how to generate such rays. The random path so created is denoted by a sequence of points: 𝑥 , 𝑥 , ⋯ , 𝑥 𝑁 with associated surface normals 𝑛 , 𝑛 , ⋯ , 𝑛 𝑁 as shown in Figure 5 above. The receiver is situated at the start point 𝑥 and the emitter is located at the end point 𝑥 𝑁 . For each pair of points on the path we associate a unit direction: 𝜔 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 ‖𝑥 𝑘+1 − 𝑥 𝑘 ‖ , 𝑘 = 0, ⋯ , 𝑁 − 1. We have by construction that 𝑛 𝑘 ∙ 𝜔 𝑘 > 0 and 𝑛 𝑘 ∙ 𝜔 𝑘−1 < 0 . Since radiances and adjoints are constant along each ray we have that: 𝐿(𝜔 𝑘 ) = 𝐿(−𝜔 𝑘 ) and 𝑃(𝜔 𝑘 ) = 𝑃(−𝜔 𝑘 ). Our goal is to minimize a cost function which we assume for simplicity to only depend on the radiance at the emitter and a set of control variables 𝜃 : 𝐽 ← 𝐽(𝐿(𝜔 ), 𝜃). Our goal is to compute the gradient 𝑑𝐽𝑑𝜃 using the adjoint method described above. We compute the radiance at the receiver in a forward pass. First, we initialize the radiance at the emitter:

𝐿(𝜔

𝑁−1 ) = 𝐿

𝑁,𝜃 (−𝜔

𝑁−1 ). Where the 𝜃 subscript indicates that a function depends on the controls. The radiances along the path are then computed using the BRDF 𝜌 𝑘,𝜃 and the PDF 𝑝 𝑘,𝜃 of the surfaces: 𝐿(𝜔 𝑘−1 ) = 𝜌 𝑘,𝜃 (𝑛 𝑘 , 𝜔 𝑘 , −𝜔 𝑘−1 )(𝑛 𝑘 ∙ 𝜔 𝑘 )𝐿(𝜔 𝑘 )/𝑝 𝑘,𝜃 (𝜔 𝑘 ) for 𝑘 = 𝑁 − 1, ⋯ ,1. This gives us the incoming radiance

𝐿(𝜔 ) at the receiver. From this radiance we can compute the cost function 𝐽(𝐿(𝜔 ), 𝜃) and its adjoint 𝑃(𝜔 ) = 𝜕𝐽𝜕𝐿 (𝐿(𝜔 ), 𝜃). Then we proceed with a backward pass to compute the adjoints starting with

𝑃(𝜔 ) : 𝑃(𝜔 𝑘 ) = 𝜌 𝑘,𝜃∗ (𝑛 𝑘 , −𝜔 𝑘−1 , 𝜔 𝑘 )(−𝑛 𝑘 ∙ 𝜔 𝑘−1 )𝑃(𝜔 𝑘−1 )/𝑝 𝑘,𝜃 (𝜔 𝑘−1 ) for 𝑘 = 1, ⋯ , 𝑁 − 1. he gradient of the cost function 𝑑𝐽𝑑𝜃 is computed alongside with the adjoint as follows. We initially set this gradient to zero 𝑑𝐽𝑑𝜃 = 0 and then incrementally updated it at each step of the backward pass: 𝑑𝐽𝑑𝜃 += (𝑃(𝜔 𝑘−1 ) ∙ 𝐿(𝜔 𝑘 )) 𝜕𝜌 𝑘,𝜃 𝜕𝜃 . Although this seems like an expensive step when there are many controls. We notice that in general only the subset of controls that the BRDF 𝜌 𝑘,𝜃 depends on must be updated. In practice this subset is much smaller than the total number of controls: the entire control vector is rarely updated. We finish the backward trace with an update of the gradient at the emitter: 𝑑𝐽𝑑𝜃 += (𝑃(𝜔 𝑁−1 ) ∙ 𝜕𝐿

𝑁,𝜃 𝜕𝜃 ).

The pair (𝐽, 𝑑𝐽𝑑𝜃 ) can then be fed to an optimizer to return a new set of controls 𝜃 and the whole process is iterated. As a simple example we consider the diffuse Lambertian BRDF: 𝜌 𝜃 (𝑛, 𝜔 , 𝜔 ) = 𝜃 𝜋 , a simple emitter: 𝐿 𝜃 (𝜔) = 𝜃 𝐸 𝑁 and a PDF that generates cosine weighted random samples: 𝑝(𝜔) = cos 𝜗𝜋 with 𝜔 = (𝜗, 𝜑). The control is therefore a -vector: 𝜃 = (𝜃 , 𝜃 ) . We assume a very simple cost function: 𝐽(𝐿) = 12 (𝐿 − 𝐿̂) , where 𝐿̂ is the desired output for the output radiance. The adjoint at the receiver is then given by: See also Appendix C for details on the Cook-Torrance BRDF.

𝐽𝜕𝐿 = 𝐿 − 𝐿̂.

The derivatives of the BRDF divided by the PDF and of the emission function are: 𝜕(𝜌 𝜃 /𝑝)𝜕𝜃 = (10) and 𝜕𝐿 𝜃 𝜕𝜃 = (01) 𝐸 𝑁 . This results in the following pseudo-code. Notice that we have (𝑛 𝑘 ∙𝜔 𝑘 )𝑝(𝜔 𝑘 ) = 𝜋 and −(𝑛 𝑘 ∙𝜔 𝑘−1 )𝑝(𝜔 𝑘−1 ) = 𝜋 because our random directions are assumed to be cosine weighted. This leads to the following simple path tracing algorithm. Forward pass : 𝐿 𝑁−1 = 𝜃 𝐸 𝑁 𝐟𝐨𝐫 𝑘 = 𝑁 − 1, ⋯ ,1 𝐝𝐨 {𝐿 𝑘−1 = 𝜃 𝐿 𝑘 Backward pass : 𝑑𝐽𝑑𝜃 = (00) 𝑃 = 𝐿 − 𝐿̂ 𝐟𝐨𝐫 𝑘 = 1, ⋯ , 𝑁 − 1 𝐝𝐨 { 𝑃 𝑘 = 𝜃 𝑃 𝑘−1 𝑑𝐽𝑑𝜃 += (𝑃 𝑘−1 ∙ 𝐿 𝑘 ) 𝑑𝐽𝑑𝜃 += (𝑃 𝑁−1 ∙ 𝐸 𝑁 ) It doesn’t get simpler than this!

6. Results

To validate our model, we have implemented a simple “Vanilla” text -book style path tracer. An outline of the implementation in pseudo code is given in Appendix D. We decided to implement our own path tracer rather than modifying an existing one so that we could focus on the core algorithm not on compatibility and build issues. It also offers more flexibility in debugging and visualizing our results. Our path tracer only handles ray/sphere and ray/quad intersections. Consequently, our scenes are restricted to quads and spheres. Also, we consider only three types of materials. M1 : A pure emitter defined by its emission strength ( 𝜃 ). M2 : A specular Phong-Blinn BRDF / PDF modelled by an ambient term ( 𝜃 ), a diffuse term ( 𝜃 ), a specular term ( 𝜃 ) and finally an exponent ( 𝜃 ). : A diffuse BRDF / PDF defined by an ambient term ( 𝜃 ) and a diffuse term ( 𝜃 ). The associated controls are given in parenthesis, there are seven of them which we represent by a vector 𝜃 = (𝜃 , 𝜃 , 𝜃 , 𝜃 , 𝜃 , 𝜃 , 𝜃 ). Our goal is to compute the gradient of the cost function with respect to these controls. To v alidate our model, we consider a simple “Cornell Box” -type scene. The scene is comprised of a box with six quad walls having material M3 , one quad light source at the top with material M1 and one sphere in the center with material M2 . The receiver is a pinhole camera defined by a screen located inside the box. Figure 5 depicts the scene in our visualizer.

Figure 5.

Two snapshots of our simple Path tracer. Left visualization of a single path and right a view of a rendered image and some selected paths.

Our visualizer can depict any quantity we like for debugging purposes. On the left of Figure 5 we show a single path, while on the right we show a subset of paths and the resulting rendered image. The ability to trace a single path was very useful in comparing our model with estimations obtained from a finite difference approximation of the gradients with respect to the controls. The approximation is obtained by choosing a small number 𝜀 and computing 𝑑𝐽𝑑𝜃 𝑘 ≅ 𝐽(𝜃 , ⋯ , 𝜃 𝑘 + 𝜀, ⋯ , 𝜃 𝐾 ) − 𝐽(𝜃 , ⋯ , 𝜃 𝑘 − 𝜀, ⋯ , 𝜃 𝐾 )2𝜀 for each control. Therefore, we must run the path tracer a number of times where 𝐾 = 7 in our example. The right choice for the value of the perturbation 𝜀 is tricky. When 𝜀 is too big the estimate is inaccurate and when it is too small, we loose numerical precision because we are subtracting two earby numbers. This is another strong argument for computing gradients using adjoints that do not depend on an arbitrary small parameter. In fact our adjoints are accurate up to machine precision. We found a sweet spot by varying the perturbation: 𝜀 = 0.1, 0.0001, … In Figure 6 we show these estimates for different 𝜀 along with the results from our method at the top. We observe good agreement. Next, we show renderings of the final image (top left) and the gradient for each of the seven controls. Figure 7 computes paths without importance sampling while in Figure 8 the paths are computed with a Phong-Blinn importance sampling as described in Appendix B. Figure 7.

Renderings of the image (top/left) and the gradients with respect to the controls with cosine-weighted uniform sampling.

Figure 6.

Comparison of the gradients obtained with our method (top/left) and finite difference estimates for different values of 𝜀. J = 3.3162e-06 dJ/dcontrol = ---------- Numerical gradients ---------

EPS = 0.1 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.411622 - 0.691398 ) / 2*EPS = 3.601122 ( 6.075682 - 0.053732 ) / 2*EPS = 30.109749 ( 1.013496 - 0.986575 ) / 2*EPS = 0.134604 ( 2.250000 - 0.250000 ) / 2*EPS = 10.000000 ( 3.160494 - 0.197531 ) / 2*EPS = 14.814816

EPS = 0.0001 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000357 - 0.999644 ) / 2*EPS = 3.565848 ( 1.002218 - 0.997785 ) / 2*EPS = 22.164581 ( 1.000014 - 0.999987 ) / 2*EPS = 0.134408 ( 1.001000 - 0.999000 ) / 2*EPS = 10.000169 ( 1.001334 - 0.998668 ) / 2*EPS = 13.332068

EPS = 1e-07 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000001 - 1.000000 ) / 2*EPS = 4.768371 ( 1.000002 - 0.999998 ) / 2*EPS = 19.669531 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000001 - 0.999999 ) / 2*EPS = 8.642673 ( 1.000001 - 0.999999 ) / 2*EPS = 13.411044

EPS = 1e-10 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000

Figure 8.

Renderings of the image (top/left) and the gradients with respect to the controls with Phong-Blinn importance sampling.

7. Conclusions and Future Work

In this work we have presented a novel general method to compute gradients of radiance in the context of transport theory. To achieve this, we have derived an equation for the adjoint/Jacobian of the cost function from the general theory of adjoints in optimization theory. We have shown that this can easily be implemented in a simple home brewed path tracer. The results show good agreement with a finite difference computation of the cost function. In the future we want to extend the theory to more general settings including volumetric scatterers and perhaps include diffraction effects. In these cases, we have to deal with an integro-integral equation and a Kirchhoff integral, respectively. Also, we want to implement this method in various existing GPU-based path tracers like Fermat.

References [1] M. B. Giles and N. A. Pierce.

An Introduction to the Adjoint Method to Design . Flow, Turbulence and Combustion, 65:393-415, 2000. [2] Andreas Griewank and Andrea Walther.

Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation . Society for Industrial and Applied Mathematic; Second edition (2008-09-26);

Fluid Control Using the Adjoint Method , ACM Transactions on Graphics (SIGGRAPH 2004), Volume 23, Issue 3, August 2004. 4] L.S. Pontryagin, V.G. Boltyanskie, Karreman Mathematics Research Collection, L.W. Neustadt, K.N. Trirogoff, R.V. Gamkrelidze, and E.F. Misenko.

The Mathematical Theory of Optimal Processes . Interscience publishers. Interscience Publishers, 1962. [5] Eric Veach,

Robust Monte Carlo Methods for Light Transport Simulation . Ph.D. dissertation, Stanford University, December 1997. Available at http://graphics.stanford.edu/papers/veach_thesis/.

Appendices

A. Adjoint Equation

In this appendix we derive the adjoint equation (Eq. 8) and an expression for the derivative of the cost function with respect to the controls (Eq. 9). Recall that the augmented Lagrangian is defined as

ℒ(𝑢, 𝑝, 𝜃) = 〈𝐽(𝑢, 𝜃)〉 + 〈𝑝, 𝐸(𝑢, 𝜃)〉.

Stationarity with respect to the state implies

From the same equation and using the definition of the adjoint we have that ∗ 𝑝, 𝛿𝑢〉 = 〈 𝜕𝐽𝜕𝑢 + (𝜕𝐸𝜕𝑢) ∗ 𝑝, 𝛿𝑢〉. Since this equation must hold for all functions 𝛿𝑢 we get Eq. 8: (𝜕𝐸𝜕𝑢) ∗ 𝑝 = − 𝜕𝐽𝜕𝑢. Now consider the second condition:

This is simply the equation that our state must satisfy the constraint and implies that its differential vanishes

With these results we can compute the gradient of the cost function 𝛿𝒥 = 〈 𝜕𝐽𝜕𝑢 , 𝛿𝑢〉 + 〈𝜕𝐽𝜕𝜃〉 =⏟

𝐴.1 − 〈𝑝, 𝜕𝐸𝜕𝑢 𝛿𝑢〉 + 〈𝜕𝐽𝜕𝜃〉 =⏟

𝐴.2 〈𝑝, 𝜕𝐸𝜕𝜃〉 + 〈𝜕𝐽𝜕𝜃〉. his is Eq. 9.

B. Sampling from a PDF

For simplicity we assume that the PDF is isotropic and thus only depends on the elevation angle: 𝑝(𝜔) = 2𝜋 𝑝(𝜗).

We then generate a random sample 𝜔 from this distribution using the C umulative P robability D istribution ( CDF ): 𝑃(𝜗) = Prob(𝑡 ≤ 𝜗) = 2𝜋 ∫ 𝑝(𝑡) sin 𝑡 𝑑𝑡. 𝜗0 This is a mapping from elevation angles in [0, 𝜋2 ] to the unit interval [0,1] . Using the inverse of the CDF we can directly generate a 𝑝 -distributed random angle 𝜗 from a uniformly distributed 𝑢~𝑈(0,1) : 𝜗 = 𝑃 −1 (𝑢). As an example, consider the cosine-weighted Phong-Blinn PDF depending on an exponent parameter 𝛼 ≥ 0 (for 𝛼 = 0 one obtains a cosine weighted Lambertian PDF). This PDF handles all cases considered in this paper: 𝑝 𝛼 (𝜗) = 𝛼 + 22𝜋 (cos 𝜗) 𝛼+1 . This gives the following CDF: 𝑃 𝛼 (𝜗) = (𝛼 + 2) ∫ (cos 𝑡) 𝛼+1 sin 𝑡 𝑑𝑡 𝜗0 = 1 − (cos 𝜗) 𝛼+2 . With an inverse equal to: 𝜗 = Φ 𝛼 (𝑢) = 𝑃 𝛼−1 (𝑢) = acos ((1 − 𝑢) ). These results give us a recipe to generate random vectors for our path tracer from the PDFs given above. The general algorithm works as follows. To generate a random vector 𝜔 in a world coordinate frame (𝑋, 𝑌, 𝑍) with the 𝑍 -vector being the normal direction: Generate : 𝑢 , 𝑢 ~𝑈(0,1) . Set : 𝜗 = Φ 𝛼 (𝑢 ) and 𝜑 = 2𝜋𝑢 . Random vector : 𝜔̅ = (𝜔̅ 𝑥 , 𝜔̅ 𝑦 , 𝜔̅ 𝑧 ) = (cos 𝜑 cos 𝜗 , sin 𝜑 cos 𝜗 , sin 𝜗) . Convert to world space : 𝜔 = 𝜔̅ 𝑥 𝑋 + 𝜔̅ 𝑦 𝑌 + 𝜔̅ 𝑧 𝑍 . . The Cook-Torrance and Phong-Blinn BRDF In this appendix we give all the details for the Phong-Blinn model. We will use a variant of the Cook-Torrance BRDF to simplify some of the resulting expressions. The goal here is not to faithfully reproduce the original Cook-Torrance BRDF but to show the details on how to implement such a model in our framework. The Cook-Torrance BRDF is usually defined as follows: 𝜌(𝑛, 𝜔 𝑖 , 𝜔 𝑜 ) = 𝐹(𝑛, 𝜔 𝑖 , 𝜔 𝑜 )𝐺(𝑛, 𝜔 𝑖 , 𝜔 𝑜 )𝐷(𝜔 𝑚 ), Where 𝐹 account for Fresnel effects which we will set equal to one. 𝐺 is a geometric factor that accounts for shadowing and the spread of the solid angles. We use the following expression: 𝐺(𝑛, 𝜔 𝑖 , 𝜔 𝑜 ) = (𝜔 𝑚 ∙ 𝑛)4(𝜔 𝑚 ∙ 𝜔 𝑜 )(𝜔 𝑖 ∙ 𝑛) = cos 𝜗 𝑚 𝑚𝑜 cos 𝜗 𝑖 . Where 𝜔 𝑚 is the vector halfway between the incoming and the outgoing directions: 𝜔 𝑚 = 𝜔 𝑖 + 𝜔 𝑜 ‖𝜔 𝑖 + 𝜔 𝑜 ‖. This vector is the normal that perfectly reflects the incoming light into the outgoing direction and is distributed according to a N ormal D ensity F unction ( NDF ) 𝐷(𝜔 𝑚 ) . In this appendix we will consider the Phong-Blinn NDF ( 𝜔 𝑚 = (𝜗 𝑚 , 𝜑 𝑚 ) ): 𝐷 𝑃𝐵,𝛼 (𝜗 𝑚 , 𝜑 𝑚 ) = 𝛼 + 22𝜋 (cos 𝜗 𝑚 ) 𝛼 sin 𝜗 𝑚 . Notice that this distribution is not normalized since the right normalizing term should be (𝛼 + 1)/2𝜋 . Since we are not sampling from this distribution but the one given below this does not matter. This choice does however simplify the expressions below. The incoming direction is a function of the normal and the outgoing direction through the reflection law: 𝜔 𝑖 = 𝑅 𝑜 (𝜔 𝑚 ) = 2(𝜔 𝑚 ∙ 𝜔 𝑜 )𝜔 𝑚 − 𝜔 𝑜 . And the corresponding differentials satisfy: 𝑑𝜔 𝑖 = 𝑑𝑅 𝑜 (𝜔 𝑚 ) = 4(𝜔 𝑚 ∙ 𝜔 𝑜 )𝑑𝜔 𝑚 = 4 cos 𝜗 𝑚𝑜 sin 𝜗 𝑚 𝑑𝜗 𝑚 𝑑𝜑 𝑚 . The PDF for the cosine weighted Phong-Blinn model is given by: 𝑝 𝛼 (𝜗 𝑚 , 𝜑 𝑚 ) = 𝛼 + 22𝜋 (cos 𝜗 𝑚 ) 𝛼+1 sin 𝜗 𝑚 . We point out that other choices for the NDF such as Beckmann or GGX are currently more popular. The same methodology described here applies to those models as well. his distribution is normalized, and we can therefore use it to sample random normals using the procedure described in Appendix B. In importance path tracing we evaluate the outgoing radiance as follows: 𝑑𝐿(𝜔 𝑜 ) = 𝜌 𝛼 (𝑛, 𝜔 𝑖 , 𝜔 𝑜 )𝑝 𝛼 (𝜔 𝑚 ) (𝜔 𝑖 ∙ 𝑛)𝐿(𝜔 𝑖 )𝑑𝜔 𝑖 . We can rewrite this in terms of the angles (𝜗 𝑚 , 𝜑 𝑚 ) as follows where we also use all the expressions derived above: 𝑑𝐿(𝜗 𝑜 , 𝜑 𝑜 ) = 𝜌 𝛼 (𝜗 𝑖 , 𝜃 𝑜 )𝑝 𝛼 (𝜗 𝑚 , 𝜑 𝑚 ) cos 𝜗 𝑖 𝑚𝑜 sin 𝜗 𝑚 𝐿(𝑅 𝑜 (𝜔 𝑚 ))𝑑𝜗 𝑚 𝑑𝜑 𝑚 = (𝛼 + 22𝜋 (cos 𝜗 𝑚 ) 𝛼 sin 𝜗 𝑚 cos 𝜗 𝑚 𝑚𝑜 cos 𝜗 𝑖 ) × cos 𝜗 𝑖 𝑚𝑜 sin 𝜗 𝑚 𝐿(𝑅 𝑜 (𝜔 𝑚 ))𝑑𝜗 𝑚 𝑑𝜑 𝑚 (𝛼 + 22𝜋 (cos 𝜗 𝑚 ) 𝛼+1 sin 𝜗 𝑚 )= 𝐿(𝑅 𝑜 (𝜔 𝑚 )) sin 𝜗 𝑚 𝑑𝜗 𝑚 𝑑𝜑 𝑚 . After grand eliminations, we get the simple update rule: 𝑑𝐿(𝜔 𝑜 ) = 𝐿(𝑅 𝑜 (𝜔 𝑚 )) sin 𝜃 𝑚 (𝛼) 𝑑𝜗 𝑚 𝑑𝜑 𝑚 . To update the gradient of the cost function with respect to the exponent parameter 𝛼 we need to differentiate this expression. The sine term depends on 𝛼 through the sampling procedure (see Appendix B): 𝑠(𝛼) = sin 𝜃 𝑚 (𝛼) = √1 − (1 − 𝑢) . And its derivative is: 𝑑𝑠𝑑𝛼 (𝛼) = (1 − 𝑢) log(1 − 𝑢)(𝛼 + 2) √1 − (1 − 𝑢) = cos 𝜗 𝑚 log(cos 𝛼+2 𝜗 𝑚 )(𝛼 + 2) sin 𝜗 𝑚 = cos 𝜗 𝑚 log(cos 𝜗 𝑚 )(𝛼 + 2) sin 𝜗 𝑚 . D. A Simple Path Tracer

In this Appendix we describe our simple path tracer in pseudo code. Instead of using an existing path tracer and modifying it, we decided to write our own. Why? It is general, simple and self contained with no external dependencies and no complicated data structures and optimizations. At the highest level it works like this. func do_path_tracing(do_gradient) if do_gradient then cost = 0 clear_gradients() end if for each image pixel do pixel_radiance = 0 for N samples do create initial ray R make_path(R) radiance = forward_pass() / N pixel_radiance += radiance if do_gradient then cost += cost(radiance) adjoint = dcost_drad(radiance) backward_path(adjoint) end if end if end for end When do_gradient==false we just perform a standard path trace. This is how it works. First, we create a path by intersecting the ray with the scene and spawning reflection vectors by sampling the corresponding PDFs. Then we traverse the path from the emitters to the receivers in a forward pass accumulating the radiance to generate a final radiance. To compute the gradients, we first compute the adjoint which is the Jacobian of the cost with respect to the radiance. Then, we perform a backward pass updating the adjoint and updating the gradient of each active control. The first step builds a path recursively. func make_path(R) hit = int_scene(R) if hit != and R->depth < MAX_DEPTH then u = unif(0,1) if u >= hit->absorb then dir = hit-> sample (hit->normal,R->dir) R_new = make_ray(R->depth+1, hit->point, dir) make_path(R_new) end if path[R->depth]->{hit, dir_i, dir_o} = {hit, R->dir, dir} end if end Once the path is created, we perform a forward pass. func forward_pass() N = path length hit = path[N-1]->hit radiance = hit->emission / hit->absorb for k=N-2 to k>=0 do {hit, dir_i, dir_o} = path[k]->{hit, dir_i, dir_o} path[k]->radiance = radiance radiance = hit-> BSDF_D_PDF (hit->normal, dir_o, -dir_i) * radiance end for return radiance end

After computing the cost and the adjoint we then perform a backward pass: func backward_pass(adjoint) N = path length for k=0 to N-2 do {hit, dir_i, dir_o, radiance} = path[k]->{hit, dir_i, dir_o, radiance} radiance /= 1 - hit->absorb hit-> update_BSDF_D_PDF_gradients (hit->normal, -dir_i, dir_o, radiance, adjoint) adjoint = hit-> BSDF_D_PDF (hit->normal, -dir_i, dir_o) * adjoint end for hit = path[N-1]->hit radiance = hit->emission / hit->absorb hit->emission_gradient += radiance * adjoint adjoint = 0 end

The routines in italicized bold face must be provided for a particular BSDF/PDF. Here are the implementations for Lambert func sample (normal, dir) u1, u2 = unif(0,1) c = sqrt(1-u1) s = sqrt(u1) (x,y,z) = (cos(2*PI*u2)*c, sin(2*PI*u2)*c, s) make_frame(normal, X, Y, Z) return x*X + y*Y + z*Z end func

BSDF_D_PDF (normal, dir_i, dir_o) return diff_col end func update_BSDF_D_PDF_gradients (normal, dir_i, dir_o, radiance, adjoint) diff_gradient += radiance*adjoint end

And for Phong-Blinn func sample (normal, dir) u1, u2 = unif(0,1) t = pow(u1,2/(exponent+2)) c = sqrt(1-t) s = sqrt(t) (x, y, z) = (cos(2*PI*u2)*c, sin(2*PI*u2)*c, s) make_frame(normal, X, Y, Z) N = x*X + y*Y + z*Z if dot(N,dir) < 0 then N = 2*dot(N,normal)*normal – N end if return end unc BSDF_D_PDF (normal, dir_i, dir_o) dir_m = normalize(dir_i + dir_m) cos_m = dir_m*normal cos2_m = cos_m*cos_m sin_m = sqrt(1-dir_m*dir_m) return spec_col * sin_m end func update_BSDF_D_PDF_gradients (normal, dir_i, dir_o, radiance, adjoint) dir_m = normalize(dir_i + dir_m) cos_m = dir_m*normal cos2_m = cos_m*cos_m sin_m = sqrt(1-cos2_m) dsin_m = cos2_m * log(cos_m) / sin_m / (alpha+2) spec_gradient += sin_m * (radiance*adjoint) alpha_gradient += spec_col * dsin_m * (radiance*adjoint) end

The function make_frame() generates an orthonormal frame from the normal: func make_frame(N, X, Y, Z) Z = N X = Z + (0.1, 0.2, 0.3) Y = normalize(cross(Z,X)) X = normalize(cross(Y,Z)) endend