CComputing Light Transport Gradients using the Adjoint Method
Jos Stam, Graphics Researcher, NVIDIA. 03/10/2019 - present
Abstract
This paper proposes a new equation from continuous adjoint theory to compute the gradient of quantities governed by the Transport Theory of light. Unlike discrete gradients ala autograd , which work at the code level, we first formulate the continuous theory and then discretize it. The key insight of this paper is that computing gradients in Transport Theory is akin to computing the importance, a quantity adjoint to radiance that satisfies an adjoint equation. Importance tells us where to look for light that matters. This is one of the key insights of this paper. In fact, this mathematical journey started from a whimsical thought that these adjoints might be related. Computing gradients is therefore no more complicated than computing the importance field. This insight and the following paper hopefully will shed some light on this complicated problem and ease the implementations of gradient computations in existing path tracers.
1. Introduction
In this paper we present a general framework for computing gradients in the context of light propagation. Gradients are of central importance in the fields of machine learning, computer vision and computer graphics. Often, we need to invert a simulation like a rendering to recover hidden control parameters. For smooth problems the gradient is a key instrument in methods such as gradient descent or quasi-Newton iteration. This paper is concerned with computing the gradient of a solution to a transport equation. In order to achieve this goal, we derive a continuous adjoint equation for the gradient of the radiance. This equation is a generalization of the usual backpropagation algorithm popular in deep learning. We show that the adjoint equation for the gradient is almost identical to the adjoint equation for the importance in transport theory. The only difference is a different source term that is equal to the initial gradient of the cost function with respect to the radiance field. The method of computation is akin to a bi-directional Monte Carlo solution using radiance and importance. First the transport equation is solved for the radiance forward from the light sources to the receivers (camera/eye). Then the adjoint transport equation is solved backwards from the receiver for the adjoint of the gradient of radiance similarly to the importance. As the propagation progresses backwards, we update the gradients of the cost function with respect to the controls acting at that point in the path. The reader familiar with backpropagation in deep learning will appreciate the analogy with Autograd is just one of many packages out there that computes differentials at the code level. See https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html for more details and [2] for an excellent introduction to Automatic Differentiation. orward and backward propagation in neural networks. Keep this in mind when reading this paper. In fact, the adjoint theory of optimization is the backbone of backpropagation. We contrast our approach with a purely autograd style of computing the gradient. Indeed, one could automatically translate the code of a renderer into its adjoint (or reverse) version and thus obtain the gradient. This approach is known as
D.T.O:
Discretize Then Optimize . On the other hand, our approach falls in the category of
O.T.D:
Optimize Then Discretize methods. We derive the adjoint equation in the continuous setting and then discretize and reuse a standard implementation of a path tracer renderer. This is because the adjoint equation is almost identical to the computation of importance. Of course, the differentials appearing in the transport process that depend on the controls must be differentiated, either analytically or using automatic differentiation. This paper does not address the problem of smoothing non-continuous terms in the transport equation. The problem of handling discontinuities is orthogonal to the approach taken in this paper. We think that uncovering the mathematical structure in a smooth setting sheds another light on the problem and might lead to simplifications, insights and better implementations. We assume in our derivations that each function is differentiable. For non-differentiable terms some regularization or some weaker form of differentiability could be used (distribution theory for example). The rest of the paper is organized as follows. Section 2 provides the necessary theoretical background of transport theory. Section 3 gives a brief overview of the continuous adjoint method in optimization. Section 4 presents the derivation of the adjoint equation in Transport Theory setting for radiance and its adjoint for the computation of the gradient of the cost function. Section 5 provides details of a simple implementation while Section 6 discusses several applications. Finally, we conclude in Section 7 and mention directions for future research. But first as an appetizer we present some necessary results from functional analysis and fix notations.
Let
โฑ = โฑ(ฮฉ, โ ๐ ) be the Hilbert space of all functions mapping a continuous domain ฮฉ to โ ๐ equipped with the following inner product : โฉ๐, ๐โช = โซ ๐ โ (๐ฅ)๐(๐ฅ)๐๐ฅ, ฮฉ where ๐, ๐ โ โฑ. This induces a norm on the space: |๐| = โโฉ๐, ๐โช , we will also denote โฉ๐โช = โฉ1, ๐โช = โซ ๐(๐ฅ)๐๐ฅ ฮฉ , the integral of ๐ over the entire domain ฮฉ . An operator is simply a linear function ๐ โถ โฑ โ โฑ . The adjoint of ๐ is an operator denoted by ๐ โ that satisfies โฉ๐๐, ๐โช = โฉ๐, ๐ โ ๐โช for all ๐, ๐ โ โฑ. An operator that satisfies
๐ = ๐ โ is called self adjoint . An important example is an operator defined by a kernel ๐พ(๐ฅ, ๐ฆ) as follows: ๐ฅ = โฉ๐พ(๐ฅ,โ),โโช = โซ ๐พ(๐ฅ, ๐ฆ)๐(๐ฆ)๐๐ฆ ฮฉ . We have the identities: (๐ + ๐) โ = ๐ โ +๐ โ and (๐๐) โ = ๐ โ ๐ โ . Differentials of operators are to be understood as a Frรฉchet Derivative . The operator ๐ is differentiable at ๐ if there exist an operator ๐ (the derivative of ๐ at ๐ ) such that lim |โ|โ0 |๐(๐ + โ) โ ๐(๐) โ ๐โ||โ| = 0. For all sequences โ = {โ ๐ } ๐=1โ with โ ๐ โ 0 as ๐ โ โ . Fun fact : the Frรฉchet derivative of the Dirac delta operator: ๐ฟ โถ ๐ถ โ (ฮฉ) โ โ โ ๐ถ โ (ฮฉ) โถ ๐ โผ ๐(0) is ๐ฟ โฒ โถ ๐ โผ โ๐ โฒ (0) . In general ๐ฟ (๐) โถ ๐ โผ (โ1) ๐ ๐ (๐) (0) . Itโs just integration by parts via Ries zโ Theorem. Yeah, a Dirac delta is not a weird function but an operator also known as a distribution.
2. Light Transport and the Adjoint Formulation We assume that our environment is comprised of a set of surfaces denoted by S . The environment between the surface is empty (no participating media) and light travels in straight lines between surface points. The properties of light like radiance are constant along each ray with changes occurring only at the surfaces. Consequently, the functions we will consider are defined over the space of rays spanned by the surfaces: This Section is inspired by Eric Veachโs excellent PhD thesis [5]. ๐ฟ (๐) ๐ฟ (๐) ๐ง ๐ ๐ ๐ โฒ Figure 1:
Geometry at a surface point and the decomposition of the radiance into outgoing and incoming parts. = {๐ฅฬ = ๐ฅ โ ๐ฅ โฒ with ๐ฅ, ๐ฅ โฒ โ ๐}. This space is four-dimensional since each ray is defined by a the two-coordinates of its endpoints on each surface. We distinguish one of these points as the origin of the ray. This is indicated by the arrow notation: ๐ฅฬ = ๐ฅ โ ๐ฅ โฒ . Such that the ray with the opposite direction is denoted by (โ๐ฅฬ ) = ๐ฅ โฒ โ ๐ฅ. The fundamental quantity in light transport is the radiance field:
๐ฟ(๐ฅฬ ) = ๐ฟ(๐ฅ โ ๐ฅ โฒ ) which has physical units of radiant energy per area per solid angle: ๐ โ ๐ โ2 โ ๐ ๐ โ1 . In the following it will be convenient to distinguish between incoming radiances ๐ฟ (๐) and outgoing radiances ๐ฟ (๐) with respect to the normal ๐ง at a point on the surface. This is illustrated in Fig. 1. We have by convention that: ๐ฟ(๐ฅฬ ) = { ๐ฟ (๐) (๐ฅฬ ) if cos ๐ > 0๐ฟ (๐) (โ๐ฅฬ ) if cos ๐ < 0
Where ๐ is the angle between the ray and the surface normal. These two fields are related by a propagator operator as follows: (๐๐ฟ)(๐ฅฬ ) = ๐ฟ(โ๐ฅฬ ) It follows that we have ๐ฟ (๐) = ๐๐ฟ (๐) and ๐ฟ (๐) = ๐๐ฟ (๐) . This operator is self-adjoint. Light sources are modeled by an emitter field ๐ฟ (๐ฅฬ ) , while the interaction at the surfaces is given by a scattering kernel ๐พ(๐ฅฬ , ๐ฆฬ ) = ๐ ๐ (๐ฅ โ ๐ฅ โฒ , ๐ฆ โ ๐ฆ โฒ )๐ฟ(๐ฆ โ ๐ฅ โฒ ) Where ๐ ๐ is the bi-directional scattering function (BSDF) and ๐ฟ is the Dirac-delta operator. The transport equation relates the radiance along a ray to sources and scattered radiances: ๐ฟ(๐ฅฬ ) = ๐ฟ (๐ฅฬ ) + โซ ๐พ(๐ฅฬ , ๐ฆฬ )๐ฟ(๐ฆฬ โ )๐๐(๐ฆฬ ) (1) Where the integration measure is defined by ๐๐(๐ฆฬ ) = ๐(๐ฆฬ ) cos ๐ cos ๐ โฒ |๐ฆ โ ๐ฆ โฒ | ๐๐ฆ๐๐ฆ โฒ . The visibility function
๐(๐ฆฬ ) is equal to one when ๐ฆ is visible from ๐ฆ โฒ along the ray ๐ฆฬ = ๐ฆ โ ๐ฆ โฒ and zero otherwise (possibly smoothed for the sake of differentiability). Furthermore ๐ and ๐ โฒ are the angles between the incoming and outgoing rays and the normal at the surface point ๐ฆ = ๐ฅ โฒ . Eq. 1 can be written more compactly using a transport operator ๐(๐ฅฬ ) = โฉ๐พ(๐ฅฬ ,โ),โโช for the scattering operation:
๐ฟ = ๐ฟ + ๐๐ฟ. (2) his is the transport equation for the radiance. Formally we can solve this equation using a Neumann series as follows: ๐ฟ = ๐๐ฟ = (๐ โ ๐) โ1 ๐ฟ = (๐ + ๐ + ๐ + ๐ + โฏ )๐ฟ . (3) This series has a physical interpretation. The final radiance is equal to successive contributions involving increasing orders of scatter events. Given the radiance function ๐ฟ we can measure its value using a receiver function ๐ (๐ฆฬ ) as follows: ๐ผ = ๐ผ(๐ฟ) = โซ ๐ (๐ฆฬ )๐ฟ (๐) (๐ฆฬ โ )๐๐(๐ฆฬ ) = โฉ๐ , ๐ฟ (๐) โช. (4) This is the quantity that we are essentially interested in computing. Using the propagator and the scattering operators we can rewrite the measured radiance:
๐ผ = โฉ๐ , ๐ฟ (๐) โช = โฉ๐ , ๐๐๐ฟ โช = โฉ(๐๐) โ ๐ , ๐ฟ โช = โฉ๐ โ ๐๐ , ๐ฟ โช = โฉ๐ (๐) , ๐ฟ โช The final radiance at a point can therefore be computed in two different ways. Either by propagating the source emitter or by propagating the receiving detector. For the latter we need to compute the adjoint of the scattering operator to obtain the importance field:
๐ = ๐ โ ๐ . (5) From Eq. 3 it follows that: ๐ โ = ๐ + ๐ โ + ๐ โ2 + ๐ โ3 + โฏ = (๐ โ ๐ โ ) โ1 . Where ๐ โ = โฉ๐พ โ (๐ฅฬ ,โ),โโช and ๐พ โ (๐ฅฬ , ๐ฆฬ ) = ๐พ(โ๐ฆฬ , โ๐ฅฬ ) . So that the importance field satisfies the adjoint transport equation ๐ = ๐ + ๐ โ ๐. (6)
To summarize: we have two alternative ways to compute the radiance ๐ผ at the receiver/emitter. The first approach which we call the forward method is to solve for the radiance using the successive scatterings of the emitter (Eq. 3) and then compute the final radiance from ๐ผ = โฉ๐ , ๐๐ฟ โช = โฉ๐ , ๐ฟ โช. In the forward mode we basically propagate radiance rays from the emitter to the receiver. Alternatively, we can use a backward method which solves for the importance by scattering the receiver (Eq. 5) and then compute the final radiance using Not named after the famous John Von Neumann (Hungarian-American) but Carl Neumann (German). In high school you probably learned that + โฏ = 1 (1 โ ๐ฅ)โ for |๐ฅ| < 1 . Well it is also true for operators when |๐| < 1 . But in some sense + โฏ = โ1 is fun nonsense. = โฉ๐ โ ๐ , ๐ฟ โช = โฉ๐ , ๐ฟ โช. The backward mode traces importance rays from the receiver to the emitter. Hybrid schemes are also possible. Where one can start two sets of propagating rays, one starting from the emitter and the other starting at the receiver connecting them somewhere in the middle. This technique is known as bi-directional ray tracing in computer graphics. This concludes our brief overview of light transport theory and the role of the adjoint transport operator in connecting radiance and importance.
Figure 2:
Radiance (forward) and Importance (backward) propagation.
3. The Continuous Adjoint Method in Optimization The goal of optimization is to find the minimum (maximum) of a cost function
๐ฅ(๐ข, ๐) depending on a state ๐ข and a control ๐ . Both the state and the control are continuous functions depending on a variable ๐ โ ฮฉ โ โ ๐ . The state and the control are also constrained to satisfy an equation ๐ธ(๐ข, ๐) = 0 . For example, in the case of Ordinary Differential Equations, the continuous variable is time and the state must satisfy a differential equation:
๐ธ(๐ข, ๐) = โ๐ขฬ (๐ก) + ๐(๐ข(๐ก), ๐(๐ก)) . The fundamental problem of continuous optimization (and machine learning) can be stated concisely as:
๐ ๐ข๐ง๐ ๐ โ = argmin ๐ ๐ฅ(๐ข, ๐) ๐ฌ๐ฎ๐๐ก ๐ญ๐ก๐๐ญ ๐ธ(๐ข, ๐) = 0, (7) The continuous adjoint method was first introduced by Pontryagin and coworkers in [4]. The article by Giles and Pierce is a very good introduction [1]. The adjoint method was first applied in computer graphics to control fluid-like animations [3]. receiver : ๐ emitter: ๐ฟ surfaces : ๐ ๐ ๐ ๐ ๐ โ ๐ โ2 radiance: ๐ฟ (๐) = ๐๐ฟ (๐) importance: ๐ (๐) = ๐ โ ๐ (๐) here ๐ข(๐) โ โ ๐ , ๐(๐) โ โ ๐ and ๐ธ(๐ข, ๐) โ โ ๐ . We assume that the cost function is defined over the entire domain: ๐ฅ(๐ข, ๐) = โซ ๐ฝ(๐ข(๐), ๐(๐))๐๐ = โฉ๐ฝ(๐ข, ๐)โช ฮฉ . However, in many applications the cost function is only defined at a finite set of points ๐ฬ ๐ โ ฮฉ : ๐ฅ(๐ข, ๐) = 12 โ|๐ข(๐ฬ ๐ ) โ ๐ขฬ ๐ | Where the ๐ขฬ ๐ โ โ ๐ (๐ = 1, โฏ , ๐) are the desired states. This is a common type of cost function for least square optimization and supervised (deep) learning. In a smooth setting where all functions are assumed to de differentiable optimization and learning algorithms rely heavily on the gradient of the cost function. Consequently, a lot of research in these fields is devoted to computing this gradient. In fact, it is the fundamental challenge. The research described in this paper is no exception! For example, both gradient descent and quasi-Newton iterative methods rely heavily on a gradient of the cost function. More precisely, given a cost function we are interested in computing the gradient of the cost function with respect to the controls: ๐ฟ๐ฅ = ๐๐ฅ๐๐. That is the holy grail we are after.
Notice that the โ ๐ฟ โ symbol is a short - hand for โ ๐ ๐๐โ โ not the Dirac -delta function: ๐ฟ๐ means a variation of ๐ with respect to the control ๐ . Constrained optimization problems like Eq. 7 can be transformed into unconstrained problems using the machinery of Lagrange multipliers. In the continuous setting one introduces a Lagrange multiplier function ๐(๐) . We then augment the cost function with a penalty term involving the multiplier and the constraint:
โ(๐ข, ๐, ๐) = ๐ฅ(๐ข, ๐) + โซ ๐(๐) โ ๐ธ(๐ข(๐), ๐(๐))๐๐ ฮฉ = โฉ๐ฝ(๐ข, ๐)โช + โฉ๐, ๐ธ(๐ข, ๐)โช. This is the less familiar continuous version of the Lagrangian. The necessary conditions for optimality are (where the derivatives are Frรฉchet): ๐โ๐๐ข = 0 ๐โ๐๐ = 0 and ๐โ๐๐ = 0.
From the first condition we get an adjoint equation for the multiplier (see Appendix A) ๐๐ธ๐๐ข) โ ๐ = โ ๐๐ฝ๐๐ข . (8) This equation is independent of the controls! This is the key reason why the adjoint method is so popular in optimization and machine learning. The consequence is that computing the gradient is no more costly then computing the function itself. The Lagrange multiplier is usually called the adjoint function in the optimization literature. Intuitively, the adjoint function models the sensitivity of the cost function with respect to the state independently of the controls. Once the adjoint function is computed we obtain the gradient of the cost function with respect to the controls as follows (see Appendix A) ๐๐ฅ๐๐ = โฉ๐, ๐๐ธ๐๐โช + โฉ๐๐ฝ๐๐โช. (9)
Computing the gradient of the cost function therefore involves two steps. The solution of the adjoint equation for ๐(๐) and the evaluation of the gradient. These equations are very general and can be applied to most optimization and machine learning problems. Next, we apply this methodology to the transport theory of light propagation.
4. Adjoint Method Applied to Transport Theory
We now combine the adjoint method with the transport equations. An example of a cost function in rendering is
๐ฝ(๐ฟ, ๐) = 12 โ|๐ผ ๐ (๐ฟ) โ ๐ผฬ ๐ | ๐ ๐ =1 + 12 ๐|๐| . (10) Where the sum is over the receivers and the ๐ผฬ ๐ are some target values and ๐ โฅ 0 models the smoothness of the control. In this case the gradient is: ๐๐ฝ๐๐ = โ(๐ผ ๐ (๐ฟ) โ ๐ผฬ ๐ ) ๐๐ผ ๐ ๐๐ ๐ ๐ ๐ =1 + ๐๐. Our method can of course handle more general cost functions. But it is helpful to hold this typical example in your mind. Why? Because we are really after computing the gradient of the radiance with respect to the controls. Let that sink in. From Eq. 4 we have that ๐๐ผ ๐ ๐๐ = โฉ๐๐ ๐๐ , ๐ฟ (๐) โช + โฉ๐ , ๐๐ฟ (๐) ๐๐ โช. In general, we assume that the transport operator and the emitters depend on the control function ๐(๐) . Consequently, our transport equation (Eq. 2) becomes:
๐ธ(๐ฟ, ๐) = โ๐ฟ + ๐ ๐ ๐ฟ + ๐ฟ = 0. here the subscript denotes dependence of a function/operator on the control ๐ not differentiation. Its differential with respect to the radiance is: ๐๐ธ๐๐ฟ = โ๐ + ๐ ๐ . And an equation for the adjoint function ๐(๐) follows from Eq. 8: โ๐๐ + (๐ ๐ ) โ ๐ = โ ๐๐ฝ๐๐ฟ . (11) This equation can be written using the Neumann series (Eq. 5) ๐ = (๐ ๐ ) โ ๐ . (12) Equation 12 is the main result of this paper . This equation is exactly the adjoint transport equation for the importance field with a different source term: ๐ = ๐๐ฝ๐๐ฟ . (13) And we have for the particular cost function given by Eq. 10 that ๐ = (๐ผ ๐ (๐ฟ) โ ๐ผฬ ๐ ) ๐๐ผ ๐ (๐ฟ)๐๐ฟ . Eq. 12 does not depend on the number of controls and is therefore as efficient to solve as the adjoint transport equation for the importance. It also does not need the computation of derivatives with respect to the controls. This is a direct consequence of the fact that the transport operator is linear with respect to the radiance. The same operator is used for the equation of the adjoint function ๐(๐) . While solving the adjoint through propagation we compute the gradients of the cost function sequentially with respect to the controls at each scatter/emitter from Eq. 9: ๐๐ฝ๐๐ = โฉ๐, (๐๐ ๐ ๐๐ ) ๐ฟ + ๐๐ฟ ๐๐ โช + ๐๐ฝ๐๐ . (14) Equation 14 is the second main result of this paper . This step requires the derivatives of the transport operator and the emitter with respect to the controls. These derivatives can be computed analytically or through automatic differentiation. . Implementation
First, we introduce the standard notations used in computer graphics: each ray is defined by a position ๐ฅ and a direction ๐ . Let ๐ be the surface normal at this point. The outgoing radiance ๐ฟ(๐ ๐๐ข๐ก ) due to an incoming radiance ๐ฟ(๐ ๐๐ ) is given by a B idirectional R eflectance D istribution F unction ( BRDF ): ๐ฟ(๐ ๐๐ข๐ก ) = ๐(๐, โ๐ ๐๐ , ๐ ๐๐ข๐ก )(๐ โ ๐ ๐๐ )๐ฟ(๐ ๐๐ ). Where ๐(๐, ๐ ๐๐ , ๐ ๐๐ข๐ก ) is the BRDF. The minus sign in front of the incoming direction ๐ ๐๐ is there because the BDRF is usually defined with both vectors pointing outwards from the surface. However, in path tracing the directions shown in Figure 4 are more natural. Similarly, the outgoing adjoint ๐(๐ฬ ๐๐ข๐ก ) in direction ๐ฬ ๐๐ข๐ก due to an incoming adjoint ๐(๐ฬ ๐๐ ) from a direction ๐ฬ ๐๐ is related via the adjoint ๐ โ of the BRDF: ๐(๐ฬ ๐๐ข๐ก ) = ๐ โ (๐, โ๐ฬ ๐๐ , ๐ฬ ๐๐ข๐ก )(๐ โ ๐ฬ ๐๐ )๐(๐ฬ ๐๐ ). ๐ ๐ Figure 3:
Geometry of a path. ๐ฅ ๐ฅ ๐ ๐ฅ ๐โ1 ๐ฅ ๐ฅ ๐ ๐ ๐ ๐ ๐ ๐ ๐โ1 ๐ ๐ โ๐ ๐โ1 ๐ โ๐ ๐ โ๐ ๐โ2 ๐ ๐โ1 โ๐ ๐โ1 ๐ ๐ฟ(๐ ๐๐ข๐ก ) ๐ฟ(๐ ๐๐ ) ๐ ๐๐ข๐ก ๐ ๐๐ ๐ ๐(๐ฬ ๐๐ข๐ก ) ๐(๐ฬ ๐๐ ) ๐ฬ ๐๐ข๐ก ๐ฬ ๐๐ Figure 4:
Definitions of the BRDF, the radiance and its adjoint. ๐ฅ ๐ฅ he situation is shown in the Figure 4. The angles for the radiance in a forward pass are related to the angles for the importance by the following relations: ๐ฬ ๐๐ข๐ก = โ๐ ๐๐ and ๐ฬ ๐๐ = โ๐ ๐๐ข๐ก . We can generate random paths in the environment as follows. We start at the receiver and then trace rays until we hit an emitter as shown in Figure 3. For each incoming ray we generate an outgoing ray randomly from a P robability D istribution F unction ( PDF ) ๐(๐) . Appendix B shows how to generate such rays. The random path so created is denoted by a sequence of points: ๐ฅ , ๐ฅ , โฏ , ๐ฅ ๐ with associated surface normals ๐ , ๐ , โฏ , ๐ ๐ as shown in Figure 5 above. The receiver is situated at the start point ๐ฅ and the emitter is located at the end point ๐ฅ ๐ . For each pair of points on the path we associate a unit direction: ๐ ๐ = ๐ฅ ๐+1 โ ๐ฅ ๐ โ๐ฅ ๐+1 โ ๐ฅ ๐ โ , ๐ = 0, โฏ , ๐ โ 1. We have by construction that ๐ ๐ โ ๐ ๐ > 0 and ๐ ๐ โ ๐ ๐โ1 < 0 . Since radiances and adjoints are constant along each ray we have that: ๐ฟ(๐ ๐ ) = ๐ฟ(โ๐ ๐ ) and ๐(๐ ๐ ) = ๐(โ๐ ๐ ). Our goal is to minimize a cost function which we assume for simplicity to only depend on the radiance at the emitter and a set of control variables ๐ : ๐ฝ โ ๐ฝ(๐ฟ(๐ ), ๐). Our goal is to compute the gradient ๐๐ฝ๐๐ using the adjoint method described above. We compute the radiance at the receiver in a forward pass. First, we initialize the radiance at the emitter:
๐ฟ(๐
๐โ1 ) = ๐ฟ
๐,๐ (โ๐
๐โ1 ). Where the ๐ subscript indicates that a function depends on the controls. The radiances along the path are then computed using the BRDF ๐ ๐,๐ and the PDF ๐ ๐,๐ of the surfaces: ๐ฟ(๐ ๐โ1 ) = ๐ ๐,๐ (๐ ๐ , ๐ ๐ , โ๐ ๐โ1 )(๐ ๐ โ ๐ ๐ )๐ฟ(๐ ๐ )/๐ ๐,๐ (๐ ๐ ) for ๐ = ๐ โ 1, โฏ ,1. This gives us the incoming radiance
๐ฟ(๐ ) at the receiver. From this radiance we can compute the cost function ๐ฝ(๐ฟ(๐ ), ๐) and its adjoint ๐(๐ ) = ๐๐ฝ๐๐ฟ (๐ฟ(๐ ), ๐). Then we proceed with a backward pass to compute the adjoints starting with
๐(๐ ) : ๐(๐ ๐ ) = ๐ ๐,๐โ (๐ ๐ , โ๐ ๐โ1 , ๐ ๐ )(โ๐ ๐ โ ๐ ๐โ1 )๐(๐ ๐โ1 )/๐ ๐,๐ (๐ ๐โ1 ) for ๐ = 1, โฏ , ๐ โ 1. he gradient of the cost function ๐๐ฝ๐๐ is computed alongside with the adjoint as follows. We initially set this gradient to zero ๐๐ฝ๐๐ = 0 and then incrementally updated it at each step of the backward pass: ๐๐ฝ๐๐ += (๐(๐ ๐โ1 ) โ ๐ฟ(๐ ๐ )) ๐๐ ๐,๐ ๐๐ . Although this seems like an expensive step when there are many controls. We notice that in general only the subset of controls that the BRDF ๐ ๐,๐ depends on must be updated. In practice this subset is much smaller than the total number of controls: the entire control vector is rarely updated. We finish the backward trace with an update of the gradient at the emitter: ๐๐ฝ๐๐ += (๐(๐ ๐โ1 ) โ ๐๐ฟ
๐,๐ ๐๐ ).
The pair (๐ฝ, ๐๐ฝ๐๐ ) can then be fed to an optimizer to return a new set of controls ๐ and the whole process is iterated. As a simple example we consider the diffuse Lambertian BRDF: ๐ ๐ (๐, ๐ , ๐ ) = ๐ ๐ , a simple emitter: ๐ฟ ๐ (๐) = ๐ ๐ธ ๐ and a PDF that generates cosine weighted random samples: ๐(๐) = cos ๐๐ with ๐ = (๐, ๐). The control is therefore a -vector: ๐ = (๐ , ๐ ) . We assume a very simple cost function: ๐ฝ(๐ฟ) = 12 (๐ฟ โ ๐ฟฬ) , where ๐ฟฬ is the desired output for the output radiance. The adjoint at the receiver is then given by: See also Appendix C for details on the Cook-Torrance BRDF.
๐ฝ๐๐ฟ = ๐ฟ โ ๐ฟฬ.
The derivatives of the BRDF divided by the PDF and of the emission function are: ๐(๐ ๐ /๐)๐๐ = (10) and ๐๐ฟ ๐ ๐๐ = (01) ๐ธ ๐ . This results in the following pseudo-code. Notice that we have (๐ ๐ โ๐ ๐ )๐(๐ ๐ ) = ๐ and โ(๐ ๐ โ๐ ๐โ1 )๐(๐ ๐โ1 ) = ๐ because our random directions are assumed to be cosine weighted. This leads to the following simple path tracing algorithm. Forward pass : ๐ฟ ๐โ1 = ๐ ๐ธ ๐ ๐๐จ๐ซ ๐ = ๐ โ 1, โฏ ,1 ๐๐จ {๐ฟ ๐โ1 = ๐ ๐ฟ ๐ Backward pass : ๐๐ฝ๐๐ = (00) ๐ = ๐ฟ โ ๐ฟฬ ๐๐จ๐ซ ๐ = 1, โฏ , ๐ โ 1 ๐๐จ { ๐ ๐ = ๐ ๐ ๐โ1 ๐๐ฝ๐๐ += (๐ ๐โ1 โ ๐ฟ ๐ ) ๐๐ฝ๐๐ += (๐ ๐โ1 โ ๐ธ ๐ ) It doesnโt get simpler than this!
6. Results
To validate our model, we have implemented a simple โVanillaโ text -book style path tracer. An outline of the implementation in pseudo code is given in Appendix D. We decided to implement our own path tracer rather than modifying an existing one so that we could focus on the core algorithm not on compatibility and build issues. It also offers more flexibility in debugging and visualizing our results. Our path tracer only handles ray/sphere and ray/quad intersections. Consequently, our scenes are restricted to quads and spheres. Also, we consider only three types of materials. M1 : A pure emitter defined by its emission strength ( ๐ ). M2 : A specular Phong-Blinn BRDF / PDF modelled by an ambient term ( ๐ ), a diffuse term ( ๐ ), a specular term ( ๐ ) and finally an exponent ( ๐ ). : A diffuse BRDF / PDF defined by an ambient term ( ๐ ) and a diffuse term ( ๐ ). The associated controls are given in parenthesis, there are seven of them which we represent by a vector ๐ = (๐ , ๐ , ๐ , ๐ , ๐ , ๐ , ๐ ). Our goal is to compute the gradient of the cost function with respect to these controls. To v alidate our model, we consider a simple โCornell Boxโ -type scene. The scene is comprised of a box with six quad walls having material M3 , one quad light source at the top with material M1 and one sphere in the center with material M2 . The receiver is a pinhole camera defined by a screen located inside the box. Figure 5 depicts the scene in our visualizer.
Figure 5.
Two snapshots of our simple Path tracer. Left visualization of a single path and right a view of a rendered image and some selected paths.
Our visualizer can depict any quantity we like for debugging purposes. On the left of Figure 5 we show a single path, while on the right we show a subset of paths and the resulting rendered image. The ability to trace a single path was very useful in comparing our model with estimations obtained from a finite difference approximation of the gradients with respect to the controls. The approximation is obtained by choosing a small number ๐ and computing ๐๐ฝ๐๐ ๐ โ ๐ฝ(๐ , โฏ , ๐ ๐ + ๐, โฏ , ๐ ๐พ ) โ ๐ฝ(๐ , โฏ , ๐ ๐ โ ๐, โฏ , ๐ ๐พ )2๐ for each control. Therefore, we must run the path tracer a number of times where ๐พ = 7 in our example. The right choice for the value of the perturbation ๐ is tricky. When ๐ is too big the estimate is inaccurate and when it is too small, we loose numerical precision because we are subtracting two earby numbers. This is another strong argument for computing gradients using adjoints that do not depend on an arbitrary small parameter. In fact our adjoints are accurate up to machine precision. We found a sweet spot by varying the perturbation: ๐ = 0.1, 0.0001, โฆ In Figure 6 we show these estimates for different ๐ along with the results from our method at the top. We observe good agreement. Next, we show renderings of the final image (top left) and the gradient for each of the seven controls. Figure 7 computes paths without importance sampling while in Figure 8 the paths are computed with a Phong-Blinn importance sampling as described in Appendix B. Figure 7.
Renderings of the image (top/left) and the gradients with respect to the controls with cosine-weighted uniform sampling.
Figure 6.
Comparison of the gradients obtained with our method (top/left) and finite difference estimates for different values of ๐. J = 3.3162e-06 dJ/dcontrol = ---------- Numerical gradients ---------
EPS = 0.1 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.411622 - 0.691398 ) / 2*EPS = 3.601122 ( 6.075682 - 0.053732 ) / 2*EPS = 30.109749 ( 1.013496 - 0.986575 ) / 2*EPS = 0.134604 ( 2.250000 - 0.250000 ) / 2*EPS = 10.000000 ( 3.160494 - 0.197531 ) / 2*EPS = 14.814816
EPS = 0.0001 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000357 - 0.999644 ) / 2*EPS = 3.565848 ( 1.002218 - 0.997785 ) / 2*EPS = 22.164581 ( 1.000014 - 0.999987 ) / 2*EPS = 0.134408 ( 1.001000 - 0.999000 ) / 2*EPS = 10.000169 ( 1.001334 - 0.998668 ) / 2*EPS = 13.332068
EPS = 1e-07 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000001 - 1.000000 ) / 2*EPS = 4.768371 ( 1.000002 - 0.999998 ) / 2*EPS = 19.669531 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000001 - 0.999999 ) / 2*EPS = 8.642673 ( 1.000001 - 0.999999 ) / 2*EPS = 13.411044
EPS = 1e-10 numerical dJ/dcost: ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000 ( 1.000000 - 1.000000 ) / 2*EPS = 0.000000
Figure 8.
Renderings of the image (top/left) and the gradients with respect to the controls with Phong-Blinn importance sampling.
7. Conclusions and Future Work
In this work we have presented a novel general method to compute gradients of radiance in the context of transport theory. To achieve this, we have derived an equation for the adjoint/Jacobian of the cost function from the general theory of adjoints in optimization theory. We have shown that this can easily be implemented in a simple home brewed path tracer. The results show good agreement with a finite difference computation of the cost function. In the future we want to extend the theory to more general settings including volumetric scatterers and perhaps include diffraction effects. In these cases, we have to deal with an integro-integral equation and a Kirchhoff integral, respectively. Also, we want to implement this method in various existing GPU-based path tracers like Fermat.
References [1] M. B. Giles and N. A. Pierce.
An Introduction to the Adjoint Method to Design . Flow, Turbulence and Combustion, 65:393-415, 2000. [2] Andreas Griewank and Andrea Walther.
Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation . Society for Industrial and Applied Mathematic; Second edition (2008-09-26);
Fluid Control Using the Adjoint Method , ACM Transactions on Graphics (SIGGRAPH 2004), Volume 23, Issue 3, August 2004. 4] L.S. Pontryagin, V.G. Boltyanskie, Karreman Mathematics Research Collection, L.W. Neustadt, K.N. Trirogoff, R.V. Gamkrelidze, and E.F. Misenko.
The Mathematical Theory of Optimal Processes . Interscience publishers. Interscience Publishers, 1962. [5] Eric Veach,
Robust Monte Carlo Methods for Light Transport Simulation . Ph.D. dissertation, Stanford University, December 1997. Available at http://graphics.stanford.edu/papers/veach_thesis/.
Appendices
A. Adjoint Equation
In this appendix we derive the adjoint equation (Eq. 8) and an expression for the derivative of the cost function with respect to the controls (Eq. 9). Recall that the augmented Lagrangian is defined as
โ(๐ข, ๐, ๐) = โฉ๐ฝ(๐ข, ๐)โช + โฉ๐, ๐ธ(๐ข, ๐)โช.
Stationarity with respect to the state implies
From the same equation and using the definition of the adjoint we have that โ ๐, ๐ฟ๐ขโช = โฉ ๐๐ฝ๐๐ข + (๐๐ธ๐๐ข) โ ๐, ๐ฟ๐ขโช. Since this equation must hold for all functions ๐ฟ๐ข we get Eq. 8: (๐๐ธ๐๐ข) โ ๐ = โ ๐๐ฝ๐๐ข. Now consider the second condition:
This is simply the equation that our state must satisfy the constraint and implies that its differential vanishes
With these results we can compute the gradient of the cost function ๐ฟ๐ฅ = โฉ ๐๐ฝ๐๐ข , ๐ฟ๐ขโช + โฉ๐๐ฝ๐๐โช =โ
๐ด.1 โ โฉ๐, ๐๐ธ๐๐ข ๐ฟ๐ขโช + โฉ๐๐ฝ๐๐โช =โ
๐ด.2 โฉ๐, ๐๐ธ๐๐โช + โฉ๐๐ฝ๐๐โช. his is Eq. 9.
B. Sampling from a PDF
For simplicity we assume that the PDF is isotropic and thus only depends on the elevation angle: ๐(๐) = 2๐ ๐(๐).
We then generate a random sample ๐ from this distribution using the C umulative P robability D istribution ( CDF ): ๐(๐) = Prob(๐ก โค ๐) = 2๐ โซ ๐(๐ก) sin ๐ก ๐๐ก. ๐0 This is a mapping from elevation angles in [0, ๐2 ] to the unit interval [0,1] . Using the inverse of the CDF we can directly generate a ๐ -distributed random angle ๐ from a uniformly distributed ๐ข~๐(0,1) : ๐ = ๐ โ1 (๐ข). As an example, consider the cosine-weighted Phong-Blinn PDF depending on an exponent parameter ๐ผ โฅ 0 (for ๐ผ = 0 one obtains a cosine weighted Lambertian PDF). This PDF handles all cases considered in this paper: ๐ ๐ผ (๐) = ๐ผ + 22๐ (cos ๐) ๐ผ+1 . This gives the following CDF: ๐ ๐ผ (๐) = (๐ผ + 2) โซ (cos ๐ก) ๐ผ+1 sin ๐ก ๐๐ก ๐0 = 1 โ (cos ๐) ๐ผ+2 . With an inverse equal to: ๐ = ฮฆ ๐ผ (๐ข) = ๐ ๐ผโ1 (๐ข) = acos ((1 โ ๐ข) ). These results give us a recipe to generate random vectors for our path tracer from the PDFs given above. The general algorithm works as follows. To generate a random vector ๐ in a world coordinate frame (๐, ๐, ๐) with the ๐ -vector being the normal direction: Generate : ๐ข , ๐ข ~๐(0,1) . Set : ๐ = ฮฆ ๐ผ (๐ข ) and ๐ = 2๐๐ข . Random vector : ๐ฬ = (๐ฬ ๐ฅ , ๐ฬ ๐ฆ , ๐ฬ ๐ง ) = (cos ๐ cos ๐ , sin ๐ cos ๐ , sin ๐) . Convert to world space : ๐ = ๐ฬ ๐ฅ ๐ + ๐ฬ ๐ฆ ๐ + ๐ฬ ๐ง ๐ . . The Cook-Torrance and Phong-Blinn BRDF In this appendix we give all the details for the Phong-Blinn model. We will use a variant of the Cook-Torrance BRDF to simplify some of the resulting expressions. The goal here is not to faithfully reproduce the original Cook-Torrance BRDF but to show the details on how to implement such a model in our framework. The Cook-Torrance BRDF is usually defined as follows: ๐(๐, ๐ ๐ , ๐ ๐ ) = ๐น(๐, ๐ ๐ , ๐ ๐ )๐บ(๐, ๐ ๐ , ๐ ๐ )๐ท(๐ ๐ ), Where ๐น account for Fresnel effects which we will set equal to one. ๐บ is a geometric factor that accounts for shadowing and the spread of the solid angles. We use the following expression: ๐บ(๐, ๐ ๐ , ๐ ๐ ) = (๐ ๐ โ ๐)4(๐ ๐ โ ๐ ๐ )(๐ ๐ โ ๐) = cos ๐ ๐ ๐๐ cos ๐ ๐ . Where ๐ ๐ is the vector halfway between the incoming and the outgoing directions: ๐ ๐ = ๐ ๐ + ๐ ๐ โ๐ ๐ + ๐ ๐ โ. This vector is the normal that perfectly reflects the incoming light into the outgoing direction and is distributed according to a N ormal D ensity F unction ( NDF ) ๐ท(๐ ๐ ) . In this appendix we will consider the Phong-Blinn NDF ( ๐ ๐ = (๐ ๐ , ๐ ๐ ) ): ๐ท ๐๐ต,๐ผ (๐ ๐ , ๐ ๐ ) = ๐ผ + 22๐ (cos ๐ ๐ ) ๐ผ sin ๐ ๐ . Notice that this distribution is not normalized since the right normalizing term should be (๐ผ + 1)/2๐ . Since we are not sampling from this distribution but the one given below this does not matter. This choice does however simplify the expressions below. The incoming direction is a function of the normal and the outgoing direction through the reflection law: ๐ ๐ = ๐ ๐ (๐ ๐ ) = 2(๐ ๐ โ ๐ ๐ )๐ ๐ โ ๐ ๐ . And the corresponding differentials satisfy: ๐๐ ๐ = ๐๐ ๐ (๐ ๐ ) = 4(๐ ๐ โ ๐ ๐ )๐๐ ๐ = 4 cos ๐ ๐๐ sin ๐ ๐ ๐๐ ๐ ๐๐ ๐ . The PDF for the cosine weighted Phong-Blinn model is given by: ๐ ๐ผ (๐ ๐ , ๐ ๐ ) = ๐ผ + 22๐ (cos ๐ ๐ ) ๐ผ+1 sin ๐ ๐ . We point out that other choices for the NDF such as Beckmann or GGX are currently more popular. The same methodology described here applies to those models as well. his distribution is normalized, and we can therefore use it to sample random normals using the procedure described in Appendix B. In importance path tracing we evaluate the outgoing radiance as follows: ๐๐ฟ(๐ ๐ ) = ๐ ๐ผ (๐, ๐ ๐ , ๐ ๐ )๐ ๐ผ (๐ ๐ ) (๐ ๐ โ ๐)๐ฟ(๐ ๐ )๐๐ ๐ . We can rewrite this in terms of the angles (๐ ๐ , ๐ ๐ ) as follows where we also use all the expressions derived above: ๐๐ฟ(๐ ๐ , ๐ ๐ ) = ๐ ๐ผ (๐ ๐ , ๐ ๐ )๐ ๐ผ (๐ ๐ , ๐ ๐ ) cos ๐ ๐ ๐๐ sin ๐ ๐ ๐ฟ(๐ ๐ (๐ ๐ ))๐๐ ๐ ๐๐ ๐ = (๐ผ + 22๐ (cos ๐ ๐ ) ๐ผ sin ๐ ๐ cos ๐ ๐ ๐๐ cos ๐ ๐ ) ร cos ๐ ๐ ๐๐ sin ๐ ๐ ๐ฟ(๐ ๐ (๐ ๐ ))๐๐ ๐ ๐๐ ๐ (๐ผ + 22๐ (cos ๐ ๐ ) ๐ผ+1 sin ๐ ๐ )= ๐ฟ(๐ ๐ (๐ ๐ )) sin ๐ ๐ ๐๐ ๐ ๐๐ ๐ . After grand eliminations, we get the simple update rule: ๐๐ฟ(๐ ๐ ) = ๐ฟ(๐ ๐ (๐ ๐ )) sin ๐ ๐ (๐ผ) ๐๐ ๐ ๐๐ ๐ . To update the gradient of the cost function with respect to the exponent parameter ๐ผ we need to differentiate this expression. The sine term depends on ๐ผ through the sampling procedure (see Appendix B): ๐ (๐ผ) = sin ๐ ๐ (๐ผ) = โ1 โ (1 โ ๐ข) . And its derivative is: ๐๐ ๐๐ผ (๐ผ) = (1 โ ๐ข) log(1 โ ๐ข)(๐ผ + 2) โ1 โ (1 โ ๐ข) = cos ๐ ๐ log(cos ๐ผ+2 ๐ ๐ )(๐ผ + 2) sin ๐ ๐ = cos ๐ ๐ log(cos ๐ ๐ )(๐ผ + 2) sin ๐ ๐ . D. A Simple Path Tracer
In this Appendix we describe our simple path tracer in pseudo code. Instead of using an existing path tracer and modifying it, we decided to write our own. Why? It is general, simple and self contained with no external dependencies and no complicated data structures and optimizations. At the highest level it works like this. func do_path_tracing(do_gradient) if do_gradient then cost = 0 clear_gradients() end if for each image pixel do pixel_radiance = 0 for N samples do create initial ray R make_path(R) radiance = forward_pass() / N pixel_radiance += radiance if do_gradient then cost += cost(radiance) adjoint = dcost_drad(radiance) backward_path(adjoint) end if end if end for end When do_gradient==false we just perform a standard path trace. This is how it works. First, we create a path by intersecting the ray with the scene and spawning reflection vectors by sampling the corresponding PDFs. Then we traverse the path from the emitters to the receivers in a forward pass accumulating the radiance to generate a final radiance. To compute the gradients, we first compute the adjoint which is the Jacobian of the cost with respect to the radiance. Then, we perform a backward pass updating the adjoint and updating the gradient of each active control. The first step builds a path recursively. func make_path(R) hit = int_scene(R) if hit != and R->depth < MAX_DEPTH then u = unif(0,1) if u >= hit->absorb then dir = hit-> sample (hit->normal,R->dir) R_new = make_ray(R->depth+1, hit->point, dir) make_path(R_new) end if path[R->depth]->{hit, dir_i, dir_o} = {hit, R->dir, dir} end if end Once the path is created, we perform a forward pass. func forward_pass() N = path length hit = path[N-1]->hit radiance = hit->emission / hit->absorb for k=N-2 to k>=0 do {hit, dir_i, dir_o} = path[k]->{hit, dir_i, dir_o} path[k]->radiance = radiance radiance = hit-> BSDF_D_PDF (hit->normal, dir_o, -dir_i) * radiance end for return radiance end
After computing the cost and the adjoint we then perform a backward pass: func backward_pass(adjoint) N = path length for k=0 to N-2 do {hit, dir_i, dir_o, radiance} = path[k]->{hit, dir_i, dir_o, radiance} radiance /= 1 - hit->absorb hit-> update_BSDF_D_PDF_gradients (hit->normal, -dir_i, dir_o, radiance, adjoint) adjoint = hit-> BSDF_D_PDF (hit->normal, -dir_i, dir_o) * adjoint end for hit = path[N-1]->hit radiance = hit->emission / hit->absorb hit->emission_gradient += radiance * adjoint adjoint = 0 end
The routines in italicized bold face must be provided for a particular BSDF/PDF. Here are the implementations for Lambert func sample (normal, dir) u1, u2 = unif(0,1) c = sqrt(1-u1) s = sqrt(u1) (x,y,z) = (cos(2*PI*u2)*c, sin(2*PI*u2)*c, s) make_frame(normal, X, Y, Z) return x*X + y*Y + z*Z end func
BSDF_D_PDF (normal, dir_i, dir_o) return diff_col end func update_BSDF_D_PDF_gradients (normal, dir_i, dir_o, radiance, adjoint) diff_gradient += radiance*adjoint end
And for Phong-Blinn func sample (normal, dir) u1, u2 = unif(0,1) t = pow(u1,2/(exponent+2)) c = sqrt(1-t) s = sqrt(t) (x, y, z) = (cos(2*PI*u2)*c, sin(2*PI*u2)*c, s) make_frame(normal, X, Y, Z) N = x*X + y*Y + z*Z if dot(N,dir) < 0 then N = 2*dot(N,normal)*normal โ N end if return end unc BSDF_D_PDF (normal, dir_i, dir_o) dir_m = normalize(dir_i + dir_m) cos_m = dir_m*normal cos2_m = cos_m*cos_m sin_m = sqrt(1-dir_m*dir_m) return spec_col * sin_m end func update_BSDF_D_PDF_gradients (normal, dir_i, dir_o, radiance, adjoint) dir_m = normalize(dir_i + dir_m) cos_m = dir_m*normal cos2_m = cos_m*cos_m sin_m = sqrt(1-cos2_m) dsin_m = cos2_m * log(cos_m) / sin_m / (alpha+2) spec_gradient += sin_m * (radiance*adjoint) alpha_gradient += spec_col * dsin_m * (radiance*adjoint) end
The function make_frame() generates an orthonormal frame from the normal: func make_frame(N, X, Y, Z) Z = N X = Z + (0.1, 0.2, 0.3) Y = normalize(cross(Z,X)) X = normalize(cross(Y,Z)) endend