Light Source Estimation with Analytical Path-tracing
LLight Source Estimation with Analytical Path-tracing
Mike Kasper, Nima Keivan, Gabe Sibley, Christoffer Heckman ∗ Abstract — We present a novel algorithm for light sourceestimation in scenes reconstructed with a RGB-D camera basedon an analytically-derived formulation of path-tracing. Ouralgorithm traces the reconstructed scene with a custom path-tracer and computes the analytical derivatives of the lighttransport equation from principles in optics. These derivativesare then used to perform gradient descent, minimizing thephotometric error between one or more captured referenceimages and renders of our current lighting estimation usingan environment map parameterization for light sources. Weshow that our approach of modeling all light sources aspoints at infinity approximates lights located near the scenewith surprising accuracy. Due to the analytical formulationof derivatives, optimization to the solution is considerablyaccelerated. We verify our algorithm using both real andsynthetic data.
I. I
NTRODUCTION
Computer vision is often referred to as the “inversegraphics” problem. This is because many of the equationsand relations used in computer vision find their roots inthe understanding of image formation and light interaction.However the complete process of image formation is mainlyignored in most applications of computer vision. In thecase of localization and mapping, brightness constancy isassumed from multiple viewpoints and robust estimation isused to reduce or ignore the influence of any non-cooperativeobservations. While this approach works well for surfacesexhibiting Lambertian reflectivity, specular and transparentsurfaces are often treated as outliers. This assumptions is alsothe cause of the complexity of localization from a completelydense 3D map. While brightness constancy may apply fromsmall baselines, it generally does not in the case of widelyvarying baselines.These approximations inhibit the estimation of manyquantities of interest, such as light position, sensor re-sponse curves, and surface properties of in-scene objects.The light position estimation problem itself has garneredsignificant interest in the autonomous robotics community.Dynamic shadow effects for instance stymie feature- anddeep learning-based algorithms for place and object recog-nition. [4] proposed a filter-based approach to this problemwhich removes shadows from images, but struggles withshadows containing reflected light and artificial light sources,such as lamps or headlights. The assumption of constantillumination is critical to tracking applications [22], whichcould be potentially improved through the consideration of
This work was graciously supported by Toyota Motor Corporation.All listed authors are with the Autonomous Robotics and PerceptionGroup at the University of Colorado, Boulder. ∗ Corresponding author. E-mail: christoffer.heckman atcolorado.edu
Fig. 1:
Left : captured reference image,
Right : render of estimated lightingconditions found using our algorithm irregular illumination imbued by individual light sources aswell.A number of previous works have handled in-scene lightsource position estimation. For instance, Elastic Fusion [22]introduces a method for this via ray casting and tessellatingthe space with potential light source voxels with a mergingfunction. This method precludes the estimation of surfacereflectancies however, as ray casting requires that they areprovided at the outset. As an alternative to using potentiallight emitting voxels, an “environment map” may instead beemployed to represent light sources. This approach imposesan encapsulating 2-manifold (typically a hemisphere) aroundthe 3-dimensional region of interest and parametrizes acoordinate system on the manifold that act as inwardly-directed point light sources [9, 13].A number of approaches have been taken on estimatingthe effect of in-scene lighting with considerable success. Forinstance, [14] addresses light source estimation by layingout a light-field which can then be used for casting shadows.The light sources in this case do not follow a physicallyderived model, but recreate in-scene lighting with impressiveaccuracy. A different approach is applied by [12], wherea learned set of radiance transfer functions are appliedfor generating plausible lighting incident on a human face.Meanwhile, there has been steady development in approachesto estimating light effects for out-of-scene sources, e.g.[2, 8, 21, 24]. These approaches may be used to e.g. castartificial shadows for the purposes of augmented reality, butdo not admit the arbitrary estimation of in-scene parameters.To address these limitations, optical path tracing may beemployed to both estimate scene parameters as well as forphotorealistic rendering, including the effects of lens flares,specular highlights, shadows, and other visual phenomena.This approach applies a generative model for light interac-tion with a scene using methods from optics, and admitsthe selective approximation as need warrants. Path tracinghowever has proven to be a prohibitively computationally a r X i v : . [ c s . C V ] J a n xpensive solution, as path tracing is a nonlinear operationwith a thousands of optimization parameters.The method described in this paper eschews the useof finite differences in favor of analytical derivatives ofthe light transport equation. We use an environment mapparametrization of light sources and employ robust nonlin-ear optimization in order to estimate the position of lightsources in a scene. Our process is naturally extensible to theestimation of scene properties including surface reflectivity.We give a description of our method in Sections II-V. Ourresults are provided in Section VI and discussed in SectionVII. We draw conclusions from our work in Section VIII.II. O VERVIEW
Physically-based rendering refers to the process of imageformation that remains as faithful to real-world optics aspossible. This is achieved through accurately modeling theinteraction of light with the surfaces and volumes in ourscene. Such rendering engines are often referred to as path-tracers, as they simulate the path an emitted ray of light takesas it traverses the scene before eventually being captured byour synthetic image sensor. As the vast majority of light inthe scene will never actually reach the image sensor, thisproblem is typically inverted for the sake of computationalsimplicity. So we instead trace rays of light from the imagesensor back to the light source. In order to synthesize high-fidelity images with little to no noise, hundreds or thousandsof rays must be traced for every pixel. These general conceptsare illustrated in Figure 2.The most important concept to take away from all of this,in regards to the work presented in this paper, is that whilethis process requires tracing thousands of rays, intersectingthem with the scene geometry at each bounce, computingthe material properties and angles of incidence at each pointof intersection, as the rays make their way towards a lightsource, all of the geometry, material properties, and mathcompute a single coefficient which denotes the amount oflight from a given light source that is stored in a given pixel.This idea sits at the heart of our algorithm.We leverage this fact in our formulation of the lightestimation problem. A custom path-tracer generates thesecoefficients for each pixel-light pair. We use these valuesto compose a linear system of equations that can be solvedusing standard optimization techniques. The power of thisapproach is that not only can it be gracefully extended tohandle any optical phenomema that could be present in ourobserved scene, it can also be rewritten to solve for differentcomponents of image formation.III. S
CENE M ODEL
We now describe the input required by our algorithm forperforming path-tracing and estimating the lighting condi-tions in the observed scene.
A. 3D Geometry
In order to properly simulate how rays emitted from lightsources bounce off surfaces before reaching our camera we
Fig. 2: Path-tracing models how rays of light traverse a given scene.With each collision with the scene’s geometry, information about surfacematerials, angle of incidence, and location of the light source are used tocompute the amount of light reaching the synthetic camera sensor. first require a 3D geometric representation of the scene.Our implementation uses a triangle mesh with correspondingsurface normals as seen in Figure 3a. We capture sucha mesh using a Asus Xtion Pro Live and the InfiniTam3D reconstruction framework [10]. The more complete thisreconstruction is the better we can recreate shadows andsimulate how light bounces around the surfaces of the scene.However we have observed that even largely incompletereconstructions still afford accurate light source estimation.
B. Surface Albedos
To render a RGB image of our current lighting estimationwe must have an albedo associated with each vertex inour mesh. The albedo describes the underlying color ofan object at a given point, void of shadows or any othershading information. Figure 3b illustrates this concept. Theproblem of separating albedos and shading information foundin images, often referred to as intrinsic image decomposition ,is the subject of a rich field of ongoing research [3, 5, 7].Given a decomposed albedo image, we can map the albedosonto the surface of the mesh using existing color mappingtechniques [6, 23]. However, to obviate this challenge wecurrently assume albedo associations are known, althoughthis knowledge need not be perfectly accurate.
C. Reference Images
Our optimization works to minimize the photometric errorbetween renders of our current lighting estimation and oneor more reference images. To avoid penalizing differencesresulting from an incomplete geometric reconstruction, wemask the reference images such that pixels corresponding toholes in our mesh are ignored, as seen in Figure 3c. In orderto render synthetic images that can be compared directly withthe captured reference images we must know the 6-DOF poseand intrinsics of the camera used. We calibrate the Xtion ProLive camera using the Calibu calibration framework [1]. Thiscalibration provides the camera intrinsics matrix, distortionparameters, and the IR-to-RGB camera transform.The pose of the IR camera is estimated as InfiniTamreconstructs the scene geometry [10]. We then compute thepose of the RGB camera using the IR-to-RGB camera trans-form previously estimated. As we are directly comparing oursynthetic images with the captured reference images, it isalso critical that we account for the camera’s response curve a) (b) (c)
Fig. 3: Input to our algorithm consists of (a) 3D geometry, (b) surfacealbedos, and (c) one or more masked reference images with correspondingcamera intrinsics and 6-DOF pose. and vignetting. We estimate the camera’s response curvefunctions using [16]. These response curves are then used toconvert captured image intensities to irradiance values. Wethen account for the camera’s vignetting using [11]. Whilemultiple reference images can be used in our optimizationwe find a single well-placed image is often sufficient toaccurately estimate the lighting conditions.
D. Environment Light
In this work we model light using an environment map[9, 13, 19]. Instead of sampling points in 3D space, en-vironment map lighting admits directional sampling. Thisrepresentation works best when approximating lights locatedfurther from the observed scene. While many works haveconsidered in-scene lighting examples [12, 14], we insteadfocus on out-of-scene sources [2, 8, 21, 24]. To computethe direct incident radiance from our environment map L d arriving at a point p we trace a ray with origin p in somedirection ω . If the ray is unobstructed by the scene geometry,point p will receive the full radiance traveling along ω asdetermined by the environment map.To compute the radiance emitted by the environment mapalong a given direction, we first discretize a unit sphere intoa finite number of uniformly spaced points, with each pointrepresenting a direction that can be sampled. We performa similar discretization as described in [9] over an entiresphere. The resolution of this discretization is indicated bythe number of desired rings. The spacing of points aroundeach ring is computed to be as close to the inter-ringspacing as possible, as seen in Figure 4. When tracing a rayalong a given direction we determine the nearest-neighbordirection from the discretized environment map and returnits associated RGB value λ as the emitted radiance.IV. L IGHT T RANSPORT E QUATION
In this section we provide a brief overview of the lighttransport equation (LTE), so that the reader may betterunderstand how we perform our light source estimation.Intuitively, the LTE describes how radiance emitted from alight source is distributed throughout a scene. Formally, wecompute the exitant radiance L o leaving a point p in direction ω o as: L o (p , ω o ) = (cid:90) H f (p , ω o , ω i ) L i (p , ω i ) | cos θ i | d ω i (1) Fig. 4:
Left : Top-down and
Right : side view of our environment mapdiscretization. The light depicted here consists of 21 rings and 522 totaldiscrete directions.
This integral evaluates the amount of incident radiance L i arriving at p over the unit hemisphere H oriented withthe surface normal found at p . The function f denotes the bidirectional reflectance distribution function (BRDF) foundat point p . The BRDF defines how much of the incidentradiance arriving at p along ω i is reflected in direction ω o .Finally, θ i is the angle between ω i and the surface normalfound at p . Figure 5 illustrates these relationships. UsingMonte Carlo integration we can rewrite Eq. (1) as the finitesum: L o (p , ω o ) = 1 N N (cid:88) i =1 f (p , ω o , ω i ) L i (p , ω i ) | cos θ i | p ( ω i ) (2)where N is the number of directions ω i sampled from the dis-tribution described by the probability density function (PDF) p . The incident radiance L i represents both the radiancecoming directly from our light source, which we denoteby L d , and the radiance reflected off surrounding surfaces,which we denote by L r . Given the recursive nature of path-tracing, we can effectively rewrite the incident radiance L r as exitant radiance L o : L r (p , ω ) = L o (p (cid:48) , ω ) (3)where p (cid:48) is the point where a ray leaving from p in direction ω first intersects the scene. As we recursively bounce raysaround our scene the BRDF, PDF and cos θ terms of theLTE are compounded. We refer to this product as throughput .Formally, the throughput T of the i th point p i in our currentpath is defined as: T (p i ) = i (cid:89) j =1 f (p j , ω j − , ω j ) | cos θ j | p ( ω j ) (4)With this definition of throughput, we can now define L d as: L d (p , ω i ) = V (p , ω i ) T (p) λ i (5)where λ i is the estimated radiance for the direction in ourenvironment map that is closest to ω i . The visibility function V evaluates to 1 if a ray leaving from point p in direction ω i is not obstructed by the scene geometry, and 0 otherwise.To compute the final pixel intensity I , we integrate theintensities of all rays M arriving at the corresponding pointon our synthetic sensor (of the form of Eq. (2)). Using MonteCarlo integration we can evaluate this with the finite sum: p ω o ω i θ i Fig. 5: Components of a single ray bounce in the LTE. p is the point ofintersection, ω o is the direction of exitant radiance L o , ω i is the directionof incident radiance L i , n is the surface normal found at p , and θ i is theangle between n and ω i . I = 1 M M (cid:88) i =1 L o (p i , ω i ) p ( ω i ) (6)where p i refers to the point where a ray originating from oursensor and traveling along ω i first intersects with the scene.For more information on path-tracing and the LTE see [18].V. L IGHT S OURCE E STIMATION
In this section we present our algorithm for light sourceestimation. We construct the optimization problem by firstinitializing all lighting parameters to be near zero, resultingin a completely dark scene. As light parameters are uniformlyinitialized, each light starts with the same probability ofbeing sampled.
A. Objective
Our objective function minimizes the photometric errorbetween the set of original reference image { I o } and a setof corresponding reconstructed images { I r } , which depictour current lighting estimation. So the photometric error E p can be defined as: E p = (cid:88) i (cid:88) p ∈ P i ( I o,i ( p ) − I r,i ( p )) (7)Given the prevalence of large discrepancies caused byan imperfect scene reconstruction or under-sampling, weemploy the Cauchy robust norm on the raw photometric errorvalues. Due to the potential over-parameterization of lighting,an activation penalty function is used as an additional cost.Each light direction λ i has an associated activation penalty Φ i defined as: Φ i = α log(1 + β n (cid:88) c =1 λ i,c ) (8)where α and β are constant weights and n is the numberof channels in our images. This logarithmic cost penalizesthe initial activation of a light direction but then plateaus asthe intensity of light increases. Increasing the weight of α and β favor solutions with fewer activated lights. The useof an activation cost has the additional benefit of completelydisabling lights that can never be sampled due to the scenegeometry and viewing angle of the reference images. Whileall lighting parameters are initialized to be near zero, if the collective probability of sampling unreachable lightsis high enough, it could have a negative impact on thevariance of our path-traces. This concept is explained furtherin the follow sections. With this additional term our finalobjectivation function E becomes: E = E p + Φ (9) B. Light Transport Derivatives
In order to perform our light optimization we first need tocompute the Jacobian of partial derivatives of pixel valueswith respect to the lighting parameters. We initialize theJacobian with all values set to zero. We then populate thevalues of the Jacobian by tracing the given scene with ourcustom path-tracer. For each pixel sample we construct thepath a ray of light takes as it bounces around the scene,while computing the throughput of the path as described inSection IV. When a ray eventually samples the light source,we add the ray’s final throughput to the corresponding valuein the Jacobian. Following Eq. (5), we can formally define thepartial derivative of the pixel I i with respect to the sampledlight λ j for each color channel c as the finite sum: ∂I i,c ∂λ j,c = 1 M N K (cid:88) k =1 V (p k , ω j ) T c (p k ) p ( ω i ) (10)where K is the number of samples for pixel I i that samplelight source λ j , p k is the final point on in path k beforesampling the light source, and T c is the throughput value forcolor channel c .In additional to the partial derivatives corresponding tothe images formation, we also need to compute the partialderivatives for our activation cost. Following Eq. (8) wecompute the partial derivative of the activation cost Φ i withrespect to the light λ i for each color channel c as: ∂ Φ i ∂λ i,c = αβ β ( (cid:80) nj =1 λ i,j ) (11) C. Gradient Descent
Having estimated our Jacobian, we can now perform thelight location and intensity optimization. As our Jacobianwill typically be extremely large, dense, and exhibit noanticipated structure we can leverage, we employ gradientdescent with backtracking as inverting or decomposing theJacobian would be computationally prohibitive. To accountfor the inherent constraint that light parameters cannot benegative we apply gradient projection after each update[20]. We continue this process until the optimization hasconverged, as indicated by the Wolfe conditions on gradientmagnitude [17].
D. Sequential Monte Carlo
As described in Section V-B, we leverage importancesampling when tracing the scene, so that our estimatedderivatives will exhibit a lower amount of variance whileusing a smaller number of samples. However, when we firstconstruct our light optimization problem, we initialize ourighting parameters uniformly. Consequently, all lights willinitially have the same probability of being sampled and wewill not receive any benefit from importance sampling. If wewere to formulate the optimization problem with a higherenvironment map resolution or use a smaller the number ofsamples per pixel the variance in the estimated derivativeswill increase, resulting in a poorer estimate of the lightingparameters.Yet once gradient descent converges and yields an updatedset of lighting parameters we can repeat the process ofestimating the derivatives. We would no longer be samplinglights uniformly, and we would now benefit greatly fromimportance sampling. This technique is referred to as par-ticle filtering or sequential Monte Carlo (sMC) [15]. Ouralgorithm continues to re-estimate the Jacobian, retracing thescene and sampling lights according to our latest estimateof lighting parameters, and performs gradient descent untilwe no longer observe any significant change in the lightingparameters. VI. E
XPERIMENTAL R ESULTS
We verified our algorithm on several distinct real andsynthetic datasets. All lighting optimizations were performedusing a environment map resolution of between 9 to 21light rings and a single reference image with a resolutionof × . We first reconstructed scene geometries fortwo real scenes. Albedos were manually associated with eachvertices in the resulting mesh. These scene models were thenused with both real and synthetic reference images.The two experiments using real reference images werecaptured using Xtion Pro Live while performing scene re-construction. These scenes were illuminated with a singlewhite LED 1,100 lumen light bulb placed within 3 metersof the observe scene. Results for these two experiments canbe seen in rows 1 and 2 of Figure 6.We performed three additional experiments with syn-thetically generated reference images. There images wererendered using the same mesh and albedos used duringoptimization. However, there were illuminated using one ormore spherical area. This was an intentional decision as notto use the same lighting model we are trying to simulate.Results for these two experiments can be seen in rows 3-5of Figure 6. Note that the photometric error depicted in therightmost column has been scaled up by a factor 1.5 for thesake of visualization.All results were implemented on a GPU for derivativecomputation, and the optimization takes place on a CPU.Each optimization to converge took between 6-10 iterationsof sequential Monte Carlo and gradient descent, which lastedfor approximately 10 minutes. Once the optimization hadconverged, rendering the reference image took approximately30 seconds. VII. D ISCUSSION
The results from our two experiments using real worldreference images clearly show that the largest photometric errors often occur at depth discontinuities, due to an imper-fect reconstruction of the scene’s geometry. This suggeststhat we may improve the robustness of our algorithm byignoring pixels near depth discontinuities as performed [23].We note that Figure 6 shows several departures fromthe generative model that are present in the image results,which can be explained through the natural behavior of ouralgorithm. The photometric error pane illustrates that thecurtain behind the scene in the first row experiences greaterphotometric error as it recedes from the imaging plane. Thisis expected as there is no difference in the environment map’simbuing of light in the scene dependent on in-scene depthvalues (the light source is modeled at infinite distance fromthe scene). Also, the latter three rows in Figure 6 demonstratethat the parametrization of light using an environment mapdo not cause a significant enough departure from an arealight that would cause the scene to not appear realistic. Thatsaid, there are complicated patterns in the photometric errorthat result from this parametrization which do not admit asimple resolution.Finally and most importantly, our results demonstrate thatour method may be leveraged to both estimate light positionand to realistically render scenes with complicated opticalphenomena, including diffusing and interacting shadows andlight sources. This approach is therefore a feasible optionfor both augmented reality applications as well as in-sceneparameter estimation.VIII. C
ONCLUSION
We have presented a new algorithm for light sourceestimation in scenes reconstructed using a RGB-D camera.Although we model light using an environment map, wehave shown that our algorithm can still accurately estimatelighting conditions created by light sources located near theobserved scene. Our major contribution is developing a newtechnique that leverages the full expressive power of thelight transport equations to perform lighting optimization.The presented optimization problem can potentially be refor-mulated to estimate any term of the light transport equation,namely surface albedos, bidirectional reflectance distributionfunctions, and scene geometry, providing a wealth of direc-tions for future research.R
EFERENCES[1] Autonomous Robotics and Perception Group: Calibu camera calibra-tion library (2016), http://github.com/arpg/calibu [2] Boom, B., Orts-Escolano, S., Ning, X., McDonagh, S., Sandilands, P.,Fisher, R.B.: Point light source estimation based on scenes recordedby a RGB-D camera. In: British Machine Vision Conference, BMVC,Bristol, UK (2013)[3] Chen, Q., Koltun, V.: A simple model for intrinsic image decomposi-tion with depth cues. In: International Conference on Computer Vision.pp. 241–248. IEEE (2013)[4] Corke, P., Paul, R., Churchill, W., Newman, P.: Dealing with shadows:Capturing intrinsic scene appearance for image-based outdoor locali-sation. In: Proc. of the International Conference on Intelligent Robotsand Systems (IROS) (November 2013)[5] Duchˆene, S., Riant, C., Chaurasia, G., Moreno, J.L., Laffont, P.Y.,Popov, S., Bousseau, A., Drettakis, G.: Multiview intrinsic images ofoutdoors scenes with an application to relighting. ACM Transactionson Graphics pp. 1–16 (2015)ig. 6:
Left : captured reference image,
Middle : render of estimated lighting conditions,