[PDF] Differentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects

Abstract

Full PDF

DDifferentiable Refraction-Tracing for Mesh Reconstruction ofTransparent Objects

JIAHUI LYU ∗ , Shenzhen University

BOJIAN WU ∗ , Alibaba Group

DANI LISCHINSKI,

The Hebrew University of Jerusalem

DANIEL COHEN-OR,

Shenzhen University and Tel Aviv University

HUI HUANG † , Shenzhen University

Fig. 1. Reconstructing a transparent

Hand object. The five images, from left to right, show a sequence of ray-traced models, progressively optimized by ourmethod. The ground-truth geometry, obtained by painting and scanning the object and a real photograph of the original object are shown on the right.

Capturing the 3D geometry of transparent objects is a challenging task,ill-suited for general-purpose scanning and reconstruction techniques, sincethese cannot handle specular light transport phenomena. Existing state-of-the-art methods, designed specifically for this task, either involve a complexsetup to reconstruct complete refractive ray paths, or leverage a data-drivenapproach based on synthetic training data. In either case, the reconstructed3D models suffer from over-smoothing and loss of fine detail. This paperintroduces a novel, high precision, 3D acquisition and reconstruction methodfor solid transparent objects. Using a static background with a coded pattern,we establish a mapping between the camera view rays and locations on thebackground. Differentiable tracing of refractive ray paths is then used to di-rectly optimize a 3D mesh approximation of the object, while simultaneouslyensuring silhouette consistency and smoothness. Extensive experiments andcomparisons demonstrate the superior accuracy of our method.CCS Concepts: • Computing methodologies → Computer graph-ics ; Shape modeling ; Mesh geometry models . ∗ Equal contribution † Corresponding author: Hui Huang ([email protected])Authors’ addresses: Jiahui Lyu, Shenzhen University; Bojian Wu, Alibaba Group; DaniLischinski, The Hebrew University of Jerusalem; Daniel Cohen-Or, Shenzhen University, Tel Aviv University; Hui Huang, College of Computer Science & Software Engineering,Shenzhen University.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].© 2020 Association for Computing Machinery.0730-0301/2020/12-ART195 $15.00https://doi.org/10.1145/3414685.3417815

Additional Key Words and Phrases:

3D reconstruction, transparent ob-jects, differentiable rendering

ACM Reference Format:

Jiahui Lyu, Bojian Wu, Dani Lischinski, Daniel Cohen-Or, and Hui Huang.2020. Differentiable Refraction-Tracing for Mesh Reconstruction of Transpar-ent Objects.

ACM Trans. Graph.

39, 6, Article 195 (December 2020), 13 pages.https://doi.org/10.1145/3414685.3417815

Acquiring the 3D geometry of real world objects has been one ofthe longstanding problems in the fields of computer graphics andcomputer vision. Most existing 3D acquisition approaches, suchas laser scanning and multi-view reconstruction, are based on theassumption that the object is opaque and its surface is approximatelyLambertian. Thus, such approaches are not applicable to objectsmade from a transparent, refractive material, due to the complexmanner in which they interact with light.Several methods have recently been proposed for non-intrusiveacquisition of such objects [Li et al. 2020; Wu et al. 2018]. The methodof Wu et al. [2018] is based on capturing correspondences betweencamera rays and the rays incident on the object from behind. Thisnecessitates a complex setup with a rotating background monitor,and yields a point cloud from which the final model is consolidated.Li et al. [2020] employ a data-driven approach, which leverages alarge number of synthetic images as its training set, and requirescapturing the environment map. Both of these methods are limitedin their ability to recover fine geometric detail, resulting in overlysmooth reconstruction results, as we demonstrate in Section 5.

ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. a r X i v : . [ c s . G R ] S e p In this work, we propose a novel, non-intrusive method for recon-structing a detailed 3D mesh of solid transparent objects. In contrastto Wu et al. [2018], our approach is based on optimizing correspon-dences between camera rays and locations on a static backgroundmonitor, thereby cutting the acquisition time by half, and avoid-ing additional cumulative errors. More importantly, the proposedmethod optimizes the reconstructed mesh directly, and is able tocapture the fine geometric details of the object’s surface, as may beseen in Fig. 1. Furthermore, our approach leverages automatic differ-entiation, which can be better integrated with popular deep learningframeworks and benefit from GPU-accelerated optimization.Starting from a rough initial mesh, obtained from the visual hullof the object, our method progressively refines the mesh in a coarse-to-fine fashion, as shown in Figure 1. Specifically, via differentiabletracing of refracted ray paths, our method optimizes an objectivefunction that consists of three losses:(1) Refraction loss, which minimizes the distance between theobserved background refractions and the simulated ones;(2) Silhouette loss, which ensures that the boundary of the opti-mized mesh matches the captured silhouettes;(3) Smoothness loss, ensuring smoothness of the optimized mesh.Our approach only makes use of refractive ray paths through theobject that feature exactly two refractions, once upon entering, andonce upon exiting the transparent object. Thus, our optimizationignores some of the additional ray paths that may be observedduring acquisition, i.e., those involving more than two intersectionswith the object’s surface and/or total internal reflection. The effectof these assumptions is discussed in Section 5.7.In the remainder of this paper, we first briefly survey the relatedprevious work. Following an overview of our approach (Section 3),we describe the terms of our objective function in more detail (Sec-tion 4). Results and comparisons with previous work [Li et al. 2020;Wu et al. 2018] are presented in Section 5. Section 6 concludes thepaper and suggests directions for future work.

Matting is a process concerned with extracting from an image ascalar foreground opacity map, commonly referred to as the alphachannel [Levin et al. 2008; Porter and Duff 1984]. Environmentmatting is an extension of alpha matting that also captures how atransparent foreground object distorts its background, so it may becomposited over a new one. The pioneering work of Zongker etal. [1999] extracts the environment matte from a series of projectedhorizontal and vertical stripe patterns, with the assumption thateach pixel is only related to a rectangular background region. Toimprove environment matting accuracy and to better approximatereal-world scenarios, Chuang et al. [2000] propose to locate mul-tiple contributing sources from surrounding environments. Otherworks present solutions in domains other than the image, suchas the wavelet domain [Peers and Dutré 2003], or the frequencydomain [Qian et al. 2015]. Our approach could be viewed as anextension of environment matting to the task of transparent object reconstruction, in the sense that it progressively optimizes the re-constructed shape of the object so as to better match a collection ofenvironment mattes captured from multiple views.

Reconstructing the surface geometry of transparent objects is alongstanding challenging problem [Ihrke et al. 2010]. Some methodsuse destructive or intrusive techniques [Aberman et al. 2017; Hullinet al. 2008; Trifonov et al. 2006] to obtain a detailed surface geometry.Non-intrusive methods use the refractive properties of transparentobjects to recover their shape by analyzing the distortions of refer-ence background images [Ben-Ezra and Nayar 2003; Tanaka et al.2016; Wetzstein et al. 2011]. Recovering the object shape from the op-tical distortion it induces is typically applicable to a single refractivesurface or a parametric model, since light transport resulting frommultiple reflections and refractions is much more difficult to analyze.In addition to intrinsic refractions, it is also possible to capture thereflective components of light transport, and estimate the shapegeometry by observing exterior specular highlights [Morris andKutulakos 2007; Yeung et al. 2011]. Since reflections occur on theoutermost surface, it is possible to reconstruct objects with complexgeometries and inhomogeneous internal materials. However, theacquisition process is quite involved and considerable manual effortis needed to control the lighting conditions precisely enough toobtain reasonable results.More recently, several researchers tackle the task of transparentobject reconstruction by incorporating deep learning techniques.Stets et al. [2019] and Sajjan et al. [2020] propose to use encoder-decoder architectures for estimating the segmentation mask, depthmap and surface normals from a single input image of a transparentobject. Li et al. [2020] present a different approach, where a render-ing layer is embedded in the network to account for complex lighttransport behaviors. They achieve state-of-the-art reconstructionresults using multi-view images. Due to the difficulty of obtaining asufficient amount of real training data, these data-driven methodsrely on synthetic training images. Although these images are gener-ated using high-fidelity photorealistic rendering, the domain gapbetween real-world and synthetic images still exists. Specifically,networks trained on synthetic images have difficulty generalizing toreal input images, and thus they are prone to reconstruction errors,as will be demonstrated in Section 5. In contrast, we use a controlledacquisition setup to capture refractive light paths and use directper-object shape optimization, which does not require a trainingset consisting of similar shapes.

Light path triangulation is an extension of classical stereo triangula-tion, which uses the relationship between direction of refraction andthe surface normal to infer geometry from light transport. Kutulakosand Steger [2008] provide a theoretical analysis of the reconstruc-tion feasibility based on the number of specular reflections andrefractions along the ray paths. The simplest case involves a singlerefraction [Shan et al. 2012], such as when reconstructing a wa-ter surface [Morris and Kutulakos 2011; Qian et al. 2017; Zhanget al. 2014]. Next, Tsai et al. [2015] reveal depth-normal ambiguity

ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. ifferentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects • 195:3 while assuming that the light rays refract twice. To eliminate theambiguity, Qian et al. [2017] propose a position-normal consistencybased optimization framework to recover front and back surfacedepth maps. Wu et al. [2018] extend this approach and present thefirst non-intrusive method to reconstruct the full shape of a gen-eral transparent object; however, due to their separate optimizationand multi-view fusion of recovered point clouds, the results arealways over-smoothed. Following Wu et al. [2018], we also startour optimization from the object’s visual hull; however, rather thanfusing point clouds, we directly optimize the surface mesh by lever-aging differentiable rendering techniques. This approach enablesus to recover fine-grained geometric detail, as demonstrated by thecomparisons in Section 5.

Multiple works utilize differentiable rendering for image-based 3Dmesh reconstruction, such as Neural 3D Mesh Renderer [Kato et al.2018] and Soft Rasterizer [Liu et al. 2019]. These methods typi-cally assume a simplified image formation model and are limitedto Lambertian scenes. Li et al. [2018] introduce a general-purposedifferentiable ray tracer that is able to compute derivatives of scalarfunctions over a rendered image with respect to arbitrary param-eters. For transparent objects, Nimier-David et al. [2019] proposeMitsuba 2, a retargetable differentiable renderer, that could be ap-plied in computational caustics design [Papas et al. 2011]. Differentlyfrom these methods, we do not employ an irradiance-based loss func-tion that measures the discrepancy between the pixel values of therendered image and the ground truth. Rather, our refraction lossis based directly on ray-pixel correspondences, which reflect thegeometry of the underlying light transport. The geometry of lightpaths is directly determined by the shape geometry, which is whatwe seek to recover, compared to the final RGB pixel colors, whichare influenced by additional factors, such as the BRDF.

Our approach for transparent object reconstruction may be viewedas reconstruction-by-synthesis . We acquire multiple views of thetarget transparent object, with a Gray-coded background patternrefracted through the object in each view, and our goal is to re-construct a virtual 3D model of the object, that would refract thebackground in a manner consistent with the captured observations.Our capturing setup is shown in Fig. 2, and is further described inSection 5.1.The reconstruction is carried out by a coarse-to-fine optimizationprocess, visualized in Fig. 3. The optimization starts from a roughinitial mesh, which is progressively remeshed and deformed to bettermatch the observed background distortion and the silhouette of theobject in the captured views. The coarse stages of the optimizationintroduce large deformations of the optimized 3D mesh, while laterstages introduce and refine finer geometric surface detail.The rough initial shape is obtained from the visual hull defined bythe multiview silhouettes. The subsequent optimization is mainlydriven by differentiable tracing of refractive ray paths. Specifically,our goal is to deform the virtual shape such that the ray paths

Fig. 2. Our transparent object capture setup. The object to be captured isplaced on the turntable, which is rotated during acquisition to provide thestatic camera with multiple views of the object. A static LCD monitor isplaced behind the object, displaying horizontal and vertical stripe patternsthat form a Gray-coded background. The background is used for extract-ing the object’s silhouette and estimating the environment matte for eachcamera view. refracted by it twice (upon entering and upon exiting) reach the samepoints on the background pattern that are visible in the capturedviews, as shown in Fig. 4. In other words, we know the backgroundpoint Q that corresponds to each pixel q in each captured image,and use differentiable ray tracing to optimize the shape such thatthe corresponding ray path indeed reaches Q . Formally, this goal isachieved by minimizing the refraction loss ; see Section 4.1.In addition to the refraction loss, our optimization also makes useof silhouette loss , which ensures that the boundary of the optimizedmesh matches the silhouettes, as seen from the different views.Finally, a smoothness loss term is used to ensure the smoothness ofthe resulting reconstructed mesh. As previously mentioned, our method starts by reconstructing aninitial rough shape of the object from a collection of multiviewsilhouettes using space carving [Kutulakos and Seitz 2000]. Sub-sequently, the rough model is progressively optimized, based onthe correspondences between view rays and background locations,extracted by environment matting [Zongker et al. 1999], while main-taining the boundary constraints, provided by the silhouettes. Inorder to recover geometric details at progressively finer scales, wealternate between remeshing the shape and optimizing the lossfunction in a coarse-to-fine fashion to reach the goal; see Fig. 3for example. At each stage, the reconstructed shape is graduallyupdated by minimizing a combination of three terms: refractionloss, silhouette loss and smoothness loss: L = α L refract + β L silhouette + γ L smooth , (1)where α , β and γ are balancing coefficients for the loss terms, whichwe set to 10 / HW , 0 . / min ( H , W ) and 10 / edgelen by default, re-spectively, where H and W represent the height and width of the ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020.

Fig. 3. Coarse-to-fine reconstruction of a real mouse statue. Top: starting with the visual hull obtained by space carving, our method gradually recovers detailsranging from large geometric displacements such as neck and tummy to fine-level details like eyes. Middle: we alternatively remesh and reconstruct geometricdetail at progressively finer scales. Bottom: the error map is visualized using the shortest distance between each vertex of the reconstruction and the groundtruth mesh. The number below is the average of the per-vertex distances in millimeters. The real size of the statue’s bounding box is mm × mm × mm . MonitorCamera Virtual shape Real object Q ∆ Q’ R a y 𝑝 𝑝 ’ 𝑝 ’ 𝑝 𝑐 𝑞 𝑝 ’ v v v 𝑝 ’ v v v Fig. 4. Refraction loss. The simulated refractive ray path (in red) throughimage pixel q should reach the observed background point Q , which corre-sponds to the intersection of the real ray path (in blue) with the backgroundmonitor. The pink mesh is the optimized virtual shape, initialized to thevisual hull. The top left and right insets show the associated triangles andvertices of a single simulated ray-pixel correspondence. camera imaging plane, and edgelen denotes the average edge lengthin the currently optimized mesh. In the following subsections, wedescribe each of our loss terms in more detail. As shown in Fig. 4, given the current virtual shape mesh, we firsttrace rays from the camera as they intersect and refract through theshape, and then optimize the positions of associated mesh verticesaccording to the captured correspondences between view rays andbackground locations (e.g., the ray cq and point Q in Fig. 4). The optimization only makes use of rays that refract exactly twice be-fore reaching the background, first when the camera ray enters thevirtual shape ( p ′ ), and once more when it exits the shape ( p ′ ). Thefull simulated light path, shown in red in Fig. 4, intersects the back-ground at Q ′ . Since Q ′ is generally different from the destination ofthe actual optical path through the real object, which is shown inblue and terminates at Q , our goal is to minimize ∆ = || Q − Q ′ || .Note that while our approach only utilizes ray paths with tworefractions, this limitation is actually less restrictive than it mayseem. In particular, it does not limit our approach to convex objects,as demonstrated by all of our examples in this paper. The reason isthat each reconstructed object is captured from multiple view direc-tions, and as the number of views increases, most mesh trianglesend up having two-refraction ray paths passing through them. Theeffect of considering only two-refraction paths is further discussedand demonstrated in Section 5.7.In comparison, Wu et al. [2018] also optimize the recovered shapebased on captured environment mattes; however, their approachrequires moving the background for each view to reconstruct the3D rays exiting the object towards the background, and yields pointclouds. In contrast, we rely on a single environment matte per view,extracted using a static background, and leverage differentiable raytracing to directly optimize the reconstructed 3D mesh.As mentioned above, our goal is to minimize the gap betweensimulated background position ( Q ′ ) and its corresponding capturedground truth ( Q ), in order to cause the simulated and real light pathsto coincide. Specifically, since each simulated ray path is associatedwith two triangles that it intersects, the vertices of these triangles areoptimized to achieve this goal. The whole process is accomplishedby iterative ray tracing and optimization, using the loss function ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. ifferentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects • 195:5

Fig. 5. Visualization of the refraction loss, as defined in Eq. (2) , of valid rays involved in two refractions. The visualized loss values are normalized by theresolution of the background monitor. As the refraction loss decreases during optimization, the final reconstructed mesh closely approximates the scannedground truth, and correctly reproduces light transport, as shown in Fig. 1.

GT mask Virtual boundary Gradient descent direction Gradient descent directionNo gradient!

Fig. 6. Silhouette loss and its negative gradient direction. For each captured camera view, the ground truth mask of the object encodes pixels inside, outside,and on the boundary of the object (white, black, and grey, respectively). The projected silhouette edges of the reconstructed mesh are shown in yellow. Thegradient is set to zero for edges whose center point falls on the mask boundary, and is non-zero for edges inside or outside the mask. defined as follows: L refract = U (cid:213) u = (cid:32)(cid:213) i ∈ I (cid:12)(cid:12)(cid:12)(cid:12) Q − Q ′ (cid:12)(cid:12)(cid:12)(cid:12) (cid:33) , (2)where U is the number of captured views and the set I only con-tains ray paths that go through the object and involve exactly tworefractions. Let v k and v k ( k = , ,

2) represent vertices of trian-gles that contain p ′ and p ′ , respectively, as shown in Fig. 4. Sincethe normal of each triangle is uniquely determined by its vertices,and the refractions are governed by Snell’s law [Born and Wolf2013], the final exiting ray p ′ Q ′ is fully parameterized by v k and v k . Thus, Q ′ , obtained by intersecting the ray with the backgroundmonitor, is also a function of the associated vertices. Since all theoperations to obtain Q ′ are differentiable, the gradient of Eq. (2) iseasily calculated using the chain rule. The effect of the refractionloss is visualized in Fig. 5. The silhouettes of the object in the captured views offer an importantset of constraints for the boundary of the reconstructed virtual shape.Although the initial visual hull, obtained by space carving, perfectlymatches the captured silhouettes, optimizing using the refractionloss alone might cause the silhouettes of the virtual shape to deviatefrom the ground truth ones, as demonstrated in Section 5.6. Thereason for this is that there may exist multiple refractive surfacesthat might satisfy the observed refraction of the background pattern.Thus, we introduce L silhouette , a novel loss term, to ensure that the silhouettes of the reconstructed shape are consistent with thecaptured silhouettes.Specifically, the silhouette loss is defined using the ground truthprojection mask of the object, as captured for each view, and theprojection of the silhouette edges of the virtual shape, as seen fromthe same view (see Fig. 6). Let b denote the projection of a virtualsilhouette edge onto the image plane, and s b denote the midpointof that edge, which serves as a representative of the edge for thecomputation of the silhouette loss. The loss is defined as the numberof edge midpoints strictly outside or strictly inside the ground truthmask: L silhouette = U (cid:213) u = (cid:213) b ∈ B | χ ( s b )| . (3)Here, χ ( s ) is an indicator function whose value is 1 or − s isinside or outside of the ground truth mask, and χ ( s ) = s lieson the mask boundary. B is the collection of the virtual silhouetteedges, obtained by projecting their 3D counterparts onto the imageplane of each view. Since χ ( s ) is a discrete function, we define itsgradient manually. Specifically, the shape of the virtual mesh is usedto determine the direction of gradient descent (negative gradient).Let N b denote the normal of edge b , which points to the outside ofthe virtual boundary. As shown in Fig. 6, if s b is inside the groundtruth mask, b should move along N b in order to align with theground truth. Conversely, if s b is outside the mask, it should movealong − N b . Finally, if s b is on the mask boundary, its gradient is setto zero. In summary, the negative gradient is defined as follows: − ∇ | χ ( s b )| : = χ ( s b ) || b || N b , (4) ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020.

Fig. 7. Coarse-to-fine reconstruction of synthetic

Bunny (left) and

Kitten (right) models, rendered with a refractive index set to 1.5. The top row shows theprogression from an initial shape obtained by space carving to a more detailed reconstruction. The bottom row visualizes the per-vertex reconstruction error,measured by the minimal distance between each vertex in the top row models and the ground truth. The average distance, below each visualization, isdecreasing, as expected. Each of the two models is scaled such that the longest dimension of its bounding box is 1.0. where || b || denotes the length of b . Let b be the image projectionof the 3D mesh edge between the vertices v b and v b . Then theprojected edge midpoint s b is given by: s b = P u v b + v b , (5)where P u is the projection matrix of view u . Thus, the gradient ofEq. (3) with respect to the 3D mesh vertices can be calculated bycombining Eq. (4) and the gradient of Eq. (5) via the chain rule. Finally, we incorporate an additional loss term to encourage thereconstructed mesh to be smooth during the optimization. This termmeasures the discrepancy between normal vectors of neighboringtriangles: L smooth = (cid:213) e ∈ E (cid:0) − log ( + ⟨ N e , N e ⟩) (cid:1) , (6)where E is the set of all the edges, and ⟨ N e , N e ⟩ is the dot productof the normals of the two adjacent triangles that have e as theircommon edge. Once the virtual shape obtained from space carving has been op-timized using Eq. (1), a new surface mesh is generated, which willserve as the initial shape for the next optimization stage. Beforeeach stage, we remesh the surface [Pietroni et al. 2010] in a coarse-to-fine fashion in order to recover progressively finer details. Forremeshing, the target edge length t l is sampled from the inversedistance space, as defined below: t l = L · t min l , ( l = , , ..., L ) , (7)where t min = . · diaglen , and diaglen is the length of theobject’s bounding box diagonal. Here L is total number of optimiza-tion stages, which we set to 10 in our experiments. The maximumdistance between the surfaces before and after remeshing is set to 0 . · diaglen . During each optimization stage, for a singleray-pixel correspondence, we trace the camera ray and first locatethe intersected triangles, whose associated vertices will be refinedin this round of optimization. As the virtual shape becomes moreaccurate, it can more precisely filter out ray paths that involve morethan two refractions, or total internal reflection, and provide a bet-ter initialization for subsequent optimization. The coarse-to-finereconstruction of two models is visualized in Fig. 7. We begin by briefly describing our acquisition setup. As shownin Fig. 2, the transparent object to be captured is positioned onturntable, between an LCD monitor that displays the coded back-ground pattern and a static camera, placed in front of the object. Theintrinsic and extrinsic parameters of the camera, and the relativepositions of monitor and turntable with respect to the camera arecalibrated [Zhang 2000], before the acquisition commences.To capture an object, the turntable is rotated to a set of 72 evenlysampled viewing angles. At each viewing angle, a Gray-coded back-ground pattern is displayed on the monitor for simultaneously ex-tracting silhouette and estimating ray-pixel correspondences usingenvironment matting. The Gray-coded background is produced bydisplaying a sequence of 11 images with vertical stripes and 11images with horizontal stripes (see Fig. 11). Note that, in order toavoid the influence of ambient light, the entire acquisition processis conducted in a dark room, and the background monitor is usedas the only light source.

We use PyTorch [Paszke et al. 2017] to implement our approach.During each of the 10 optimization stages, the loss function is eval-uated and its gradients are back-propagated 500 times. Each time,we randomly select one camera view for computing refraction lossand nine other views (spaced apart by 40 ◦ from each other with a ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. ifferentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects • 195:7 [Wu et al. 2018] Ours Ground Truth

Fig. 8. A qualitative comparison using rendering. The three images in eachrow are rendered from the same camera view, with the result of Wu etal. [2018] on the left, our reconstructed model in the middle, and the originalmodel on the right. Our method captures more of the fine level geometricdetail then that of Wu et al., and the ray-traced images are nearly identicalto those rendered from the ground truth models.

Iterations Iterations A v e r age R e c on s t r u c t i on E rr o r Kitten Bunny

IOR =IOR =IOR =IOR =IOR =IOR = IOR =IOR =IOR =IOR =IOR =IOR =

Fig. 9. Reconstruction accuracy with different refractive indices. Errors areslightly higher for higher refractive indices, but the effect of the optimizationis similar. random starting view) for silhouette loss. To compute the refractionloss, we first use the OptiX ray tracing engine [Parker et al. 2010]

Fig. 10. Real transparent objects used in our experiments. All of these objectsexhibit geometric detail at a variety of scales.Fig. 11. Four of the captured images of a real

Horse object, each image istaken from a different view and while the background monitor is displayingone of the horizontal or vertical stripe patterns. Acquiring each view witha full Gray-coded background pattern enables extracting the environmentmatte and the object silhouette. to find the triangles intersected by each ray path from the camerato the background. The OptiX engine is used here for efficiency,which becomes important as the number of mesh triangles grows.However, OptiX is not a differentiable ray tracer, and thus, once theintersected triangles are known, the final intersection points ( p ′ and p ′ ) are computed in PyTorch in a fully differentiable manner. Havingcomputed these intersection points, we trace the ray as discussedin Section 4.1, to obtain a differentiable ray path with regard to themesh vertices. We then perform gradient descent using Nesterovmomentum optimizer with default arguments, where the learningrate decays from 0 . · diaglen to 0 . · diaglen . It takes about30 minutes to perform the data acquisition for a single object, andabout 1 minute to reconstruct the visual hull from the silhouettesusing space carving. Starting from the visual hull, the progressiveoptimization takes about 10 minutes. ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020.

Fig. 12. Reconstruction of real transparent objects:

Horse , Rabbit , and

Tiger . For each object, our reconstructed model is rendered with diffuse shading andray-traced in a virtual environment (two left columns). For qualitative comparison, the corresponding ground truth models are rendered in the two rightcolumns. Note that our reconstructed models succeed in capturing some the skin folds and wrinkles on the body of

Horse and

Tiger , as well as the whiskersand eyes of

Rabbit . We first evaluate our method on two synthetic mesh models:

Bunny and

Kitten , rendered as solid transparent objects with a refractiveindex of 1.5. To emulate the acquisition process, we render the mod-els using a path tracer implemented using OptiX [Parker et al. 2010].For photorealistic appearance, we also simulate total internal reflec-tion, but limit the ray tracing depth to 30. The virtual data capturingsetup follows the one described in Section 5.1, with the camera set toa pinhole model, and the resolution of virtual background monitorset to 1920 × ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. ifferentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects • 195:9

Our ground truth Ours [Wu et al. 2018] [Li et al. 2020] Li et al.’s GT

Fig. 13. Comparison with Wu et al. [2018] and Li et al. [2020] using real transparent objects:

Mouse , Monkey and

Dog . For each object, compared withits corresponding ground truth, our reconstructed results better capture geometric details at various scales, while the results of Wu et al. and Li et al. areover-smoothed and many of these details are lost. Note that, due to the lack of silhouette constraints, the arm of the

Monkey and the tail of the

Mouse in thereconstructions of Li et al. are falsely connected with the body, while our reconstruction of these thin structures is more precise. yielding more accurate reconstructions. When simulating refractivelight transport through each of these models, it may be seen thatour reconstructions are accurate enough to closely approximate theimages obtained with the original models. In contrast, the refractionsthrough the models reconstructed by Wu et al. [2018] are visiblytoo smooth.Finally, in Fig. 9 we examine the accuracy of our reconstructionof the same two synthetic examples for different refractive indices.A higher refractive index, induces stronger changes in the directionof the refracted light rays, which may cause more of the final exit-ing rays to miss the background monitor. This leads to fewer validray-pixel correspondences and causes a slight decrease in the recon-struction accuracy. The local peaks in the error plots in Fig. 9 occurat 500 iteration cycles are caused by remeshing, which temporarilyincreases the reconstruction error.

We obtained eight real transparent objects made from borosilicate3.3 glass (the refractive index is 1.4723) for our experiments; seeFig. 10. For data acquisition, a DELL LCD monitor with resolution1920 × Ground truth Ours [Wu et al. 2018]

Fig. 14. Comparison with Wu et al. [2018] using a real

Hand object. Ourresult succeeds in capturing the finger nails, and the creases between fingers,while these details are smoothed over by the method of Wu et al. by a high precision stepper motor to rotate 5 ◦ each time. A singlePointGrey Flea3 color camera is used to capture the images. Weused an aperture of f/6.0, while the shutter time is set to about 50msfor adequate exposure; see Fig. 11 for an example of acquired data.In order to quantitatively evaluate the accuracy of reconstruc-tion, we paint each object with DPT-5 developer and then scan itwith a high-end industrial level scanner ∗ to obtain a ground truth ∗ Our ground truth Ours [Li et al. 2020] Li et al.’s GT

Fig. 15. Comparison with Li et al. [2020] using a real

Pig object. Fine details on the body, such as the eye, are smoothed over, and the shape of the ears and tailare distorted in their result, in contrast to our method. w/o refraction loss w/o silhouette loss w/o smoothness loss w/o refraction loss w/o silhouette loss w/o smoothness loss

Fig. 16. Reconstruction of

Rabbit without different loss terms. The fullreconstruction result is shown in Fig. 12, where the average reconstructionerror is 0.6261mm. The errors for the above three reconstructions (from leftto right) are 0.8420mm, 2.2337mm and 0.7300mm, respectively.

Coarse-level optimization Fine-level optimization

Fig. 17. Validation of our coarse-to-fine optimization strategy. The leftreconstruction is achieved by optimizing directly the initial coarse meshwithout further remeshing, while the right reconstruction directly optimizesthe final fine-level mesh. The average reconstruction errors are 0.7376mmand 0.9187mm, respectively, compared to an average error of 0.6261mm forthe coarse-to-fine reconstruction.

3D mesh. We compare between the reconstructed results and theground truth after aligning them using ICP, as shown in Fig. 3 and 5.Quantitatively, our reconstructed models improve the approxima-tion provided by the visual hull roughly by a factor of x2, and theaverage distance of our final reconstructed models from the groundtruth is on the order of 0.1-0.3 percent of the model’s bounding boxdiagonal. The average reconstruction errors for all eight real objectsare reported in Table 1, where we also compare our accuracy withthe state-of-the-art.Qualitatively, our final reconstructions is able to noticeably im-prove upon the initial visual hull model, while maintaining theaccuracy of the silhouettes, even for thin structures such as the tailin Fig. 3. Additional examples can be found in Fig. 12, where the

Table 1. Reconstruction error, measured by average per-vertex distance,of our results, [Wu et al. 2018] and [Li et al. 2020], each with respect toits corresponding ground truth. As a baseline, we also report the averagedistance between the initial visual hull and the ground truth. All distancesare normalized by the bounding box diagonals of the corresponding models. initial [Wu et al. 2018] [Li et al. 2020] ours

Mouse

Dog

Monkey

Hand

Pig

Rabbit

Horse

Tiger reconstructed models of the

Horse , Rabbit and

Tiger objects, demon-strate the ability of our method to cope with various complicatedgeometries, capturing fine scale geometric details, such as skin folds,wrinkles, whiskers, etc.

The relevant state-of-the-art methods are those of Wu et al. [2018]and Li et al. [2020], since both of them reconstruct the full shapeof a transparent object in a non-intrusive way, as described in Sec-tion 2. To perform the comparison with Wu et al. [2018], we obtainedfrom them their captured data, their reconstructed models, and thecorresponding ground truth scans. Similarly, we obtained the recon-structed and the ground truth models from Li et al. [2020]. It shouldbe noted that although Li et al. experimented with transparent ob-jects obtained from the same source, due to different manufacturingbatches there are slight differences between the shapes, and there-fore their results are compared to a different set of ground truthmodels.As demonstrated in Fig. 13, the reconstructions produced by eitherof these two state-of-the-art methods tend to smooth out the finegeometric details. In contrast, our method is more successful incapturing geometric detail at various scales, such as the larger scaledisplacements of the neck and tummy for the

Mouse and

Monkey

ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. ifferentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects • 195:11

Fig. 18. Optimization using different numbers of views. In each case, the views directions are evenly sampled from ◦ . As the number of views increases, thereconstruction results gradually capture more geometric details. The average reconstruction errors are plotted in Fig. 19 (left, solid green line).Fig. 19. Average reconstruction error (in millimeters) vs. the number of views used by the optimization for Monkey , Tiger and

Horse . The solid lines representexperiments with real objects, as described in Section 5.4. For comparison, the dotted lines represent experiments with synthetic objects (obtained byscanning the real ones), as described in Section 5.3. In each case, compared with the ground truth, the simulated experiment with more accurate ray-pixelcorrespondences clearly demonstrate higher quality over our real results. All reconstructions were obtained by optimizing over the full set of 72 views. objects, as well as the smaller scale details of the eyes of

Monkey and

Dog . Furthermore, the silhouettes for thin structures, such as thetail of

Mouse or the arm of

Monkey , are better preserved. Additionalcomparisons may be found in Fig. 14 and Fig. 15, where in each case,the reconstructions generated by our method are visibly closer tothe ground truth.

We first validate the effectiveness of each component of our lossfunction in Eq. (1), using the

Rabbit object. The reconstruction usingthe full loss function is shown in the rightmost column of Fig. 12,while reconstructions corresponding to omissions of each of thethree loss terms are shown in Fig. 16. It may be seen that whenthe refraction loss L refract is omitted, the reconstruction fails tocapture the finer scale geometric details, which are captured bythe refraction loss since they affect the refractive light transportthrough the object. We conclude that the refraction loss plays akey role in the recovery of fine details. On the other hand, omittingthe silhouette loss L silhouette fails to correctly reproduce the overallshape, which is particularly noticeable in the ears, legs, and tail ofthe rabbit. Finally, omitting the smoothness loss L smooth results inmacroscopic roughness artifacts (on legs and back).Next, Fig. 17 examines the effect of our coarse-to-fine optimizationstrategy. Specifically, we show the results when optimizing the initialcoarse-level mesh (left) and the final fine-level mesh (right), without progressing from one to another by remeshing every 500 iterations.It is not surprising that optimizing the coarse-level mesh alone,cannot recover the finer surface details, due to its sparse sampling ofthe shape. However, optimizing the fine-level mesh directly resultsin some displacement artifacts, since the optimization is prone togetting locally stuck in local minima.We examine the effect of the number of views on reconstructionaccuracy in Figs. 18 and 19. As more views are incorporated into theoptimization, the number of valid ray-pixel correspondences thatmay be used by the optimization is increased as well. This increase,in turn, improves the reconstruction accuracy, as reported for 3 realobjects in Fig. 19 (left, plotted in solid lines). Nonetheless, note thatthe accuracy gains diminish as more and more views are added, andacceptable results may be obtained with 18 or 36 views. To further examine the effect of the two-refraction assumption, foreach of our real examples, we measure the percentage of the finalmesh triangles that are pierced by at least one two-refraction raypath, when considering all of the captured views. Fig. 20 (left) showsthat as the number of views increases, the average percentage ofsuch triangles goes up (from 70.5% to 92.4%). Thus, when using36 uniformly spaced horizontal rotation angles, more than 90%of the triangles (on average across all eight examples) are being

ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. P e r c en t age o f T w o R e f r a c t i on T r i ang l e s Views

MouseDogMonkeyHandPigHorseRabbitTiger mm Fig. 20. Left: The percentage of the final mesh triangles that are pierced by at least one two-refraction ray path vs. the number of views. Right top: visualizationof the number of two-refraction paths piercing each triangle under 72 views. The maximum number is truncated to 50 for better visualization. Right bottom:the corresponding reconstruction error map. Below each model we report the average reconstruction error and the real size of bounding box, in millimeters.Fig. 21. Failure case. Left: a

Shell model with a cavity (highlighted by thedotted rectangle) and its rendering. Right: the result of reconstruction andrendering for a qualitative comparison. Because of multiple refractions inthe highlighted region, the cavity fails to be reconstructed correctly. optimized using two-refraction paths. The corresponding increasein the reconstruction accuracy is reported in Fig. 19 (left, solid lines).We also visualize the number of two-refraction paths piercing eachtriangle with 72 views and the corresponding reconstruction errormap of each object on the right side of Fig. 20. In general, the numberof two-refraction paths tends to be smaller for those triangles whosenormals are nearly perpendicular to the camera view rays (for allviews). On the other hand, those parts of the model are typicallywell captured by the silhouettes. Thus, it could be argued that therefraction loss complements the visual hull and the silhouette loss,and the deficiency of two-refraction paths does not result in largererrors. In fact, Fig. 20 shows that the largest errors in the

Rabbit and

Tiger models occur on the sides, which receive plenty of two-refraction paths. The reason is that the initial visual hull in theseareas happens to deviate significantly from the true shape, and thesilhouette loss is not able to fix it completely.We have demonstrated the ability of our method to reconstructaccurate and detailed models of transparent objects; however, sev-eral sources of errors remain. In order to better understand them,in Fig. 19 we also plot the reconstruction accuracy for the syntheticcounterparts of the eight real objects, where the reconstruction iscarried out as described in Section 5.3. The results, plotted using dotted lines in Fig. 19, show that the accuracy is significantly im-proved. Reconstructing the synthetic models effectively eliminatesall the real-world factors that could introduce errors, since the ren-dered images are noise-free and the ray-pixel correspondences arecalculated precisely. As shown in Fig. 19, the eye of

Monkey andthe side of

Tiger are reconstructed better than for the real objects.This suggests that one source of error is the uncertainty in the dataacquisition, and other possible errors in the environment matteextraction process. Another limitation on the reconstruction accu-racy is imposed by the limited size of the background monitor. Dueto this limited size, some of the ray paths never reach the codedbackground, and fewer valid ray paths are thus available to optimizethe shape with. This results in a reduction of the reconstructionaccuracy, as was already shown in Fig. 9.Furthermore, as shown in Fig. 19, the accuracy curves tend toflatten out after 18 views in almost all cases. We believe the rea-son is that at that point we have sufficiently many twice refractedray paths and silhouetted, thus the errors begin to be dominated byother factors, such as considering only pure refractions and ignoringreflections, which is not enough for faithful simulation of light trans-port through transparent objects. Thus, we assume a one-to-onecorrespondence between camera rays and points on the background,while in reality multiple background locations may contribute to asingle image pixel.Fig. 21 demonstrates a failure case of our method. Here the

Shell reconstructed object has a hollow part near the opening (highlightedin the figure). Because all of the views are horizontal, this cavitycannot be captured by the object’s silhouettes, and all of the raypaths passing through that part (from all of the views) require morethan two refractions in order to reach the background. Thus, ourapproach fails to correctly reconstruct the cavity, as shown in Fig. 21.

Although several dedicated approaches for recovering the 3D shapeof a transparent object have been recently proposed, these meth-ods are still unable to approach the accuracy of 3D scanning foropaque diffuse objects. In this work, we have proposed a new trans-parent object reconstruction method, which delivers a significantimprovement in reconstruction accuracy and its ability to capture

ACM Trans. Graph., Vol. 39, No. 6, Article 195. Publication date: December 2020. ifferentiable Refraction-Tracing for Mesh Reconstruction of Transparent Objects • 195:13 fine geometric surface details. Our controlled capture setup is consid-erably simpler than that of Wu et al. [2018], and the reconstructionprocess is more streamlined. Both of these significant improvementsmay be attributed to the power of differentiable ray tracing, whichwe use to directly optimize the recovered 3D mesh so as to minimizea combination of several complementing loss terms.The main limitation of our approach is that only rays that undergotwo refractions through the object are taken into account by ouroptimization process. Future work should attempt to alleviate thisrestriction, as well as extend our approach to more general opticalproperties, such as objects which might not be homogeneous intheir color, refractive index, or transmission properties.

ACKNOWLEDGMENTS

We sincerely thank Zhengqin Li, Yu-ying Yeh and Manmohan Chan-draker for providing us their reconstructed results and their scannedground truth of

Mouse , Monkey , Dog , and

Pig as shown in [Li et al.2020]. We also thank the anonymous reviewers for their valuablecomments. This work was supported in parts by NSFC (61761146002,61861130365), GD Science & Technology Program (2020A0505100064,2018KZDXM058, 2018A030310441, 2015A030312015), GD TalentPlan (2019JC05X328), LHTD (20170003), Israel Science Foundation(2366/16, 2472/17), National Engineering Laboratory for Big DataSystem Computing Technology, and Guangdong Laboratory of Ar-tificial Intelligence and Digital Economy (SZ).

REFERENCES

Kfir Aberman, Oren Katzir, Qiang Zhou, Zegang Luo, Andrei Sharf, Chen Greif, BaoquanChen, and Daniel Cohen-Or. 2017. Dip Transform for 3D Shape Reconstruction.

ACM Trans. on Graphics (Proc. of SIGGRAPH)

36, 4 (2017), 79:1–79:11.Moshe Ben-Ezra and Shree K. Nayar. 2003. What Does Motion Reveal About Trans-parency?

Proc. Int. Conf. on Computer Vision (2003), 1025–1032.Max Born and Emil Wolf. 2013.

Principles of optics: electromagnetic theory of propagation,interference and diffraction of light . Elsevier.Yung-Yu Chuang, Douglas E. Zongker, Joel Hindorff, Brian Curless, David H. Salesin,and Richard Szeliski. 2000. Environment Matting Extensions: Towards HigherAccuracy and Real-time Capture.

ACM Trans. on Graphics (Proc. of SIGGRAPH) (2000), 121–130.Matthias B. Hullin, Martin Fuchs, Ivo Ihrke, Hans-Peter Seidel, and Hendrik P. A. Lensch.2008. Fluorescent Immersion Range Scanning.

ACM Trans. on Graphics (Proc. ofSIGGRAPH)

27, 3 (2008), 87:1–87:10.Ivo Ihrke, Kiriakos N. Kutulakos, Hendrik Lensch, Marcus Magnor, and WolfgangHeidrich. 2010. Transparent and specular object reconstruction.

Computer GraphicsForum

29, 8 (2010), 2400–2426.Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3d mesh renderer.

Proc. IEEE Conf. on Computer Vision & Pattern Recognition (2018), 3907–3916.Kiriakos N. Kutulakos and Steven M. Seitz. 2000. A Theory of Shape by Space Carving.

Int. J. Computer Vision

38, 3 (2000), 199–218.Kiriakos N. Kutulakos and Eron Steger. 2008. A theory of refractive and specular 3Dshape by light-path triangulation.

Int. J. Computer Vision

76, 1 (2008), 13–29.Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A Closed-Form Solution to NaturalImage Matting.

IEEE Trans. Pattern Anal. Mach. Intell.

30, 2 (2008), 228âĂŞ242.Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen. 2018. DifferentiableMonte Carlo Ray Tracing through Edge Sampling.

ACM Trans. Graph. (Proc. SIG-GRAPH Asia)

37, 6 (2018), 222:1–222:11.Zhengqin Li, Yu-Ying Yeh, and Manmohan Chandraker. 2020. Through the LookingGlass: Neural 3D Reconstruction of Transparent Shapes.

Proc. IEEE Conf. on ComputerVision & Pattern Recognition (2020).Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. 2019. Soft rasterizer: A differentiablerenderer for image-based 3d reasoning.

Proc. Int. Conf. on Computer Vision (2019),7708–7717.Nigel JW Morris and Kiriakos N. Kutulakos. 2007. Reconstructing the surface ofinhomogeneous transparent scenes by scatter-trace photography.

Proc. Int. Conf. onComputer Vision (2007), 1–8.Nigel JW Morris and Kiriakos N. Kutulakos. 2011. Dynamic refraction stereo.

IEEETrans. Pattern Analysis & Machine Intelligence

33, 8 (2011), 1518–1531. Merlin Nimier-David, Delio Vicini, Tizian Zeltner, and Wenzel Jakob. 2019. Mitsuba2: a retargetable forward and inverse renderer.

ACM Trans. on Graphics (Proc. ofSIGGRAPH Asia)

38, 6 (2019), 203:1–203:17.Marios Papas, Wojciech Jarosz, Wenzel Jakob, Szymon Rusinkiewicz, Wojciech Matusik,and Tim Weyrich. 2011. Goal-based caustics.

Computer Graphics Forum (Proc. ofEurographics)

30, 2 (2011), 503–511.Steven G Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock,David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison,et al. 2010. OptiX: a general purpose ray tracing engine.

ACM Trans. on Graphics(Proc. of SIGGRAPH)

29, 4 (2010), 1–13.Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, ZacharyDeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Auto-matic differentiation in PyTorch.

NIPS-W (2017).Pieter Peers and Philip Dutré. 2003. Wavelet environment matting.

Proc. EurographicsWorkshop on Rendering (2003), 157–166.Nico Pietroni, Marco Tarini, and Paolo Cignoni. 2010. Almost isometric mesh parame-terization through abstract domains.

IEEE Trans. Visualization & Computer Graphics

16, 4 (2010).Thomas Porter and Tom Duff. 1984. Compositing Digital Images.

Proc. of SIGGRAPH

18, 3 (1984), 253âĂŞ259.Yiming Qian, Minglun Gong, and Yee-Hong Yang. 2015. Frequency-based environmentmatting by compressive sensing.

Proc. Int. Conf. on Computer Vision (2015), 3532–3540.Yiming Qian, Minglun Gong, and Yee-Hong Yang. 2017. Stereo-Based 3D Reconstructionof Dynamic Fluid Surfaces by Global Optimization.

Proc. IEEE Conf. on ComputerVision & Pattern Recognition (2017), 6650–6659.Shreeyak Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Chung Lee, AndyZeng, and Shuran Song. 2020. ClearGrasp: 3D Shape Estimation of TransparentObjects for Manipulation.

Proc. IEEE Int. Conf. on Robotics & Automation (2020).Qi Shan, Sameer Agarwal, and Brian Curless. 2012. Refractive height fields from singleand multiple images.

Proc. IEEE Conf. on Computer Vision & Pattern Recognition (2012), 286–293.Jonathan Stets, Zhengqin Li, Jeppe Revall Frisvad, and Manmohan Chandraker. 2019.Single-shot analysis of refractive shape using convolutional neural networks.

Proc.IEEE Winter Conf. on Applications of Computer Vision (2019), 995–1003.Kenichiro Tanaka, Yasuhiro Mukaigawa, Hiroyuki Kubo, Yasuyuki Matsushita, andYasushi Yagi. 2016. Recovering Transparent Shape from Time-of-Flight Distortion.

Proc. IEEE Conf. on Computer Vision & Pattern Recognition (2016), 4387–4395.Borislav Trifonov, Derek Bradley, and Wolfgang Heidrich. 2006. Tomographic recon-struction of transparent objects.

Proc. Eurographics Conf. on Rendering Techniques (2006), 51–60.Chia-Yin Tsai, Ashok Veeraraghavan, and Aswin C Sankaranarayanan. 2015. Whatdoes a single light-ray reveal about a transparent object?

Proc. IEEE Int. Conf. onImage Processing (2015), 606–610.Gordon Wetzstein, David Roodnick, Wolfgang Heidrich, and Ramesh Raskar. 2011.Refractive shape from light field distortion.

Proc. Int. Conf. on Computer Vision (2011), 1180–1186.Bojian Wu, Yang Zhou, Yiming Qian, Minglun Gong, and Hui Huang. 2018. Full 3DReconstruction of Transparent Objects.

ACM Trans. on Graphics (Proc. of SIGGRAPH)

37, 4 (2018), 103:1–103:11.Sai-Kit Yeung, Tai-Pang Wu, Chi-Keung Tang, Tony F Chan, and Stanley Osher. 2011.Adequate reconstruction of transparent objects on a shoestring budget.

Proc. IEEEConf. on Computer Vision & Pattern Recognition (2011), 2513–2520.Mingjie Zhang, Xing Lin, Mohit Gupta, Jinli Suo, and Qionghai Dai. 2014. RecoveringScene Geometry under Wavy Fluid via Distortion and Defocus Analysis.

Proc. Euro.Conf. on Computer Vision (2014), 234–250.Zhengyou Zhang. 2000. A Flexible New Technique for Camera Calibration.

IEEE Trans.Pattern Analysis & Machine Intelligence

22, 11 (2000), 1330–1334.Douglas E. Zongker, Dawn M. Werner, Brian Curless, and David H. Salesin. 1999.Environment Matting and Compositing.