Spatial Phase-Sweep: Increasing temporal resolution of transient imaging using a light source array
Ryuichi Tadano, Adithya Kumar Pediredla, Kaushik Mitra, Ashok Veeraraghavan
aa r X i v : . [ c s . C V ] D ec Spatial Phase-Sweep: Increasingtemporal resolution of transient imagingusing a light source array
Ryuichi Tadano , ∗ , Adithya Kumar Pediredla , Kaushik Mitra andAshok Veeraraghavan Sony Corporation, 7-1 Konan 1-chome, Minato-ku, Tokyo 108-0075, Japan Rice University, 6100 Main Street, Houston, TX 77005, USA Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India ∗ [email protected]
Abstract:
Transient imaging or light-in-flight techniques capture the prop-agation of an ultra-short pulse of light through a scene, which in effect cap-tures the optical impulse response of the scene. Recently, it has been shownthat we can capture transient images using commercially available Time-of-Flight (ToF) systems such as Photonic Mixer Devices (PMD). In this paper,we propose ‘spatial phase-sweep’, a technique that exploits the speed oflight to increase the temporal resolution beyond the 100 picosecond limitimposed by current electronics. Spatial phase-sweep uses a linear array oflight sources with spatial separation of about 3 mm between them, therebyresulting in a time shift of about 10 picoseconds, which translates into 100Gfps of transient imaging in theory. We demonstrate a prototype and tran-sient imaging results using spatial phase-sweep.
References and links
1. A. Velten, M. Bawendi, and R. Raskar, “Picosecond Camera for Time-of-Flight Imaging,” in “Imaging andApplied Optics,” (2011), p. IMB4.2. A. Velten, D. Wu, A. Jarabo, B. Masia, C. Barsi, C. Joshi, E. Lawson, M. Bawendi, D. Gutierrez, and R. Raskar,“Femto-photography: Capturing and visualizing the propagation of light,” ACM Transactions on Graphics (TOG)- SIGGRAPH 2013 Conference Proceedings , 44 (2013).3. B. Heshmat, G. Satat, C. Barsi, and R. Raskar, “Single-shot ultrafast imaging using parallax-free alignment witha tilted lenslet array,” in “CLEO: 2014,” , vol. 1 (OSA, Washington, D.C., 2014), vol. 1, pp. 3–4.4. A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G. Bawendi, and R. Raskar, “Recovering three-dimensional shape around a corner using ultrafast time-of-flight imaging,” Nature Communications , 745(2012).5. F. Heide, M. B. Hullin, J. Gregson, and W. Heidrich, “Low-budget transient imaging using photonic mixer de-vices,” ACM Transactions on Graphics , 1 (2013).6. A. Kadambi, R. Whyte, A. Bhandari, L. Streeter, C. Barsi, A. Dorrington, and R. Raskar, “Coded Time ofFlight Cameras : Sparse Deconvolution to Address Multipath Interference and Recover Time Profiles,” ACMTransactions on Graphics , 121–3 (1978).10. N. Abramson, “Light-in-flight recording: high-speed holographic motion pictures of ultrafast phenomena,” Ap-plied optics , 215–232 (1983).11. B. Nilsson and T. E. Carlsson, “Direct Three-Dimensional Shape Measurement by Digital Light-in-Flight Holog-raphy.” Applied optics , 7954–7959 (1998).12. I. Gkioulekas, A. Levin, and T. Zickler, “Micron-scale Light Transport Decomposition Using Interferometry,”ACM Transactions on Graphics (2015).3. L. Gao, J. Liang, C. Li, and L. V. Wang, “Single-shot compressed ultrafast photography at one hundred billionframes per second,” Nature , 74–77 (2014).14. F. Heide, L. Xiao, A. Kolb, M. B. Hullin, and W. Heidrich, “Imaging in scattering media using correlation imagesensors and sparse convolutional coding,” Optics Express , 26338 (2014).15. R. Tadano, A. K. Pediredla, and A. Veeraraghavan, “Depth Selective Camera: A Direct, On-chip, ProgrammableTechnique for Depth Selectivity in Photography,” The IEEE International Conference on Computer Vision(ICCV) (Submitted) .16. M. Tobias, K. Holger, F. Jochen, A. Martin, and L. Robert, “Robust 3D Measurement with PMD Sensors,” RangeImaging Day, Z¨urich , 8 (2005).17. R. Lange, P. Seitz, A. Biber, and S. Lauxtermann, “Demodulation pixels in CCD and CMOS technologies fortime-of-flight ranging,” Proceedings of SPIE , 177–188 (2000).18. R. Lange and P. Seitz, “Solid-state time-of-flight range camera,” IEEE Journal of Quantum Electronics , 390–397 (2001).19. R. Schwarte, “New electro-optical mixing and correlating sensor: facilities and applications of the photonic mixerdevice (PMD),” Proceedings of SPIE , 245–253 (1997).20. R. M. Conroy, A. a. Dorrington, R. K¨unnemeyer, and M. J. Cree, “Range imager performance comparison inhomodyne and heterodyne operating modes,” Proc. SPIE 7239, Three-Dimensional Imaging Metrology, 723905(January 19, 2009) , 723905–723905–10 (2009).21. B. B¨uttgen and P. Seitz, “Robust optical time-of-flight range imaging based on smart pixel structures,” IEEETransactions on Circuits and Systems I: Regular Papers , 1512–1525 (2008).22. R. Z. Whyte, A. D. Payne, A. a. Dorrington, and M. J. Cree, “Multiple range imaging camera operation withminimal performance impact,” Image (Rochester, N.Y.) , 75380I–75380I–10 (2010).
1. Introduction
Transient imaging or light-in-flight refers to capturing the temporal response of a scene to anultra-short pulse of light. Current techniques to capture transient images are either based onstreak cameras or on photonic mixer devices. Streak cameras when used along with femtosec-ond laser pulse based illumination can provide extremely fine temporal resolution, of the orderof 1 picosecond, but such systems [1–4] are prohibitively expensive and cost upwards of sev-eral hundred thousand dollars. More recently, Heide et al. [5], Kadambi et al. [6], and O’Toole et al. [7] have shown that commercially available photonic mixer devices that cost a few hun-dred dollars can be used to acquire transient images. Unfortunately, the temporal resolutionof these techniques is limited by the accuracy of the phase locked loop (PLL) circuit in theon-board electronics of these devices. In commercially available systems such as camboardnano [8], the on-board electronics and the PLL limits the minimum achievable phase shift tothe order of about 100 picoseconds ( ∼
128 picoseconds on camboard nano). As a consequence,transient images obtained using photonic mixer devices have a much lower temporal resolution(100 picoseconds) compared to systems based on streak cameras and femtosecond laser pulses(1 picosecond).Our goal in this paper is to improve the temporal resolution of transient images obtainedusing photonic mixer devices (PMD) over and above the limit imposed by the sensor electron-ics. We exploit the incredible speed of light (3 × m/sec) to our advantage and propose atechnique called ‘spatial phase-sweep’ (SPS) to improve the temporal resolution of transientimages obtained using PMDs. The idea behind spatial phase-sweep is very simple. We use anarray of light sources with the different sources in the array being slightly offset along the op-tical axis. That creates small but precisely controllable differences in time of travel betweenthe light pulses emitted by the different sources (Fig. 1). Since the light source positions in anarray can be precisely controlled, the corresponding path length differences created result in aslight temporal offset — the temporal offset D t , is given by D t = D dc , where D d is the spatialshift between adjacent light sources in the array, and c is the speed of light. In our prototype, D d is 3 mm, resulting in a temporal resolution D t of about 10 picoseconds, an order of magnitudebetter than the limit imposed by the on-board electronics of the PMD device in the prototype.The main alternative technique to improving the temporal resolution of transient images ob- (cid:0)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:7)(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:6)(cid:13) (cid:14)(cid:15)(cid:16)(cid:17)(cid:18)(cid:19) (cid:20)(cid:15)(cid:17)(cid:20)(cid:21)(cid:22)(cid:23)(cid:20)(cid:24)(cid:25)(cid:26)(cid:20)(cid:22)(cid:19) (a) Conventional system [6] (cid:27)(cid:28)(cid:29)(cid:30)(cid:31) !"" (b) Spatial Phase-Sweep (proposed) Fig. 1: (view in color)
Schematic chart of Spatial Phase-Sweep:
To capture the impulse re-sponse of the scene, we employ a pseudo random code, whose auto correlation is triangleshaped. (a) The sampling rate of auto correlation is limited by the minimum phase shift amountof PLL circuit in conventional systems [6]. (b) In our system, using the same electronics, a lightsource array is utilized to introduce more samples in phase by means of spatial domain sweep.tained using photonic mixer devices, is to increase the base frequency of the voltage controlledoscillator (VCO) used in the phase locked loop circuit in the on-board electronics. Boosting thebase frequency of the VCO may theoretically provide up to 10x improvement in the temporalresolution, but such a technique would come with significant increase in the cost of the result-ing sensor. Our proposed technique results in minor incremental cost over existing solutions,since we only need to create a linear array of laser diodes, which are inexpensive and easy toobtain. In addition, the key innovation of spatial phase-sweep is independent of the temporalresolution limit imposed by on-board electronics. This means that even if sensor electronicsare improved significantly, the spatial phase-sweep technique may be used to further improvetemporal resolution over that limit. The fundamental limit on temporal resolution achieved us-ing spatial phase-sweep is dependent mainly upon the accuracy with which one can controlthe positioning of laser diodes in the array. Since the positioning of laser diodes can be con-trolled to sub-millimeter precision, spatial phase-sweep will continue to provide improvementin temporal resolution even if the on-board electronics improve by an order of magnitude, dueto increased base frequency of the VCO.The main technical contributions of our paper are as follows:• We propose spatial phase-sweep, a technique to improve the temporal resolution of tran-sient images captured using photonic mixer devices.• We develop algorithms for self-calibration and transient imaging recovery from the datacaptured using spatial phase-sweep ToF camera.• We build a proof of concept prototype and demonstrate a 10x improvement in temporalresolutionSome of the limitations of the proposed technique are:• The data acquisition time increases linearly with the increase in temporal resolution.• The physical size of the light sources will limit the size of the light source array andhence, the increase in the resolution of spatial phase-sweep beyond a limit will be difficultto implement.• Our system requires repetitive measurements, which means that it cannot capture one-time phenomena such as plasma dynamics. . Prior work
Transient imaging finds applications in visualizing the interaction of light with an opticallycomplex scene that can involve multiple reflections, scattering media, or subsurface scattering.In this section, we review various approaches to transient imaging and proceed to explain aboutour approach.
Holography based:
Abramson captured the first light-in-flight images by shining a flat surfaceand a hologram with a short laser pulse [9, 10]. The beam from the flat surface is used asreference beam and the light coming from hologram interferes with the reference beam toproduce an image that corresponds to a short distance traveled by the light wave. By movingthe reference surface and stacking the images, they created light-in-flight images. Nilsson [11]repeated the same experiment with the help of CCD array to create digital light-in-flight video.
OCT based:
Gkioulekas et al. [12] proposed micron-scale transient imaging using optical co-herence tomography (OCT). The idea of incorporating OCT technique is close to ours, howeverthe scale of the subject they support is quite small, 2 cm H × × Streak camera based:
Velten et al. [1, 2, 4] proposed the use of a streak camera and a fem-tosecond laser to capture transient images. The laser illuminates one horizontal scan line at atime and scans the entire scene. For every scan, photons illuminate the scene, scatter and someof the scattered photons eventually reach the streak camera. The streak camera converts thesephotons into electrons using a photo cathode. These electrons are then deflected vertically by avoltage that varies with time. Hence, the intensity of the pixels in the vertical axis of the imagecorrespond to the photons coming from various depths. Scanning the entire scene, Velten et al. produced high resolution transient image ( ∼ et al. [13] employed digital micro-mirror device (DMD)and compressed sensing techniques along with streak camera. Their system achieved a tem-poral resolution of 10 picosecond. Heshmat et al. [3] utilized a tilted lenslet array to realize asingle shot transient imaging at a temporal resolution of 2 picosecond. PMD based:
Though streak camera based methods provide very high temporal resolution,they are prohibitively expensive: a system based on femtosecond laser and a streak cameracosts upward of several hundred thousand dollars. To realize an inexpensive transient imaging,photonic mixer device (PMD) based methods have been proposed by Heide et al. [5].PMDs are the basic building blocks of most commercial time of flight cameras. Besides, sev-eral applications using this device have been proposed in the past few years [6, 14, 15]. In suchsystems, a laser diode or a light emitting diode (LED) is temporally modulated to create a codedillumination signal. The light scattered off the subject is then correlated with a programmablesensor modulation pattern on a PMD sensor to obtain an array of correlational measurements.Heide et al. [5] performed a series of measurements with varying phase delays between theillumination and the sensor modulation patterns (while keeping both to be sinusoidal), anddemonstrated a deconvolution technique that is capable of recovering the transient images fromthe captured correlational measurements. Kadambi et al. [6] demonstrated a similar techniquefor recovering transient images, but using M-sequence, instead of sinusoidal modulation.O’Toole et al. [7] used an encoded projector to modulate the light both spatially and tem-porally. The 3-D illumination signal is transformed by interacting with the scene and is cap-tured by the PMD sensor. The spatial and the temporal components of the received signal havecomplementary information about the scene and are used to more robustly capture sharp light-in-flight images. The temporal resolution of their transient image is 100 picoseconds, whichtranslates to 10 Gfps.All these PMD based techniques for capturing transient images using PMD sensors are lim-ited in their temporal resolution, primarily due to phase locked loop (PLL) in FPGA or elec- (a) ToF camera system −60 −40 −20 0 20 40 60−10010203040
Auto correlation of m−sequence (31 bit)
Shift amount [bit] A u t o c o rr e l a ti on (b) Auto correlation of m-sequence (31 bits). Fig. 2: (a)
ToF camera system:
The system controller sends two binary signals: f ( t ) to thePMD sensor and g ( t + f ) to the illumination source. For each pixel, the PMD sensor measurescorrelation between reference signal and incident light on pixel (reflected version of g ( t + f ) )as shown in Eq. 1. (b) Auto correlation of m-sequence:
We use m-sequence for both f ( t ) and g ( t ) . The interval between each peak in the auto correlation of m-sequence can be controlledby the code length. Using a sufficiently long code, we can focus on only one peak and treat itas delta function.trical circuits. PLLs in these commercially available electrical circuits are limited to about 100picosecond time delays, which results in a 100 picosecond temporal resolution on the capturedtransient images. In this paper, we will overcome this limit through spatial phase-sweep, whilerestricting the cost of the device to be low.
3. Background
In this section, we will explain the principles of PMD sensor based ToF camera and proceed toexplaining on how ToF camera can be used to obtain transient images.
ToF camera [16–20] consists of a PMD sensor and laser diodes that emit coded illumination g ( t ) . This illumination signal interacts with the scene and reaches a sensor pixel. Due to theavailable technology, the sensor cannot directly measure the signal received, but can only mea-sure the correlation between the received signal and a binary coded signal f ( t ) inside the sensorcircuit. The entire process for each pixel can be mathematically represented as b ( f ) = Z T a ( t ) · Z ¥ g ( t + f − t ) f ( t ) dt d t , with a ( t ) = Z p a p d ( | p | = t ) , (1)where t is temporal delay of illumination due to the finite speed of light that travels fromthe light source to the sensor pixel via scene, a ( t ) is the scene response (integration of allcontributions from different light paths p that correspond to the same delay t ), T is exposuretime, and f is delay for illumination signal controlled by the system.In an ordinary ToF camera that is designed to capture depth information, it is assumed thatthe scene has just single path. Hence, Eq. 1 becomes b ( f ) = a · Z ¥ g ( t + f − t ) f ( t ) dt . (2)Where, t is the time delay. Further, sinusoidal waves with same frequency are utilized for both f ( t ) and g ( t ) . Three or four measurements with different amounts of phase shift are required L KMNO M MNO LPLMQMMNRMNSMNTMNUL VWXY Z[\]^][_ ‘aabcdeefghijdk KOM M OM LMMMNlllOMNlllTMNlllmMNlllUMNllllLLNMMMLLNMMMRcdeefghijdk nopqrs toqtuvtwx yz{{vtwx y|}{{vtwx yz~{{vtwx y(cid:127)}{{vtwx y|}~{{ (cid:128) (cid:129)(cid:128) (cid:130)(cid:128)(cid:128) (cid:130)(cid:129)(cid:128)(cid:128)(cid:129)(cid:130)(cid:128)(cid:130)(cid:129)(cid:131)(cid:128)(cid:131)(cid:129)(cid:132)(cid:128)(cid:132)(cid:129)(cid:133)(cid:128) (cid:134)(cid:135)(cid:136)(cid:137)(cid:138)(cid:139)(cid:140)(cid:141) (cid:142)(cid:143)(cid:144)(cid:137) (cid:145)(cid:136)(cid:136)(cid:146)(cid:147)(cid:148)(cid:148)(cid:149)(cid:148)(cid:150)(cid:151)(cid:151)(cid:152)(cid:153)(cid:154)(cid:155) (cid:156)(cid:157)(cid:154)(cid:158)(cid:159)(cid:160)(cid:157)¡(cid:157)¢£ (cid:153)⁄(cid:155) ¥ƒ££ƒ¢§ (cid:160)(cid:157)(cid:158)(cid:159)¤£ (cid:153)'(cid:155) “(cid:157)(cid:154)« ‹(cid:157)£(cid:157)'£ƒ›¢ (cid:157)(cid:160)(cid:160)›(cid:160)
Fig. 3:
Sampling step and peak detection error: (a) Measurement b ( f ) . Due to subsurfacescattering/indirect reflections, actual cross correlation does not look like a triangular form. (b)OMP based kernel fitting results. Estimated peak positions are marked as X. Smaller samplingstep better fits the estimated curve to ground truth. (c) Relationship between sampling step andpeak estimation error. Error increases as the sampling step becomes larger.to generate depth information. When multiple cameras are in operation, custom codes such aspseudo random sequence are utilized to overcome interference problems [21, 22]. a ( t ) in Eq. 1 is the impulse response of the world or transient response that we are interestedto solve, not just for the single path case, but for a generic case. The most common approachto solve for a ( t ) is by de-convolving b ( f ) with cross-correlation function between f ( t ) and g ( t ) [5, 6] . In [5], various combinations of frequency/phase-shifted sinusoidal functions areused for f ( t ) and g ( t ) to build a correlation matrix. To solve the inverse problem described bythe matrix, they incorporated regularization functions restricting the transient response to besmooth in temporal and spatial domain. In [6], f ( t ) and g ( t ) are designed to be m-sequences sothat inverting the cross-correlation function becomes easy.The common problem with both the approaches is that the measurements b ( f ) cannot besampled at arbitrary sampling rate. With the existing electronics, b ( f ) can only be sampled at10 Gfps. Light events such as subsurface scattering or inter-reflections happen at much fasterrate and are missed in these approaches. Hence, b ( f ) is grossly under sampled. We propose toincrease the sampling rate of measurement vector b ( f ) by a factor of 10, thereby capture thesefast occurring transient events more accurately, at 100 Gfps. Note that b ( f ) cannot be describedin a parametric way as it includes an unknown scene response. Hence, the only way to improvetemporal resolution of transient image is to do finer sampling.To show that, we perform a simple simulated experiment based on an actual measurement b ( f ) . Here, we followed the transient imaging method of [6], which uses m-sequence as f ( t ) and g ( t ) . Using a measurement b ( f ) as ground truth, we performed OMP based kernel fitting[6]. In the OMP based kernel fitting, sub-sampled data of ground truth is used as kernel basis.For simplicity, we assume the scene response is 1-sparse in terms of sub-sampled kernel basis.We change the interval of sampling points to investigate how sampling step affects the fittingresults. As shown in Fig. 3 (a), the actual measurement is not triangular shaped as expectedfrom Fig. 2 (b). This unknown shape is difficult to describe in a parametric way. OMP basedfitting will provide us better estimation because it is based on actual measured kernel. Even fora simple task such as peak estimation, we can see that increasing sampling rate gives us a finerestimation results (Fig. 3 (b), (c)). Hence, doing finer sampling will increase the informationwe can obtain, especially for complicated tasks. fl(cid:176)–†‡·(cid:181)¶(cid:176)– •¶‚„”†¶‡„»fl…‰ …(cid:190) …¿ …(cid:192)`´ˆ `˜ˆ `¯ˆ Fig. 4:
Spatial phase-sweep:
By placing multiple light sources in an array with slightly differ-ent distances from the subject, we can sample the cross correlation function more precisely thanthe conventional PLL based sampling. (a) Graph showing measurements only from light source
4. Increasing temporal resolution of transient imaging
As described in Sec. 2, temporal resolution of conventional transient imaging techniques usingPMD sensor are theoretically limited to around 100 picoseconds [6]. This is determined by theprecision of phase shift control ( f ) of the PLL circuit. In this section, we first formulate how theprecision of controlling f affects the information we can acquire. We then introduce a simpletechnique to boost the temporal resolution without increasing the phase shift precision controlof PLL. The concept of our idea is illustrated in Fig. 4. Suppose h ( x ) is the cross correlation function between f ( t ) and g ( t ) , Eq. 1 can be written as b ( f ) = Z T a ( t ) h ( f − t ) d t , with h ( x ) = Z f ( t ) g ( t + x ) dt . (3)Hence, solving for transient response is a deconvolution problem and is more intuitive in fre-quency domain. Computing the discrete-time Fourier transform on both sides of Eq. 3 andrearranging the terms, we have A ( p f D f ) = B ( p f D f ) H ( p f D f ) (4)where D f is sampling interval (or phase shift amount), and A , B , and H are discrete-timeFourier transforms of a , b and h . Note that sampling performed here is not in temporal domainbut in phase domain. Clearly, Eq. 4 is periodic with period D f . Hence, smaller
D f is better as wecan capture more frequency information without aliasing. For the commercially available PLLcircuits, the phase shift control (
D f ) is around 100 ps. However, light events such as sub-surfacescattering happen at much higher frequency, depending on the properties of the material. Hence,it is crucial to have smaller sampling interval
D f to acquire more information about transientimage.The currently available oscillator’s frequency of 100 ps corresponds to a distance of 3 cmtraveled by light. With the current state of the art design, the 3 cm precision control determinesthe theoretical limitation of the frequency of the transient image. To break this limit, we in-sert extra phase delay that is independent from the innovations in PLL’s design, by arrangingan array of light sources uniformly and perpendicular to the image plane. We call this idea asSpatial phase-sweep’ as the spatial arrangement of light sources sweeps the phase of illumi-nation signal. After incorporating the extra freedom of the position of the light source in Eq. 1,the measurements are re-formulated as: b ( f + m n ) = Z T a ( t ) · Z ¥ g ( t + f + m n − t ) f ( t ) dt d t , (5)where m n is phase delay inserted by n th light source. Though the light sources can be arbitrarilyplaced, we place them uniformly. Hence, m n is given by m n = n · D dc , where D d is the distancebetween two consecutive light sources. In summary, we can now sample the measurements atthe rate of D dc = 10 ps, allowing us to acquire information that is previously not possible. As we change the active light source the amplitude of incident light at each pixel and thedistance between object and light source changes. To overcome this inconsistency, we introducean equalization process between the data taken with different illumination sources. Let us callmeasurements set for multiple light source positions as { b n ( f ) } , where n denotes the indexof active light source. For each measurement n , we calculate equalization coefficient w n byminimizing the following cost function via least squares method: w n = arg min w (cid:229) f (cid:12)(cid:12)(cid:12)(cid:12) ˆ b n ( f ) − w · b n ( f ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:229) f ( ˆ b n ( f ) · b n ( f )) (cid:229) f ( b n ( f )) (6)where b n ( f ) is measurement corresponding to n th light source, and ˆ b n ( f ) is estimation ofequalized b n ( f ) obtained by linearly interpolating data set { b ( f ) } . The cost function is in-tended to decrease the squared error between ˆ b n ( f ) and equalized observation w · b n ( f ) . Fig. 4illustrates the basic idea of Spatial Phase-Sweep.
5. Experimental Setup
Our system consists of a PMD sensor, laser diode, Alteras FPGA development kit DE2-11 anda translation stage. To simulate light source array, we use a translation stage to control thelight source position linearly towards the subject. Fig. 5 (a) shows our setup. FPGA controlsvarious functions of PMD sensor including the reference code f ( t ) . Captured image is read outvia FPGA and saved in an external storage. FPGA also controls laser diode driving board bysending illumination code g ( t ) . This ensures that the frequency and phase of the light sourceand sensor are synchronized. Most of the hardware and software design of our system is basedon the work by Kadambi et al. [6]. Illumination:
The infrared laser diode in our set up is used for illumination. The diodes aredriven by iC-HG from iC-Haus. We can choose arbitrary binary sequence as illumination code f ( t ) . In our system, we used a 31 bit m-sequence. The modulation frequency was 50 MHz.As mentioned above, we changed a single light source position for each measurement by atranslation stage to simulate a light source array. Using such a mechanical component, we cancontrol D d of Eq. 1 in the orders of 0.1 mm. Code control:
The PLL circuit included in the FPGA allows us to shift the phase of theoutput signal depending on the VCO frequency. In our configuration, we can control f byabout 96 ps. This is more precise compared to the code modulation frequency. This phase shiftamount corresponds to light travel distance of about 2.8 cm, which implies that the frame rateof the transient image we can get is around 10 Gfps. Translation stage:
In our experiment, we utilized linear translation stage to control the po-sition of the light source in the orders of 0.1 mm. However, for the approximations given in ˙¨(cid:201)˚˙¸(cid:204)˝˛ˇ —(cid:209)(cid:210)˚(cid:211)(cid:201)(cid:212)(cid:204)(cid:201)(cid:213)(cid:214) (cid:209)(cid:215) (cid:216)(cid:204)(cid:201)(cid:217)(cid:218)(cid:219)(cid:220) (cid:221)(cid:222)(cid:223)(cid:221)(cid:224)Æ (cid:226)ª(cid:228)(cid:229)(cid:230) (cid:221)(cid:224)(cid:231)ÆŁ(cid:222)Ø(cid:218)Œº (cid:236)Æ(cid:237)(cid:223)(cid:221)(cid:238)(cid:237)(cid:230)ª(cid:224)(cid:223) (cid:221)(cid:230)(cid:237)(cid:228)(cid:222)(cid:239)(cid:240)æ (cid:242)(cid:243)(cid:240)(cid:244)ı(cid:240)(cid:246) (cid:247)ł(cid:240)øœß(cid:242)(cid:252)œœ(cid:243) (cid:253)(cid:240)(cid:254)œ(cid:255)(cid:240) (cid:239)(cid:1)æ (cid:0)(cid:2)(cid:3)(cid:243)(cid:246)œ(cid:4) (cid:254)ı(cid:255)(cid:255)(cid:2)(cid:255) (cid:239)(cid:253)æ (cid:5)(cid:255)(cid:240)(cid:243)œø˘˙¨(cid:201)˚˙¸(cid:204)˝˛ˇ —(cid:209)(cid:210)˚(cid:211)(cid:201)(cid:212)(cid:204)(cid:201)(cid:213)(cid:214) (cid:209)(cid:215) (cid:216)(cid:204)(cid:201)(cid:217)˘˙¨(cid:201)˚˙ ¸(cid:204)˝˛ˇ —(cid:209)(cid:210)˚(cid:211)(cid:201)(cid:212)(cid:204)(cid:201)(cid:213)(cid:214) (cid:209)(cid:215) (cid:216)(cid:204)(cid:201)(cid:217)(cid:239)(cid:4)æ (cid:25)(cid:3)(cid:240)(cid:6)(cid:244)ı(cid:7)ı(cid:253)(cid:240)(cid:244)ı(cid:2)(cid:6)
Fig. 5: (color in electronic version)
System implementation and scene setup: (a) Our imple-mentation is comprised of Altera FPGA development kit DE2-115, infrared laser diode, PMD19k-S3, and a translation stage. We show the effect of our spatial phase-sweep technique onthree scenes: an object placed between a coupled mirror (b), grapes which includes small spher-ical surfaces (c), and quantification scene which includes stacked 10 sheets of 3 mm thickness(d). In each picture of setups, the directions faced camera and light source are described withorange and blue arrows. Also, green frames indicate the actual field of view which the camerasees. (cid:8)(cid:9)(cid:9)(cid:10)(cid:11)(cid:8)(cid:12)(cid:13)(cid:14)(cid:8)(cid:15)(cid:12) (cid:16)(cid:13)(cid:11)(cid:17)(cid:18)(cid:13)(cid:19) (cid:20)(cid:20)(cid:21)(cid:22) (cid:23)(cid:24)(cid:26)(cid:26)(cid:27)(cid:23) (a) Schematic chart for qualification (cid:28) (cid:29) (cid:30)(cid:31) ! (b) Temporal resolution [6] " (c) Temporal resolution (proposed)
Fig. 6: (color in electronic version)
Performance evaluation: (a) 10 sheets of 3 mm thicknessare stacked to create the subject. Camera and light source are placed normal to the plane ofthe sheets. See also Fig. 5 (d). (b) The yellow lines indicate the boundary of each sheet. Thered pixel bands occupy all the surface of the subject, which means that the effective temporalresolution of [6] is at most 5 Gfps. (c) On the other hand, in our result, the red pixel bandsoccupy 2–3 sheets in a single frame, which correspond to 16.7–25 Gfps.Appendix A to be valid, we used a step size of 2.8 mm. Hence, the translation stage helped usin inserting 9 extra measurement to increase the temporal resolution 10 times, to 100 Gfps.
6. Results
In this section, we show experimental results both in quantitative and qualitative manner. Inthe visualization process, we perform a simple peak detection based on Orthogonal MatchingPursuit (OMP) technique to show the wave front propagation, similar to [6]. While solvingOMP, we set the sparsity to one and the bases as a set of phase shifted versions of the observedkernel function. To obtain sufficient data to perform OMP, around two thousand measurementsof different f is acquired [6]. Though we only demonstrated a single path method for proof ofconcept, note that our temporal resolution increasing method is generalizable to multiple path x (conventional [6]) (Yellow lines show edges of the sheets with 3 mm of thickness) -./ 01-2 -34-./ 0142 -345./ 0162 -365./ 0172 -378./ 0192
10x (with spatial phase sweep (proposed)) :;< =>:? :@:A;< =>A? :@BB;< =>BC? :@BC;< =>BD? :@BD;< =>BE?:@BE;< =>BF? :@BF;< =>BG? :@BG;< =>BH? :@BH;< =>BA? :@BA;< =>C:?:@C:;< =>CI? :@CI;< =>CB? :@CB;< =>CC? :@CC;< =>CD? :@CD;< =>CE?:@CE;< =>CF? :@CF;< =>CG? :@CG;< =>CH? :@CGE;< =>CA? :@CHE;< =>D:?
Fig. 7:
Quantitative results:
The array of images shows the successive frames of transient im-age. Images above the dotted line correspond to the result without spatial phase sweep (originalframe rate determined by the PLL phase shift capability) and the ones below the line corre-sponds to the results of our method. Red pixels indicate that the light reaches those positions atthe timing of corresponding frames. The elapsed time and frame index are shown in the top leftof each frame. Our result resolves the transient phenomenon into 40 frames which makes thewidth of the band of red pixels narrower than conventional transient image [6], which resolvethe same phenomenon in only 4 steps. See also Visualization 1 and 2 for video version.methods like [5–7].
In this section, we experimentally evaluate the increase in temporal resolution by our method.We placed a terraced slope with 3 mm thick sheets arranged in front of the camera as shownin Fig. 6 (a). We quantify the temporal resolution as the number of sheets occupied by thewavefront as shown in Fig. 6 (c). The reconstructed wave front of light propagation of thestate-of-the-art (1x) and our technique (10x) is shown in Fig. 7. We can clearly observe thesignificantly improved temporal resolution of our spatial phase-sweep.The effective temporal resolution can be measured from the width of the red pixel band. Wepick up a frame and count the number of sheets the red pixel band occupies. Suppose the bandoccupies n sheets, the temporal resolution of the transient image in Frames Per Second ( FPS )will be ( . × ) / ( . × n × ) . The factor 2 is added as the frame rate appears doubledbecause of the distance traveled by light from light source to camera via the subject. FromFig. 6 (c), the band lies in 2–3 sheets in a single frame, which turns into 16.7–25 Gfps. Thesize of the subject is too small to tell the effective temporal resolution of conventional transientimage in the same manner. However, temporal resolution is at most 5 Gfps, since the whole 10sheets are occupied by the red pixel band in a single frame. KLMN OPL QRSO TU RKVWXM JKLMN OPL QRSO TYU RKVWXM (a) Depth maps.
Horizontal pixel count D e p t h [ mm ] from 1x result (mean error = 4.958)from 10x result (mean error = 1.436)ground truth (b) Plotted depth reconstruction. Fig. 8:
Depth reconstruction comparison:
From 1x and 10x results, we generated: (a) depthmaps and (b) plot a part of it for comparison. The depth reconstruction graph plots along thered lines that are indicated in the depth maps. The mean error from the ground truth for 1x and10x results are 4.96 mm and 1.44 mm respectively. To decrease the effect of noise, we applied5 taps of median filter to the data.Our spatial phase sweep technique also improves the accuracy of the depth estimate. Toillustrate that, we generate depth maps using the data in Fig. 7 and plotted them as shown inFig. 8. We can observe that 10x result resolves the object’s depth values into 20 uniform levelswhereas 1x resolves to only 2 uniform levels for the same depth range.
We have evaluated the effect of our technique on several scenes that includes tiny objects, smallenough to describe the improvement by the proposed method. The setups are shown in Fig. 5.
Coupled mirror:
Consider the set up in Fig. 5 (b). The transient images are shown in Fig. 9.The effects of 1x and 10x are similar to the quantification experiment. Consider the top row of1x and top three rows of 10x results. We can notice that the propagating wave front of the lighton the stuffed toy’s surface is resolved more precisely in the 10x result. The light hits at its noseand arms first, then gradually propagates onto its stomach and forehead taking 10–20 frames inthe 10x result. On the other hand, the same phenomenon occurs within only 1–2 frames in the1x result. The width of the band of red pixels is narrower in 10x than 1x. Note that the wavefront moves from outside in the last half of the sequence because of the imaginary light sourcescreated by the mirror (recall Fig. 5 (b)).
Grapes scene:
Fig. 10 shows the transient imaging result of set up in Fig. 5 (c). The lightsource is placed on the right side of the scene. Although we can infer that light is traveling fromright side to left side in both of 1x and 10x results, it can be noticed that 10x result describes thephenomenon more precisely than 1x result. In 10x result, we can observe the light propagationeven on a single grape.
Hue colorization:
In Fig. 11, we show the hue colorized visualization of the transient imagesusing the same data.
7. Discussion and conclusionA simple but effective modification:
We have demonstrated that we can increase the temporalresolution of transient imaging dramatically, by a factor of 10, using just a light source array.The light source array does not increase the cost of the setup significantly. x (conventional [6])
Z[\ ]^Z_ Z‘aZ[\ ]^a_ Z‘ab[\ ]^c_ Z‘cb[\ ]^d_ Z‘de[\ ]^f_Z‘eg[\ ]^b_ Z‘bh[\ ]^aZ_ a‘Zh[\ ]^aa_ a‘ai[\ ]^ac_ a‘ci[\ ]^ad_
10x (with spatial phase sweep (proposed)) jkl mnjo jpjqkl mnqo jpjrkl mnro jpjskl mnso jpjtkl mntojpjukl mnuo jpjvkl mnvo jpjwkl mnwo jpjxkl mnxo jpjykl mnyojpqtkl mnquo jpqykl mnrjo jprtkl mnruo jprykl mnsjo jpsxkl mntjojpxwkl mnyjo jpyjkl mnyto jpytkl mnyxo jpyxkl mnqjro qpjrkl mnqjvoqpjvkl mnqqjo qpqjkl mnqqto qpqskl mnqqxo qpqwkl mnqrro qprukl mnqsjo
Fig. 9:
Coupled mirror scene results:
Images above the dotted line correspond to the resultwithout spatial phase sweep and the ones below the line correspond to the results of our method.The wave front propagation is captured more precisely in the 10x result compared to 1x. Thesame phenomenon, which occurs within 1–2 frames (
Frame rate of transient image:
Although, we have empirically shown that our method im-proves the temporal resolution of PMD based transient imaging system by a factor of 10, thepractical temporal resolution of our system is around 16.7-25 Gfps (see Sec. 6.1). On the otherhand, the actual amount of phase delay between each successive measurement, or temporal res-olution, is 9.6 ps. If we calculate the frame rate using the definitions used in other papers [6, 7],the frame rate translates to 104 Gfps. One of the possible reasons of this gap between effec-tive and theoretical temporal resolution is the SNR of the measured correlation. In our OMPbased peak detection algorithm, signal noise in the correlation could introduce variance into thedetected peak positions. Another possible reason is subsurface scattering effect. Although weobtained OMP kernel from actual data to include the subsurface scattering effect in it, the shapeof the kernel loses high frequencies due to subsurface scattering and that negatively affects theOMP based peak detection.
Limitations:
The physical size of the light sources can limit the size of the light source arrayand hence, the resolution of spatial phase-sweep. The size of the camera, may increase due to x (conventional [6]) z{| }~z(cid:127) z(cid:128)(cid:129){| }~(cid:129)(cid:127) z(cid:128)(cid:130){| }~(cid:130)(cid:127) z(cid:128)(cid:131){| }~(cid:131)(cid:127) z(cid:128)(cid:132){| }~(cid:132)(cid:127)~z ~(cid:129)
10x (with spatial phase sweep (proposed)) (cid:133)(cid:134)(cid:135) (cid:136)(cid:137)(cid:133)(cid:138) (cid:133)(cid:139)(cid:133)(cid:140)(cid:134)(cid:135) (cid:136)(cid:137)(cid:140)(cid:138) (cid:133)(cid:139)(cid:133)(cid:141)(cid:134)(cid:135) (cid:136)(cid:137)(cid:141)(cid:138) (cid:133)(cid:139)(cid:133)(cid:142)(cid:134)(cid:135) (cid:136)(cid:137)(cid:142)(cid:138) (cid:133)(cid:139)(cid:133)(cid:143)(cid:134)(cid:135) (cid:136)(cid:137)(cid:143)(cid:138)(cid:133)(cid:139)(cid:133)(cid:144)(cid:134)(cid:135) (cid:136)(cid:137)(cid:144)(cid:138) (cid:133)(cid:139)(cid:133)(cid:145)(cid:134)(cid:135) (cid:136)(cid:137)(cid:145)(cid:138) (cid:133)(cid:139)(cid:133)(cid:146)(cid:134)(cid:135) (cid:136)(cid:137)(cid:146)(cid:138) (cid:133)(cid:139)(cid:133)(cid:147)(cid:134)(cid:135) (cid:136)(cid:137)(cid:147)(cid:138) (cid:133)(cid:139)(cid:133)(cid:148)(cid:134)(cid:135) (cid:136)(cid:137)(cid:148)(cid:138)(cid:133)(cid:139)(cid:140)(cid:143)(cid:134)(cid:135) (cid:136)(cid:137)(cid:140)(cid:144)(cid:138) (cid:133)(cid:139)(cid:140)(cid:148)(cid:134)(cid:135) (cid:136)(cid:137)(cid:141)(cid:133)(cid:138) (cid:133)(cid:139)(cid:141)(cid:143)(cid:134)(cid:135) (cid:136)(cid:137)(cid:141)(cid:144)(cid:138) (cid:133)(cid:139)(cid:141)(cid:148)(cid:134)(cid:135) (cid:136)(cid:137)(cid:142)(cid:133)(cid:138) (cid:133)(cid:139)(cid:142)(cid:147)(cid:134)(cid:135) (cid:136)(cid:137)(cid:143)(cid:133)(cid:138)(cid:137)(cid:133) (cid:137)(cid:140) (cid:137)(cid:141) (cid:137)(cid:142) (cid:137)(cid:143) (cid:137)(cid:144) (cid:137)(cid:145) (cid:137)(cid:146) (cid:137)(cid:147) (cid:137)(cid:148)
Fig. 10:
Grapes scene results:
Images above the dotted line correspond to the result withoutspatial phase sweep and the ones below the line correspond to the result of our method. 10xresult resolves the way light propagates even on a single grape while 1x result takes only 1frame to cover each grape. See also Visualization 5 and 6 for video version.additional light sources. However, this is not a limiting factor for many practical applications.Our solution is feasible today as the phase control interval in spatial domain is 3 cm. If thatvalue is too large, for example several tens of meters, it would have been impractical to makesuch a large light source array. On the other hand, if the advances in electronics push the phasecontrol of PLL to around 1 ps, then it translates to building a light source array of size 300 m m,which may not be feasible.In terms of number of measurements, the data acquisition time of our method increases lin-early with the increase in temporal resolution. This can be another limiting factor in increasingthe number of light sources. Future directions: • Designing of light source array:
In this paper, we used a single light source and a trans-lation stage. By changing the position of the translation stage we simulated the effect ofa light source array. This requires us to adjust the translation stage at every measurementand is time consuming. Designing light source array will make the system more compactand will reduce the manual work.•
Decreasing number of measurements:
Currently, in our method the number ofmeasurements required increases linearly with the increase of sampling rate in phasedomain. As some previous works have already shown, the impulse response of the sceneis sparse even if it includes multiple paths or scattering. Employing compressive sensingtheory might help us to reduce the number of measurements dramatically.•
Advanced signal model:
The effective temporal resolution of light propagation is lim-ited by the peak detection method we used. Although OMP gives us good results, more (cid:150)(cid:151)(cid:152)(cid:153)(cid:154)(cid:155) (cid:156)(cid:157)(cid:154)(cid:158)(cid:159)(cid:160)¡(cid:160)¢(cid:154)(cid:159)(cid:160)£(cid:158) (cid:153)⁄(cid:155) ¥£(cid:157)ƒ§¤' “(cid:160)««£«(cid:153)¢(cid:155) ‹«(cid:154)ƒ¤› (cid:149)(cid:150)(cid:151)(cid:152)(cid:149)(cid:150)(cid:151)(cid:152)fifl fi(cid:176)fl fifl fi(cid:176)flfifl fi(cid:176)fl
Fig. 11:
Hue colorization:
Hue colorized visualizations are shown using the same data as theresults above. (a) Quantification, (b) coupled mirror, and (c) grapes. Left images correspondto the 1x result and right images correspond to the 10x result. We can notice that while thetemporal resolution of 1x result is too small to represent scene response using the all colorindicated in the color bar, 10x results illustrates the transient image in color smoothly.advanced signal models such as exponentially modified Gaussians [14] might narrowdown the variance of the wave front of light.
Conclusion:
In this paper, we proposed a technique to increase the temporal resolution oftransient imaging by translating temporal domain sampling to spatial domain sweeping. Theo-retically, we have derived the conditions required to align light source uniformly and to makecalibration simple. Though we do a simple modification to existing PMD based transient imag-ing system, we have demonstrated that our method improves temporal resolution dramaticallyin several scenes.
8. Acknowledgments
Most of the hardware and software design of our system is provided by Achuta Kadambi [6].We are extremely grateful for the detailed documentation and the in-depth instructions providedby Kadambi et al. that allowed us to build our prototype. This work was partially supported bySony Corporation and by NSF Grants IIS:1116718 and CCF:1117939.
Appendix A. Systematic error analysis of phase insertion
In the case of single path scene, the amount of phase insertion is a function of the angle betweenthe axis of the light source array and light ray to the subject. This angle dependence of spatialphase-sweep is illustrated in Fig. 12 (a). Each pixel has a different amount of phase insertionwhen the light source is changed. We define systematic error as the maximum difference inthe phase shifts introduced to all the pixels by the change in the light source. It is possible toaccount for these differences in the calibration process by estimating the angle for each pixel.However, this demands additional calibration steps to obtain such an information. Hence, tokeep the system simple, we will evaluate the systematic error theoretically to find the limit onthe number of light sources below which, we can neglect the phase difference between pixelsin the same frame.Consider a simple situation where we have a planar subject and the light sources are per-pendicular to the subject as shown in Fig. 12. We will first calculate the amount of phase shiftintroduced for a point A, S = | O ′ A | − | OA | as a function of q and then find the systematicerror by maximizing the difference between two farthest points A and B. For simplicity, letus assume that B is on the line of light source array. Hence, | O ′ B | − | OB | = D d . Using primal †‡‡· (cid:181)¶(cid:181) •‚„”»… ‰(cid:190)¿(cid:192)`´ ‰¿ˆ˜´`… (a) Overview ¯ ˘˙¨(cid:201)˚ ¸(cid:204)˝˛ˇ(cid:204)— (cid:209)(cid:210) (cid:211)(cid:212)(cid:212)¯ (cid:213)˝(cid:201)˚ˇ˛˚(cid:214) ˘˙¨(cid:201)˚ (cid:215)(cid:204)ˇ ˘(cid:204)(cid:213)˝˛ (cid:216)¯ (cid:213)˝(cid:201)˚ˇ˛˚(cid:214) ˘˙¨(cid:201)˚ (cid:215)(cid:204)ˇ ˘(cid:204)(cid:213)˝˛ (cid:217)(cid:218)(cid:219)(cid:220)(cid:221)(cid:222) (cid:223)(cid:224)(cid:221)(cid:222)Æ(cid:226)(cid:223)ª(cid:224) (cid:220)ƪ(cid:228)(cid:224)(cid:229) (b) Maximum systematic error Fig. 12:
Systematic error: (a) d is the distance between subject and light source, D d ( ≪ d ) isthe distance between two consecutive light sources and q is ∠ AOB . Systematic error is com-puted in Eq. 7. (b) Black solid lines indicate phases controlled by PLL, which are same for allpixels. Red and orange lines are phases inserted for point A and B respectively around f j bythe light source array. The systematic error is the difference between red and orange lines. Themaximum error happens at furthermost inserted phase from the black solid lines and is givenby (cid:4) N (cid:5) D d ( − cos q ) , where N is the size of the light source array.trigonometry, S and its 1st order Maclaurin expansion can be written in terms of D d as follows: S = r d cos q + d D d + D d − d cos q ≃ D d cos q (7)with a remainder (error) term: | R | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a − b ( a + b c + c ) D d (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a − b a x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (8)where a = d / cos q , b = d are substitution variables and R is the remainder term for 1st orderMclaurin expansion. According to Eq. 7, the difference in the inserted phases for point A and Bis proportional to the amount of phase shift introduced ( D d ). Suppose we want to increase thetemporal resolution by N times. We make sure that the worst phase inserted does not deviatefrom the ideal phase by the amount of phase shift introduced. Hence, we have (cid:22) N (cid:23) D d ( − cos q ) ≤ D d ⇒ (cid:22) N (cid:23) ≤ − cos q (9)where D d indicates the minimum distance between the light sources below which the pixels inthe same measurement can be considered to have same phase. This is illustrated in Fig. 12(b).Suppose that the illuminated range is less than 50 ◦ ( q ≤ ◦ ), like a normal lens, the maximummagnification in temporal resolution will be: N ≤ .
3. From Eq. 8, we can evaluate the accu-racy of the approximation in Eq. 7. As mentioned in Sec. 5, the scale of our setup is as follows: d ≥ . q ≤ ◦ and D d ≤ .
03 m. Then the approximation error is less than 7 . × −4