[PDF] A Sketching Framework for Reduced Data Transfer in Photon Counting Lidar

Abstract

Single-photon lidar has become a prominent tool for depth imaging in recent years. At the core of the technique, the depth of a target is measured by constructing a histogram of time delays between emitted light pulses and detected photon arrivals. A major data processing bottleneck arises on the device when either the number of photons per pixel is large or the resolution of the time stamp is fine, as both the space requirement and the complexity of the image reconstruction algorithms scale with these parameters. We solve this limiting bottleneck of existing lidar techniques by sampling the characteristic function of the time of flight (ToF) model to build a compressive statistic, a so-called sketch of the time delay distribution, which is sufficient to infer the spatial distance and intensity of the object. The size of the sketch scales with the degrees of freedom of the ToF model (number of objects) and not, fundamentally, with the number of photons or the time stamp resolution. Moreover, the sketch is highly amenable for on-chip online processing. We show theoretically that the loss of information for compression is controlled and the mean squared error of the inference quickly converges towards the optimal Cram\'er-Rao bound (i.e. no loss of information) for modest sketch sizes. The proposed compressed single-photon lidar framework is tested and evaluated on real life datasets of complex scenes where it is shown that a compression rate of up-to 150 is achievable in practice without sacrificing the overall resolution of the reconstructed image.

Full PDF

aa r X i v : . [ ee ss . SP ] F e b A Sketching Framework for Reduced Data Transferin Photon Counting Lidar

Michael P. Sheehan,

Member, IEEE , Julián Tachella,

Member, IEEE and Mike E. Davies,

Fellow, IEEE

Abstract —Single-photon lidar has become a prominent tool fordepth imaging in recent years. At the core of the technique, thedepth of a target is measured by constructing a histogram of timedelays between emitted light pulses and detected photon arrivals.A major data processing bottleneck arises on the device wheneither the number of photons per pixel is large or the resolutionof the time stamp is ﬁne, as both the space requirement andthe complexity of the image reconstruction algorithms scale withthese parameters. We solve this limiting bottleneck of existinglidar techniques by sampling the characteristic function of thetime of ﬂight (ToF) model to build a compressive statistic, a so-called sketch of the time delay distribution, which is sufﬁcientto infer the spatial distance and intensity of the object. Thesize of the sketch scales with the degrees of freedom of theToF model (number of objects) and not, fundamentally, withthe number of photons or the time stamp resolution. Moreover,the sketch is highly amenable for on-chip online processing. Weshow theoretically that the loss of information for compression iscontrolled and the mean squared error of the inference quicklyconverges towards the optimal Cramér-Rao bound (i.e. no loss ofinformation) for modest sketch sizes. The proposed compressedsingle-photon lidar framework is tested and evaluated on real lifedatasets of complex scenes where it is shown that a compressionrate of up-to 1/150 is achievable in practice without sacriﬁcingthe overall resolution of the reconstructed image.

Index Terms —Single-Photon Lidar, Empirical CharacteristicFunction, Compressive Learning, Summary Statistics

I. I

NTRODUCTION S INGLE photon light detection and ranging (lidar) hasemerged as an important depth imaging technique preva-lent in the automobile [1], [2], defence [3] and forestry indus-tries [4]. This modality has the unique advantage of offeringvery high depth resolution [5] even at long-range scenes usinglow-power (eye-safe) lasers [6]. The technique has at its corethe ability of emitting light pulses and detecting each single-photon as it arrives, thereby obtaining a depth estimate bymeasuring the round-trip time of individual photons. By usinga time correlated single-photon counting (TCSPC) system, ahistogram can be formed indicating the time delay betweenemitted light pulses and detected photons for each pixel, witha proportion of the photons originating from background orambient light (e.g. the sun). The number of counts per timehistogram bin provide information on the depth and reﬂectivityof the object or scene. The presence of a peak in the histogramindicates an object is present within the range of the lidarsystem. The location of this object corresponds to the locationof the impulse response. If the material is semi-transparent

M.Sheehan, J.Tachella and M.Davies are with the School of Engineering,University of Edinburgh. This work is supported by ERC Advanced grant694888, C-SENSE and a Royal Society Wolfson Research Merit Award.Correspondence to M.Sheehan (Email: [email protected]). (e.g. glass, water, camouﬂage) or the laser footprint is large,then multiple peaks with different intensities may exist withina single pixel [5]. A standard example of a TCSPC histogramfor a given pixel within a scene is shown in Figure 1.

Fig. 1: An example of a TCSPC histogram of a pixel in a com-plex scene including a semi-transparent material (camouﬂage)in front of a person which is depicted by the 2 spikes of thehistogram respectively.The image restoration task reduces down to inferring thepositions and intensities of the peaks in the histogram foreach pixel in the image. Typically, either the time stampof all photons or a histogram of bin width ∆ t has to berecorded, stored in memory and transferred from the chip foreach pixel in the scene. The development of high rate, highresolution, low power ToF image sensors is challenging dueto the large data volumes required. This causes a major dataprocessing bottleneck on the device when either the numberof photons per pixel n is large, the time resolution, ∆ t , isﬁne or the spatial resolution is high, as the space requirement,power consumption and computational burden of the depthreconstruction algorithms scale with these parameters [7].Various existing methods have attempted to tackle thetrade-off between depth resolution and computational/spacecomplexities. A number of papers [8]–[12] propose methodsto address the trade-off between depth resolution and the com-plexities associated with the TCSPC histogram. Henderson etal. [8] propose a method that employs a gated procedure tocoarsely bin the detected photons, whilst Ren et al. [9] developa sliding window approach to achieve high resolution depth.Walker et al. [10] calculate the depth directly from the photontime stamps. However in all of these approaches, the approx-imations formed on-chip compromise the depth resolution ofthe image. Della Rocca et al. [11], [12] proposes to only collectthe histograms of photon detections when there is a signiﬁcantchange of activity. This method reduces the data-transfer, asit is only required during speciﬁc moments in time. Similarly,Hutchings et al. [13] propose a method of discarding photon detections based on activity. However, these methods canpotentially remain idle when there is a small change in activity,and can also suffer from a loss of temporal resolution due tocoarse histogram binning. Zhang et al. [14] propose a methodof reducing the transfer of photon detections by performing acoarse to ﬁne approximation of the histogram of time-of-ﬂight.At each scale, a coarse histogram is constructed with a limitednumber of bins. Multiple histograms of increasing resolutionhave to be formed, hence the method has an increased totalacquisition time and can also suffer from a loss of temporalresolution.Compressive sensing strategies have been successfully ap-plied to lidar [15]–[17], focusing on compressing the infor-mation across pixels. Kadambi et al. [16] propose to exploitthe sparsity of natural scenes in some representation domain(e.g. wavelet transform) to greatly reduce signal acquisition.The depth accuracy is limited by the level of amplitude noiseand decay of the impulse response and is therefore limited tothe case of one surface per pixel. In a similar vein, Halimi etal. [18] propose an adaptive sampling strategy that is scenedependent. By building up regions of interest and data drivendepth maps in an iterative manner, they efﬁciently choosesuitable scan positions to reduce acquisition time by up to 8times in comparison to uniform sampling in certain scenarios.However, these methods perform compression within the spa-tial domain and not, in the case of our method, throughout thedepth or time domain and are therefore fundamentally differentin practice to the method proposed in this paper.Another approach to reduce the data transfer of the infor-mation needed to reconstruct the lidar image is to compressthe data on-chip. As highlighted in [19], standard low-leveldata compression methods can be used to compress the dataon-chip, however these methods can only offer up to a modest data reduction and in some cases involve signiﬁcant on-chip computation or there are limitations with respect to on-chip storage.In this paper, we propose a novel solution to this bottleneckof existing lidar techniques by calculating on-the-ﬂy summarystatistics of the photon time stamps, a so-called sketch, basedon samples of the characteristic function of the ToF model.Distinct to compressive sensing, the goal here is not to recoverthe photon counting data but rather the underlying probabilitydistribution. In this sense, we are estimating the probabilitymodel directly from some summary statistics and thereforeour proposed framework utilises much of the theory foundin the generalised method of moments [20], [21], empiricalcharacteristic function [22], [23] and compressive learning[24]–[26] literature. The size of the sketch scales with thedegrees of freedom of the ToF model (i.e., number of objectsin depth) and not with the number of photons or the ﬁnenessof the time resolution, without sacriﬁcing precision in depth.The sketch can be computed for each incoming photon in anonline fashion, only requiring a minimal amount of additionalcomputation which can be performed efﬁciently on-chip. Thesketch can be shown to capture all the salient informationof the histogram, including the ability to explicitly removebackground light or dark count effects, in a compact anddata-efﬁcient form, suitable for both on-chip processing or off- chip post processing. Furthermore, we develop a compressivelidar image reconstruction algorithm which has computationalcomplexity dependent only on the size of the sketch. Ourproposed method paves the way for high accuracy 3D imagingat fast frame rates with low power consumption. In summarythe main contributions of the paper are as follows: • We propose a principled approach for compressing time-of-ﬂight information in an online fashion without the re-quirement to form a histogram and without compromisingdepth resolution. • A compressive single-photon lidar algorithm is proposedwhich does not scale with either the number of photonsor the time stamp resolution in terms of space and timecomplexity. • The statistical efﬁciency, given a compression rate (orsketch size), is quantiﬁed for different single-photon lidarscenarios, showing that only limited measurements of thecharacteristic function are needed to achieve negligibleinformation loss.The remainder of the work is organized as follows. Section2 details the ToF observation model used in single-photonlidar and also presents the idea of summary statistics usedfor parameter estimation including the empirical characteristicfunction and compressive learning statistics. In Section 3 wedetail our proposed compressive single-photon lidar schemeincluding a reconstruction algorithm that has computationalcomplexity which scales with the sketch size m as well asquantifying the statistical efﬁciency of the estimated parame-ters θ . Results of the compressive lidar framework are analysedon both synthetic and real datasets in Section 4. Section 5ﬁnally summarizes our conclusions and discusses future work.II. B ACKGROUND

A. Lidar Observation Model

The photon count at time stamp t ∈ [0 , T − , for anarbitrary pixel can be modelled as a Poisson distribution [27],[28]: y t k | ( r, b, t k ) ∼ P ( rh ( t − t k ) + b ) , (1)where r ≥ denotes the reﬂectivity of the detected surface, h ( · ) is the impulse response of the system and b deﬁnesthe level of background photons. The number of discretizedtime stamp bins over the range of interest is denoted by T .The time stamp t is discretized over the range [0 , T − dependent on the time-stamp resolution ∆ t . For simplicity,here we assume that the integral of the impulse response H = P Tt =1 h ( t ) is constant although the proposed approachcan accommodate more complex scenarios. If the lidar systemis in free running mode where multiple acquisitions of asurface/object are obtained, then the interval [0 , T − canbe thought of as circular in the sense that time-stamp T isequivalent to the time-stamp .Alternatively, one can instead model the time of arrivalof the p th photon detected. We assume there are K distinctreﬂecting surfaces, where α k and α denote the probabilitythat the detected photon originated from the k th surface andbackground sources, respectively. Let x p ∈ [0 , T − denote the time stamp of the p th photon where ≤ p ≤ n , then x p can be described by a mixture distribution [29] π ( x p | α , ..., α K , t , ..., t K ) = K X k =1 α k π s ( x p | t k ) + α π b ( x p ) , (2)where P Kk =0 α k = 1 . The distribution of the photons originat-ing from the signal and background are deﬁned by π s ( x p | t ) = h ( x p − t ) /H and the uniform distribution π b ( x p ) = 1 /T over [0 , T − , respectively. Often in practice, the signal distribution π s is modelled either using a discretized Gaussian distributionover the interval [0 , T − or through the data driven impulsefunction which is calculated through experiments. In SectionIV, we consider both. B. Summary Statistics

Our acquisition goal is to obtain parameter estimates of thesignal model in (2), given the time-stamp of photons detected.Parameter estimation usually involves the inference of a setof parameters θ ∈ Θ ⊂ R K +1 associated to a probabilitymodel π ( · | θ ) deﬁned on some space x ∈ R d . In thecase of single-photon lidar, the dimension d = 1 . Typically,we observe a ﬁnite dataset X = { x i } ni =1 of n sampleswhich we assume is sampled from the distribution given in(2). Maximum likelihood estimation (MLE) is a traditionalparameter estimation method whereby a likelihood functionassociated with the ﬁnite data is maximized with respect tothe model parameters, e.g. ˆ θ = arg min θ n n X i =1 log π ( x i | θ ) . (3)

1) Generalised Method of Moments:

In some cases, thelikelihood function might not have a closed form solution nora computationally tractable approximation [20]. Generalisedmethod of moments [20], [21] (GeMM) is an alternativeparameter estimation method where one estimates θ by match-ing a collection of generalised moments with an empiricalcounterpart computed over a set of ﬁnite data sampled from thedistribution π ( x | θ ) . Given a nonlinear function g : R d → C m ,then we deﬁne the expectation constraint E g ( x ; θ ) = 0 , (4)where E denotes the expectation with respect to the probabilitydistribution π ( x | θ ) . Typically, the GeMM estimator isobtained by minimising a quadratic cost of the empiricaldiscrepancy with respect to θ to try impose the momentconstraints of (4). Let us deﬁne g n ( X ; θ ) := 1 n n X i =1 g ( x i ; θ ) , (5)calculated over X = { x i } ni =1 , then a GeMM classically takesthe form [20], [21] ˆ θ := arg min θ g n ( X ; θ ) T W g n ( X ; θ ) , (6)where W is a symmetric positive deﬁnite weighting matrixthat may depend on θ .

2) Compressive Learning:

Building on the concept ofGeMM, compressive learning [24], [25] utilises generalisedmoments of the data but with the distinct goal of reducingsignal acquisition, space and time complexities. The link toGeMM is established by separating the function g into thefollowing particular form: g ( x ; θ ) = Φ( x ) − E θ Φ( x ) , (7)where Φ : R d C m is often referred to as the feature function.The separable form decouples the measured moments, Φ( x ) ,from the parameters θ that are to be estimated. This is not ausual assumption in GeMM, although it may arise in particularcases. By denoting the empirical mean or the so-called sketchas z n := 1 n n X i =1 Φ( x i ) , (8)we can estimate θ solely from the sketch z n by minimising ˆ θ = arg min θ k z n − E θ Φ( x ) k W , (9)which is the particular compressive GeMM loss of (6). InSection III, we explicitly deﬁne the weighting matrix W forcompressive single-photon lidar.The separable form of g in (7) allows a sketch statistic z n to be formed with a single pass of the data without theneed to store X , and it can easily be updated on the ﬂy withminimal computational cost. The sketch statistic has size m , orsize m if decoupled into its real and imaginary components,which, fundamentally, scales independent of the dimensionsof the dataset X , which in the case of single-photon lidar isthe photon count n or the binning resolution T .

3) Empirical Characteristic Function:

A speciﬁc type ofGeMM is the empirical characteristic function (ECF) esti-mation [22], [23], [30], and occurs when the generalizedmoment is chosen to be Φ( x ) = [ e i ω Tj x ] mj =1 , where i = √− and ω j is a discrete set of frequencies. It is of particularinterest as the expectation of Φ , namely Ψ π ( ω ) = E θ e i ω T x , isspeciﬁcally the characteristic function (CF) of the probabilitydistribution π ( x | θ ) at frequency ω . The CF exists forall distributions, and often has a closed form expression.Moreover, it captures all the information of the probabilitydistribution [31], therefore giving a one-to-one correspondencebetween the CF and the probability distribution π ( x | θ ) . TheCF also has the favourable property that it decays in frequency,i.e. Ψ π ( ω ) → as ω → ∞ , under mild conditions on theprobability distribution π ( x | θ ) [31], [32].For a single depth observation model in (2) (i.e. K = 1 )and a discrete impulse response function h , we deﬁne thecharacteristic function of the observation model by Ψ π ( ω ) = α Ψ π s ( ω ) + α Ψ π b ( ω )= α ˆ h ( ω ) e i ωt + α sinc( ωT / (10)where sinc( x ) = sin( x ) x and ˆ h denotes the (discrete) Fouriertransform of the impulse response function h . It should benoted that we could consider different distributions π b , andhence CFs, to model the detected photons originating from more complex background sources, although this is beyondthe scope of this paper.The feature function Φ is a complex valued function ofsize m . With regards to hardware implementation, it is oftenpreferable and convenient to work directly with real valuedfunctions. The complex term e i ωx can be alternatively writtenas cos( ωx ) + i sin( ωx ) , where e i ωx has been decoupled intoits real and imaginary components. As a result, the featurefunction Φ can be equivalently written as a real valued featurefunction Φ : R d R m , consisting of m real valued termsby stacking the real and complex components, for e.g. Φ( x ) =  cos ( ω x ) ... cos ( ω m x )sin ( ω x ) ... sin ( ω m x )  . For sake of fair comparison to existing hardware implementedmethods in the literature, the results and ﬁgures presentedrepresent a sketch of size m , consisting of m real valuedmeasurements. The nature of the feature function, in terms ofit being represented by a complex or real valued function, willbe made clear in its context throughout the paper.III. S KETCHED L IDAR

We start with a warm up example to highlight the potentialof using a sketch for single-photon lidar and to motivatethe design of the sketch sampling procedure which will bediscussed in Section III-B1.

A. Compressing Single Depth Data

In the absence of photons originating from backgroundsources and the presence of a single surface or object, thesample mean of all the photon time-stamps ( Φ( x ) = x ) is thesimplest summary statistic for estimating the single locationparameter t . This only holds in the noiseless case as thesample mean estimate is heavily biased toward the centre ofthe histogram when background photons are detected.Suppose, we instead observe the cosine and sine of eachphoton count x with angular frequency ω = πT , namely Φ( x ) = " cos (cid:0) πxT (cid:1) sin (cid:0) πxT (cid:1) , (11)and denote z n the real valued sketch of size ( m = 1 )computed over the dataset X as in (8). It is possible to recoveran estimate of the single depth location parameter t directlyfrom the sketch, without recourse to the data X , via thetrigonometric sample mean ˆ t = T π arg  n X j =1 cos (cid:18) πx j T (cid:19) + i n X j =1 sin (cid:18) πx j T (cid:19) (12)where arg is the complex argument. As the backgroundphotons are distributed uniformly over the interval [0 , T − ( π b ( x ) = T ), the expected moment of the photons originatingfrom background sources is zero, E x ∼ π b Φ( x ) = . The result-ing estimate of the single depth parameter ˆ t is therefore anunbiased estimator of the location parameter t . The estimatorin (12) coincides with the circular mean estimator detailed in[33].We summarise the above using a simulated example, wherea pixel of T = 1000 histogram bins with a signal-to-background ratio (SBR) of 1 and a total of n = 600 photonsis simulated, where the time stamp of each photon is denotedby X = { x i } ni =1 . The data was simulated using a Gaussianimpulse response function with σ = 15 and a true position attime stamp t = 320 . Computing the sketch z n from (8) andusing (12) we obtain the sketch estimate ˆ t cm = 323 . and thesample mean estimate of ˆ t = 434 . . The TCSPC histogramalong with both the circular and standard mean estimates aswell as the location parameter t are shown in Figure 2 whereit is evident that the circular mean estimate does not sufferfrom the noise bias inherent in the sample mean. Fig. 2: The TCSPC histogram with t = 320 . The circularmean estimate (yellow) and the standard mean estimate (red)superimposed.Importantly, the sketch formed using the moment in (11)is equivalent to the complex valued ECF sketch z n = n P nj =1 e i ωx j sampled at ω = πT , decoupled into both its realand imaginary components. In fact, the estimate ˆ t in (12) isthe optimal estimator to the compressive ECF sketch detailedin (9) (see Appendix). Principally, we only need to store andtransfer 2 values to accurately estimate the depth location ofthe object or surface, without the requirement to recourse tothe original photon time-stamped data. For the remainder ofthis section, we generalize the approach of forming a sketchin (8) of arbitrary size and sampling the ECF at multiplefrequencies [ ω i ] mi =1 . This will enable us to obtain statisticallyefﬁcient estimates for the single surface case and to solve morecomplex lidar scenes including several surfaces with varyingintensities where more salient information of the observationmodel is required. B. Sampling the ECF

Recall that the observation model π in (2) is discretizedover the interval [0 , T − which we can consider to be asufﬁcient sampling if the distribution in (2) is approximately bandlimited. As a result, the characteristic function Ψ π ( ω ) hasa ﬁnite basis characterized by the set of frequencies (cid:26) πjT (cid:12)(cid:12)(cid:12)(cid:12) j ∈ [0 , T − (cid:27) . (13)We can generalise the approach from Section III-A by sam-pling multiple frequencies from the ﬁnite basis in order toconstruct the ECF sketch. As is the case for the circular mean,the frequencies ω = πjT for j ∈ [0 , T − correspond to thezeros in the sinc function associated with the background pdf π b seen in (10). We can therefore construct a sketch of arbitrarydimension m that is also blind to photons originating frombackground sources by avoiding the zero frequency ω = 0 ofthe ﬁnite basis. As a result, we deﬁne the set of orthogonalfrequencies by Ω := (cid:26) ω j = 2 πjT (cid:12)(cid:12)(cid:12)(cid:12) j ∈ [0 , T − (cid:27) . (14)We coin this set the orthogonal frequencies as it deﬁnes re-gions over the interval of the observation model’s characteristicfunction where the signal’s contribution is orthogonal to thebackground’s contribution.

1) Sampling Schemes:

In order to construct a sketch, we areultimately interested in retaining sufﬁcient salient informationof the characteristic function Ψ π such that we can identify andestimate the unique location and intensity parameters θ of theobservation model π ( x | θ ) deﬁned in (2). It was discussed inSection II that the CF of a probability distribution decays infrequency, i.e. Ψ π ( ω ) → as ω → ∞ . Furthermore, as theobservation model is discretized over the interval, we assumethat the characteristic function of the observation model isapproximately band-limited. A natural sampling scheme wouldtherefore be to sample the ﬁrst m frequencies of the orthogonalfrequencies Ω to capture the maximum energy of the CF. Inother words, we could truncate the CF of the observationmodel whilst avoiding the zero frequency.Alternatively, in [24], [25], provable guarantees for estimat-ing mixture of Gaussian models have been provided, undercertain conditions based on random sampling (cf. compressivesensing [34]) of the CF. It is understood that the higherfrequencies of the CF may provide further information to helpdiscriminate distributions that are close in probability space.Moreover, if the CF decays slowly in frequency then theenergy of the CF will be spread more throughout the set oforthogonal frequencies. We therefore provide an alternativesampling scheme whereby we randomly sample the set oforthogonal frequencies with respect to some sampling law Λ .In a similar design to the frequency sampling pattern proposedin [35], we sample the orthogonal frequencies by ( ω , ω , . . . , ω m ) ∼ Λ ˆ h , (15)where Λ ˆ h ∝ ˆ h . To formalize, we consider the follow samplingschemes in order to construct our ECF sketches:1) Truncated Orthogonal Sampling: Sample the ﬁrst m frequencies i.e j = 1 , , . . . , m from Ω .2) Random Orthogonal Sampling: Sample the set of fre-quencies randomly, governed by the distributing law Λ ˆ h .Depending on the circumstances of the lidar device we mightexpect one or the other sampling scheme to perform better. C. Statistical Estimation

Once the ECF sketch is constructed using either samplingscheme, we must estimate the parameters θ of the observationmodel π ( x | θ ) solely from the sketch z n . In general, thereis no closed form expression to estimate θ from the sketch ofarbitrary size as is the case for the circular mean estimate in(12).It is well documented in the ECF and GeMM literature,e.g. [21], [22], [36], that a complex valued ECF sketch z n of size m , computed over a ﬁnite dataset X = { x , . . . , x n } ,satisﬁes the central limit theorem. Formally, a sketch z n ∈ C m converges asymptotically to a Gaussian random variable z n dist −−→ N (cid:0) Ψ π , n − Σ θ (cid:1) , (16)where Σ θ ∈ C m × m has entries (Σ θ ) ij = Ψ π ( ω i − ω j ) − Ψ π ( ω i )Ψ π ( − ω j ) for i, j = 1 , , . . . , m .The asymptotic normality result in (16) naturally leads to asketch maximum likelihood estimation (SMLE) algorithm thatconsists of minimising the following arg min θ m θ ) + n ( z n − z θ ) T Σ − θ ( z n − z θ ) , (17)where for convenience we denote z θ = (cid:2) Ψ π ( ω j ) (cid:3) mj =1 . Foran observation model consisting of K surfaces and a generalimpulse response function h , recall that z θ =  K X k =1 α k ˆ h ( ω j ) e i ω j t k  mj =1 (18)and θ = ( α , α , . . . , α K , t , . . . , t K ) . Note that we havedropped the sinc function on the assumption that we are usingone of the proposed sampling schemes. Minimising (17) isequivalent to minimising the compressive GeMM objectivefunction deﬁned in (9) with the weighting function chosento be W = Σ − θ . The weighting matrix W = Σ − θ isasymptotically optimal in the sense that it minimises thevariance of the estimator ˆ θ from the sketch sketch z n [21].In practice Σ θ is θ dependent as it is a function of theunderlying parameters θ that are to be estimated. There arevarious well established methods in the GeMM and ECFliterature [20], [21] that tackle the difﬁculty of approximating Σ θ and estimating θ simultaneously. In [31], they use theK-L method which iteratively estimates Σ θ and θ in a twostage procedure by ﬁxing and updating one at a time. Someparticular methods [37] ﬁx Σ θ after only a few iterations ofthe K-L approach to reduce the computational complexityof the algorithm, although this typically comes at the costof introducing sample bias [38]. Occasionally, the covariancematrix is set throughout to be the identity, Σ θ = I , reducing(17) to a standard least squares optimization but this generallyresults in a less statistically efﬁcient estimator ˆ θ [20].In this paper, we use the method of estimating Σ θ and θ simultaneously at each iteration. This approach is commonlyreferred to as Continuous Updating Estimator (CUE) [37]and obtains estimates that do not produce sample bias likethe two-step K-L approach [38] and can often lead to morestatistically efﬁcient estimators [20]. However, the SMLE method is not restricted to the CUE and in certain situationspractitioners may choose to sacriﬁce unbiased and efﬁcientlyoptimal estimators for a reduced computational complexity byconsidering the other methods discussed.The optimisation problem in (17) is also typically nonconvex and can suffer from spurious local minima. For thecase when there is only a single surface, we initialise theSMLE algorithm using the analytic circular mean solutionin (12) with minimal added computational overhead. Fromour experience with synthetic and real data, the circular meanestimate generally initialises the SMLE algorithm within thebasin of the global minima, hence the issues associated withnon-convex optimization are circumvented. For the case ofmultiple surfaces, we form a coarse uniform grid across [0 , T − K and initialise at the smallest SMLE loss. D. Central Limit Theorem

One of the main advantages of the SMLE lidar approachfrom (16) is that even at low photon levels (i.e. small n ),the SMLE estimates quickly follow the central limit theorem(CLT) and provide a good approximation of its expectation.In contrast, the TCSPC histogram used for many estimationmethods, discussed in Section I, is a poor approximation to itsexpectation as each time stamp bin t has only a small numberof photons. Thus efﬁcient processing of the full histogramdata requires careful consideration of the underlying Poissonstatistics [39].This is illustrated in Figure 3 which shows four separatehistograms of the error (ˆ t − t ) for increasing photon count n , along with the asymptotic Gaussian distribution from (16).The estimate ˆ t was obtained from a real valued sketch of size2 ( m = 1 ) using the circular mean estimate in (12). Thesimulated data was the same as the motivation example inSection III-A where a Gaussian IRF with σ = 15 was used.The SBR was set at 1 and the total number of time-stampswas T = 1000 . The total photon count varied from n = 10 to n = 10000 increasing by a factor of 10 each time. For eachphoton count, we estimated the location parameter t a totalof 1000 times where the data X = { x i } mi =1 was simulatedindependently for each trial.Even at extremely low photon counts of n = 10 , the error (ˆ t − t ) can be reasonably approximated by a Gaussian randomvariable centred around 0. This suggests that the estimate ˆ t quickly satisﬁes the central limit theorem with respect to thephoton count n . In the large photon regime ( n = 10000 ), theestimation error is concentrated tightly around zero and mostlycontained within 5 time stamps. These results suggest that thesketched lidar CLT results of (16) hold even for low photonslevels, hence the SMLE loss in (17) is a well-justiﬁed loss tominimise. E. Statistical Efﬁciency

In this section, we calculate the theoretical statistical efﬁ-ciency of the sketched lidar estimates, θ , that parametrize theobservation model π ( x | θ ) in (2), and compare them withthe estimates obtained using the full data (i.e no compression) n=10 -300 0 3000100200 F r equen cy n=100 -50 0 5003060 F r equen cy n=1000 -15 0 1503060 F r equen cy n=10000 -5 0 503060 F r equen cy Fig. 3: Histograms of the (ˆ t − t ) for increasing photon count n where the sketched lidar estimate (circular mean) is denotedby ˆ t .using the relative error percentage. The relative error percent-age, which will be deﬁned later, is a key metric allowing usto quantify the relative loss of information given a sketch ofsize m from a statistical point of view.Statistical efﬁciency is a measure of the variability or qualityof an unbiased estimator ˆ θ [40]. The Cramér-Rao bound givesa lower bound on the mean squared error of ˆ θ [41] andtherefore provides a best case scenario on the variability of theparameter estimates. Given the observation model π ( x | θ ) andthe corresponding Fisher information matrix (FIM), deﬁned as I data ( θ ) := n E " ∂ log π ( x | θ ) ∂θ ! , (19)then the optimal Cramér-Rao mean squared error, in terms ofthe full data, is deﬁned asRMSE n := vuut K X k =1 [ I data ( θ ) − ] { kk } . (20)Equivalently, we can compute the FIM for the sketched caseusing the normality result stated in (16), where the FIM of amultivariate Gaussian distribution [41] is deﬁned as (cid:0) I sketch ( θ ) (cid:1) ij := n ∂ z θ ∂θ i Σ − θ ∂ z θ ∂θ j , (21)where z θ is the sketch deﬁned in (18). Similarly, we deﬁnethe optimal sketched Cramér-Rao mean squared error asRMSE m := vuut K X k =1 [ I sketch ( θ ) − ] { kk } . (22)To quantify the statistical efﬁciency of an estimate obtainedfrom a real valued sketch of size m , we use the relative errorpercentage (REP) metric which compares the optimal sketch root mean squared error RMSE m with the corresponding fulldata root mean squared error RMSE n , deﬁned byREP := 100 RMSE m − RMSE n RMSE n ! . (23)Notably, the FIM of the sketched statistic in (21) scales with n , hence the REP metric is independent of the photon count.We compare the statistical efﬁciency of the sketched lidarestimates to the alternative compression technique of coarsebinning [8] discussed in Section I. The coarse binning ap-proach can be seen to be equivalent to constructing a summarystatistic ˜ z n = n X i =1 n [( j − ˜ m ,j ∆ ˜ m ] ( x i ) o ˜ mj =1 , (24)where ∆ ˜ m = (cid:6) T ˜ m (cid:7) denotes the down-sampling factor, ˜ m denotes the number of measurements equivalent to the real-valued sketch size (i.e. ˜ m = 2 m ) and [ t i ,t i +∆ ˜ m ] ( x ) is theindicator function deﬁned as [( j − ˜ m ,j ∆ ˜ m ] ( x ) := ( if x ∈ (cid:2) ( j − ˜ m , j ∆ ˜ m (cid:1) , Otherwise. (25)Once the coarse binning sketch has been constructed, tradi-tional estimation methods, for e.g log-matched ﬁltering [42]or expectation maximization [43], can be employed on thesketch to estimate the parameters of the observation model.Lidar scenes typically have only 0,1 or 2 reﬂectors inthe scene, therefore in the following experiments we onlyconsider the case where K = 1 , . Moreover, we choosethe setting of the lidar scene (e.g. binning resolution, peaklocation, intensity) to best replicate a realistic setting as seenin Section IV-C. In each experiment, we consider two differentimpulse response functions (IRF), exhibiting both a shortand heavy-tail. Figure 4 depicts the contrasting IRFs andthe magnitude of their corresponding characteristic functions, Ψ π s ( ω ) = ˆ h ( ω ) e i ωt . We evaluate the statistical efﬁciency ofthe sketched and coarse binning estimate using the REP as afunction of the number of real measurements m and examineboth the random and truncated orthogonal sampling schemesdiscussed in Section III-B1.

1) One Surface:

We ﬁrst evaluate the REP for a singlepeak case positioned at t = 430 , a binning resolution of T = 1000 . We consider both low and high background photoncount levels, where the signal to background ratio (SBR) wasset at 10 and 1, respectively. Figure 5 shows the REP metricas a function of the number of real measurements m forthe truncated orthogonal (blue), random orthogonal (red) andcoarse binning (orange) compression techniques, where thehigh (SBR=10) and low (SBR=1) background photon levelsare denoted by a solid and dashed line, respectively. Thetop and bottom plots depict the short and heavy-tailed IRF,accordingly.We ﬁrst observe that both sketched lidar sampling schemesapproach as the real measurements increases and onlya modest number of measurements are needed to obtain alow REP. In contrast, the coarse binning approach exhibitsa slow convergence REP and remains high throughout the

50 100 15000.010.020.030.04

Short Tailed IRFHeavy Tailed IRFShort Tailed CFHeavy Tailed CF

Fig. 4: A ﬁgure showing the CF (black) of a short (solid) andheavy (dashed) tailed impulse response function (red).

10 20 30 4010 Short Tailed IRF

Trun. Orth. (SBR=10) Rand. Orth. (SBR=10) Coarse Binning (SBR=10)Trun. Orth. (SBR=1) Rand. Orth. (SBR=1) Coarse Binning (SBR=1)

10 20 30 4010 Long Tailed IRF

Fig. 5: A ﬁgure showing the REP as a function of the numberof real measurements ( m ) for a single peak lidar scene. Top:short-tailed impulse response function. Bottom: heavy-tailedimpulse response function.measurement range. Importantly, we see that the differentsketch sampling schemes outperform each other dependingon the tail of the IRF and hence the rate of decay of theCF. For instance, the truncated scheme produces a lower REPfor the short-tailed IRF, while the random sampling schemeachieves a quicker convergence and a signiﬁcantly lower REPthroughout the measurement range for the heavy-tailed IRF.This can be explained by Figure 4, the CF of the short-tailedIRF has the majority of its energy contained within the ﬁrstfew ( m = 10 ) frequencies, while the CF of the heavy-tailedIRF has its energy spread more throughout its frequency.

2) Two Surfaces:

We now evaluate the REP for a twopeak case positioned at ( t , t ) = (320 , , a time stampresolution of T = 1000 . The intensity of the two peaks isgiven by and , respectively, simulating an object thatis positioned behind a semi-transparent surface. We simulateboth low and high background levels, where the SBR was

10 20 30 4010 Short Tailed IRF

10 20 30 4010 Long Tailed IRF

Trun. Orth. (SBR=10) Rand. Orth. (SBR=10) Coarse Binning (SBR=10)Trun. Orth. (SBR=1) Rand. Orth. (SBR=1) Coarse Binning (SBR=1)

Fig. 6: A ﬁgure showing the REP as a function of the numberof real measurements ( m ) for a lidar scene with 2 surfaces.Top: short-tailed impulse response function. Bottom: heavy-tailed impulse response function.again set at 10 and 1, respectively. Figure 6 shows the REPmetric as a function of the number of real measurements m for the truncated orthogonal (blue), random orthogonal (red)and coarse binning (orange) compression techniques, wherethe high (SBR=10) and low (SBR=1) background photonlevels are denoted by a solid and dashed line, respectively.The top and bottom plots depict the short and heavy-tailedIRF, accordingly. We see the same pattern as the singlesurface case where the REP remains high for the coarsebinning compression technique while, in contrast, the sketchedlidar converges towards a relatively low REP in a modestnumber of measurements. We again observe that the truncatedscheme performs best on a fast decaying CF, while the randomsampling scheme outperforms the truncated counterpart whenthere is a slow decaying CF. The doubling of the dimensionof the parameter θ by estimating two peaks and intensities,does not have a signiﬁcant impact on the required numberof measurements needed to achieve a relatively low REP. Forinstance in the high SBR (solid) scenario, the truncated orthog-onal sampling scheme requires real measurements( m = 10 )to achieve a REP less than for the unimodal case comparedwith a requirement of real measurements ( m = 12 ) toachieve the same level of REP for the bimodal case.These theoretical results on the statistical efﬁciency of thelidar sketch show that only a moderate sketch size is neededto achieve negligible loss of information. The results are basedon the asymptotic normality property discussed in (16), andwe will see in Section IV-B that in practice this normalityresult holds even for small photon counts of n = 10 . IV. E XPERIMENTS

A. Experimental set up

In this section, we evaluate our compressive lidar frameworkon synthetic and real data with increasingly complex scenes.Our method is compared with classical algorithms workingon the full data space (i.e no compression) namely log-matched ﬁltering [42] and expectation maximization (EM)[43]. Moreover, we also compare our results to the alternativecompression technique of coarse binning [8] discussed inSection I and (24). Both the log-matched ﬁltering and EMalgorithms estimate the location parameters using the full dataand therefore the results obtained from these methods set abenchmark to the estimation accuracy when no compressiontakes place. For sake of fair comparison, we use the real valuedsketch in all the subsequent results, such that the number ofreal measurements is equivalent to m .

1) Processing:

Restoration of depth imaging of single-photon lidar consists of estimating a 3D point cloud from alidar data cube containing the number of photons n i,j,t in pixel ( i, j ) at time stamp t , where i ∈ [1 , . . . , N r ] , j ∈ [1 , . . . , N c ] and t ∈ [0 , T − . We denote the average photon count foreach pixel by ¯ n and process each pixel ( i, j ) of the data cubeand estimate the true location and intensity parameter, denoted t and α , respectively. A data driven impulse response is givenfor each dataset and we can obtain the characteristic functionof the IRF by using (10).

2) Evaluation Metrics:

Two different error metrics are usedto evaluate the performance of our proposed sketched lidarframework. We consider the root mean squared error (RMSE)between the reconstructed image and the ground truth. Giventhat t i,j,k is the location of the k th peak in pixel ( i, j ) and ˆ t i,j,k the estimated counterpart, then the root mean squarederror of the reconstructed image isRMSE := vuut KN r N c N r X i =1 N c X j =1 K X k =1 (cid:16) t i,j,k − ˆ t i,j,k (cid:17) . (26)Secondly, we compare the percentage of true detections F true ( τ ) with a resolution τ . That is, a true detection occurs if | t i,j,k − ˆ t i,j,k | ≤ τ [5]. When plotted as a function of τ , thismetric quantiﬁes the variability of the estimate.The compression of both the sketched lidar and coarsebinning approach is measured in terms of the dimensionreduction achieved by the statistic with respect to the rawTCSPC data and is quantiﬁed by the metric max { mT , mn } ,which is dependent on the dimensions, T and n , of the lidarscene and where the number of real measurements ( m ) isused for sake of fair comparison. B. Synthetic Data

We evaluate the sketched lidar framework on a syntheticdataset simulating a pixel in a scene which consists of a singlepeak response. We chose the parameters that best replicateda realistic lidar scene and that were akin to the real datasetswhich will be discussed in IV-C. Therefore, we set the binningresolution at T = 500 , and impulse response was generatedwith a true Gaussian function where σ = 10 , resulting in standard deviation with respect to the time stamp resolution.The SBR was set at which reﬂects a moderately noisy andchallenging scene, and results in α = α = 0 . . We rana Monte-Carlo simulation with trials to evaluate andcompare the performance of our sketched lidar framework fordifferent photon counts n and varying sketch sizes m .For each trial, we uniformly chose t ∼ U (0 , , andestimated ˆ t sketch for the sketched lidar approach and ˆ t coarse for the alternative compression technique of coarse binning.For point of reference, we estimated the log-matched ﬁlterestimate, ˆ t LMF , which represents the estimate over the fulldata (i.e. no compression). We varied the total number of realmeasurements between ( m = 1 ) and ( m = 10 ) for boththe sketched lidar and coarse binning method and increasedthe photon count from n = 10 to n = 10000 by a factor of10 each time. The solid lines in Figure 4 show the IRF h andthe magnitude of its corresponding CF, respectively. Here weonly show the results for the truncated orthogonal samplingscheme but we observed in practice that the alternative randomorthogonal sampling scheme produces similar results.Figure 7 shows the RMSE as a function of the number ofreal measurements m for the four increasing photon counts n . For both compression techniques, the RMSE decreases onaverage as m increases. The sketched lidar method (blue)exhibits a signiﬁcantly smaller RMSE than the alternativecourse binning technique for all photon counts. Notably, thesketched lidar method converges to a low RMSE even at thelow photon regime of n = 10 , suggesting that the normalityresult in (16) holds for low photon counts. Our sketchedapproach even achieves a smaller RMSE than the LMF for n = 10 , and which suggests the loss of statisticalefﬁciency is negligible even for small sketch sizes as shownin Section III-E.Figure 8 shows the percentage of peaks detected as afunction of the resolution τ for the different real valuedsketch sizes of , and . The sketched lidar methodperforms comparably with the LMF even at low photon counts,achieving approximately of peak detections within only τ = 10 resolutions for a real valued sketch size of ( m = 5 )and a photon count of n = 10 . Impressively, for the largephoton regime of n = 10000 the sketched lidar approachestimates of peaks within τ = 1 resolution for all theshown real valued sketch sizes. In comparison the alternativecoarse binning method only estimates of peaks within thesame resolution τ = 1 for the largest sketch size of ˜ m = 20 . C. Real Data

In this section we evaluate our sketched lidar frameworkon two real datasets of increasing complexity. Namely, apolystyrene head imaged at Heriot-Watt University [5], [28]which consists mostly of a single peak, and a scene wheretwo humans are standing behind a camouﬂage net, depictedin [7], [44], which contains of 2 objects per pixel with varyingintensity. Log-matched ﬁltering works under the assumption of negligible back-ground noise while here the SBR is 1. -2 SMLE Coarse Binning LMF Fig. 7: A ﬁgure showing the RMSE as a function of the numberof real measurements ( m ) for both the sketched lidar (blue)and coarse binning (orange) method. The log matched ﬁlter(dashed black) is shown for comparison.

10 30 5005010010 30 50050100 10 30 5005010010 30 50050100

Fig. 8: A ﬁgure showing the percentage of peak detected as afunction of the time stamp resolution τ for both the sketchedlidar and coarse binning method for varying real measurementsizes ( m ).

1) Polystyrene Head:

The ﬁrst scene consists of apolystyrene head placed 40 meters away from the lidar de-vice. The data cube has width and height of 141 pixels, N r = N c = 141 and a total of T = 4613 time stamps. Atotal acquisition time of 100 milliseconds was used for eachpixel resulting in an average photon count of ¯ n = 337 withan SBR of approximately 6.82. The vast majority of pixelsconsist of a single peak, although there are a minority ofpixels around the borders of the head that consist of twopeaks. The parameter set to be estimated for each pixel is θ = ( t, α ) of dimension 2. We compare our results with the ground truth obtained from the experiment as well as thefull data algorithm of log-matched ﬁltering and the coarsebinning compression technique. As the log-matched ﬁlter isthe maximum likelihood estimation of a single peak, weassume each pixel has one surface for the sake of comparison.The coarse binning approach is computed using log-matchedﬁltering once the data cube is down-sampled. Impulse ResponseChar. Function

Fig. 9: A ﬁgure showing the CF (black) of the data driven im-pulse response function (red) of the polystyrene head dataset.The data driven impulse response function and its corre-sponding CF obtained from (10), are shown in Figure 9. Weonly present the results for the truncated orthogonal samplingscheme, from Section III-B1, but we observed in practice thatthe alternative random orthogonal sampling scheme producessimilar results. We initialise the sketched lidar algorithm usingthe analytic circular mean solution in (12).Figure 10 shows the reconstructed images of the sketchedlidar, coarse binning and log matched ﬁlter approaches, aswell as the ground truth image. We ﬁrst notice that oursketched lidar method sufﬁciently reconstructs the polystyrenehead scene for all sketch sizes, even for the circular meanestimate ( m = 1 ) in (a). In contrast, the coarse binningapproach fails for all corresponding measurements ˜ m withsigniﬁcant staircase artifacts arising. Figure 11 shows theRMSE, in comparison to the ground truth, as a functionof the number of real measurements ( m ). We observe thatour sketched lidar method produces a smaller RMSE as themeasurement size increases and achieves a smaller RMSE thanthe LMF approach for larger measurements. In comparison,the coarse binning method obtain estimates that produce alarge RMSE consistently throughout. As such, this suggeststhat our sketched lidar approach does not compromise reducedresolution in favour of compression which is very apparent inthe coarse binning method.

2) Humans Behind Camouﬂage :

The second scene consistsof two humans standing behind a camouﬂage net approxi-mately 320 metres away from the lidar device. Further detailscan be found of the scene in [44], [45]. The data cube haswidth and height of 32 pixels, N r = N c = 32 and a totalof T = 153 time stamps. A total acquisition time of 5.6milliseconds was used for each pixel resulting in an averagephoton count of ¯ n = 871 with an approximate SBR of 2.35.The vast majority of pixels have 2 surfaces (the camouﬂagenet and a human) where the net (ﬁrst peak) accounts for thebiggest intensity. The parameter set to be estimated for eachpixel is θ = ( t , t , α , α ) of dimension 4. We compare our Sketched Lidar Coarse Binning

RMSE = 10 . RMSE = 1430 . [mm] [mm] Compression = 0 . Compression = 0 . a) Real Measurements = 2 ( m = 1 ) b) Measurements = 2 RMSE = 8 . RMSE = 201 . [mm] [mm] Compression = 0 . Compression = 0 . c) Real Measurements = 8 ( m = 4 ) d) Measurements = 8 RMSE = 7 . RMSE = 100 . [mm] [mm] Compression = 0 . Compression = 0 . e) Real Measurements = 20 ( m = 10 ) f) Measurements = 20 Log Matched Filter Ground Truth

RMSE = 6 . [mm]

500 700 [mm] g) No Compression h)Intensity

Fig. 10: A ﬁgure showing the face dataset lidar reconstructionsof the sketched lidar and coarse binning method for thereal valued measurement size , , . Both the log matchedﬁlter reconstruction and the ground truth image are given forcomparison.results with the full data EM algorithm as well as the coarsebinning compression technique. For this experiment, the coarsebinning algorithm uses the EM estimate once the data cube has Coarse BinningLog Matched FilterSMLE (Trunc. Orth. Sampling)

Fig. 11: A ﬁgure showing the RMSE as a function of thenumber of real measurements ( m ) for the polystyrene headdataset. The random orthogonal sketched lidar (red) and coarsebinning method (orange) are compared with the full dataestimation via log matched ﬁltering (dashed black).been down-sampled as the log-match ﬁltering algorithm is onlyapplicable to single peak cases. Due to the lack of a groundtruth, we compare the reconstructions of the camouﬂage sceneto the full data EM algorithm reconstruction and equate therelevant compression of both the sketched lidar frameworkand the coarse binning technique. The data driven impulseresponse function h and its corresponding CF obtained from(10), are shown in Figure 12. Again, we only present theresults for the truncated orthogonal sampling scheme, fromSection III-B1, but we observed in practice that the alternativerandom orthogonal sampling scheme produces similar results.We uniformly sampled 10 starting points for each of peak t and t and initialised with the smallest sketched cost functionfrom (9). Impulse Response Func.Char. Func.

Fig. 12: A ﬁgure showing the CF (black) of the data drivenimpulse response function (red) of the camouﬂage dataset.Figure 13 shows the reconstructed images of the sketchedlidar, coarse binning and EM algorithm methods. Evidently,the reconstruction of our sketched lidar approach becomesbetter as the number of real measurements ( m ) increases,for instance the torso of the human positioned near 600 cmhas greater clarity in sketch size 20 compared to sketch size4 where more spurious peaks are detected. However, thesketched lidar reconstruction for m = 2 is still sufﬁcient incomparison to the EM reconstruction in (g), while in contrast the coarse binning method fails to reconstruct either humanfor the corresponding number of measurements. The coarsebinning method once again suffers from the stair case effectas seen by the lack of width of the ﬁrst human standing atposition 200 cm in (f). Furthermore, the compression due tothe coarse binning results in poor depth accuracy as seen bythe position of the camouﬂage net in reconstruction (b) whichhas a disparity of approximately 120 cm in comparison to theEM reconstruction. Once again, this suggests that our sketchedlidar approach does not compromise reduced resolution infavour of compression which is very apparent in the coarsebinning method. V. C ONCLUSION

In this paper, we proposed a novel sketching solution tohandle the major data processing bottleneck of single-photonlidar caused by the ﬁne resolution of modern high rate,high resolution ToF image sensors. Our approach involvedsampling the characteristic function of the observation modelto form online statistics that have dimensionality proportionalto the number of parameters of the model. Furthermore, wedeveloped an efﬁcient sketching algorithm, inspired by ECFestimation techniques, which has space and time complexitythat fundamentally scales with the size of the sketch m , and isindependent of both photon count and depth resolution. Twosampling schemes are proposed that sample in regions of thecharacteristic function that are blind to photons originatingfrom background sources. As a result, our method obtainsestimates of the location and intensity parameters that areunbiased.Our novel sketch based acquisition removes the trade-offbetween depth resolution and data transfer complexity that isapparent in existing methods. Here we have only considered asimple pixel-wise depth estimate method in the form of thesketched MLE. It should be straightforward to incorporatethe sketched statistics into more sophisticated state-of-the-artreconstruction algorithms, such as the real-time 3D algorithmin [7]. However, we leave this for future work. Anotherline of future work would be to use the sketch statistics forother algorithmic purposes such as target detection and multi-reﬂection detection.A CKNOWLEDGEMENTS

This work was supported by the ERC Advancedgrant, project C-SENSE, (ERC-ADG-2015-694888). Mike. E.Davies is also supported by a Royal Society Wolfson ResearchMerit Award. The authors would like to thank the single-photon group at HWU (https://single-photon.com) for the useof the datasets used in Section IV-C. The polystyrene headdataset in IV-C1 and the camouﬂage dataset in IV-C2 wereobtained from [5] and [7], respectively.R

EFERENCES[1] J. Hecht, “Lidar for self-driving cars,”

Opt. Photon. News , vol. 29, no. 1,pp. 26–33, Jan 2018.[2] J. Rapp, J. Tachella, Y. Altmann, S. McLaughlin, and V. K. Goyal,“Advances in single-photon lidar for autonomous vehicles: Workingprinciples, challenges, and recent advances,”

IEEE Signal ProcessingMagazine , vol. 37, no. 4, pp. 62–71, 2020. Sketched Lidar Coarse Binning [cm] [cm]

Compression = 0 . Compression = 0 . a) Real Measurements = 4 ( m = 2 ) b) Measurements = 4 [cm] [cm] Compression = 0 . Compression = 0 . c) Real Measurements = 14 ( m = 7 ) d) Measurements = 14 [cm] [cm] Compression = 0 . Compression = 0 . e) Real Measurements = 20 ( m = 10 ) f) Measurements = 20 Expectation Maximization g) No CompressionIntensity

Fig. 13: A ﬁgure showing the camouﬂage dataset lidar recon-structions of the sketched lidar and coarse binning method forthe real valued measurement size ( m ) of , , . Both thelog matched ﬁlter reconstruction and the ground truth imageare given for comparison. [3] J. Gao, J. Sun, J. Wei, and Q. Wang, “Research of underwater targetdetection using a slit streak tube imaging lidar,” in , 2011, pp. 240–243.[4] M. Pierzchała, P. Giguère, and R. Astrup, “Mapping forests using anunmanned ground vehicle with 3D lidar and graph-slam,” Computersand Electronics in Agriculture , vol. 145, pp. 217 – 225, 2018.[5] J. Tachella, Y. Altmann, X. Ren, A. McCarthy, G. S. Buller, S. McLaugh-lin, and J. Tourneret, “Bayesian 3D reconstruction of complex scenesfrom single-photon lidar data,”

SIAM Journal on Imaging Sciences ,vol. 12, pp. 521–550, 03 2019.[6] A. M. Pawlikowska, A. Halimi, R. A. Lamb, and G. S. Buller, “Single-photon three-dimensional imaging at up to 10 kilometers range.”

Optics express , vol. 25 10, pp. 11 919–11 931, 2017.[7] J. Tachella, Y. Altmann, N. Mellado, A. McCarthy, R. Tobin, G. S.Buller, J. Tourneret, and S. McLaughlin, “Real-time 3D reconstructionfrom single-photon lidar data using plug-and-play point cloud denoisers,”

Nature communications , vol. 10, no. 1, pp. 1–6, 2019.[8] R. K. Henderson, N. Johnston, H. Chen, D. D. Li, G. Hungerford,R. Hirsch, D. McLoskey, P. Yip, and D. J. S. Birch, “A 192×128 timecorrelated single photon counting imager in 40nm CMOS technology,”in

ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference(ESSCIRC) , 2018, pp. 54–57.[9] X. Ren, P. W. R. Connolly, A. Halimi, Y. Altmann, S. McLaughlin,I. Gyongy, R. K. Henderson, and G. S. Buller, “High-resolution depthproﬁling using a range-gated CMOS SPAD quanta image sensor,”

Opt.Express , vol. 26, no. 5, pp. 5541–5557, Mar 2018.[10] R. J. Walker, J. A. Richardson, and R. K. Henderson, “A 128×96 pixelevent-driven phase-domain δσ -based fully digital 3D camera in 0.13 µ mCMOS imaging technology,” in , 2011, pp. 410–412.[11] F. M. Della Rocca, H. Mai, S. W. Hutchings, T. Al Abbas, A. Tsiamis,P. Lomax, I. Gyongy, N. A. W. Dutton, and R. K. Henderson, “A 128 ×128 SPAD dynamic vision-triggered time of ﬂight imager,” in ESSCIRC2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC) ,2019, pp. 93–96.[12] F. Mattioli Della Rocca, H. Mai, S. W. Hutchings, T. A. Abbas,K. Buckbee, A. Tsiamis, P. Lomax, I. Gyongy, N. A. W. Dutton,and R. K. Henderson, “A 128 × 128 SPAD motion-triggered time-of-ﬂight image sensor with in-pixel histogram and column-parallel visionprocessor,”

IEEE Journal of Solid-State Circuits , vol. 55, no. 7, pp. 1762–1775, 2020.[13] S. W. Hutchings, N. Johnston, I. Gyongy, T. Al Abbas, N. A. W. Dutton,M. Tyler, S. Chan, J. Leach, and R. K. Henderson, “A reconﬁgurable3-D-stacked SPAD imager with in-pixel histogramming for ﬂash lidar orhigh-speed time-of-ﬂight imaging,”

IEEE Journal of Solid-State Circuits ,vol. 54, no. 11, pp. 2947–2956, 2019.[14] C. Zhang, S. Lindner, I. M. Antolovi´c, J. Mata Pavia, M. Wolf, andE. Charbon, “A 30-frames/s, × SPAD ﬂash lidar with 1728dual-clock 48.8-ps tdcs, and pixel-wise integrated histogramming,”

IEEEJournal of Solid-State Circuits , vol. 54, no. 4, pp. 1137–1151, 2019.[15] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F.Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressivesampling,”

IEEE Signal Processing Magazine , vol. 25, no. 2, pp. 83–91, 2008.[16] A. Kadambi and P. T. Boufounos, “Coded aperture compressive 3-Dlidar,” in , 2015, pp. 1166–1170.[17] R. Tobin, Y. Altmann, X. Ren, A. McCarthy, R. Lamb, S. McLaughlin,and G. S. Buller, “Comparative study of sampling strategies for sparsephoton multispectral lidar imaging: Towards mosaic ﬁlter arrays,”

Jour-nal of Optics , vol. 19, p. 094006, 09 2017.[18] A. Halimi, P. Ciuciu, A. Mccarthy, S. Mclaughlin, and G. S. Buller, “Fastadaptive scene sampling for single-photon 3D lidar images,” in

IEEECAMSAP 2019 - International Workshop on Computational Advancesin Multi-Sensor Adaptive Processing , Le Gosier (Guadeloupe), France,Dec. 2019.[19] I. Maksymova, C. Steger, and N. Druml, “Review of lidar sensor dataacquisition and compression for automotive applications,” in

Multidisci-plinary Digital Publishing Institute Proceedings , vol. 2, no. 13, 2018, p.852.[20] L. P. Hansen, “Large sample properties of generalized method ofmoments estimators,”

Econometrica , vol. 50, no. 4, pp. 1029–1054,1982.[21] A. Hall,

Generalized Method of Moments , 11 2007, pp. 230 – 255.[22] A. Feuerverger and A. Mureika, “The empirical characteristic functionand its applications,”

The Annals of Statistics , vol. 5, no. 1, pp. 88–97,1977.[23] A. Feuerverger and P. McDunnough, “On the efﬁciency of empiricalcharacteristic function procedures,”

Journal of the Royal StatisticalSociety. Series B (Methodological) , vol. 43, no. 1, pp. 20–27, 1981.[24] R. Gribonval, G. Blanchard, N. Keriven, and Y. Traonmilin, “Statisticallearning guarantees for compressive clustering and compressive mixturemodeling,” arXiv preprint arXiv:2004.08085 , 2020.[25] N. Keriven, A. Bourrier, R. Gribonval, and P. Pérez, “Sketching for large-scale learning of mixture models,”

Information and Inference: A Journalof the IMA , vol. 7, no. 3, pp. 447–508, 2018.[26] M. P. Sheehan, M. S. Kotzagiannidis, and M. E. Davies, “Compressiveindependent component analysis,” in , 2019, pp. 1–5. [27] S. Hernandez-Marin, A. M. Wallace, and G. J. Gibson, “Bayesiananalysis of lidar signals with multiple returns,” IEEE Transactions onPattern Analysis and Machine Intelligence , vol. 29, no. 12, pp. 2170–2180, 2007.[28] Y. Altmann, X. Ren, A. McCarthy, G. S. Buller, and S. McLaughlin,“Lidar waveform-based analysis of depth images constructed usingsparse single-photon data,”

IEEE Transactions on Image Processing ,vol. 25, no. 5, pp. 1935–1946, 2016.[29] Y. Altmann and S. McLaughlin, “Range estimation from single-photonlidar data using a stochastic em approach,” in , 2018, pp. 1112–1116.[30] M. Carrasco and J. P. Florens, “Generalization of GMM to a continuumof moment conditions,”

Econometric Theory , pp. 797–834, 2000.[31] A. Feuerverger and P. McDunnough, “On some Fourier methods forinference,”

Journal of the American Statistical Association , vol. 76, no.374, pp. 379–387, 1981.[32] E. Lukacs, O. Szász et al. , “On analytic characteristic functions,”

PaciﬁcJ. Math , vol. 2, no. 4, pp. 615–625, 1952.[33] S. R. Jammalamadaka and A. Sengupta,

Topics in circular statistics .world scientiﬁc, 2001, vol. 5.[34] Y. C. Eldar and G. Kutyniok,

Compressed sensing: theory and applica-tions . Cambridge university press, 2012.[35] N. Keriven, A. Bourrier, R. Gribonval, and P. Pérez, “Sketching forlarge-scale learning of mixture models,” in , 2016,pp. 6190–6194.[36] L. Hansen, “Large sample properties of generalized method of momentsestimators,”

Econometrica , vol. 50, no. 4, pp. 1029–1054, 1982.[37] L. P. Hansen, J. Heaton, and A. Yaron, “Finite-sample properties ofsome alternative GMM estimators,”

Journal of Business & EconomicStatistics , vol. 14, no. 3, pp. 262–280, 1996.[38] J. Hausman, R. Lewis, K. Menzel, and W. Newey, “Properties of the cueestimator and a modiﬁcation with moments,”

Journal of Econometrics ,vol. 165, no. 1, pp. 45 – 57, 2011, moment Restriction-Based Econo-metric Methods.[39] D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efﬁcientcomputational 3-D and reﬂectivity imaging with single-photon detec-tors,”

IEEE Transactions on Computational Imaging , vol. 1, no. 2, pp.112–125, 2015.[40] R. A. Fisher, “On the mathematical foundations of theoretical statistics,”

Philosophical Transactions of the Royal Society of London. Series A,Containing Papers of a Mathematical or Physical Character , vol. 222,pp. 309–368, 1922.[41] S.-i. A and H. Nagaoka,

Methods of information geometry . AmericanMathematical Soc., 2007, vol. 191.[42] G. Turin, “An introduction to matched ﬁlters,”

IRE Transactions onInformation Theory , vol. 6, no. 3, pp. 311–329, 1960.[43] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the em algorithm,”

Journal of the RoyalStatistical Society. Series B (Methodological) , vol. 39, no. 1, pp. 1–38,1977.[44] A. Halimi, R. Tobin, A. McCarthy, S. McLaughlin, and G. S. Buller,“Restoration of multilayered single-photon 3D lidar images,” in , 2017, pp.708–712.[45] R. Tobin, A. Halimi, A. McCarthy, X. Ren, K. J. McEwan, S. McLaugh-lin, and G. S. Buller, “Long-range depth proﬁling of camouﬂaged targetsusing single-photon detection,”

Optical Engineering , vol. 57, no. 3, p.031303, 2017. A PPENDIX

Deriving the Circular Mean Estimate From the ECFEstimation:

Given that we have a single frequency ω ∈ R , we can deﬁnethe sketch as z n = n P nj =1 e i ωx j and the goal is to solve: ˆ θ = arg min θ ( z n − Ψ π ( ω )) . (27)Clearly, (27) is minimised when Ψ π ( ω ) = z n and equatingthe real and complex components we get: αe ( ωt )22 cos( ωt ) − (1 − α ) sinc( ωT n n X j =1 cos( ωx j ) (28) αe ( ωt )22 cos( ωt ) = 1 n n X j =1 sin( ωx j ) . (29)Notably, we can optimally choose the frequency to be ω = πT resulting in sinc( ωT ) = 0 and thereby ensurethe characteristic function is sampled in a region where thebackground noise is not present. Consequently, dividing (28)by (29) we get αe ( 2 πtT )22 cos( πtT ) αe ( 2 πtT )22 sin( πtT ) = P nj =1 cos( ωx j ) P nj =1 sin( ωx j ) , (30)resulting in an optimal estimate of θ ∗ = T π arctan 2 P nj =1 cos( ωx j ) P nj =1 sin( ωx j ) ! ..