[PDF] Depth Reconstruction from Sparse Samples: Representation, Algorithm, and Sampling

Abstract

The rapid development of 3D technology and computer vision applications have motivated a thrust of methodologies for depth acquisition and estimation. However, most existing hardware and software methods have limited performance due to poor depth precision, low resolution and high computational cost. In this paper, we present a computationally efficient method to recover dense depth maps from sparse measurements. We make three contributions. First, we provide empirical evidence that depth maps can be encoded much more sparsely than natural images by using common dictionaries such as wavelets and contourlets. We also show that a combined wavelet-contourlet dictionary achieves better performance than using either dictionary alone. Second, we propose an alternating direction method of multipliers (ADMM) to achieve fast reconstruction. A multi-scale warm start procedure is proposed to speed up the convergence. Third, we propose a two-stage randomized sampling scheme to optimally choose the sampling locations, thus maximizing the reconstruction performance for any given sampling budget. Experimental results show that the proposed method produces high quality dense depth estimates, and is robust to noisy measurements. Applications to real data in stereo matching are demonstrated.

Full PDF

aa r X i v : . [ c s . C V ] F e b Depth Reconstruction from Sparse Samples:Representation, Algorithm, and Sampling

Lee-Kang Liu,

Student Member, IEEE , Stanley H. Chan,

Member, IEEE and Truong Q. Nguyen,

Fellow, IEEE

Abstract —The rapid development of 3D technology and com-puter vision applications have motivated a thrust of methodolo-gies for depth acquisition and estimation. However, most existinghardware and software methods have limited performance due topoor depth precision, low resolution and high computational cost.In this paper, we present a computationally efﬁcient method torecover dense depth maps from sparse measurements. We makethree contributions. First, we provide empirical evidence thatdepth maps can be encoded much more sparsely than naturalimages by using common dictionaries such as wavelets andcontourlets. We also show that a combined wavelet-contourlet dic-tionary achieves better performance than using either dictionaryalone. Second, we propose an alternating direction method ofmultipliers (ADMM) for depth map reconstruction. A multi-scalewarm start procedure is proposed to speed up the convergence.Third, we propose a two-stage randomized sampling schemeto optimally choose the sampling locations, thus maximizingthe reconstruction performance for any given sampling budget.Experimental results show that the proposed method produceshigh quality dense depth estimates, and is robust to noisymeasurements. Applications to real data in stereo matching aredemonstrated.

Index Terms —Sparse reconstruction, random sampling,wavelet, contourlet, disparity estimation, alternating directionmethod of multipliers, compressed sensing

I. I

NTRODUCTION

The rapid development of 3D technology has created a newwave of visualization and sensing impacts to the digital signalprocessing community. From remote sensing [1] to preservinghistorical heritages [2], and from rescue [3] to 3D laparoscopicsurgery [4], [5], the footprints of 3D have been inﬂuencing abroad spectrum of the technological frontiers.The successful development of 3D signal processing isfundamentally linked to a system’s ability to acquire depth.To date, there are two major classes of depth acquisitiontechniques: hardware solutions and computational procedures.Hardware devices are usually equipped with active sensorssuch as time-of-ﬂight camera [6] and LiDAR [7]. While beingable to produce high quality depth maps, these hardwaresystems have high instrumentation cost. Moreover, the dataacquisition time of the devices is long (10 fps as opposed

L. Liu and T. Nguyen are with Department of Electrical and ComputerEngineering, University of California at San Diego, La Jolla, CA 92093, USA.Emails: [email protected] and [email protected]. Chan is with School of Electrical and Computer Engineering andDepartment of Statistics, Purdue University, West Lafayette, IN 47907, USA.Email: [email protected] work was supported in part by the National Science Foundationunder grant CCF-1065305. Preliminary material in this paper was presentedat the 39th IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP), Florence, May 2014. to 60fps on standard cameras [8]). Although speeding up ispossible, spatial resolution has to be traded off in return.An alternative solution to acquiring depth is to estimatedepth using a set of computational procedures. This classof computational methods, broadly referred to as disparityestimation algorithms [9–12], estimates the depth by comput-ing the disparities between a pair of stereo images throughtheir corresponding matching features [13], [14]. Disparity es-timation algorithms usually work well under well conditionedenvironments, but they could be sensitive to illumination,noise, stereo camera alignments, and other camera factors.Thus, the effective number of reliable features that one canuse for disparity estimation is actually much fewer than thenumber of pixels of the image [15], [16].

A. Scope and Contributions

The objective of this paper is to present a sampling andreconstruction framework to improve and speed up the depthacquisition process. The key idea is to carefully select a sparsesubset of spatial samples and use an optimization algorithmto reconstruct the ﬁnal dense depth map.The three major contributions of this paper are as follows.

1) Representation (Section III). In order to reconstruct thedepth map, we must ﬁrst deﬁne an appropriate representation.We show that, as opposed to natural images, depth maps canbe well approximated using a sparse subset of wavelet atoms.Moreover, we show that a combined dictionary of wavelets andcontourlets can further improve the reconstruction quality.

2) Algorithm (Section IV). We propose a fast numericalalgorithm based on the alternating direction method of multi-pliers (ADMM). We derive novel splitting strategies that allowone to solve a sequence of parallelizable subproblems. Wealso present a multiscale implementation that utilizes the depthstructures for efﬁcient warm starts.

3) Sampling (Section V). We propose an efﬁcient spatialsampling strategy that maximizes the reconstruction perfor-mance. In particular, we show that for a ﬁxed samplingbudget, a high quality sampling pattern can be obtained byallocating random samples with probabilities in proportionalto the magnitudes of the depth gradients.

B. Related Work

The focus of this paper lies in the intersection of two closelyrelated subjects: depth enhancement and compressed sensing.Both subjects have a rich collection of prior works but thereare also limitations which we should now discuss.

The goal of depth enhancement is to improve the resolutionof a depth map. Some classical examples include MarkovRandom Field (MRF) [17], bilateral ﬁlter [18], and otherapproaches [19], [20]. One limitation of these methods isthat the low-resolution depth maps are sampled uniformly.Also, it is usually assumed that a color image of the sceneis available. In contrast, our proposed method is applicableto any non-uniformly sampled low-resolution depth map anddoes not require color images. Thus, the new method allowsfor a greater ﬂexibility for the enhancement.Compressed sensing (CS) is a popular mathematical frame-work for sampling and recovery [21]. In many cases, CSmethods assume that natural images exhibit sparse structuresin certain domains, e.g. , wavelet. However, as will be discussedin Section III of this paper, natural images are indeed not sparse. If we compare natural images to depth maps, thelatter would show a much sparser structure than the former.Furthermore, the theory of combined bases [22], [23] showsthat a pair of incoherent bases are typically more effective forsignal recovery. Yet, the application of these theories to depthmaps is not fully explored.The most relevant paper to our work is perhaps [24].However, our work has two advantages. First, we propose anew ADMM algorithm for the reconstruction task (SectionIV). We show that the ADMM algorithm is signiﬁcantlymore efﬁcient than the subgradient method proposed in [24].Second, we present a sampling scheme to choose optimalsampling patterns to improve the depth reconstruction (SectionV), which was not discussed in [24].We should also mention a saliency-guided CS method pro-posed in [25], [26]. In these two papers, the spatial sampling isachieved by a mixing-plus-sampling process, meaning that theunknown pixels are ﬁltered and then sub-sampled. The ﬁlteringcoefﬁcients are constructed through a pre-deﬁned saliency mapand certain density functions ( e.g. , Gaussian-Bernoulli). In ourwork, the mixing process is not required so that depth valuesare sampled without ﬁltering. This makes our proposed methodapplicable to disparity estimation where mixing cannot be used(otherwise it will defeat the purpose of reconstructing densedepth maps from a few estimated values.)Finally, advanced computational photography techniquesare recently proposed for fast depth acquisition, e.g. , [27],[28]. However, the problem settings of these works involvehardware designs and are thus different from this paper.The rest of the paper is organized as follows. After elab-orating the problem and clarifying notations in Section II,we discuss the representation of depth maps in Section III.A fast reconstruction algorithm is presented in Section IV.In Section V we discuss the design of optimal samplingpatterns. Experimental results are shown in Section VI, anda concluding remark is given in Section VII.II. N

OTATIONS AND P ROBLEM F ORMULATION

In this section we introduce notations and elaborate on theproblem formulation.

A. Depth and Disparity

The type of data that we are interested in studying isthe depth map. Depth can be directly measured using activesensors, or inferred from the disparity of a pair of stereoimages. Since the correspondence between depth and disparityis unique by simple geometry [29], in the rest of the paper weshall use depth and disparity interchangeably.

B. Sampling Model

Let x ∈ R N be an N × vector representing a disparitymap. For simplicity we assume that x is normalized so that ≤ x j ≤ for j = 1 , . . . , N .To acquire a set of spatial samples, we deﬁne a diagonalmatrix S ∈ R N × N with the ( j, j ) th entry being S jj def = ( , with probability p j , , with probability − p j , (1)where { p j } Nj =1 is a sequence of pre-deﬁned probabilities.Speciﬁc examples of { p j } Nj =1 will be discussed below. Fornow, we only require { p j } Nj =1 to satisfy two criteria: (1) foreach j = 1 , . . . , N , p j must be bounded so that ≤ p j ≤ ;(2) the average of the probabilities must achieve a target sampling ratio ξ : N N X j =1 p j = ξ, (2)where < ξ < . Example 1: If p j = ξ for all j , then the sampling pattern S is a diagonal matrix with uniformly random entries. Thissampling pattern corresponds to a uniform sampling withoutﬁltering in the classical compressed sensing, e.g., [21]. Example 2: If p j = 1 for j ∈ Ω and p j = 0 for j ∈ Ω ,where Ω and Ω are two pre-deﬁned sets such that | Ω | = ξN and | Ω | = (1 − ξ ) N , then S is a deterministic samplingpattern. In particular, if Ω and Ω are designed so that theindices are uniformly gridded, then S will become the usualdown-sampling operator.With S , we deﬁne the sampled disparity map as b = Sx . (3)Note that according to our deﬁnition of S , the sampleddisparity b ∈ R N × will contain zeros, i.e. , b j = 0 if S jj = 0 . Physically, this corresponds to the situation wherethe unsampled pixels are marked with a value of zero. Remark 1:

Since S is a random diagonal matrix, readersat this point may have concerns about the overall number ofsamples which is also random. However, we argue that suchrandomness has negligible effects for the following reason. Forlarge N , standard concentration inequality guarantees that theaverage number of ones in S stays closely to ξN . In particular,by Bernstein’s inequality [30] we can show that for ε > , Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X j =1 S jj − ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε  ≤ (cid:26) − N ε / ε/ (cid:27) . (4)Therefore, although the sampling pattern in our framework israndomized, the average number of samples is concentratedaround ξN for large N . C. Representation Model

To properly formulate the reconstruction problem, we as-sume that the disparity map can be efﬁciently represented asa linear combination of basis vectors { ϕ i } Mi =1 : x = M X i =1 h x , ϕ i i ϕ i , (5)where h· , ·i denotes the standard inner product. Deﬁning α i def = h x , ϕ i i as the i th basis coefﬁcient, α def = [ α , . . . , α M ] T , and Φ def = [ ϕ , . . . , ϕ M ] , the relationship in (5) can be equivalentlywritten as x = Φ α .The reconstruction problem can be posed as an optimizationproblem in which the goal is to seek a sparse vector α ∈ R M such that the observed samples b are best approximated.Mathematically, we consider the problem minimize α k S Φ α − b k + λ k α k , (6)where λ > is a regularization parameter, and k · k is the ℓ -norm of a vector.In this paper, we are mainly interested in two types of Φ —the wavelet frame and the contourlet frame [31]. Frames aregeneralizations of the standard bases in which M , the numberof bases, can be more than N , the dimension of x . Moreover,for any frame Φ , it holds that ΦΦ T = I . Therefore, x = Φ α if and only if α = Φ T x . Using this result, we can equivalentlyexpress (6) as minimize x k Sx − b k + λ k Φ T x k . (7) Remark 2:

In compressed sensing literature, (6) is known asthe synthesis problem and (7) is known as the analysis problem[32]. Furthermore, the overall measurement matrix S Φ in (6)suggests that if p j = ξ for all j , then S Φ corresponds to thepartial orthogonal system as discussed in [33]. In this case,the restricted isometry property (RIP) holds [34] and exactrecovery can be guaranteed under appropriate assumptions ofsparsity and number of measurements. For general { p j } Nj =1 ,establishing RIP is more challenging, but empirically weobserve that the optimization produces reasonable solutions. D. Penalty Functions

As discussed in [24], (7) is not an effective formula-tion because the ℓ norm penalizes both the approximation(lowpass) and the detailed (highpass) coefﬁcients. In reality,since disparity maps are mostly piecewise linear functions,the lowpass coefﬁcients should be maintained whereas thehighpass coefﬁcients are desirable to be sparse. To this end,we introduce a binary diagonal matrix W ∈ R M × M wherethe ( j, j ) th entry is 0 if j is an index in the lowest passband,and is 1 otherwise. Consequently, we modify the optimizationproblem as minimize x k Sx − b k + λ k W Φ T x k . (8) Finally, it is desirable to further enforce smoothness of thereconstructed disparity map. Therefore, we introduce a totalvariation penalty so that the problem becomes minimize x k Sx − b k + λ k W Φ T x k + β k x k T V . (9)Here, the total variation norm is deﬁned as k x k T V def = k D x x k + k D y x k , (10)where D = [ D x ; D y ] is the ﬁrst-order ﬁnite differenceoperator in the horizontal and vertical directions. The abovedeﬁnition of total variation is known as the anisotropic totalvariation. The same formulation holds for isotropic totalvariation, in which k x k T V = P Nj =1 q [ D x x ] j + [ D y x ] j .The problem in (9) is generalizable to take into account ofa combination of L dictionaries. In this case, one can considera sum of L penalty terms as minimize x k Sx − b k + L X ℓ =1 λ ℓ k W ℓ Φ Tℓ x k + β k x k T V . (11)For example, in the case of combined wavelet and contourletdictionaries, we let L = 2 .III. S PARSE R EPRESENTATION OF D ISPARITY M AP The choice of the dictionary Φ in (11) is an importantfactor for the reconstruction performance. In this section wediscuss the general representation problem of disparity maps.We show that disparity maps can be represented more sparselythan natural images. We also show that a combined wavelet-contourlet dictionary is more effective in representing disparitymaps than using the wavelet dictionary alone. A. Natural Images vs Depth Data

Seeking effective representations for natural images isa well-studied subject in image processing [31], [35–41].However, representations of disparity maps seems to be lessstudied. For example, it is unclear how sparse can a predeﬁneddictionary ( e.g. , wavelets) encode disparity maps as comparedto natural images.To address this question, we consider a × croppedpatch from a gray-scaled image and the corresponding patchin the disparity map. For each of the image and the disparity,we apply the wavelet transform with Daubechies / ﬁlterand 5 decomposition levels. Then, we truncate the waveletcoefﬁcients to the leading coefﬁcients with the largestmagnitudes. The reconstructed patches are compared and theresults are shown in Figure 1.The result indicates that for the same number of waveletcoefﬁcients, the disparity map can be synthesized with sig-niﬁcantly lower approximation error than the image. Whilesuch result is not surprising, the big difference in the PSNRsprovides evidence that reconstruction of disparity maps fromsparse samples should achieve better results than that of naturalimages. (a) Original (b) Approx. (c) Original (d) Approx.disparity disparity view view(50.25 dB) (29.29 dB)Fig. 1: PSNR values of approximating a disparity patch and aimage patch using the leading of the wavelet coefﬁcients. B. Wavelet vs Contourlet

The above results indicate that wavelets are efﬁcient rep-resentations for disparity maps. Our next question is to askif some of the dictionaries would do better than other dictio-naries. In this section, we discuss how a combined wavelet-contourlet dictionary can improve the wavelet dictionary.

1) Evaluation Metric:

To compare the performance of twodictionaries, it is necessary to ﬁrst specify which metricto use. For the purpose of reconstruction, we compare themean squared error (MSE) of the reconstructed disparity mapsobtained by feeding different dictionaries into (11). For anyﬁxed sampling pattern S , we say that a dictionary Φ is betterthan another dictionary Φ if the reconstruction result using Φ has a lower MSE than using Φ , for the best choiceof parameters λ , λ and β . Note that in this evaluationwe do not compare the sparsity of the signal using differentdictionaries. In fact, sparsity is not an appropriate metricbecause contourlets typically require 33% more coefﬁcientsthan wavelets [42], but contourlets have a better representationof curves than wavelets.

2) Comparison Results:

We synthetically create a gray-scaled image consisting of a triangle overlapping with anellipse to simulate a disparity map. We choose the uniformlyrandom sampling pattern S so that there is no bias caused bya particular sampling pattern.As parameters are concerned, we set λ = 4 × − and β = 2 × − for the single wavelet dictionary model ( L = 1 ),and λ = 4 × − , λ = 2 × − and β = 2 × − forthe combined dictionary model ( L = 2 ). The choices of theseparameters are discussed in Section IV-C.Using the proposed ADMM algorithm (See Section IV), weplot the performance of the reconstruction result as a functionof the sampling ratio. For each point of the sampling ratio,we perform a Monte-Carlo simulation over 20 independenttrials to reduce the ﬂuctuation caused by the randomness inthe sampling pattern. The result in Figure 2 indicates that thecombined dictionary is consistently better than the waveletdictionary alone. A snapshot of the result at ξ = 0 . is shownin Figure 3. As observed, the reconstruction along the edgesof the ellipse is better in the combined dictionary than usingwavelet alone.IV. R ECONSTRUCTION A LGORITHM

In this section we present an alternating direction methodof multipliers (ADMM) algorithm to solve (11). The ADMM sampling ratio PS NR \ d B WaveletWavelet+Contourlet

Fig. 2: ADMM reconstruction result as a function of samplingratio ξ . Each point on the curves is averaged over 20 indepen-dent Monte-Carlo trials. The PSNR evaluates the performanceof solving (11) using different combinations of dictionaries.(a) Wavelet, 34.77 dB (b) Combined, 35.86 dBFig. 3: Snapshot of the comparison between wavelet dictionaryand a combined wavelet-contourlet dictionary at ξ = 0 . .algorithm has a tight connection with the proximal operatorpresented by Moreau in the 60’s [43], and later by Eckstein andBertsekas [44] in the 90’s. The application of ADMM to imagedeconvolution was ﬁrst mentioned in [45]. For brevity we skipthe introduction of ADMM algorithm because comprehensivetutorials are easily accessible [46], [47]. Instead, we highlightthe unique contributions of this paper, which include a particu-lar operator splitting strategy and a multiscale implementation.For notational simplicity we consider a single dictionary sothat L = 1 . Generalization to L > is straight forward. Also,in our derivation we focus on the anisotropic total variationso that k x k T V = k D x x k + k D y x k . Extension to isotropictotal variation follows the same idea as presented in [5]. A. ADMM and Operator Splitting

A central question about ADMM algorithms is which of thevariables should be splitted so that the subsequent subproblemscan be efﬁciently solved. Inspecting (11), we observe that thereare many possible choices. For example, we could split thequadratic term in (11) by deﬁning an auxiliary variable u = Sx , or we could keep the quadratic term without a split. Inwhat follows, we present an overview of our proposed splittingmethod and discuss the steps in subsequent subsections. We start the ADMM algorithm by introducing three auxil-iary variables r = x , u ℓ = Φ ℓ x , and v = Dx . Consequently,we rewrite the optimization problem as minimize x , r , u ℓ , v k b − Sr k + λ ℓ k W ℓ u ℓ k + β k v k subject to r = x , u ℓ = Φ Tℓ x , v = Dx . (12)The ADMM algorithm is a computational procedure toﬁnd a stationary point of (12). The idea is to consider theaugmented Lagrangian function deﬁned as L ( x , u ℓ , r , v , w , y ℓ , z )= 12 k b − Sr k + λ ℓ k W ℓ u ℓ k + β k v k (13) − w T ( r − x ) − y Tℓ (cid:16) u ℓ − Φ Tℓ x (cid:17) − z T ( v − Dx )+ µ k r − x k + ρ ℓ k u ℓ − Φ Tℓ x k + γ k v − Dx k . In (13), the vectors w , y ℓ and z are the Lagrange multipliers; λ ℓ and β are the regularization parameters, and µ , ρ ℓ and γ arethe internal half quadratic penalty parameters. The stationarypoint of the augmented Lagrangian function can be determinedby solving the following sequence of subproblems x ( k +1) = argmin x L (cid:16) x , u ( k ) ℓ , r ( k ) , v ( k ) , w ( k ) , y ( k ) ℓ , z ( k ) (cid:17) , u ( k +1) ℓ = argmin u ℓ L (cid:16) x ( k +1) , u ℓ , r ( k ) , v ( k ) , w ( k ) , y ( k ) ℓ , z ( k ) (cid:17) , r ( k +1) = argmin r L (cid:16) x ( k +1) , u ( k +1) ℓ , r , v ( k ) , w ( k ) , y ( k ) ℓ , z ( k ) (cid:17) , v ( k +1) = argmin v L (cid:16) x ( k +1) , u ( k +1) ℓ , r ( k +1) , v , w ( k ) , y ( k ) ℓ , z ( k ) (cid:17) , and the Lagrange multipliers are updated as y ( k +1) ℓ = y ( k ) ℓ − ρ ℓ (cid:16) u ( k +1) ℓ − Φ Tℓ x ( k +1) (cid:17) , (14a) w ( k +1) = w ( k ) − µ (cid:16) r ( k +1) − x ( k +1) (cid:17) , (14b) z ( k +1) = z ( k ) − γ (cid:16) v ( k +1) − Dx ( k +1) (cid:17) . (14c)We now discuss how each subproblem is solved. B. Subproblems1) x -subproblem: The x -subproblem is obtained by drop-ping terms that do not involve x in (13). This yields x ( k +1) = argmin x − w T ( r − x ) − y Tℓ (cid:16) u ℓ − Φ Tℓ x (cid:17) − z T ( v − Dx ) + µ k r − x k (15) + ρ ℓ k u ℓ − Φ Tℓ x k + γ k v − Dx k . Problem (15) can be solved by considering the ﬁrst-orderoptimality condition, which yields a normal equation (cid:16) ρ ℓ Φ ℓ Φ Tℓ + µ I + γ D T D (cid:17) x ( k +1) (16) = Φ ℓ ( ρ ℓ u ℓ − y ℓ ) + ( µ r − w ) + D T ( γ v − z ) . The matrix in (16) can be simpliﬁed as ( ρ ℓ + µ ) I + γ D T D ,because for any frame Φ ℓ , it holds that Φ ℓ Φ Tℓ = I . Now,since the matrix D T D is a circulant matrix, the matrix ( ρ ℓ + µ ) I + γ D T D is diagonalizable by the Fourier transform. Thisleads to a closed form solution as x ( k +1) = F − (cid:20) F ( RHS )( ρ ℓ + µ ) I + γ |F ( D ) | (cid:21) , (17)where RHS denotes the right hand side of (16), F ( · ) denotesthe 2D Fourier transform, F − ( · ) denotes the 2D inverseFourier transform, and |F ( D ) | denotes the magnitude squareof the eigenvalues of the differential operator D . Remark 3:

If we do not split the quadratic function k b − Sx k using r = x , then the identity matrix µ I in (16)would become µ S T S . Since S is a diagonal matrix containing1’s and 0’s, the matrix ρ ℓ Φ ℓ Φ Tℓ + µ S T S + γ D T D is notdiagonalizable using the Fourier transform. u ℓ -subproblem: The u ℓ -subproblem is given by min u ℓ λ ℓ k W ℓ u ℓ k − y Tℓ (cid:16) u ℓ − Φ Tℓ x (cid:17) + ρ ℓ k u ℓ − Φ Tℓ x k . (18)Since W ℓ is a diagonal matrix, (18) is a separable optimizationconsisting of a sum of scalar problems. By using the standardshrinkage formula [5], one can show that the closed-formsolution of (18) exists and is given by u ( k +1) ℓ = max (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) α ℓ + y ℓ ρ ℓ (cid:12)(cid:12)(cid:12)(cid:12) − λ ℓ e w ℓ ρ ℓ , (cid:19) · sign (cid:18) α ℓ + y ℓ ρ ℓ (cid:19) , (19)where e w ℓ = diag ( W ℓ ) and α ℓ = Φ Tℓ x . Remark 4:

If we do not split using u ℓ = Φ Tℓ x , then the u ℓ -subproblem is not separable and hence the shrinkage formulacannot be applied. Moreover, if we split u ℓ = W ℓ Φ Tℓ x , i.e. ,include W ℓ , then the x -subproblem will contain Φ ℓ W ℓ Φ Tℓ ,which is not diagonalizable using the Fourier transform. r -subproblem: The r -subproblem is the standardquadratic minimization problem: min r k Sr − b k − w T ( r − x ) + µ k r − x k . (20)Taking the ﬁrst-order optimality yields a normal equation (cid:16) S T S + µ I (cid:17) r = (cid:16) S T b + w + µ x (cid:17) . (21)Since S is a diagonal binary matrix, (21) can be evaluated viaan element-wise computation. Remark 5: (21) shows that our splitting strategy of using r = x is particularly efﬁcient because S is a diagonal matrix.If S is a general matrix, e.g. , i.i.d. Gaussian matrix in [48],then solving (21) will be less efﬁcient. v -subproblem: The v -subproblem is the standard totalvariation problem: min v β k v k − z T ( v − Dx ) + γ k v − Dx k . (22)The solution is given by v ( k +1) = max (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) Dx + z γ (cid:12)(cid:12)(cid:12)(cid:12) − βγ , (cid:19) · sign (cid:18) Dx + z γ (cid:19) . (23)The overall ADMM algorithm is shown in Algorithm 1. Algorithm 1

ADMM Algorithm

Require: b , S x (0) = Sb , u (0) ℓ = Φ Tℓ x (0) , r (0) = x (0) , v (0) = Dx (0) while k x ( k +1) − x ( k ) k / k x ( k ) k ≥ tol do Solve x -subproblem by (17). Solve u ℓ , r and v subproblems by (19), (21) and (23). Update multipliers by (14a), (14b) and (14c). end while return x ∗ ← x ( k +1) Parameter Functionality Values λ Wavelet sparsity × − λ Contourlet sparsity × − β Total variation × − ρ Half quad. penalty for Wavelet . ρ Half quad. penalty for Contourlet . µ Half quad. penalty for r = x . γ Half quad. penalty for v = Dx . TABLE I: Summary of Parameters.

C. Parameters

The regularization parameters ( λ ℓ , β ) and internal halfquadratic penalty parameters ( ρ ℓ , µ , γ ) are chosen empirically.Table I provides a summary of the parameters we use in thispaper. These values are the typical values we found over awide range of images and testing conditions. For detailedexperiments of the parameter selection process, we refer thereaders to our supplementary technical report in [49]. D. Convergence Comparison

Since (11) is convex, standard convergence proof of ADMMapplies (c.f. [47]). Thus, instead of repeating the convergencetheory, we compare our proposed algorithm with a subgradientalgorithm proposed by Hawe et al. [24].To set up the experiment, we consider the uniformly randomsampling pattern S with sampling ratios ξ = 0 . , . , . . Forboth our algorithm and the subgradient algorithm proposed in[24], we consider a single wavelet dictionary using Daubechieswavelet “db2” with 2 decomposition levels. Other choices ofwavelets are possible, but we observe that the difference is notsigniﬁcant.Figure 4 shows the convergence results of our proposedalgorithm and the subgradient algorithm. It is evident from theﬁgure that the ADMM algorithm converges at a signiﬁcantlyfaster rate than the subgradient algorithm. In particular, we seethat the ADMM algorithm reaches a steady state in around 10seconds, whereas the subgradient algorithm requires more than90 seconds. E. Multiscale ADMM

The ADMM algorithm shown in Algorithm 1 can be mod-iﬁed to incorporate a multiscale warm start. The idea worksas follows.First, given the observed data b , we construct a multiscalepyramid { b q | q = 0 , . . . , Q − } of Q levels, with a scale −3 −2 −1 running time / sec M SE ADMM (proposed) 20%ADMM (proposed) 15%ADMM (proposed) 10%Subgradient 20%Subgradient 15%Subgradient 10%

Fig. 4: Comparison of the rate of convergence betweenADMM (proposed) and subgradient algorithms [24] for singlewavelet dictionary. We used “Aloe” as a test image. TheADMM algorithm requires approximately 10 seconds to reachsteady state. The subgradient algorithm requires more than × running time than the ADMM algorithm to reach steady state.factor of 2 across adjacent levels. Mathematically, by assumingwithout loss of generality that N is a power of 2, we deﬁnea downsampling matrix A q at the q th level as A q = [ e , , e , , . . . , , e N/ q ] , where e k is the k th standard basis. Then, we deﬁne b q as b q = A q b q − , (24)for q = 1 , . . . , Q − , and b = b . Correspondingly, we deﬁnea pyramid of sampling matrices { S q | q = 0 , . . . , Q − } ,where S q = A q S q − , (25)with the initial sampling matrix S = S .The above downsampling operation allows us to solve (11)at different resolution levels. That is, for each q = 0 , . . . , Q − ,we solve the problem x q = argmin x k S q x − b q k + λ ℓ k W ℓ Φ Tℓ x k + β k x k T V , (26)where Φ ℓ and W ℓ are understood to have appropriate dimen-sions.Once x q is computed, we feed an upsampled version of x q as the initial point to the ( q − th level’s optimization. Morespeciﬁcally, we deﬁne an upsampling and averaging operation: B q = h e T ; e T ; e T ; e T ; . . . ; e TN/ q ; e TN/ q i , (27)and we feed x q , the solution at the q th level, as the initialguess to the problem at the ( q − th level: x (0) q − = B q x q . (28)A pictorial illustration of the operations of A q and B q isshown in Figure 5. The algorithm is shown in Algorithm 2. , x q x (0) q − B q b q − b q A q Fig. 5: Schematic diagram showing the operations of A q and B q : A q downsamples the observed data b q by a factor of 2; B q upsamples the solution x q by a factor of 2, followed bya two-tap ﬁlter of impulse response [1 , . Algorithm 2

Multiscale ADMM Algorithm

Require: S , . . . , S Q − and b , . . . , b Q − for q = Q − to do x q = ADMM( b q , S q ) with initial guess x (0) q Let x (0) q − = B q x q , if q ≥ . end for Output x = x . To validate the effectiveness of the proposed multiscalewarm start, we compare the convergence rate against theoriginal ADMM algorithm for a combined dictionary case.In Figure 6, we observe that the multiscale ADMM con-verges at a signiﬁcantly faster rate than the original ADMMalgorithm. More speciﬁcally, at a sampling ratio of , themultiscale ADMM algorithm converges in 20 seconds whereasthe original ADMM algorithm converges in 50 seconds whichcorresponds to a factor of 2.5 in runtime reduction. Forfairness, both algorithms are tested under the same platformof MATLAB 2012b / 64-bit Windows 7 / Intel Core i7 / CPU3.2GHz (single thread) / 12 GB RAM.

Remark 6:

When propagating the q th solution, x q , to the ( q − th level, we should also propagate the correspondingauxiliary variables u ℓ , r , v and the Lagrange multipliers y ℓ , w and z . The auxiliary variables can be updated ac-cording to x (0) q − as u (0) ℓ,q − = Φ ℓ x (0) q − , r (0) q − = x (0) q − ,and v (0) q − = Dx (0) q − . For the Lagrange multipliers, we let y (0) ℓ,q − = B q y ℓ,q , w (0) q − = B q w q , and z (0) q − = B q z q . Remark 7:

The choice of the up/down sampling factor isnot important. In our experiment, we choose a factor of 2 forsimplicity in implementation. Other sampling factors such as √ are equally applicable. Furthermore, the two-tap averageﬁlter [1 , in Figure 5 can be replaced by any valid averagingﬁlter. However, experimentally we ﬁnd that other choices ofﬁlters do not make a signiﬁcant difference comparing to [1 , .V. S AMPLING S CHEME

In the above sections, we assume that the sampling matrix S is given and is ﬁxed. However, we have not yet discussedthe design of the sampling probability { p j } Nj =1 . The purposeof this section is to present an efﬁcient design procedure. A. Motivating Example

Before our discussion, perhaps we should ﬁrst ask aboutwhat kind of sampling matrix S would work (or would notwork). To answer this question, we consider an exampleshown in Figure 7. In Figure 7 we try to recover a simple −3 −2 −1 running time / sec M SE Multiscale ADMM (proposed) 20%Multiscale ADMM (proposed) 10%ADMM (proposed) 20%ADMM (proposed) 10%Subgradient 20%Subgradient 10%

Fig. 6: Runtime comparison of original ADMM algorithm,multiscale ADMM algorithm and subgradient algorithm. Allalgorithms use the combined wavelet-contourlet dictionary.The testing image is “Aloe” and two sampling ratios (10% and20%) are tested. Q = 3 multiscale levels are implemented inthis experiment.disparity map consisting of an ellipse of constant intensityand a plain background. We consider three sampling patternsof approximately equal sampling ratios ξ : (a) a samplingpattern deﬁned according to the magnitude of the disparitygradient; (b) an uniform grid with speciﬁed sampling ratio √ ξ along both directions; (c) a random sampling pattern drawnfrom an uniform distribution with probability ξ . The threesampling patterns correspondingly generate three sampleddisparity maps. For each sampled disparity map, we run theproposed ADMM algorithm and record the reconstructeddisparity map. In all experiments, we use a wavelet dictionaryfor demonstration. ξ = 0 . ξ = 0 . ξ = 0 . (a) 45.527dB (b) 29.488dB (c) 30.857dBFig. 7: Three sampling patterns and the corresponding recon-struction results using the proposed ADMM algorithm. Here, ξ denotes the actual sampling ratio. (a) Sampling along thegradient; (b) Sampling from a grid; (c) Sampling from anuniformly random pattern. Figure 7 suggests a strong message: For a ﬁxed samplingbudget ξ , one should pick samples along gradients. However,the pitfall is that this approach is not practical for two reasons.First, the gradient of the disparity map is not available prior toreconstructing the disparity. Therefore, all gradient informationcan only be inferred from the color image. Second, thegradients of a color image could be very different from thegradients of the corresponding disparity map. Thus, inferringthe disparity gradient from the color image gradient is achallenging task. In the followings, we present a randomizedsampling scheme to address these two issues. B. Oracle Random Sampling Scheme

We ﬁrst consider an oracle situation where the gradients areassumed known . The goal is to see how much improvementone should expect to see.Let a = [ a , . . . , a N ] T be a vector denoting the magnitudeof the ground truth disparity map’s gradient. Given this oracleinformation about the disparity gradients, we consider a softdecision rule where a pixel is sampled with probability deﬁnedaccording to some function of { a j } Nj =1 . Such a function ischosen based on the intuition that the sampled subset ofgradients should carry the maximum amount of informationcompared to the full set of gradients. One way to capture thisintuition is to require that the average gradient computed from all N samples is similar to the average gradient computed froma subset of ξN samples.To be more precise, we deﬁne the average gradient com-puted from all N samples as µ def = 1 N N X j =1 a j . (29)Similarly, we deﬁne the average gradient computed from arandom subset of ξN samples as Y def = 1 N N X j =1 a j p j I j , (30)where { I j } Nj =1 is a sequence of Bernoulli random variableswith probability Pr[ I j = 1] = p j . Here, the division of a j by p j is to ensure that Y is unbiased, i.e. , E [ Y ] = µ .From (29) and (30), minimizing the difference between Y and µ can be achieved by minimizing the variance E [( Y − µ ) ] .Moreover, we observe that E (cid:2) ( Y − µ ) (cid:3) = 1 N N X j =1 a j p j Var [ I j ] = 1 N N X j =1 a j (cid:18) − p j p j (cid:19) , where the last equality holds because Var[ I j ] = p j (1 − p j ) .Therefore, the optimal sampling probability { p j } Nj =1 can befound by solving the optimization problem ( P ) : minimize p ,...,p N N N X j =1 a j p j subject to 1 N N X j =1 p j = ξ, and ≤ p j ≤ , (a) Greedy sampling (b) Random sampling35.5201 dB, ξ = 0 . ξ = 0 . Fig. 8: Comparison between a deterministic sampling patternby selecting samples greedily according to the magnitude of { a j } , and a randomized sampling pattern using the proposedscheme.of which the solution is given by [50, Lemma 2] p j = min( τ a j , , (31)where τ is the root of the equation g ( τ ) def = N X j =1 min( τ a j , − ξN. (32)It is interesting to compare this new random samplingscheme versus a greedy sampling scheme by picking the ξN pixels with the largest gradients. Figure 8 shows the result. Forthe greedy sampling scheme, we ﬁrst compute the gradient ofthe disparity map ∇ x def = p ( D x x ) + ( D y x ) and thresholdit to obtain a set of samples Ω def = { j | [ ∇ x ] j > α k∇ x k ∞ } ,where α = 0 . is the threshold. The actual sampling ratio isthen | Ω | /N . For the randomized scheme, we let a = ∇ x andwe compute p j according to (31). In this particular example,we observe that the randomized sampling scheme achieves aPSNR improvement of more than 4 dB. C. Practical Random Sampling Scheme

We now present a practically implementable samplingscheme. The challenge that we have to overcome is that thegradient information of the disparity is not available. There-fore, we propose the following two-stage sampling process.Our proposed sampling scheme consists of two stages - apilot stage to obtain a rough estimate of the disparity, anda reﬁnement stage to improve the disparity estimate. In theﬁrst step pilot stage, we pick ξN/ samples ( i.e. , half ofthe desired number of samples) using an uniformly randomsampling pattern. This gives a sampling pattern { I (1) j } Nj =1 ,where the superscript denotes the ﬁrst stage. Correspondingly,we have a sampling matrix S (1) and the sampled data b (1) .Given S (1) and b (1) , we apply the ADMM algorithm to obtaina pilot estimate x (1) .In the second stage, we use the pilot estimate x (1) as a guideto compute the gradient ∇ x (1) . By (31), this suggests thatthe optimal sampling probability is p j = min( τ [ ∇ x (1) ] j , .However, in order to ensure that the ξN/ samples pickedat the second stage do not overlap with those picked in theﬁrst stage, instead of letting p j = min( τ [ ∇ x (1) ] j , , we let p j = min( τ a j , , where a j = ( [ ∇ x (1) ] j , if I (1) j = 0 , , if I (1) j = 1 . (33) In words, a j deﬁned by (33) forces p j = 0 when the j thpixel is picked in the ﬁrst step. Thus, the non-zero entriesof { I (1) j } and { I (2) j } are mutually exclusive, and hence wecan now apply the ADMM algorithm to recover x (2) from S + S and b + b . The overall method is summarized inAlgorithm 3. Algorithm 3

Two-Stage Algorithm Input: N , ξ , b Output: x (2) Stage 1: Let I (1) j = 1 with probability ξ/ , for j = 1 , . . . , N . Deﬁne S (1) and b (1) according to { I (1) j } . Compute x (1) = ADMM ( S (1) , b (1) ) . Stage 2: Compute ∇ x (1) . For j = 1 , . . . , N , deﬁne a j = ( [ ∇ x (1) ] j , if I (1) j = 0 , , if I (1) j = 1 . . Compute τ such that P Nj =1 min { τ a j , } = Nξ/ . Let p j = min { τ a j , } , for j = 1 , . . . , N . Let I (2) j = 1 with probability p j , for j = 1 , . . . , N . Deﬁne S (2) and b (2) according to { I (2) j } . Compute x (2) = ADMM ( S (1) + S (2) , b (1) + b (2) ) . D. Further Improvement by PCA

The two-stage sampling procedure can be further improvedby utilizing the prior information of the color image. Theintuition is that since both color image and disparity map arecaptured from the same scene, strong gradients in the disparitymap should align with those in the color image. However, sincea color image typically contains complex gradients whichare irrelevant to the disparity reconstruction, it is importantto ﬁlter out these unwanted gradients while preserving theimportant ones. To this end, we consider the following patch-based principal component analysis.Given a color image y ∈ R N , we construct a collectionof patches { y j } Nj =1 where y j ∈ R d denotes a vectorizationof the j th patch of size √ d × √ d centered at pixel j of theimage. For patches centered at the corners or boundaries ofthe image, we apply a symmetrical padding to make sure thattheir sizes are √ d ×√ d . This will give us a total of N patches.Next, we form a data matrix Y def = [ y , y , . . . , y N ] . Thisdata matrix leads to a principal component decomposition as Y Y T = U Λ U T , (34)where U is the eigenvector matrix, and Λ is the eigenvaluematrix. Geometrically, the projection of any patch y j ontothe subspace spanned by any eigenvector u i is equivalent toapplying a ﬁnite impulse response ﬁlter to the patch, i.e. , u Ti y j . In many cases, except for the ﬁrst eigenvector u ,all remaining eigenvectors u , . . . , u d are in the form ofdifferential operators (of different orders and orientations, see Fig. 9: The ﬁrst 6 eigenvectors of the data matrix Y Y T , where Y is obtained from the color image corresponding to Figure 8.In this example we set the patch size as × so that d =361 . The range of the color index of this ﬁgure is [ − . , . .examples in Figure 9). More interestingly, these ﬁlters aretypically bandpass ﬁlters, which suggest that both low fre-quency components ( e.g. , smooth regions) and high frequencycomponents ( e.g. , complex textures) of the color image can beﬁltered by applying the projections. Consequently, we considerthe following ﬁltered signal a j = d ′ X i =2 |h u i , y j i| , j = 1 , . . . , N, (35)where d ′ < d is a tunable parameter (which was set to d ′ = 16 for d = 49 in this paper). Here, the absolute value in (35) isused to get the magnitude of h u i , y j i , as a j must be a non-negative number.To see how this PCA concept can be incorporated into ourtwo-stage sampling scheme, we make the following obser-vations. First, the uniform sampling in Stage-1 can well bereplaced by the PCA approach. In particular, instead of setting I (1) j = 1 with probability ξ/ , we can deﬁne a j according to(35), and let p j = min( τ a j , for τ being the root of (32).Consequently, we let I (1) j = 1 with probability p j .In Stage-2, since we have already had a pilot estimate ofthe disparity map, it is now possible to replace Y in (34)by a data matrix X = [ x (1)1 , . . . , x (1) N ] , where each x (1) j is a d -dimensional patch centered at the j th pixel of x (1) .Thus, instead of setting a j = [ ∇ x (1) ] j in (33), we can set a j = P d ′ i =2 |h u i , x (1) j i| using (35). The advantage of this new a j is that it softens the sampling probability at the objectboundaries to a neighborhood surrounding the boundary. Thisreduces the risk of selecting irrelevant samples because of abad pilot estimate. E. Comparisons

As a comparison between sampling patterns, we consider adisparity map shown in Figure 10. Setting ξ = 0 . ( i.e. , 10%),we study four sampling patterns including two versions ofour proposed two-stage method. We conduct a Monte-Carlosimulation by repeating 32 independent trials, and averagethe PSNRs. The results shown in Figure 10(c) are generatedusing the original two-stage sampling scheme without PCAimprovement, whereas the results shown in Figure 10(d)are generated using an improved two-stage sampling schemewhere the ﬁrst stage is uniform and the second stage is PCA.These results indicate that for the same sampling ratio ξ , thechoice of the sampling pattern has some strong inﬂuence tothe reconstruction quality. For example, as compared to both (a) Uniform random (b) Uniform grid (c) Proposed w/o PCA (d) Proposed w/ PCAMethod Actual Sampling Ratio Average PSNR / dB Standard DeviationUniform random 0.1001 29.7495 0.3768Uniform grid 0.1128 30.2726 0.0000Proposed w/o PCA 0.1000 32.4532 0.8962Proposed w/ PCA 0.1002 . ξN/ uniformly random samples in stage 1, and pick the remaining ξN/ samples according tothe pilot estimate from Stage 1. We conduct a Monte-Carlo simulation with 32 independent trials. The averages of PSNRs arepresented in the Table.uniform random sampling and grid sampling, the original two-stage sampling has about 2.44 dB improvement, and can befurther improved by almost 3.76 dB using the PCA idea.VI. E XPERIMENTAL R ESULTS

In this section we present additional results to illustrate theperformance of the proposed method.

A. Synthetic Data

We ﬁrst compare the proposed algorithm with existing meth-ods on the Middlebury dataset where ground truth disparitiesare available. We consider two versions of the proposedalgorithm: “Proposed WT+CT Grid” and “Proposed WT+CT2-Stage”. “Proposed WT+CT Grid” is the ADMM algorithmpresented in Section IV using both wavelet and contourletbases. Here, “Grid” refers to using a deterministic uniformgrid sampling pattern and “2-stage” refers to using the 2-stagerandomized sampling scheme presented in Section V. We useDaubechies wavelet “db2” with 2 decomposition levels forwavelet dictionary, and we set “bior9-7” wavelet function with[5 6] directional decompositions for contourlet dictionary. http://vision.middlebury.edu/stereo/data/ We also compare our method with [24], which has threedifferences from ours: (1) [24] uses a subgradient descent algo-rithm whereas we use an ADMM algorithm; (2) [24] considersonly a wavelet basis whereas we consider a combined wavelet-contourlet basis; (3) [24] uses a combination of canny edgesand uniformly random samples whereas we use a principleddesign process to determine samples.In this experiment we do not compare with depth superresolution algorithms, e.g. , [18], [51], [52]. These methods re-quire a color image to guide the actual reconstruction process,which is different from what is presented here because we onlyuse the color image for designing the sampling pattern. As areference of these methods, we show the results of a bicubicinterpolation using uniform grid sampling pattern.Table II shows the PSNR values of various methods atdifferent sampling ratios and sampling methods. It is clear that“Proposed WT+CT 2-Stage” outperforms the other methodsby a signiﬁcant margin. Moreover, as the sampling ratioincreases, the PSNR gain of “Proposed WT+CT 2-Stage” ismore prominent than that of other methods. For example,increasing from 5% to 25% for “Art”, “Proposed WT+CT 2-Stage” demonstrates an 18 dB PSNR improvement whereasbicubic only demonstrates 3 dB improvement.It is also instructive to compare the percentage of bad pixels (% Bad Pixel), which is a popular metric to measure thequality of disparity estimates [53]. Given a threshold τ > ,the percentage of bad pixels is deﬁned as% Bad Pixel def = 1 N N X j =1 (cid:0) | b x j − x ∗ j | > τ (cid:1) , (36)where b x is the reconstructed disparity and x ∗ is the groundtruth disparity. Percentage of bad pixels can be considered asan absolute difference metric as compared to the mean squaredmetric of PSNR.Table III shows the percentage of bad pixels of variousmethods at different sampling ratios and sampling methods.The results indicate that “Proposed WT+CT 2-Stage” has arelatively higher % Bad Pixel at τ = 1 than other methods,but has a lower % Bad Pixel at τ = 2 and τ = 3 . This resultsuggests that most of the errors of “Proposed WT+CT 2-Stage”are small and there are very few outliers. In contrast, bicubicgrid (for example) has a low % Bad Pixel at τ = 1 but high %Bad Pixel at τ = 2 and τ = 3 . This implies that a signiﬁcantportion of the bicubic results has large error. Intuitively, theresults suggest that in the bicubic case, some strong edges andcorners are completely missed, whereas these information arekept in “Proposed WT+CT 2-Stage”.Finally, we show the performance of the proposed algorithmtowards additive i.i.d. Gaussian noise. The purpose of thisexperiment is to demonstrate the sensitivity and robustnessof the algorithm in the presence of noise. While in realitythe noise in disparity estimates is not i.i.d. Gaussian, theresult presented here serves as a reference for the algorithm’sperformance. A more realistic experiment on real data will beillustrated in the next subsection.The results are shown in Figure 11. Using “Bicubic Grid”as the baseline, we observe that “Proposed WT+CT 2-Stage”on average has 5.79 dB improvement, “Proposed WT+CTGrid” has 3.60 dB improvement, whereas “[24] Grid” has only3.02 dB improvement. This provides a good indicator of therobustness of the proposed methods.

10 20 30 40 50 60 70 80101520253035 PS NR / d B noise standard deviation Proposed WT+CT 2−StageProposed WT+CT Grid[24] GridBicubic Grid

Fig. 11: Comparison of reconstruction performance with noisysamples. We use “Art” disparity map as a test image, and set ξ = 0 . . B. Real Data

In this experiment we study the performance of the proposedalgorithm for real data. The top left part of Figure 12 showsa snapshot of a stereo video (with resolution × , 30fps). For this video sequence, we apply the block matchingalgorithm by Lee et al. [54] to obtain the initial disparityestimates. However, instead of computing the full disparitymap , we only compute 10% of the disparity pixel valuesand use the proposed reconstruction algorithm to recover thedense disparity map. The 10% samples are selected accordingto the two stages of “Proposed WT+CT 2-Stage”. In theﬁrst stage, we select the locations of the 5% samples usingour oracle random sampling scheme with PCA improvementapplied to the color image. A pilot estimate of the disparity isthus computed and the remaining 5% samples can be locatedaccording to the second stage of “Proposed WT+CT 2-Stage”.The results shown in the middle row of Figure 12 illustrate thatthe “Proposed WT+CT 2-Stage” generates the closest disparitymaps compared to an ideal dense estimate.In addition to real video sequences, we also test the pro-posed algorithm on a stereo system we developed. The systemconsists of a low cost stereo camera with customized blockmatching algorithms. The bottom row of Figure 12 shows theresults of the reconstructed disparity maps. Referring to theresults of “[24] Grid” and “Bicubic Grid”, we note that thereare serious stair-like artifacts located at object boundaries. Incontrast, the two proposed methods in general produce muchsmoother object boundaries, thanks to the superior modelingand the optimized sampling scheme. More interestingly, weobserve that “Proposed WT+CT 2-Stage” indeed removessome undesirable noisy estimates in the recovered disparitymaps. This shows that the proposed method could potentiallyfurther developed as a depth enhancement method.VII. C ONCLUSION

A framework for dense depth reconstruction from sparsesamples is presented in this paper. Three contributions aremade. First, we provide empirical evidence that depth data canbe more sparsely encoded by a combination of wavelet andcontourlet dictionaries. This provides a better understandingof the structures of depth data. Second, we propose a generaloptimization formulation. An alternating direction method ofmultipliers (ADMM) with a multi-scale warm start is proposedto achieve fast reconstruction. The ADMM algorithm achievesfaster rate of convergence than the existing subgradient descentalgorithms. Third, we propose an efﬁcient method to selectsamples by a randomized sampling scheme. The proposedsampling scheme achieves high quality reconstruction resultsat a given sampling budget. The new tools developed in thispaper are applicable to many depth data processing tasks,with applications in acquisition, compression, and enhance-ment. Future work shall focus on extending the methods tospace-time data volume to further improve consistency of theestimates. TABLE II: Comparisons of reconstruction algorithms in terms of PSNR. We put N/A when the algorithm does not convergein 1000 iterations.

Disparity Method Percentage of Samples / PSNR (dB)Name Algorithm / Sampling Strategy 5 % % % % % Aloe Proposed WT+CT 2-Stage 27.5998

Proposed WT+CT Grid 25.3236 28.9052 30.0940 31.2956 32.3548[24] Grid 25.1248 27.8941 28.9504 30.2371 31.6646Bicubic Grid

Proposed WT+CT Grid 27.5176 28.9528 30.8371 32.5150 33.7126[24] Grid 27.0300 N/A N/A N/A N/ABicubic Grid 29.1550 30.3536 31.1098 31.9473 32.8366Baby Proposed WT+CT 2-Stage

Proposed WT+CT Grid 34.4421 36.7965 37.6708 39.0504 40.0689[24] Grid 33.6627 35.3166 36.2522 37.4513 38.7670Bicubic Grid 34.8368 36.2385 37.1749 37.5973 38.3961Dolls Proposed WT+CT 2-Stage

Proposed WT+CT Grid 28.4858 29.0453 30.0949 30.8123 31.6725[24] Grid 28.4959 N/A N/A N/A 32.0521Bicubic Grid 29.0612 30.0475 30.4374 31.0053 31.8800Moebius Proposed WT+CT 2-Stage

Proposed WT+CT Grid 27.6882 28.7245 29.8527 31.1663 32.2399[24] Grid 27.6851 28.7973 N/A N/A 32.0990Bicubic Grid 28.3987 29.9338 30.6607 30.9427 32.0143Rocks Proposed WT+CT 2-Stage

Proposed WT+CT Grid 25.5924 29.0848 30.4766 31.2311 32.9218[24] Grid 25.4444 28.7973 29.5364 30.2058 32.1672Bicubic Grid 28.7241 30.4212 30.7552 31.6722 32.6706

TABLE III: Comparisons of reconstruction algorithms in terms of % Bad Pixel.

Method % of Bad Pixels [ τ = 1 ] % of Bad Pixels [ τ = 2 ] % of Bad Pixels [ τ = 3 ]Disparity Algorithm Percentage of Samples Percentage of Samples Percentage of SamplesName Sampling Strategy 5 % % % % % % % % % % % % % % % Aloe Prop. WT+CT 2-Stage 41.47 21.37 14.00 8.85 5.81

Prop. WT+CT Grid 36.88 22.96 15.61 11.62 8.69 21.16 10.11 6.87 5.17 3.92 15.80 7.62 5.55 4.25 3.27[24] Grid 31.44

Prop. WT+CT Grid 15.80 8.27 5.80 4.12 3.14

Prop. WT+CT Grid 20.67 11.74 8.03 5.79 4.44 R EFERENCES[1] M. A. Lefsky, W. B. Cohen, G. G. Parker, and D. J. Harding, “LIDARremote sensing of above-ground biomass in three biomes,”

GlobalEcology and Biogeography , vol. 11, pp. 393–399, Oct. 2002.[2] S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, and R. Szeliski, “BuildingRome in a day,” in

Proc. IEEE Int. Conf. Computer Vision (ICCV’09)

IEEE Trans. Bio. Eng. , vol. 59, no. 10,pp. 2859–2865, Oct. 2012.[5] S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen,“An augmented Lagrangian method for total variation video restoration,”

IEEE Trans. Image Process. , vol. 20, no. 11, pp. 3097–3111, Nov. 2011.[6] S. Foix, G. Alenya, and C. Torras, “Lock-in time-of-ﬂight (ToF)cameras: A survey,”

IEEE Sensors Journal , vol. 11, no. 9, pp. 1917–1926, Sep. 2011. [7] B. Schwarz, “LIDAR: Mapping the world in 3D,”

Nature Photonics ,vol. 4, pp. 429–430, Jul. 2010.[8] C. Niclass, M. Soga, H. Matsubara, S. Kato, and M. Kagami, “A 100-mrange 10-frame/s 340 × µ m cmos,” IEEE Journal of Solid-State Circuits , vol. 48, no. 2, pp.559–572, Feb. 2013.[9] X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang, “On buildingan accurate stereo matching system on graphics hardware,” in

Proc.IEEE Int. Conf. Computer Vision (ICCV’11) , Nov. 2011, pp. 467–474.[10] A. Klaus, M. Sormann, and K. Karner, “Segment-based stereo matchingusing belief propagation and a self-adapting dissimilarity measure,” in

Proc. IEEE Int. Conf. on Pattern Recognition (ICPR’06) , Aug. 2006,vol. 3, pp. 15–18.[11] Z. Wang and Z. Zheng, “A region based stereo matching algorithmusing cooperative optimization,” in

Proc. IEEE Int. Conf. ComputerVision and Pattern Recognition (CVPR’08) , Jun. 2008, pp. 1–8.[12] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister, “Stereo match-ing with color-weighted correlation, hierarchical belief propagation, andocclusion handling,”

IEEE Trans. Pattern Anal. Machine Intell. , vol. 31,no. 3, pp. 492–504, Mar. 2009. Left View Right View Left View Right ViewDense Estimation [54] Proposed WT+CT 2-Stage Proposed WT+CT Grid [24] Grid Bicubic Grid

Fig. 12: Examples of disparity map reconstruction from 10% measured samples using real data. [Top] Left and right viewimages of the “Newspaper” dataset, and a sequence captured by a stereo system we developed. [Middle] The reconstructeddisparity maps of “Newspaper”. [Bottom] The reconstructed disparity maps of our sequence. For the reconstructed disparitymaps, we show the zoom-in results of size × for better visualization. Methods under comparisons include: a densedisparity estimation [54] to acquire initial estimate; “Proposed WT+CT 2-Stage” which applies the 2-Stage randomized schemeto determine sampling locations; “Proposed WT+CT Grid” which picks samples from a uniform grid; “[24] Grid” which appliesa subgradient algorithm to samples picked from a uniform grid; “Bicubic Grid” which applies bicubic interpolation to samplespicked from a uniform grid. [13] J. Heikkila and O. Silven, “A four-step camera calibration procedurewith implicit image correction,” in in Proc. IEEE Int. Conf. ComputerVision and Pattern Recognition (CVPR’97) , Jun. 1997, pp. 1106–1112.[14] Z. Zhang, “Flexible camera calibration by viewing a plane fromunknown orientations,” in Proc. IEEE Int. Conf. Computer Vision(ICCV’99) , Sep. 1999, vol. 1, pp. 666–673.[15] R. S. Feris, J. Gemmell, K. Toyama, and V. Kruger, “Hierarchicalwavelet networks for facial feature localization,” in

Proc. IEEE Int.Conf. Automatic Face and Gesture Recognition (FG’02) , May 2002, pp.118–123.[16] Y. Ke and R. Sukthankar, “PCA-SIFT: A more distinctive representationfor local image descriptors,” in

Proc. IEEE Int. Conf. Computer Visionand Pattern Recognition (CVPR’04) , Jun. 2004, vol. 2, pp. 506–513.[17] J. Diebel and S. Thrun, “An application of Markov random ﬁeld torange sensing,” in

Advances in Neural Info. Process. System (NIPS’05) ,Dec. 2005, pp. 291–298.[18] Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-depth superresolution for range images,” in

Proc. IEEE Int. Conf. Computer Visionand Pattern Recognition (CVPR’07) , Jun. 2007, pp. 1–8.[19] J. Li, T. Xue, L. Sun, and J. Liu, “Joint example-based depth mapsuper-resolution,” in

IEEE Int. Conf. Multimedia and Expo (ICME’12) ,Jul. 2012, pp. 152–157.[20] O. M. Aodha, N. D. F. Campbell, A. Nair, and G. J. Brostow, “Patchbased synthesis for single depth image super-resolution,” in

Proc.European Conf. Computer Vision (ECCV’12) , Oct. 2012, pp. 71–84.[21] E. J. Cand`es and M. B. Wakin, “An introduction to compressive sampling,”

IEEE Signal Process. Magazine , vol. 25, no. 2, pp. 21–30,Mar. 2008.[22] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomicdecomposition,”

IEEE Trans. Info. Theory , vol. 47, no. 7, pp. 2845–2862, Nov. 2001.[23] M. Elad and A. M. Bruckstein, “A generalized uncertainty principle andsparse representation in pairs of bases,”

IEEE Trans. Info. Theory , vol.48, no. 9, pp. 2558–2567, Sep. 2002.[24] S. Hawe, M. Kleinsteuber, and K. Diepold, “Dense disparity maps fromsparse disparity measurements,” in

Proc. IEEE Int. Conf. ComputerVision (ICCV’11) , Nov. 2011, pp. 2126–2133.[25] S. Schwartz, A. Wong, and D. A. Clausi, “Saliency-guided compressivesensing approach to efﬁcient laser range measurement,”

J. Vis. Commun.Image R. , vol. 24, no. 2, pp. 160–170, 2013.[26] S. Schwartz, A. Wong, and D. A. Clausi, “Multi-scale saliency-guided compressive sensing approach to efﬁcient robotic laser rangemeasurement,” in

Proc. IEEE Computer Society Conf. Computer, RobotVision , 2012, pp. 1–8.[27] A. Kirmani, A. Colaco, F.N.C. Wong, and V.K. Goyal, “CODAC: Acompressive depth acquisition camera framework,” in

Proc. IEEE Int.Conf. Acoust., Speech, and Signal Process. (ICASSP’12) , Mar. 2012, pp.5425–5428.[28] A. Kirmani, D. Venkatraman, D. Shin, A. Colaco, F.N. C. Wong, J.H.Shapiro, and V.K. Goyal, “First photon imaging,”

Science Magazine ,vol. 343, no. 6166, pp. 58–61, Nov. 2013.[29] R. I. Hartley and A. Zisserman,

Multiple View Geometry in Computer Vision , Cambridge University Press, Mar. 2004.[30] F. Chung and L. Lu, “Concentration inequalities and martingaleinequalities: a survey,”

Internet Mathematics , vol. 3, no. 1, pp. 79–127,2006.[31] M. N. Do and M. Vetterli, “The contourlet transform: An efﬁcientdirectional multiresolution image representation,”

IEEE Trans. ImageProcess. , vol. 14, no. 12, pp. 2091–2106, Dec. 2005.[32] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis insignal priors,”

Inverse Problems , vol. 23, pp. 947–968, Apr. 2007.[33] E. J. Cand`es and Y. Plan, “A probabilistic and RIPless theory ofcompressed sensing,”

IEEE Trans. Information Theory , vol. 57, no.11, pp. 7235–7254, Nov. 2011.[34] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proofof the restricted isometry property for random matrices,”

Const. Approx. ,vol. 28, no. 3, pp. 253–263, Dec. 2008.[35] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation,”

IEEETrans. Signal Process. , vol. 54, no. 11, pp. 4311–4322, Nov. 2006.[36] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color imagerestoration,”

IEEE Trans. Image Process. , vol. 17, no. 1, pp. 53–69, Jan.2008.[37] S. Mallat,

A Wavelet Tour of Signal Processing: The Sparse Way ,Academic Press, Dec. 2008.[38] D. D. Y. Po and M. N. Do, “Directional multiscale modeling of imagesusing the contourlet transform,”

IEEE Trans. Image Process. , vol. 15,no. 6, pp. 1610–1620, Jun. 2006.[39] E. J. Cand`es and D. L. Donoho, “Recovering edges in ill-posed inverseproblems: Optimality of curvelet frames,”

Annals of Statistics , , no. 3,pp. 784–842, Aug. 2002.[40] E. J. Cand`es and D. L. Donoho, “New tight frames of curvelets andoptimal representations of objects with piecewise C singularities,” Communications on Pure and Applied Mathematics , vol. 57, no. 2, pp.219–266, Feb. 2004.[41] E. Le Pennec and S. Mallat, “Bandelet image approximation andcompression,”

Multiscale Model. Simul. , vol. 4, no. 3, pp. 992–1039,2005.[42] M. Vetterli and J. Kovaˇcevi´c,

Wavelets and subband coding , PrenticeHall, 1995.[43] J. J. Moreau, “Proximit´e et dualtit´e danes un espace hilbertien,”

Bulletinde la Soci´et´e Math´ematique de France , vol. 93, pp. 273–299, 1965.[44] J. Eckstein and D. P. Bertsekas, “On the Douglas-Rachford splittingmethod and the proximal point algorithm for maximal monotone oper-ators,”

Math. Program. , vol. 55, no. 3, pp. 293–318, Jun. 1992.[45] J. Yang, Y. Zhang, and W. Yin, “An efﬁcient TVL1 algorithm fordeblurring multichannel images corrupted by impulsive noise,”

SIAM J.on Sci. Comput. , vol. 31, no. 4, pp. 2842–2865, Jul. 2009.[46] D. Han and X. Yuan, “A note on the alternating direction method ofmultipliers,”

J. of Optim. Theory and Applications , vol. 155, no. 1, pp.227–238, Oct. 2012.[47] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”

Found. Trends Mach. Learn. , vol. 3, no. 1, pp. 1–122,Jan. 2011.[48] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”

IEEE Trans. Info. Theory , vol. 55, no. 5, pp.2230–2249, May 2009.[49] L. Liu, S. H. Chan, and T. Q. Nguyen, “Depth reconstruction fromsparse samples: Representation, algorithm, and sampling (supplementarymaterial),” Available online at http://arxiv.org/abs/1407.3840.[50] S. H. Chan, T. Zickler, and Y. M. Lu, “Monte Carlo non-local means:Random sampling for large-scale image ﬁltering,”

IEEE Trans. ImageProcess. , vol. 23, no. 8, pp. 3711–3725, Aug. 2014.[51] F. Li, J. Yu, and J. Chai, “A hybrid camera for motion deblurring anddepth map super-resolution,” in

Proc. IEEE Int. Conf. Computer Visionand Pattern Recognition (CVPR’08) , Jun. 2008, pp. 1–8.[52] J. Park, H. Kim, Y. Tai, M.S. Brown, and I. Kweon, “High qualitydepth map upsampling for 3D-TOF cameras,” in

Proc. IEEE Int. Conf.Computer Vision (ICCV’11) , Nov. 2011, pp. 1623–1630.[53] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,”

Int. J. on Computer Vision ,vol. 47, no. 1-3, pp. 7–42, Apr. 2002.[54] Z. Lee, J. Juang, and T. Q. Nguyen, “Local disparity estimation withthree-moded cross census and advanced support weight,”

IEEE Trans.Multimedia , vol. 15, no. 8, pp. 1855–1864, Dec. 2013. r X i v : . [ c s . C V ] F e b Depth Reconstruction from Sparse Samples:Representation, Algorithm, and Sampling(Supplementary Material)

Lee-Kang Liu,

Student Member, IEEE , Stanley H. Chan,

Member, IEEE and Truong Q. Nguyen,

Fellow, IEEE

Abstract

The purpose of this supplementary report is to present experimental results on the selecting parameters for the proposed depthreconstruction algorithm.

I. D

EPTH R ECONSTRUCTION A LGORITHM

For completeness we ﬁrst recall the reconstruction algorithm. Referring to the main article, the optimization problem thatwe would like to solve is minimize x k Sx − b k + L X ℓ =1 λ ℓ k W ℓ Φ Tℓ x k + β k x k T V , (1)where Φ ℓ is a dictionary, W ℓ is the corresponding weighting matrix, S is the sampling pattern, and the vector b is theobservation. We consider L = 2 dictionaries, where Φ and Φ are the wavelet and contourlet dictionaries, respectively.To solve (1), we apply the alternating direction method of multipliers (ADMM). The ADMM algorithm tries to ﬁnd thestationary point of the augmented Lagrangian function, deﬁned as L ( x , u , u , r , v , w , y , y , z ) = 12 k b − Sr k + λ k W u k + λ k W u k + β k v k (2) − w T ( r − x ) − y T (cid:16) u − Φ T x (cid:17) − y T (cid:16) u − Φ T x (cid:17) − z T ( v − Dx )+ µ k r − x k + ρ k u − Φ T x k + ρ k u − Φ T x k + γ k v − Dx k . Note that in (2), we need to specify the regularization parameters ( λ , λ , β ) and internal parameters ( µ, ρ , ρ , γ ). The purposeof this supplementary report is to discuss how the parameters are chosen.II. P ARAMETER S ELECTION AND E XPERIMENTS

We now present how to choose the regularization parameters ( λ , λ , β ), and the internal parameters ( µ, ρ , ρ , γ ). A. Experimental Conﬁgurations

Before presenting results, we ﬁrst describe our experimental conﬁgurations. Testing disparity maps are chosen from Mid-dlebury datasets . All disparity values are normalized to the range [0 , . Figure 1 shows some examples of disparity maps.For the sampling patterns, we choose the uniformly random samples to minimize any bias towards the sampling. For waveletdictionary, we use “db2” wavelet function with decomposition level 2, and for contourlet dictionary, we set frequency partition“5, 6”. These settings are ﬁxed throughout the experiment.Aloe Art Baby2 Moebius Dolls Rocks1Fig. 1: Example disparity maps from Middlebury dataset. http://vision.middlebury.edu/stereo/data/ B. Regularization Parameters ( λ , λ , β ) We empirically evaluate the mean square error (MSE) by sweeping the parameters ( λ , λ , β ) from − to , with aﬁxed sampling rate of ξ = 0 . . The optimal values of the parameters are chosen to minimize the average MSE.Figure 2 shows the MSE curves for various images. For each plot, the MSE is computed by sweeping one parameter whileﬁxing the other parameters. Observing the top row of (2), we see that the optimal λ across all images is approximately locatedin the range of − ≤ λ ≤ − . Therefore, we select λ = 4 × − . Similarly, we can determine λ = 2 × − and β = 2 × − .We repeat the above analysis for ξ = 0 . . The results are shown in the bottom row of Figure 2. The result indicates thatwhile there are some difference in the MSE as compared to the top row, the optimal value does not change. Therefore, wekeep the parameters using the above settings. −6 −5 −4 −3 −2 −1 −3 λ M SE AloeArtBaby2MoebiusDollsRocks1 −6 −5 −4 −3 −2 −1 −3 λ M SE AloeArtBaby2MoebiusDollsRocks1 −6 −5 −4 −3 −2 −1 −3 β M SE AloeArtBaby2MoebiusDollsRocks1 λ , ξ = 0 . λ , ξ = 0 . β, ξ = 0 . −6 −5 −4 −3 −2 −1 −3 λ M SE AloeArtBaby2MoebiusDollsRocks1 −6 −5 −4 −3 −2 −1 −3 λ M SE AloeArtBaby2MoebiusDollsRocks1 −6 −5 −4 −3 −2 −1 −3 β M SE AloeArtBaby2MoebiusDollsRocks1 λ , ξ = 0 . λ , ξ = 0 . β, ξ = 0 . Fig. 2: Comparison of reconstruction performance with varying regularization parameters and depth images. For each plot, wesweep a parameter from − to while ﬁxing others to be our typical values. We set the sampling rate to be 20 % (1strow) and 10 % (2nd row). Typical values of regularization parameters are λ = 4 × − , λ = 2 × − and β = 2 × − . C. Internal Parameters ( µ, ρ , ρ , γ ) For µ, ρ , ρ , γ , we conduct a set of similar experiments as before. The results are shown in Figure 3 and Figure 4. Thecriteria to select the parameter is based on the convergence rate. This gives us ρ = 0 . , ρ = 0 . , µ = 0 . and γ = 0 . .Aloe −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Art −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Baby2 −4 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −4 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −4 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −4 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Moebius −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Dolls −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Fig. 3: MSE for ξ = 0 . . Aloe −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Art −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Baby2 −4 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −4 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −4 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −4 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Moebius −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Dolls −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE ρ =0.1000 ρ =0.0100 ρ =0.0010 ρ =0.0005 ρ =0.0001 −3 −2 −1 number of iteration M SE µ =0.1000 µ =0.0500 µ =0.0100 µ =0.0050 µ =0.0010 −3 −2 −1 number of iteration M SE γ =0.8000 γ =0.3000 γ =0.1000 γ =0.0100 γ =0.0010 ρ ρ µ γ Fig. 4: MSE for ξ = 0 . . III. S

UMMARY

We summarize our ﬁndings in Table I. We remark that the values in Table I are “typical” values that correspond to areasonable MSE on average. Of course, for a speciﬁc problem there exists a set of optimal parameters. However, from ourexperience, this set of parameters seems to be robust over a wide range of problems.Parameter Functionality Values λ Wavelet sparsity × − λ Contourlet sparsity × − β Total variation × − ρ Half quad. penalty for Wavelet . ρ Half quad. penalty for Contourlet . µ Half quad. penalty for r = x . γ Half quad. penalty for v = Dx .1