[PDF] Compressive Deconvolution in Random Mask Imaging

Abstract

We investigate the problem of reconstructing signals from a subsampled convolution of their modulated versions and a known filter. The problem is studied as applies to specific imaging systems relying on spatial phase modulation by randomly coded "masks." The diversity induced by the random masks is deemed to improve the conditioning of the deconvolution problem while maintaining sampling efficiency. We analyze a linear model of the system, where the joint effect of the spatial modulation, blurring, and spatial subsampling is represented by a measurement matrix. We provide a bound on the conditioning of this measurement matrix in terms of the number of masks, the dimension of the image, and certain characteristics of the blurring kernel and subsampling operator. The derived bound shows that stable deconvolution is possible with high probability even if the total number of (scalar) measurements is within a logarithmic factor of the image size. Furthermore, beyond a critical number of masks determined by the extent of blurring and subsampling, every additional mask improves the conditioning of the measurement matrix. We also consider a more interesting scenario where the target image is sparse. We show that under mild conditions on the blurring kernel, with high probability the measurement matrix is a restricted isometry when the number of masks is within a logarithmic factor of the sparsity of the image. Therefore, the image can be reconstructed using many sparse recovery algorithms such as the basis pursuit. The bound on the required number of masks is linear in sparsity of the image but it is logarithmic in its dimension. The bound provides a quantitative view of the effect of the blurring and subsampling on the required number of masks, which is critical for designing efficient imaging systems.

Full PDF

CCompressive Deconvolutionin Random Mask Imaging

Sohail Bahmani and Justin Romberg,

Senior Member

Abstract

We investigate the problem of reconstructing signals from a subsampled convolution of their modulated versionsand a known ﬁlter. The problem is studied as applies to a speciﬁc imaging architecture that relies on spatial phasemodulation by randomly coded “masks”. The diversity induced by the random masks is deemed to improve theconditioning of the deconvolution problem while maintaining sampling efﬁciency.We analyze a linear model of the imaging system, where the joint effect of the spatial modulation, blurring,and spatial subsampling is represented concisely by a measurement matrix. We provide a bound on the conditioningof this measurement matrix in terms of the number of masks K , the dimension (i.e., the pixel count) of the sceneimage L , and certain characteristics of the blurring kernel and subsampling operator. The derived bound shows thatstable deconvolution is possible with high probability even if the number of masks (i.e., K ) is as small as L log LN ,meaning that the total number of (scalar) measurements is within a logarithmic factor of the image size. Furthermore,beyond a critical number of masks determined by the extent of blurring and subsampling, use of every additionalmask improves the conditioning of the measurement matrix.We also consider a more interesting scenario where the target image is known to be sparse. We show that undermild conditions on the blurring kernel, with high probability the measurement matrix is a restricted isometry whenthe number of masks is within a logarithmic factor of the sparsity of the scene image. Therefore, the scene imagecan be reconstructed using any of the well-known sparse recovery algorithms such as the basis pursuit. The boundon the required number of masks grows linearly in sparsity of the scene image but logarithmically in its ambientdimension. The bound provides a quantitative view of the effect of the blurring and subsampling on the requirednumber of masks, which is critical for designing efﬁcient imaging systems. I. I

NTRODUCTION

In this paper, we investigate the mathematics of reconstructing a high-resolution image from measurements madeby a low-resolution sensor array. A schematic of this type of imaging system is shown in Figure 1: an image isfocused onto a spatial light modulator (SLM), passes through a blurring lens, and the resulting intensity image isintegrated over a relatively large region in space. The blurring lens spreads the energy in the image out spatially,allowing the sensor array to have gaps. We show that under mild conditions, the resolution of this type of systemis fundamentally limited by the resolution of the spatial light modulator; the spatial resolution of sensors and the

The authors are with the School of Electrical and Computer Engineering, Georgia Institute of Technology in Atlanta, GA. E-mail: { sohail.bahmani,jrom } @ece.gatech.edu. This work was supported by ONR grant N00014-11-1-0459, and NSF grants CCF-1415498 and CCF-1422540. a r X i v : . [ c s . I T ] S e p blurring lens play only a minor role. We also show that the diversity provided by the spatial light modulator makesthe deconvolution process stable.Our mathematical analysis uses an idealized model for this imaging system. We model the imaging process as alinear operator that maps an image into a set of indirect measurements. Multiple batches of these measurements aretaken, each with a different pattern on the SLM. The favorable conditioning in the inverse problem comes from usinga diverse set of patterns; we show that choosing the patterns at random results in improved acquisition efﬁciencyin both the least squares and sparse reconstruction scenarios. We show that the total number of measurementssufﬁcient for least squares reconstruction is within a logarithmic factor of the number of pixels in the image whichis nearly-optimal. For sparse reconstruction, a total number of masks within a logarithmic factor of the sparsitysufﬁces to achieve a stable deconvolution. Mathematically, these are statements about the singular values of boththe imaging matrix as a whole and the submatrices formed from subsets of its columns.Our main results, Theorems 1 and 2 in Section III, give stable reconstruction guarantees in terms of the number ofmasks K used, the number of sensors N in the array, the number of pixels L in the high-resolution reconstruction,and parameters that characterize the joint spatial response of the blurring and sampling system. These mathematicalresults give credence to the idea that diverse spatial light modulation makes spatial deconvolution a well-posedproblem, even in the presence of heavy subsampling. A. ContributionsStability of deconvolution via least squares:

The relation between the scene image and the measured samples inthe illustrated system can be described mathematically by a matrix. Therefore, the conditioning of this measurementmatrix determines the stability of an ordinary deconvolution based on the standard least squares. We quantify thenumber of random masks that is sufﬁcient to bound the conditioning of the measurement matrix. The obtainedbounds show that the blurring and the subsampling determine a critical number of masks needed to guaranteethat the measurement matrix is well-conditioned, and thus the deconvolution is stable. With every additional maskbeyond this critical number the guaranteed conditioning of the measurement matrix improves.

Sparse deconvolution:

We also study the case where the scene image is sparse. In this scenario, as the numberof samples can be much less than the number of the scene pixels, we refer to the reconstruction of the scene imageas compressive deconvolution akin to Compressive Sensing (CS) [1]–[3]. We characterize the sufﬁcient number ofmasks to guarantee the Restricted Isometry Property (RIP) for the (scaled) measurement matrix and thus successfulreconstruction of the scene image using (cid:96) -minimization. The established bound show that depending on the effectof the blurring and the subsampling, if the number of masks grow linearly with the sparsity of the scene imagebut merely logarithmically with its pixel count we can have suitable RIP constants for successful (cid:96) -minimization.Therefore, the number of measurements can potentially be signiﬁcantly smaller that the pixel count of the sceneimage. B. Background and related Work

Classical deconvolution techniques can be broadly categorized in two frameworks based on their approaches toregularization of the inverse problem. Methods of the ﬁrst category, including Wiener ﬁltering and a variety ofBayesian methods, assume some stochastic model for the image or the blurring kernel that is often applicationspeciﬁc. Methods of the second category, that are essentially some variants of the least squares, only use thedeterministic spatial or spectral structures of the image such as smoothness for regularization. For a comprehensivesurvey of classic deconvolution methods for image restoration and reconstruction we refer the interested readers to[4] and [5].In recent years there has been an increasing interest in the application of CS in various imaging modalitiesincluding but not limited to holography [6], coded aperture spectral imaging [7], ﬂuorescent microscopy [8], andsub-wavelength imaging [9]. The CS-based imaging systems are particularly interesting in applications where themeasurements are time-consuming or expensive. Furthermore, by exploiting the sparsity of the scene image, theCS imaging methods can operate at SNR regimes where conventional imaging methods may perform poorly. Asurvey of practical advantages and challenges of various CS imaging systems can be found in [10]. The ﬁrst CSimaging system was introduced as the “single-pixel camera” in [11] where a single sensor integrates the randomlymasked versions of the scene image for a few different masks. Effectively, the single-pixel camera measures theinner product of the scene image and the randomly generated masks. Using the fact that natural images are often(nearly) sparse in some basis, it is shown in [11] that CS allows accurate image reconstruction in this single-pixelarchitecture.In this paper, we consider the deconvolution problem in a random mask imaging system that is very similar tothe single-pixel camera; the main difference is that the reﬂections from the digital micromirror device (DMD) areblurred in a controlled manner in order to allow sampling with a few sensors. Our goals is to determine how theperformance of the system depends on the number of masks and the extent of the blurring and the subsampling.Based on an idealized mathematical model of the considered imaging system, we tie the number of masks sufﬁcientfor reconstruction of the scene image to certain characteristics of the blurring kernel and the subsampling operator.Because the integration that occurs at each sensor involves the convolution with the Point Spread Function (PSF)of the lens, our analysis has similarity with the analyses used for CS with random convolution [12]–[14]. Theseanalyses, address the problems where convolution with a known random signal/ﬁlter is of interest. In our problem,however, the convolution is deterministic and randomness occurs as spatial phase modulation. Our theoretical resultsare based on recent theoretical developments regarding random matrices and chaos processes.

C. Organization of the paper

Section II elaborates on the considered imaging architecture and the (idealized) formulation of the deconvolutionproblems. The main theoretical guarantees are stated in Section III. In Section III-A, we study the stability of thedeconvolution for generic scene images by analyzing the behavior of the extreme singular values of the measurementmatrix in terms of certain properties of the known blurring kernel and subsampling operator. Furthermore, in

Fig. 1: Schematic of the masked imaging system. The scene image is focused on a DMD that acts as a spatialphase modulator. The masked image reﬂected from the DMD is blurred by a lens and then spatially subsampledby a few sensors. The result is one set of measurements corresponding to the chosen DMD pattern.Section III-B, the performance guarantees of (cid:96) -minimization for recovering sparse images are stated in terms ofthe Restricted Isometry Property (RIP) of the measurement matrix. The proofs of these guarantees are provided inthe appendices. The theoretical guarantees are validated by the numerical simulations reported in Section IV. Theconcluding remarks are provided in Section V. D. Notation

We use the following notation convention throughout this paper. Matrices and vectors are denoted by bold capitaland small letters, respectively. The vectors e i for i = 1 , , . . . denote the canonical basis vectors that are zeroexcept at their i -th entry which is one. The diagonal matrix whose diagonal entries form a vector x is denotedby D x . The matrix of diagonal entries of a matrix X is denoted by diag ( X ) . The vector norms (cid:107)·(cid:107) p for p ≥ are the standard (cid:96) p -norms. The so called (cid:96) -norm, which counts the nonzero entries of its argument, is denoted by (cid:107)·(cid:107) . The matrix norms (cid:107)·(cid:107) and (cid:107)·(cid:107) F are used to denote the operator norm and the Frobenius norm, respectively.The largest (or the smallest) eigenvalue of symmetric matrices are denoted by λ max ( · ) (or λ min ( · ) ). Also, we use cond ( · ) to denote the condition number of matrices. Occasionally, expressions of the form f (cid:38) g (or f (cid:46) g ) areused that should be interpreted as f ≥ cg (or cf ≤ g ), where c > is some absolute constant. II. P

ROBLEM S ETUP

The purpose of this paper is to develop a principled understanding of the type of imaging architecture depictedin Figure 1. The central abstraction we make is to model the acquisition process as the application of a matrix to anunknown vector. This puts the image reconstruction problem squarely into the realm of linear algebra, allowing usto connect it to recent developed mathematics in that ﬁeld. In this section, we detail how this abstraction is made,and the algorithms we use to perform the reconstruction.The main physical assumption we make is that the mapping between the true image I ( t ) and a single measurementis a linear functional; each measurement can be written as y m = (cid:104) I ( t ) , h m ( t ) (cid:105) = ˆ I ( t ) h m ( t ) d t . The measurement “test functions” h m ( t ) , which like the image are functions of a continuous 2D spatial index, aredifferent for each sensor location and each pattern on the SLM. They also depend on the point spread functionfor the blurring lens, and the size and shape of the sensor. Our results will depend on general properties of theensemble of these functions, but we do not assume that they have any particular form. We will assume throughoutthat the h m ( t ) are known.We model the action of the spatial light modulator as follows: the input image is divided into small square regions,and a weight is applied uniformly over each region. In the schematic above, the SLM is depicted as a DMD, implyingthat our weights are binary values, and are either or . The mathematical analysis in the next section will use ± for the binary weights; the analysis is smoother with weights that are zero-mean. But if we have measurementsusing 0/1 weights φ , we can use the linearity of the system to easily generate the corresponding measurement φ − , which has entries of ± . All that is needed to perform this transformation is a single measurement with allof the weights equal to . While the the choice of zero-mean random masks have some importance for the purposeof theoretical analysis, the functionality of the imaging system and the reconstruction algorithms are independentof this choice. We support this notion by a simulation on a 2D image with binary (i.e., / ) masks in Section IV.As with any inverse problem whose solution we actually want to compute, we model the underlying image I ( t ) as lying in a ﬁnite dimensional subspace spanned by a set of known basis functions ψ (cid:96) ( t ) , (cid:96) = 1 , . . . , L . We canwrite I ( t ) = L (cid:88) (cid:96) =1 x (cid:63)(cid:96) ψ (cid:96) ( t ) . To reconstruct I , we estimate the vector of expansion coefﬁcients x ∈ R L . With the basis model in place, we cannow write the entire set of M measurements y as an M × L matrix H applied to x : y = Hx (cid:63) + error , where H ( m, (cid:96) ) = (cid:104) ψ (cid:96) ( t ) , h m ( t ) (cid:105) = ˆ h m ( t ) ψ (cid:96) ( t ) d t . The error term above is meant to encapsulate errors from all sources, including modeling inaccuracies (the underlyingimage does not truly lie in the span of the ψ (cid:96) ) and the presence of noise in the measurements.Our analysis uses a standard discretization: the ψ (cid:96) ( t ) are indicator functions on the same small square regionsover which the SLM divides the image. As we describe further below, this allows us to write the action of the SLMon the image expansion coefﬁcients as multiplication by a diagonal matrix. This opens a path for the mathematicalanalysis of the architecture. However, this not necessarily the only basis for which might be used for discretization;the algorithms used for the reconstruction only require us to provide the matrix H .For a given pattern on the SLM, whose weights are entries in the vector φ k , we divide the measurement operatorinto two parts: an L × L diagonal matrix D φ k which maps the { x ( i ) } to { φ k ( i ) x ( i ) } , and an N × L matrix G ,which maps the modulated coefﬁcients into the values measured at the sensor array: y k = GD φ k x (cid:63) + error . The measurements can be stacked and written compactly as y =  y y ... y K  = Hx (cid:63) + error , where the measurement matrix H is given by H =  GD φ GD φ ... GD φ K  . (1)The analysis below draws the entries in each φ k independently and taking values ± with equal probability. Thenumber of patterns used is K , for a total of M = KN measurements of the image.The matrix G models the joint action of the blurring lens and the sensor. If, for example, we model the sensorsas taking point samples, then the n th row of G will contain the inner products (cid:104) ψ (cid:96) ( t ) , p ( τ n − t ) (cid:105) , where p ( t ) isthe point spread function of the lens, and τ n is the sample location for sensor n . Alternatively, if we model thesensors as integrating over a certain region, we replace p ( · ) in the previous expression with its convolution withan indicator function over this region. As mentioned before, the analysis below does not depend on the particularphysical model that we use, but rather on general properties of G . Qualitatively, these amount to G not havingany blind spots — each part of the image inﬂuences the reading on at least one of the sensors — and each sensorhaving a distinct view of the image.In this paper, we will take G to be known, making the recovery of x (cid:63) a certain kind of generalized deconvolutionproblem. We analyze the performances of two algorithms, one of which assumes no structure in the image, and theother tailored to the case where the image is sparse. Deconvolution of generic scene images : In this scenario, the aim is to estimate the scene image x (cid:63) that hasno speciﬁc structure. For this deconvolution problem we analyze the least squares estimator (cid:98) x := argmin x (cid:107) Hx − y (cid:107) = K (cid:88) k =1 (cid:13)(cid:13) GD φ k x − y k (cid:13)(cid:13) . (2)The performance of the above least squares depends solely on the conditioning of the cumulative measurementmatrix H in (1). This is a matrix with structured randomness (since the φ k are random); we will show thatit is well-conditioned if the total number of measurements KN slightly larger than then number of pixels L we use to discretize the image.2) Deconvolution of sparse scene images : We also study the more interesting problem of recovering sparseimages in the described imaging system. With the measurement matrix H deﬁned in (1), we consider theestimator (cid:98) x := argmin x (cid:107) x (cid:107) (3)subject to Hx = y , which is inspired by the CS framework. Note that we consider only error-free measurements; the extensionto the case of corrupted measurements is the same as in well-known CS approaches. A common sufﬁcientcondition used in CS to guarantee accuracy of not only the (cid:96) -minimization algorithm, but also a variety ofgreedy algorithms, is the Restricted Isometry Property (RIP) (see [15] and references therein). A matrix A is said to satisfy the RIP with constant δ S ∈ (0 , if (1 − δ S ) (cid:107) x (cid:107) ≤ (cid:107) Ax (cid:107) ≤ (1 + δ S ) (cid:107) x (cid:107) , holds for all S -sparse vectors x . Exact recovery of S -sparse signals via (cid:96) -minimization is shown in [16]under the condition δ S < √ − , which was later improved to δ S < / (cid:0) √ (cid:1) in [17].We provide a sufﬁcient condition on the number of masks that can guarantee RIP for the measurement matrix(1). Our goal is to show that the sparse scene image can be recovered through (3), even if the number ofmasks grows much slower than the dimension of the target image at a rate that is almost linear in its sparsity.The constant factor in the rate is determined by the relative sensitivity of the measurements to different pixelsof the scene image. III. M AIN R ESULTS

To state the main results we use the shorthand ρ := (cid:107) G (cid:107) , θ max := max ≤ i ≤ L (cid:107) Ge i (cid:107) , θ min := min ≤ i ≤ L (cid:107) Ge i (cid:107) , respectively for the spectral norm, the largest column (cid:96) -norm, and the smallest column (cid:96) -norm of G .Intuitively, to ensure identiﬁability for every scene image in the imaging system described in Section II, themeasurements need to retain the relative intensity of the scene pixels to a great extent. For instance, no scene image with only a single active pixel should be mapped to an all-zero measurement (i.e., θ min > ). Similarly, there shouldbe no drastic ampliﬁcation of any pixel with respect to other pixels. These requirements translate into desirabilityof matrices G whose columns (cid:96) -norms are almost equal (i.e. θ max θ min ≈ ). In fact, straightforward calculations revealthat the expectation of the matrix H T H , which describes the system, has a condition number equal to (cid:16) θ max θ min (cid:17) . A. Deconvolution by Least Squares

As discussed above, identiﬁability of the scene image in the considered imaging system depends on certaincharacteristics of the matrix H T H where H is given by (1). The conditioning of the same matrix determines thestability of the estimate obtained by the least squares estimator (2). To verify this fact, it sufﬁces to observe thatthe least squares solution can be written as (cid:98) x = (cid:16) H T H (cid:17) − H T ( y (cid:63) + z ) = x (cid:63) + (cid:16) H T H (cid:17) − H T z , where y (cid:63) = Hx (cid:63) denotes the vector of error-free measurements and z is a measurement perturbation. It is clearfrom the equation above that the conditioning of (cid:16) H T H (cid:17) − H T , or equivalently that of H T H , determines thestability against the perturbation. The following theorem establishes a relation between the number of applied masks(i.e., K ) and the condition number of H T H . Theorem 1 (Conditioning of the least squares for deconvolution) . Suppose that K ≥ β + 1log 4 − δ − ρ θ log L, (4) for absolute constants δ ∈ [0 , and β > . Then with probability at least − L − β we have cond (cid:16) H T H (cid:17) ≤ δ − δ · θ θ . Remark . As the proof provided in the appendix shows, the bound in (4) can be improved slightly to K ≥ ( β + 1) max (cid:26) ρ ψ ( − δ ) θ , ρ ψ ( δ ) θ (cid:27) , where ψ ( t ) := (1 + t ) log (1 + t ) − t . The bound (4) is preferred merely because of its simpler expression. Remark . It is worthwhile to inspect the tightness of the bound imposed by (4) qualitatively. We can express theterm ρ /θ in (4) as the product of (cid:107) G (cid:107) F /θ and ρ / (cid:107) G (cid:107) F . As discussed above, for a ﬁxed number of masksthe more uneven the column norms of G are, the more unbalanced the pixel ampliﬁcation/attenuation and therebythe more unstable the least squares become. The ﬁrst term (i.e., (cid:107) G (cid:107) F /θ ) that varies in the interval [ L, ∞ ) ,measures how equally the energy of G is distributed among its columns. Another crucial factor that determines therequired number of masks is the redundancy of the measurements that can be captured by the rank of the matrix G . A severely rank-deﬁcient G provides less information per mask than a full-rank G . The second term mentionedabove (i.e., ρ / (cid:107) G (cid:107) F ) that varies in the interval [1 / rank ( G ) , quantiﬁes the dependence on the rank of G . Inthe ideal case of a full-rank G with equally normed columns (4) imposes K (cid:38) L log LN which is suboptimal merely by a factor of log L . In the case that G has columns of equal norm, but rank ( G ) = 1 , (4) requires K (cid:38) L log L that is also suboptimal by a mere factor of log L . B. Sparse Deconvolution by (cid:96) -Minimization It is also desirable to ensure that the (cid:96) -minimization (3) is stable and its performance degrades gracefullywith perturbations. Signiﬁcant shrinkage or expansion of the distances between different sparse signals under themeasurement matrix H can lead to ambiguity or noise ampliﬁcation, respectively. The RIP provides a formalcharacterization of the measurement matrices that have the desired properties. The next theorem establishes the RIPfor a properly scaled version of the matrix H . Below, the set of S -sparse vectors on the L -dimensional unit sphereis denoted by D S,L := (cid:8) x ∈ R L | (cid:107) x (cid:107) = 1 and (cid:107) x (cid:107) ≤ S (cid:9) . Theorem 2 (RIP for sparse deconvolution) . Let µ := θ θ ≥ , θ := θ + θ , and suppose that K (cid:38) δ − µ S log L, for some parameter δ ∈ (0 , . Then − µ − δµ + 1 ≤ θ · (cid:107) Hx (cid:107) K ≤ µ − δµ + 1 (5) holds for all x ∈ D S,L with probability at least − − CK min { δ/ (2 µ ) ,δ / (2 µ ) } , where C > is an absoluteconstant. Corollary 3.

Let the true signal x (cid:63) be S -sparse. If µ is sufﬁciently close to one and K (cid:38) µ S log L , then thereexist δ ∈ (0 , for which, (3) recovers x (cid:63) with probability at least − − CK min { δ/ (2 µ ) ,δ / (2 µ ) } , where C > is an absolute constant.Proof: It is known from the standard compressed sensing literature that if a measurement matrix satisﬁes theRIP of order S with a sufﬁciently small constant, then the associated (cid:96) -minimization exactly recovers any S -sparsesignal [15, and references therein]. Note that (5) guarantees that if K (cid:38) δ − µ S log L , then with probability atleast − − CK min { δ/ (2 µ ) ,δ / (2 µ ) } the matrix √ Kθ avg H obeys the RIP of order S with a constant no more than ( µ + 2 δ − / ( µ + 1) = 1 − − δ ) / ( µ + 1) . For a µ that is sufﬁciently close to one we can choose a δ ∈ (0 , for which the desired RIP constant can be achieved and thus the (cid:96) -minimization (3) successfully recovers x (cid:63) . Remark . Robustness of the proposed sparse deconvolution to noise can also be established using the standardRIP-based bounds known for (cid:96) -minimization [15, and references therein], but we do not reproduce them here.Note that, as can be expected, for large values of µ the RIP and thereby robustness cannot be guaranteed. IV. N

UMERICAL E XPERIMENTS

In this section, we provide numerical experiments in an effort to understand the empirical relationship betweenthe number of masks used and the success of our deconvolution methods.In Section IV-A below, we describe the aggregate results of a large number of small numerical experiments. Wepresent empirical phase transition diagrams of both the generic and the sparse deconvolution methods through aseries of Monte Carlo simulations on one-dimensional signals. The purpose here is not necessarily to mimic a realimaging system as close as possible, but rather to get a feel for how well random matrices of the general form (1)work for least squares and sparse recovery.In Section IV-B, we describe a single simulation that more closely matches a physical acquistion. We use arealistic point spread function for the blurring operation, and { , } weights for the spatial light modulator. Theimage is also not exactly sparse. We note that the recovery is successful despite the slight departure from what wasanalyzed theoretically. A. Deconvolution for 1D signals

We evaluated the empirical success rate of the proposed generic and sparse deconvolution methods on synthetic1D signals in relation to the number of masks K . In these simulations the subsampling operator, which models N = 4 sensors with uniform spacing, is ﬁxed during the trials. Furthermore, the simulated blurring is modeled bya random circulant matrix whose ﬁrst column p is a random blurring kernel drawn independently in each trial.Therefore, the matrix G consists of N = 4 uniformly spaced rows of the circulant matrix generated by p . Also, asmentioned in Section II the masks are drawn independently each with independent equiprobable ± entries.

1) Deconvolution by Least Squares:

We considered signals of dimension L = 2048 and varied the number ofmasks from K = 512 to K = 2048 with the steps of size . For the (blurring) ﬁlter (i.e., h ) we consider anall-pass model and a low-pass model. For the all-pass model the ﬁlter is drawn from a N ( , I ) distribution. For thelow-pass model, however, the ﬁlter is obtained by suppressing the DFT coefﬁcients of a standard normal randomvector whose indices are between and − leading to a random ﬁlter of “bandwidth” .Given that the stability of the least squares approach can be characterized by the conditioning of the associatedmeasurement matrix, we merely measured the ratio of the smallest and largest singular values of the matrix ofinterest. For each value of K we ran the simulation times and computed the , the , and quantilesof the condition number.Figures 2a and 2b show the condition number of the matrix associated with the least squares problem (2) inlogarithmic scale versus the number of masks K for the considered all-pass and low-pass models, respectively. Theﬁrst few values of K are cropped out because the matrix was severely ill-conditioned and the condition numberreached values in the order of . As expected, increasing the number of masks improves the considered conditionnumber. Furthermore, for any given number of masks the condition number for the all-pass model is better that thelow-pass model by a factor of about two. Comparison between the quantiles of the condition number at any givenvalue of K also suggests that the condition number has more ﬂuctuations in the case of the low-pass model.

640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 204810 Number of masks ( K ) σ max / σ min

10% quantile50% quantile90% quantile (a) All-pass model

640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 204810 Number of masks ( K ) σ max / σ min

10% quantile50% quantile90% quantile (b) Low-pass model with a bandwidth of 127

Fig. 2: Conditioning of the measurement matrix in the least squares problem (2) vs. K , the number of masks.

2) Sparse Deconvolution by (cid:96) -Minimization: For this set of simulations we considered signals of dimension L = 2048 whose nonzero entries are drawn independently from a standard normal distribution and are located on asupport set drawn uniformly at random. We varied the number of masks from K = 8 to K = 256 with the steps ofsize eight. The (blurring) ﬁlter is generated according to the random all-pass and low-pass models described abovefor the case of unstructured deconvolution. These models might not be reasonable in incoherent imaging wherethe signal and the ﬁlter are both nonnegative. However, our theoretical guarantees similar to other results in CSsuggests no signiﬁcant dependence on the sign of the signal that can affect the simulation results. For each valueof K the sparsity of the signal is increased as a multiple of eight until the successful recovery rate drops below . Among each of the trials ran for each pair of K and S (i.e., the (cid:96) -norm of the signal), those that yielda relative error no more than are counted as successful reconstructions.Figures 3a and 3b show the phase transition diagram of the estimate produced by (3) in terms of the numberof masks K and sparsity level S respectively for the considered all-pass and low-pass models. As can be seenfrom the ﬁgures phase transition boundary is almost linear in both models, which is in agreement with the relationbetween K and S suggested by the Theorem 2. While the phase transition boundary for the low-pass model isslightly lower (worse) than that of the all-pass model, the difference does not appear to be signiﬁcant. B. 2D Sparse Deconvolution

We also applied the (cid:96) -minimization technique to recover a 2D image from (synthetic) measurements in theconsidered masked imaging system and the result is depicted in Figure 4. Instead of reconstruction in the canonicalbasis, in this experiment we considered reconstruction of the image with respect to the 2D-DCT basis. Furthermore,the random masks are populated with i.i.d. equiprobable / entries rather than the ± entries used in the previous

32 64 96 128 160 192 224 256326496128160192224256288320352

Number of masks ( K ) Sp a r s i t y ( S ) P r o b a b ili t y o f s u cc e ss (a) All-pass model

32 64 96 128 160 192 224 256326496128160192224256288320352384

Number of masks ( K ) Sp a r s i t y ( S ) P r o b a b ili t y o f s u cc e ss (b) Low-pass model Fig. 3: The phase transition diagram of (cid:96) -minimization in terms of K , the number of masks, and S , the (cid:96) -normof the signals of dimension L = 2048 .simulations. The target image at the top left corner is a ﬂuorescent microscopy image that has L = 188 × pixels with -bit per color channel. Although the image appears to be sparse, only about of its pixels arezero-valued. In the 2D-DCT basis, however, the image is more compressible as, for instance, its squared (cid:96) -norm to (cid:96) -norm ratio is less than ≈ . L . The blurring kernel, a × PSF generated using the PSFGeneratorpackage [18], is depicted in the bottom left corner of Figure 4. For comparison, the blurred unmasked image isshown at the top center. We applied K = 50 random / masks and subsampled each of the blurred outputs by arate of / in each direction which yields N = 18 × scalar measurements per mask. These measurementsare shown at the center of the second row. The overall undersampling rate is K × N/L ≈ . We applied the (cid:96) -minimization with the same blur kernel for each color channel, independently. The relative error of estimatedimage obtained by (cid:96) -minimization, shown at the top right corner of Figure 4, is less than . We repeated thesame simulation with ± masks and the relative error and the quality of the estimate remained virtually the same.V. C ONCLUSION

In this paper we studied the deconvolution problem in an idealized random mask imaging system. We quantiﬁedthe number of masks that sufﬁces to solve the deconvolution problem stably. For generic scene images and depending The image is an adaptation of an image on Wikimedia Commons that is available online at: http://commons.wikimedia.org/wiki/File:S_cerevisiae_septins.jpg Fig. 4: Sparse deconvolution applied to ﬂuorescent micrograph of cell outlines (red) and septins (green) in

Saccharomyces . The ﬁrst column shows the original × scene image (top) and the × blur kernel(bottom). The second column shows the blurred image (top) for comparison and the masked, blurred, andsubsampled measurements (bottom). The last column shows the estimated image (top) and the absolute error(bottom).on the extent of the blurring and subsampling, we can have a well-conditioned deconvolution problem at the cost ofoversampling at a rate logarithmic in the dimension of the target image. For sparse scene images, however, stabledeconvolution through (cid:96) -minimization is possible at much lower sampling rates. The number of required masksfor stable sparse deconvolution can grow almost linearly in the sparsity at a rate much slower than the dimension ofthe target image. The established bounds can be satisfactory in certain regimes, but the sharpness of these boundsneeds further investigation. For example, the sample complexity stated in Theorem 2 might be pessimistic as itsdependence on the number of sensors is not ideal.Considering more realistic mathematical models of the random mask imaging system introduces interesting andchallenging problems that can be studied in the future. For example, analyzing the system under Poisson noise modelwould be of great interest as it is a more common and realistic model in optics. Furthermore, if the sensors onlymeasure intensity, we would obtain a phase retrieval problem whose theoretical analysis may requires signiﬁcantlydifferent approaches. A PPENDIX AP ROOF OF T HEOREM

Proof of Theorem 1.:

Let Z k = D φ k G T GD φ k . Our goal is to show that for sufﬁciently large K , the randommatrix (cid:80) Kk =1 Z k = H T H is well-conditioned with high probability. It is straightforward to verify that Z k (cid:60) and (cid:107) Z k (cid:107) = ρ . Furthermore, we have E (cid:34) K (cid:88) k =1 Z k (cid:35) = K diag (cid:16) G T G (cid:17) , from which we can deduce that λ min (cid:32) E (cid:34) K (cid:88) k =1 Z k (cid:35)(cid:33) = Kθ and λ max (cid:32) E (cid:34) K (cid:88) k =1 Z k (cid:35)(cid:33) = Kθ . Therefore, with ψ ( δ ) := (1 + δ ) log (1 + δ ) − δ , we can apply the matrix Chernoff bound [19, Corollary 5.2] andobtain P (cid:32) λ min (cid:32) K (cid:88) k =1 Z k (cid:33) ≤ (1 − δ ) Kθ (cid:33) ≤ L e − Kθ ψ ( − δ ) ρ for any δ ∈ [0 , and P (cid:32) λ max (cid:32) K (cid:88) k =1 Z k (cid:33) ≥ (1 + δ ) Kθ (cid:33) ≤ L e − Kθ ψ ( δ ) ρ , for any δ ≥ . In particular, using (4) and the fact that ψ ( ± δ ) ≥ (log 4 − δ for all δ ∈ (0 , , it is straightforwardto show that P (cid:32) λ min (cid:32) K (cid:88) k =1 Z k (cid:33) ≤ (1 − δ ) Kθ (cid:33) ≤ L − β and P (cid:32) λ max (cid:32) K (cid:88) k =1 Z k (cid:33) ≥ (1 + δ ) Kθ (cid:33) ≤ L − β . Therefore, for values of K that obey (4) we have λ max (cid:16)(cid:80) Kk =1 Z k (cid:17) λ min (cid:16)(cid:80) Kk =1 Z k (cid:17) ≤ δ − δ · θ θ , with probability at least − L − β . A PPENDIX BP ROOF OF T HEOREM A x := GD x and (cid:101) A x = I K × K ⊗ A x =  A x . . . A x  . Furthermore, deﬁne A := (cid:110) (cid:101) A x | x ∈ D S,L (cid:111) . (6)To state the proof, we also need to deﬁne certain quantities as follows. Let d F ( A ) := sup A ∈A (cid:107) A (cid:107) F = sup x ∈D S,L (cid:13)(cid:13)(cid:13) (cid:101) A x (cid:13)(cid:13)(cid:13) F = √ K sup x ∈D S,L (cid:107) A x (cid:107) F = √ Kθ max (7)and d ( A ) := sup A ∈A (cid:107) A (cid:107) = sup x ∈D S,L (cid:13)(cid:13)(cid:13) (cid:101) A x (cid:13)(cid:13)(cid:13) = sup x ∈D S,L (cid:107) A x (cid:107) = θ max . (8)Let γ ( A , (cid:107)·(cid:107) ) be the Talagrand’s functional [20] for the set A with respect to the operator norm. It is known thatthis functional can be bounded from above by the Dudley’s integral as γ ( A , (cid:107)·(cid:107) ) ≤ c ˆ d ( A )0 (cid:112) log N ( A , (cid:107)·(cid:107) , u )d u, (9)where N ( X , (cid:107)·(cid:107) X , ε ) denotes the covering number of a set X with respect to ε -balls of the norm (cid:107)·(cid:107) X [20]. Proofof Theorem 2 follows from the results of [21]. In particular, we use the following theorem. Theorem 4 (Krahmer et al [21, Theorem 1.4]) . Let A be a symmetric set of matrices (i.e., A = −A ) and φ be avector of i.i.d. Rademacher random variables. Then for some positive absolute constants c and c we have P (cid:18) sup A ∈A (cid:12)(cid:12)(cid:12) (cid:107) Aφ (cid:107) − E (cid:107) Aφ (cid:107) (cid:12)(cid:12)(cid:12) ≥ c E + t (cid:19) ≤ − c min (cid:110) t V , tU (cid:111) , where E = γ ( A , (cid:107)·(cid:107) ) ( γ ( A , (cid:107)·(cid:107) ) + d F ( A ))+ d F ( A ) d ( A ) , U = d ( A ) , and V = d ( A ) ( γ ( A , (cid:107)·(cid:107) ) + d F ( A )) .Proof of Theorem 2.: It can be easily veriﬁed that (cid:101) A x (cid:104) φ T1 φ T2 · · · φ T K (cid:105) T = Hx . Therefore, the conditioning of H on sparse vectors can be obtained by means of appropriate tail bounds for (cid:13)(cid:13)(cid:13) (cid:101) A x φ (cid:13)(cid:13)(cid:13) with x ∈ D S,L and φ being a vector of KL i.i.d. Rademacher random variables. Our goal is to apply Theorem 4 for the set A deﬁned by (6), to obtain the desired tail bounds. It follows from Theorem 4 that for some positiveabsolute constants c and c we have sup x ∈D S,L (cid:107) Hx (cid:107) ≤ sup x ∈D S,L E (cid:13)(cid:13)(cid:13) (cid:101) A x φ (cid:13)(cid:13)(cid:13) + c E + t = sup x ∈D S,L K (cid:107) A x (cid:107) F + c E + t ≤ Kθ + c E + t, (10)and inf x ∈D S,L (cid:107) Hx (cid:107) ≥ inf x ∈D S,L E (cid:13)(cid:13)(cid:13) (cid:101) A x φ (cid:13)(cid:13)(cid:13) − c E − t = inf x ∈D S,L K (cid:107) A x (cid:107) F − c E − t ≥ Kθ − c E − t (11)with probability at least − − c min (cid:110) tU , t V (cid:111) . It only remains to properly bound the quantities U , V , and E andchoose a reasonable value for t .First we need to bound γ ( A , (cid:107)·(cid:107) ) . Using the special structure of (cid:101) A x we deduce that (cid:13)(cid:13)(cid:13) (cid:101) A x − (cid:101) A x (cid:48) (cid:13)(cid:13)(cid:13) = (cid:107) A x − A x (cid:48) (cid:107)≤ (cid:107) G (cid:107) (cid:107) x − x (cid:48) (cid:107) ∞ = ρ (cid:107) x − x (cid:48) (cid:107) ∞ . Therefore, N ( A , (cid:107)·(cid:107) , u ) ≤ N ( D S,L , ρ (cid:107)·(cid:107) ∞ , u ) . Then a simple volumetric argument yields N ( D S,L , ρ (cid:107)·(cid:107) ∞ , u ) ≤ (cid:18) LS (cid:19) (cid:18) ρu (cid:19) S ≤ (cid:18) LeS (cid:18) ρu (cid:19)(cid:19) S . Thus, using (9) and for sufﬁciently large absolute constant β , we can write γ ( A , (cid:107)·(cid:107) ) ≤ c ˆ θ max (cid:112) log N ( A , (cid:107)·(cid:107) , u )d u ≤ c ˆ θ max (cid:113) log N ( D S,L , ρ (cid:107)·(cid:107) ∞ , u )d u ≤ c ˆ θ max (cid:115) S log LeS + S log (cid:18) ρu (cid:19) d u ≤ c ˆ θ max (cid:114) S log LeS + (cid:115) S log (cid:18) ρu (cid:19) d u ≤ c θ max √ S (cid:32)(cid:114) log LeS + 2 (cid:115) log (cid:18) ρθ max (cid:19)(cid:33) ≤ θ max (cid:112) β S log L, where the last two inequalities follow from the bounds ˆ α (cid:115) log (cid:18) u (cid:19) d u ≤ α (cid:115) log (cid:18) α (cid:19) with α = θ max ρ and ρ = (cid:107) G (cid:107) ≤ (cid:107) G (cid:107) F ≤ θ max √ L, respectively. Then we can write U = d ( A ) = θ , V = d ( A ) ( γ ( A , (cid:107)·(cid:107) ) + d F ( A )) ≤ θ (cid:16)(cid:112) β S log L + √ K (cid:17) , and E = γ ( A , (cid:107)·(cid:107) ) ( γ ( A , (cid:107)·(cid:107) ) + d F ( A )) + d F ( A ) d ( A ) ≤ θ (cid:16) β S log L + (cid:112) β KS log L + √ K (cid:17) . Setting t = δθ (cid:16) √ β S log L + √ K (cid:17) √ K we have c E + t ≤ c θ (cid:16) β S log L + (cid:112) β KS log L + √ K (cid:17) + δθ (cid:16)(cid:112) β S log L + √ K (cid:17) √ K (12)Recall from the statement of the theorem that µ = θ θ . We would like to upper bound the right-hand side of (12)by δKθ . The desired bound can be interpreted as a quadratic polynomial being nonnegative at √ K . It sufﬁcesthat √ K is greater than the largest root of the polynomial. Therefore, straightforward algebra shows that if K obeys K ≥ (cid:32)(cid:18)(cid:112) β + 2 c δ · µ (cid:16)(cid:112) β + 2 (cid:17)(cid:19) + 4 c δ µβ (cid:33) S log L = O (cid:0) δ − µ S log L (cid:1) , then the right-hand side of (12) is bounded from above by δKθ and thus c E + t ≤ δKθ . (13)Furthermore, we have tU ≥ δ µ K and tV ≥ δ µ √ K which imply e − c min (cid:110) t V , tU (cid:111) ≤ e − c K min (cid:110) δ µ , δ µ (cid:111) . (14) Applying (13) to (10) and (11) and bounding the tail probability using (14) shows that with probability at least − − c K min { δ/ (2 µ ) ,δ / (2 µ ) } the inequalities (1 − δ ) Kθ ≤ (cid:107) Hx (cid:107) ≤ K (cid:0) θ + δθ (cid:1) , which are equivalent to (5), hold for all x ∈ D S,L .R EFERENCES[1] E. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequencyinformation,”

Information Theory, IEEE Transactions on , vol. 52, no. 2, pp. 489–509, Feb. 2006.[2] E. J. Cand`es and T. Tao, “Near optimal signal recovery from random projections: universal encoding strategies?”

IEEE Transactions onInformation Theory , vol. 52, no. 12, pp. 5406–5425, Dec. 2006.[3] D. L. Donoho, “Compressed sensing,”

IEEE Transactions on Information Theory , vol. 52, no. 4, pp. 1289–1306, 2006.[4] M. Banham and A. Katsaggelos, “Digital image restoration,”

IEEE Signal Processing Magazine , vol. 14, no. 2, pp. 24–41, Mar. 1997.[5] R. Puetter, T. Gosnell, and A. Yahil, “Digital image reconstruction: deblurring and denoising,”

Annual Review of Astronomy andAstrophysics , vol. 43, no. 1, pp. 139–194, 2005.[6] D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive holography,”

Optics Express , vol. 17, no. 15, pp. 13 040–13 049,Jul 2009.[7] G. Arce, D. Brady, L. Carin, H. Arguello, and D. Kittle, “Compressive coded aperture spectral imaging: An introduction,”

Signal ProcessingMagazine, IEEE , vol. 31, no. 1, pp. 105–115, Jan 2014.[8] V. Studer, J. Bobin, M. Chahid, H. S. Mousavi, E. Cand`es, and M. Dahan, “Compressive ﬂuorescence microscopy for biological andhyperspectral imaging,”

Proceedings of the National Academy of Sciences , vol. 109, no. 26, pp. E1679–E1687, 2012.[9] S. Gazit, A. Szameit, Y. C. Eldar, and M. Segev, “Super-resolution and reconstruction of sparse sub-wavelength images,”

Optics Express ,vol. 17, no. 26, pp. 23 920–23 946, Dec 2009.[10] R. M. Willett, R. F. Marcia, and J. M. Nichols, “Compressed sensing for practical optical imaging systems: a tutorial,”

OpticalEngineering , vol. 50, no. 7, pp. 072 601–1–072 601–13, 2011. [Online]. Available: http://dx.doi.org/10.1117/1.3596602[11] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressivesampling,”

IEEE Signal Processing Magazine , vol. 25, no. 2, pp. 83–91, Mar. 2008.[12] J. Romberg, “Compressive sensing by random convolution,”

SIAM Journal on Imaging Sciences , vol. 2, no. 4, pp. 1098–1128, 2009.[13] H. Rauhut,

Theoretical foundations and numerical methods for sparse recovery , ser. Radon series on computational and applied mathematics.deGruyter, 2010, vol. 9, ch. Compressive sensing and structured random matrices, pp. 1–92.[14] H. Rauhut, J. Romberg, and J. A. Tropp, “Restricted isometries for partial random circulant matrices,”

Applied and Computational HarmonicAnalysis , vol. 32, no. 2, pp. 242–254, 2012.[15] S. Foucart, “Sparse recovery algorithms: sufﬁcient conditions in terms of restricted isometry constants,” in

Approximation Theory XIII:San Antonio 2010 , ser. Springer Proceedings in Mathematics, vol. 13. San Antonio, TX: Springer New York, 2012, pp. 65–77.[16] E. J. Cand`es, “The restricted isometry property and its implications for compressed sensing,”

Comptes Rendus Math´ematique , vol. 346,no. 910, pp. 589 – 592, 2008.[17] S. Foucart, “A note on guaranteed sparse recovery via (cid:96) -minimization,” Applied and Computational Harmonic Analysis , vol. 29, no. 1,pp. 97 – 103, 2010.[18] H. Kirshner, F. Aguet, D. Sage, and M. Unser, “3-D PSF ﬁtting for ﬂuorescence microscopy: Implementation and localization application,”

Journal of Microscopy

Foundations of Computational Mathematics , Aug. 2011.[20] M. Talagrand,

The Generic Chaining , ser. Springer Monographs in Mathematics. Springer, 2005.[21] F. Krahmer, S. Mendelson, and H. Rauhut, “Suprema of chaos processes and the restricted isometry property,”