Compressive Spectral Image Reconstruction using Deep Prior and Low-Rank Tensor Representation
CCompressive Spectral Image Reconstructionusing Deep Prior and Low-Rank TensorRepresentation J ORGE B ACCA , Y
ESID F ONSECA , AND H ENRY A RGUELLO * Department of Systems Engineering, Universidad Industrial de Santander, Bucaramanga, Colombia * [email protected] Abstract:
Compressive spectral imaging (CSI) has emerged as an alternative spectral imageacquisition technology, which reduces the number of measurements at the cost of requiring arecovery process. In general, the reconstruction methods are based on hand-crafted priors usedas regularizers in optimization algorithms or recent deep neural networks employed as an imagegenerator to learn a non-linear mapping from the low-dimensional compressed measurementsto the image space. However, these data-driven methods need many spectral images to obtaingood performance. In this work, a deep recovery framework for CSI without training data ispresented. The proposed method is based on the fact that the structure of some deep neuralnetworks and an appropriated low-dimensional structure are sufficient to impose a structure ofthe underlying spectral image from CSI. We analyzed the low-dimension structure via the Tuckerrepresentation, modeled in the first net layer. The proposed scheme is obtained by minimizingthe ℓ -norm distance between the compressive measurements and the predicted measurements,and the desired recovered spectral image is formed just before the forward operator. Simulatedand experimental results verify the effectiveness of the proposed method. © 2021 Optical Society of America
1. Introduction
Spectral imaging (SI) deals with capturing the spatial information of a target in a broader range ofthe electromagnetic spectrum compared to a conventional RGB imaging system. This additionalinformation is useful for some applications such as biomedical imaging [1], crop identification [2],and surveillance [3]. SI can be denoted as a 3D tensor X ∈ R 𝑀 × 𝑁 × 𝐿 with 𝑀 × 𝑁 as the spatialpixels and 𝐿 spectral bands [2]. Traditional methods to acquire SI are based on scanning alongone of its tensor modes, which results in time-consuming systems, and therefore, prohibits itsusage in dynamic scenes [4].Alternatively, based on the compressive sensing (CS) theory, new imaging snapshots systemsacquire 2D multiplexed projections of a scene instead of directly acquire all voxels, resulting inan image compression via hardware [5]. To date, different compressive spectral imaging (CSI)techniques have been proposed [6–11]. For instance, the pioneer coded aperture snapshot spectralimaging (CASSI) system [10] uses optical elements to encode and disperse the incoming lightto acquire 2D intensity projections. Even though CSI yield efficient sensing, a reconstructionprocess from the compressed measurements is needed, since it results in finding a solution toan under-determined system [5]. This recovery problem is addressed by representing the 3Dscene as a 1D vector and assuming particular spectral image nature priors in different dimensionsused as regularization in an optimization problem [4, 12]. For instance, [13, 14] assume low totalvariation, [7,9] explore the sparsity assumption of the scene in some orthogonal basis, [15,16] usenon-local similarity, and [17, 18] employ low-rank structures. However, these hand-crafted priorsdo not often represent the wide variety and non-linearity of spectral images, and the vectorizationignores the high-dimensional structure of the scene, resulting in low reconstruction quality [19].On the other hand, data-driven recovery methods are based on the power of the deep neural a r X i v : . [ ee ss . I V ] J a n etworks as image generators, where the goal is to learn a non-linear transformation that maps alow-dimensional feature into realistic spectral images [20]. In particular, with a vast spectral dataset, [21–24] learn inverse networks that map the low-dimensional compressed measurementsto the desired spectral image [25]. These methods have shown high performance speed andreconstrucion quality. However, they are very dependent on training data, and small variations inthe sensing system would require re-training of the model [19]. Alternative solutions such as [26],takes the sensing model into account when solving an optimization problem where the prior islearned using convolutional auto-encoder with a spectral data set, and more recently [19, 26–28]use unrolled-based methods that incorporate the sensing process into the deep network design,where the prior is intrinsically learned through end-to-end optimization. Although these methodshave proven to be more general, they still depend on training data.In this paper, a deep recovery framework for reconstructing spectral images from CSImeasurements without training data requirements is proposed. The method is based on the factthat the deep convolutional neural networks and the appropriated low-dimensional representationare sufficient to learn/generate the image representation without any training data, and therefore, torecover a spectral image directly from the CSI measurements. In particular, the proposed methoddesigns a deep neural network, where the first layer learns a low-dimensional 3D tensor, which isthen refined by convolutional operations to generate the desired reconstruction. The weights ofthis neural network are randomly initialized and fitted to guarantee that the reconstruction suits theCSI measurement via ℓ -norm minimization over the CSI measurement; therefore, the recoveredimage is formed just before the forward operator. The proposed method is expressed as anend-to-end optimization by modeling the forward compressive sensing model as a non-trainablelayer; consequently, it can be solved using any deep learning algorithm like stochastic gradientdescent. Additionally, we analyzed the importance of the low-dimensional tensor structure inthe first layer via low-rank Tucker representation, which imposes a low-rank 3D-prior. Sincethere is no more information available other than the compressive spectral measurements, theproposed method is more related to hand-crafted techniques. Results in simulated and real datademonstrate that the proposed method outperforms the hand-crafted methods in many scenariosand obtains comparable results with data-driven approaches.
2. Related work
The traditional CS recovery algorithms are considered hand-designed since they use some expertknowledge of the signal, known as a signal prior [26]. These methods are based on optimizationtechniques that design a data fidelity term, and incorporate the prior as a regularization term [29].The most common prior is assuming that the signal is sparse on a given basis, such as Wavelet [30],discrete cosine transform (DCT) [5], among others [5]. This sparsity assumption is imposed indifferent methods by applying ℓ or ℓ regularizers. Examples of algorithms that use sparsitypriors include, the GPSR [29], ADMM [31], CSALSA [32], ISTA [33], AMP [34] among others.In CSI, some specific kinds of prior are used. For instance, [9] assumes low total variation, [7]explores the spatial sparsity assumption of the scene in Wavelet domain, and the spectral sparsityin the DCT domain [15, 16]; furthermore, [17, 18] employ low-rank structures based on the linearmixture model. Exploring tensor structure, low-rank tensor recovery methods have been alsoproposed [12, 35]. However, these hand-crafted methods require expert knowledge of the target toselect which prior to use. Therefore, they do not represent the wide variety and the non-linearityof spectral image representations. .2. Data-Driven CS Reconstruction Data-driven recovery methods are based on learning a non-linear inverse mapping from thecompressive measurements to a realistic image. In particular, with a vast dataset of ground-truthand compressive measurement pairs, these methods are used to learn a non-linear network byminimizing the distance between the output of the net and the ground-truth. The main differencebetween the state-of-the-art methods is their network architecture. For instance, [36] learnsa stacked auto-encoder, convolution layers are applied in [37], and convolutional, residual,and fully-connected layers are also used in [38–41]. In particular, for CSI, [22] was the firstwork that used a data-driven approach, where, an initialization obtained from TwiST [42] wasrefined using denoising networks; [19] proposed a particular model to explore the spatial andspectral information and to design the coded aperture usually included in CSI architectures.Furthermore, based on the structure of the U-net, [24] proposed a non-linear mapping replacingthe 2D for 3D convolutions, and [23] developed a generative model based on the U-net. Thesemethods have shown high performance in reconstruction quality, and once trained, they allowreal-time reconstruction. However, these approaches are highly dependent on the data-set used.Furthermore, small-variations in the compressive measurements, such as type of noise or changesin the sensing matrix, would require a time-consuming re-training.
Recently, some works have considered the sensing model to proposed a mixed approach whichconsiders the hand crafted as well as the data-driven CS reconstruction. In particular, thesemethods use a deep network or denoiser to replace the hand-crafted prior, then, this non-linearprior is employed in the optimization algorithm [38]. For instance, Plug-and-play priors (PnP)use pre-existing denoisers as a proximal step [43, 44], [45] learns the proximal mapping using aconvolutional network, and [26] learns a SI prior, through a convolutional autoencoder, which isthen incorporated into the optimization problem. More recently, D-AMP [46], ISTA-Net [47],ADMM-Net [48], and DNU [28] use the unrolled based method that incorporates the optimizationsteps into the deep network architecture using residual networks; consequently, they can learnthe prior and the parameters via end-to-end training. This strategy is also employed for CSIin [19, 27]. Although these methods have proven to be more general, they still depend on trainingdata, which is limited in SI.
The generative model (GM) has been used for CS recovery [49]. The goal in GM is to generate arealistic image from a low-dimensional latent representation. For instance, [49, 50] use a pre-trained deep neural network and obtains the low-dimensional representation, which minimizes thedistance between the compressive measurements and the output of the net. On the other hand, [51]shows that a pre-trained network is not necessary. Instead of finding the low-dimensional latentspace, [51] uses a fixed random variable as latent space, then the weights of the model are updatedto obtain an optimal result. The drawback of this method is its sensitivity to changes in theapplication, the fixed latent space or the network architecture, which usually require small randomdisturbances to obtain a good performance. The proposed method in this work is closely relatedto [50, 51], where the parameters of the network model and the low-dimensional representation(based on a Tucker representation, which is useful for SI) are optimized in an end-to-end approachfor a CSI architecture.
Notation:
Through the paper, vectors are represented with boldface lowercase letters, e.g., 𝒙 , and matricesare denoted as boldface capital letters X . The 3D tensors are denoted as X ∈ R 𝑀 × 𝑁 × 𝐿 andhe 1-mode product of a tensor X 𝑜 ∈ R 𝑀 𝑝 × 𝑁 𝑝 × 𝐿 𝑝 with a matrix U ∈ R 𝑀 × 𝑀 𝑝 is written as X = X 𝑜 × U where X ∈ R 𝑀 × 𝑁 𝑝 × 𝐿 𝑝 , and X ( 𝑚,𝑛,ℓ ) = 𝑀 𝑝 ∑︁ ˆ 𝑚 = U ( 𝑚, ˆ 𝑚 ) X 𝑜 ( ˆ 𝑚,𝑛,ℓ ) . In the same way, the 2-mode and 3-mode products can be defined. We introduce the functionshift ℓ (·) : R 𝑀 × 𝑁 → R 𝑀 ×( 𝑁 + 𝐿 − ) which refers to a shifting operator, i.e., for a given X we havethat shift ℓ ( X ) : = (cid:40) X ( 𝑚,𝑛 − ℓ ) , if 1 ≤ 𝑛 − 𝑙 ≤ 𝑁 , otherwise . Finally, the function vect (·) : R 𝑀 × 𝑁 × 𝐿 → R 𝑀 𝑁 𝐿 represents the vectorization of a tensor.
3. Compressed Measurements Acquisition
The CASSI sensing approach is used in order to acquire the compressed measurements of aspectral scene [10]. This architecture is composed of three main optical elements: a codedaperture, a prism as a dispersive element, and a gray-scale detector, as illustrated in Fig 1. Thespatial-spectral data cube is represented as X ∈ R 𝑀 × 𝑁 × 𝐿 with 𝑀 × 𝑁 spatial dimensions, 𝐿 spectral bands, and X ℓ ∈ R 𝑀 × 𝑁 denotes the 2D spectral intensity image of X at the ℓ -th spectralband. As shown in Fig. 1, each spatial position of the scene is modulated by a coded aperture C ∈ { , } 𝑀 × 𝑁 , which block/unblock the incoming light, then, the coded spectral scene passesthrough the prism creating a horizontal shifting. Finally, the coded shifted spectral scene isintegrated along the spectral axis by the detector, resulting in the 2D compressed measurement Y ∈ R 𝑀 ×( 𝑁 + 𝐿 − ) . In CSI, it is possible to acquire 𝑆 < 𝐿 different measurement snapshots ofthe same spectral data cube employing 𝑆 different patterns in the coded aperture. Therefore, theoutput of the sensing process at the 𝑠 -th spectral snapshot can be mathematically expressed as Y ( 𝑠 ) = 𝐿 ∑︁ ℓ = shift ℓ − (cid:16) X ℓ (cid:12) C ( 𝑠 ) (cid:17) , (1)where the ℓ -th spectral band, X ℓ , of the tensor X is shifted with the operator shift ℓ − (·) , and (cid:12) denotes the element-wise product with the 2D coded aperture C ( 𝑠 ) .The CASSI sensing model can be seen as a linear operator, after stacking the measurementsof multiple shots as 𝒚 = [ vect ( Y ( ) ) 𝑇 , · · · vect ( Y ( 𝑆 ) ) 𝑇 ] . Thus, the system matrix model can beexpressed as 𝒚 = H vect ( X ) , (2)where H ∈ R 𝑆𝑀 ( 𝑁 + 𝐿 − )× 𝑀 𝑁 𝐿 represents the linear sensing matrix of CASSI.
Scene Codedaperture Encoded scene Prism Shifted encoded scene Measurements
Fig. 1. Physical sensing phenomena in CASSI, which is the CSI prototype used tovalidate the proposed approach.ig. 2. Visual representation of the proposed deep neural scheme, where the boxeswith background color represent the learning parameters, the white box stand for thenon-trainable CSI system, and the non-box blocks represent the outputs of the layers.
4. Compressive Spectral Reconstruction
The goal in CSI is to recover the spectral image X ∈ R 𝑀 × 𝑁 × 𝐿 from the compressive measurements 𝒚 . Since 𝑆𝑀 ( 𝑁 + 𝐿 − ) (cid:28) 𝑀 𝑁 𝐿 , this problem consists in solving an undetermined system,which is addressed by restricting the feasible set of solutions using image priors as regularizers.A tensor formulation for addressing this problem is described belowminimize Z 𝑜 ∈ R 𝑀 × 𝑁 × 𝐿 (cid:107) 𝒚 − H vect ( X )(cid:107) + 𝜆 · 𝜙 ( Z (cid:48) 𝑜 ) (3)subject to X = Z (cid:48) 𝑜 × U (cid:48) × V (cid:48) × W (cid:48) , where the matrices U (cid:48) ∈ R 𝑀 × 𝑀 , V (cid:48) ∈ R 𝑁 × 𝑁 and W (cid:48) ∈ R 𝐿 × 𝐿 are fixed and known orthogonalmatrices, which usually are the matrix representation of the Wavelet and the Discrete Cosinetransforms; Z (cid:48) 𝑜 is the representation of the spectral image in the given basis and 𝜙 (·) : R 𝑀 × 𝑁 × 𝐿 → R is a regularization function that imposes particular image priors with 𝜆 as the regularizationparameter [29].Unlike the hand-craft priors as sparsity [5], we explore the power of some deep neural networksas image generators that map a low-dimensional feature tensor Z ∈ R 𝑀 × 𝑁 × 𝐿 to the image as X = M 𝜽 ( Z ) , (4)where M 𝜽 (·) represents a deep network, with 𝜽 as the net-parameters. To ensure a low-dimensional structure over the feature tensor, this work used the Tucker representation, i.e., Z = Z 𝑜 × U × V × W with Z 𝑜 ∈ R 𝑀 𝜌 × 𝑁 𝜌 × 𝐿 𝜌 as a 3D low dimensional tensor, with 𝑀 𝜌 < 𝑀 , 𝑁 𝜌 < 𝑁 and 𝐿 𝜌 < 𝐿 . This representation maintains the 3D structure of the spectral images,exploits the inherent low-rank of this data [52, 53], and also implicitly constraint the output X ina low-dimensional manifold via the architecture and the weights of the net [50].In this paper, we are focused in a blind representation, where instead of have a pre-trainingnetwork or huge amount of data to train this deep neural representation, we express an optimizationproblem which learns the weight 𝜽 in the generative network M 𝜽 and also the tensor feature Z with its Tucker representation elements as Z 𝑜 , U , V and W . All the parameters of thisoptimization problem are randomly initialized and the only available information are thecompressive measurements and the sensing model, i.e, the optimization problem is data trainingindependent. In particular, we explore the prior implicitly captured by the choice of the generatornetwork structure, which is usually composed of convolutional operations, and the importanceof the low-rank representation feature, therefore, the proposed method consists of solving thefollowing optimization problemminimize 𝜽 , Z 𝑜 , U , V , W (cid:107) 𝒚 − H vect (M 𝜽 ( Z ))(cid:107) (5)subject to Z = Z 𝑜 × U × V × W , here the recovery is X ∗ = M 𝜽 ∗ ( Z ∗ 𝑜 × U ∗ × V ∗ × W ∗ ) . This optimization problem can besolved using an end-to-end neural network framework, as shown in Fig. 2. In this way, the input,that is common in all neural networks, is replaced with a custom layer with Z 𝑜 , U , V , W aslearnable parameters, which construct the low-rank Tucker representation of Z , then this tensor Z is refined with convolutional layers via M 𝜽 ( Z ) ; these optimization variables are representedby the first two blue-blocks in the Fig. 2. The final layer in the proposed method is a non-traininglayer which models the forward sensing operator H vect (M 𝜽 ( Z )) to obtain the compressivemeasurements 𝒚 as the output of the net. Therefore, the problem in (5) can be solved withstate-of-the-art deep learning optimization algorithm, such as, stochastic gradient descent. Oncethe parameters are optimized, the desired SI is recovered just before the non-trainable layerlabeled as "CSI system" in Fig. 2.
5. Simulation and Results
In this section, the performance of the proposed compressive spectral image reconstructionapproach is presented. The performance metrics used are the peak-signal-to-noise (PSNR) [5],the structural similarity (SSIM) [54], and the spectral angle mapping (SAM) [17]. PSNR andSSIM are calculated as the average of each 2D spatial image through the bands, and the SAMis the average of all spectral pixels. Four different tests are presented to validate the proposedmethod. The first test evaluates the importance of the low-rank tensor representation; the secondtest compares the recovery of the data-driven methods with the proposed method; the thirdevaluates the proposed method in different noisy scenarios and for a different number of shotsagainst the non-data dependent state-of-the-art algorithms; and finally, the proposed method isevaluated using two compressive spectral images obtained with a real test-bed implementation . This section evaluates the importance of the rank level in the 3D tensor using the Tuckerrepresentation, which is placed at the first block of our model, as illustrated in Fig. 2. Forthat, two spectral images with 𝑀 = × 𝑁 =
256 pixels, and 𝐿 =
10 spectral bands between400 and 700nm from [55] where chosen. Three different network architectures were tested as”Convolutional Layers” for the second block in the Figure 2. The first network architecture is asimple ResNet-based model [56], with a single skip connection and four convolutional layers, asshown in the Figure 3 with 2 .
150 parameters. The second architecture, also shown in Fig. 3, is aconvolution Autoencoder-based [57], with 8 .
160 training parameters, and six convolutional layers.The third architecture tested and depicted in FIg. 3, is a Unet-based [58], without drop-out layers,and, in the contracting part, the feature information is increased using multiples of 𝐿 =
10, i.e., 𝐿, 𝐿 and 3 𝐿 as is illustrated in Fig.3, resulting in 92 .
190 training parameters. This test is focusedon a single snapshot for a randomly coded aperture generated from a Bernoulli distribution withmean 0.5.As mentioned, the tensor feature Z ∈ R 𝑀 × 𝑁 × 𝐿 comes from a low-dimensional kernel Z 𝑜 ∈ R 𝑀 𝑝 × 𝑁 𝑝 × 𝐿 𝑝 ; then, to evaluate the importance of the rank-level in the Tucker representation,we establish the following relationship 𝑀 𝑝 𝑀 = 𝑁 𝑝 𝑁 = 𝐿 𝑝 𝐿 = 𝜌, (6)where 𝜌 ∈ ( , ] , is referred as the hyper-parameter rank factor . Furthermore, as the parametersof the problem in (5) are randomly initialized, we simulated five realizations. The averageresults for this 5 realizations are summarized in the Figure 4. Notice that for the three networkarchitectures and the two datasets, the rank factor is a crucial hyper-parameter to obtain a good The code can be find https://github.com/jorgebaccauis/Deep_Prior_Low_Rank ig. 3. Visual representation of the three network models used: U-Net-based,AutoencoderNet-based and ResNet-based. The color represents the different lay-ers in each network. reconstruction. In particular, for the DataSet 1 the optimal value is 𝜌 = { . , . , . } for theResNet-based, AutoeconderNet-based, and Unet-based, respectively; and 𝜌 = { . , . , . } forDataSet 2. Furthermore, notice that a small value of 𝜌 presents the worst case for all the networks.Also, notice all the network configurations obtain around 31 dB, which is the best-obtainedresults, for different 𝜌 values; however, the AutoencoderNet-based is more stable compared withthe other networks. This result shows the importance of the low-rank tensor representation in thefirst layer, where the optimal value changes for each dataset and each network architecture. Although the proposed method does not need data to work, this test compares its results withthe data-driven approaches to demonstrate the quality achieved. In particular, we use fivelearning-based methods for comparison: HSCNN [22], ISTA-Net [47], Autoencoder [26];HIR-DSSP [19] and DNU [28]. These methods where trained using the public ICVL [59],Harvard [60], and KAIST [26] hyperspectral image data-sets using their available codes; thesensing process was evaluated for a single snapshot, according to [28]. For the proposed method,the two network architectures were evaluated, i.e., AutoEnconder-Based, and UNet-based. Twotesting images of 512 ×
512 of spatial resolution and 31 spectral bands were chosen to evaluate ig. 4. PSNR Box plot for the different network architectures varying the rank factor 𝜌 ,with 5 run trials. the different methods, and the reconstruction results and ground truth are shown in Fig. 5. It canbe observed that the two variants of the proposed method outperform in visual and quantitativeresults to HSCNN, ISTA-Net, AutoEnconder, HIR-DSSP, up to ( / . / . ) in terms of(PSNR/SSIM/SAM),respectively, and show comparable/close results with respect to the DNUmethod, which is the best data-driven method. However, the proposed method has the advantagethat it does not require training data compared with the driven-data methods, i.e., only thecompressive measurements are available for the proposed approach. Numerical simulations were conducted to demonstrate the robustness of the proposed methodat different levels of additive Gaussian noise and the number of shots, using the two spectralimage obtained in [55]. It is well known that the data-driven methods distribution of trainingand test data must be similar to obtain good results, for this reason, in this experiment, theproposed method was compared with the state-of-art non-data driven methods. Specifically, wecompare the proposed method with the GPSR [29], using sparsity assumption in the WaveletKronecker Discrete Cosine transform implemented as in [8], ADMM [31] using low-rank prior / /0.023)DNU (32.57/0.991/ ) (32.88/0.992/0.022) (PSNR/SSIM/SAM) (27.30/0.977/0.052) (27.35/0.982/0.039) (28.22/0.980/0.040) ( / / ) (32.24/0.991/0.029)(33.19/ /0.023) DNU (33.04/0.995/0.024) Ground truth(PSNR/SSIM/SAM)
Fig. 5. Two reconstructed scenes using the 5 learning-based methods and the twovariations of the proposed method, i.e., (AutoEncoder, UNet)-Based. implemented as in [17], CSALSA [32] using the 3D total variation, PnP-ADMM [43] usingBM3D as denoiser, and Deep Image Prior [51] using the ResNet based network. Three differentnoise levels were evaluated: 20, 30 dB of signal to noise ratio (SNR) and noiseless case thatresults in ∞ dB. Further, the number of snapshots is varied between 1, 2, 3, and 4 snapshots usingthe CASSI system, expressed mathematically as in (2). For this experiment, the ResNet-basednetwork was used as the "Convolutional layers" in the proposed model, and the rank factor wasfixed as 𝜌 = . 𝜌 = . able 1. Mean performance comparison for the different recovery methods varying thenumber of snapshots and noise in SNR dB. Shots Noise Metrics GPSR
ADMM CSALSA
PnP
ADMM
DIP Prop.1 ∞ PSNR
SSIM
SAM PSNR
SSIM
SAM PSNR
SSIM
SAM ∞ PSNR
SSIM
SAM PSNR
SSIM
SAM PSNR
SSIM
SAM ∞ PSNR
SSIM
SAM PSNR
SSIM
SAM PSNR
SSIM
SAM ∞ PSNR
SSIM
SAM
PSNR
SSIM
SAM PSNR
SSIM
SAM comparison of the performance in terms of PSNR, SSIM, and SAM metrics, for the differentmethods (The results are the average of the two DataSet). Boldface indicates the best result foreach case, and the second-best result is underlined. From the Table 1, it can be seen that theproposed method outperforms in almost all cases the other methods. Furthermore, the proposedmethod shows better noise robustness compared to the other methods, since the maximum qualitydifference between the noise levels studied is 3 dB, compared to 3dB, 5dB, 5dB, 5dB, 10dB and3dB for the GPSR, ADMM, CSALSA, PnP -ADMM and Deep Image Prior (DIP), respectively.Additionally, as expected, when the number of snapshots per image increases, all the methods
PSNR/SSIM/SAM)(PSNR/SSIM/SAM)
Fig. 6. Two RGB false color reconstructed scenes using the non-data driven methodsand the proposed method with its respective metrics are presented. Additionally, theground-truth and a spectral point of each scene is shown. improve their reconstruction quality; in particular, the difference between 1 and 4 snapshots forthe noiseless case, is up to, 5dB, 6dB, 6dB, 10dB 5dB and 6dB for GPSR, ADMM, CSALSA,PnP-ADMM, DIP and the proposed method, respectively.To visualize the reconstructions and analyze the results in more detail, Figure 6 shows an RGBfalse color for the reconstruction of each method, for a single CASSI shot, which is the extremecase in terms of compression. Note, that the proposed method, in the zoomed insets, is much ig. 7. Testbed CASSI implementation where the relay lens focuses the encoded lightby the DMD into the sensor after dispersed by the prism. cleaner than its counterparts. Additionally, to see the behavior, a single spatial point of eachreconstruction for the two Datasets are also presented in Figure 6. It can be seen that the spectralsignatures obtained by the proposed method closely resemble the ground-truth.
This section evaluates the proposed method with real measurements acquired using a testbedimplementation. For this section, the ResNet-based model was used with ( 𝜌 = . 𝑒 −
3. Specifically, two different scenarios of compressed projections were assessed, whichare described as follows.
This scenario was carried out for one snapshot of the CASSI testbed laboratory implementationdepicted in Fig. 7. This setup contains a 100- 𝑛𝑚 objective lens, a high-speed digital micro-mirrordevice (DMD) (Texas Instruments-DLI4130), with a pixel size of 13 , 𝜇𝑚 , where the CA isimplemented, an Amici Prism (Shanghai Optics), and a CCD (AVT Stingray F-145B) camerawith spatial resolution 1388 × . 𝜇𝑚 . The CA spatial distribution forthe snapshot comes from blue noise patterns, i.e., this CA is designed according to [61]. Noticethat the robustness analysis summarized in Table 1, showed that the three best recovery methodswere the PnP-ADMM, DIP, and the proposed method; therefore, we decided also to comparethem using this real data.Figure 8 presents the RGB scene obtained with a traditional camera, and the false-colored RGBimages corresponding to reconstructed spectral images using the different solvers. Furthermore,the spectral responses of two particular spatial locations in the scene, indicated as red points inthe images, are also included and compared with the spectral behavior using a commerciallyavailable spectrometer (Ocean Optics USB2000+). The visual results show that the proposedmethod yield better spatial and spectral reconstruction since the RGB reconstructed is sharper inthe proposed scheme, and the spectral signatures are closer to those taken by the spectrometer,this is, the SAM of the normalized signatures obtained from the PnP-ADMM algorithm is 0.188,Deep Image Prior is 0.205, and the SAM associated to the proposed method is 0.120. Thesenumerical results validate the performance of the proposed method with real data for a realCASSI setup using a binary-coded aperture. roposedPnP-ADMMDIP P1P2
RGB Scene
Fig. 8. (Left) RGB visual representation of the scene obtained with the differentmethods, (Right), two spectral signatures of the recovered scenes.
The real data for this second test was provided by [62]. In particular, the main difference with thedata of Section 5.4.1 is that the spatial modulation is a Colored CA, where each pixel can beseen as a filter with its spectral response, (further details regarding Colored CA can be foundin [8, 62]). The optical elements in this testbed implementation were the same used in the
Fig. 9. (Top) RGB visual representation of the scene obtained with the GPSR methodused in [62] and the proposed method, (Bottom), normalized spectral signatures of therecovered scenes. revious setup, where the DMD was used to emulate the Colored CA. The coding and the scenewere implemented to have a spatial resolution of 256 ×
256 pixels and 𝐿 =
6. Conclusions
A method for reconstructing spectral images from the CSI measurements has been proposed.The proposed scheme is based on the fact that the spectral images can be generated froma convolutional network whose input features comes from a low-rank Tucker representation.Although the proposed method is based on a convolutional network framework, it does notrequire training data, only the compressed measurements. This method was evaluated inthree scenarios: noiseless, noisy, and real data implementation. In all of them, the proposedmethod outperforms the image quality reconstruction compared with state-of-the-art methods.In particular, the proposed method with 20 SNR levels of noise in the CSI measurementsoutperforms its counterparts in up to 4 dB in the PSNR measure. Although the proposed methodwas tested in two variation of CSI system, it can be extended and used in compressive systemswhere the data set is limited.
References
1. G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” J. biomedical optics , 010901 (2014).2. L. Zhang, L. Zhang, D. Tao, and X. Huang, “Tensor discriminative locality alignment for hyperspectral imagespectral–spatial feature extraction,” IEEE Transactions on Geosci. Remote. Sens. , 242–256 (2012).3. P. W. Yuen and M. Richardson, “An introduction to hyperspectral imaging and its application for security, surveillanceand target acquisition,” The Imaging Sci. J. , 241–253 (2010).4. C. Hinojosa, J. Bacca, and H. Arguello, “Coded aperture design for compressive spectral subspace clustering,” IEEEJ. Sel. Top. Signal Process. , 1589–1600 (2018).5. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging: Anintroduction,” IEEE Signal Process. Mag. , 105–115 (2014).6. X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin, and D. J. Brady, “Computational snapshot multispectralcameras: Toward dynamic capture of the spectral world,” IEEE Signal Process. Mag. , 95–108 (2016).7. C. V. Correa, C. Hinojosa, G. R. Arce, and H. Arguello, “Multiple snapshot colored compressive spectral imager,”Opt. Eng. , 041309 (2016).8. H. Arguello and G. R. Arce, “Colored coded aperture design by concentration of measure in compressive spectralimaging,” IEEE Transactions on Image Process. , 1896–1908 (2014).9. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectralimaging,” Appl. optics , B44–B51 (2008).10. M. Gehm, R. John, D. Brady, R. Willett, and T. Schulz, “Single-shot compressive spectral imaging with a dual-disperserarchitecture,” Opt. express , 14013–14027 (2007).11. S. Shauli, O. Yaniv, A. Marwan, A. Ibrahim, D. G. Blumberg, and A. Stern, “Dual-camera design for hyperspectraland panchromatic imaging, using a wedge shaped liquid crystal as a spectral multiplexer,” Sci. Reports (Nature Publ.Group) (2020).12. S. Zhang, L. Wang, Y. Fu, X. Zhong, and H. Huang, “Computational hyperspectral imaging based on dimension-discriminative low-rank tensor recovery,” in Proceedings of the IEEE International Conference on Computer Vision, (2019), pp. 10183–10192.13. D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshotspectral imagers,” Appl. Opt. , 6824–6833 (2010).4. L. Wang, Z. Xiong, D. Gao, G. Shi, and F. Wu, “Dual-camera design for coded aperture snapshot spectral imaging,”Appl. optics , 848–858 (2015).15. Y. Fu, Y. Zheng, I. Sato, and Y. Sato, “Exploiting spectral-spatial correlation for coded hyperspectral image restoration,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), pp. 3727–3736.16. L. Wang, Z. Xiong, G. Shi, F. Wu, and W. Zeng, “Adaptive nonlocal sparse representation for dual-camera compressivehyperspectral imaging,” IEEE transactions on pattern analysis machine intelligence , 2104–2111 (2016).17. J. Bacca, C. V. Correa, and H. Arguello, “Noniterative hyperspectral image reconstruction from compressive fusedmeasurements,” IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. , 1231–1239 (2019).18. T. Gelvez, H. Rueda, and H. Arguello, “Joint sparse and low rank recovery algorithm for compressive hyperspectralimaging,” Appl. optics , 6785–6795 (2017).19. L. Wang, C. Sun, Y. Fu, M. H. Kim, and H. Huang, “Hyperspectral image reconstruction using a deep spatial-spectralprior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 8032–8041.20. R. Hyder and M. S. Asif, “Generative models for low-rank video representation and reconstruction from compressivemeasurements,” in (IEEE, 2019), pp. 1–6.21. L. Wang, T. Zhang, Y. Fu, and H. Huang, “Hyperreconnet: Joint coded aperture optimization and image reconstructionfor compressive hyperspectral imaging,” IEEE Transactions on Image Process. , 2257–2270 (2018).22. Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu, “Hscnn: Cnn-based hyperspectral image recovery from spectrallyundersampled projections,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, (2017), pp. 518–525.23. X. Miao, X. Yuan, Y. Pu, and V. Athitsos, “ 𝜆 -net: Reconstruct hyperspectral images from a snapshot measurement,”in IEEE/CVF Conference on Computer Vision (ICCV), vol. 1 (2019).24. D. Gedalin, Y. Oiknine, and A. Stern, “Deepcubenet: reconstruction of spectrally compressive sensed hyperspectralimages with deep neural networks,” Opt. Express , 35811–35822 (2019).25. J. Bacca, L. Galvis, and H. Arguello, “Coupled deep learning coded aperture design for compressive imageclassification,” Opt. Express , 8528–8540 (2020).26. I. Choi, D. S. Jeon, G. Nam, D. Gutierrez, and M. H. Kim, “High-quality hyperspectral reconstruction using a spectralprior,” ACM Transactions on Graph. (TOG) , 1–13 (2017).27. T. Zhang, Y. Fu, L. Wang, and H. Huang, “Hyperspectral image reconstruction using deep external and internallearning,” in Proceedings of the IEEE International Conference on Computer Vision, (2019), pp. 8559–8568.28. L. Wang, C. Sun, M. Zhang, Y. Fu, and H. Huang, “Dnu: Deep non-local unrolling for computational spectralimaging,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp.1661–1671.29. M. A. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application tocompressed sensing and other inverse problems,” IEEE J. selected topics signal processing , 586–597 (2007).30. E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE signal processing magazine ,21–30 (2008).31. S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al. , “Distributed optimization and statistical learning via thealternating direction method of multipliers,” Foundations Trends Mach. learning , 1–122 (2011).32. M. V. Afonso, J. M. Bioucas-Dias, and M. A. Figueiredo, “An augmented lagrangian approach to the constrainedoptimization formulation of imaging inverse problems,” IEEE Transactions on Image Process. , 681–695 (2010).33. I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with asparsity constraint,” Commun. on Pure Appl. Math. A J. Issued by Courant Inst. Math. Sci. , 1413–1457 (2004).34. D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Natl.Acad. Sci. , 18914–18919 (2009).35. S. Yang, M. Wang, P. Li, L. Jin, B. Wu, and L. Jiao, “Compressive hyperspectral imaging via sparse tensor andnonlinear compressed sensing,” IEEE Transactions on Geosci. Remote. Sens. , 5943–5957 (2015).36. A. Mousavi, A. B. Patel, and R. G. Baraniuk, “A deep learning approach to structured signal recovery,” in (IEEE, 2015), pp. 1336–1343.37. A. Mousavi and R. G. Baraniuk, “Learning to invert: Signal recovery via deep convolutional networks,” in (IEEE, 2017), pp. 2272–2276.38. A. Dave, A. K. Vadathya, R. Subramanyam, R. Baburajan, and K. Mitra, “Solving inverse computational imagingproblems using deep pixel-level prior,” IEEE Transactions on Comput. Imaging , 37–51 (2018).39. H. Palangi, R. Ward, and L. Deng, “Distributed compressive sensing: A deep learning approach,” IEEE Transactionson Signal Process. , 4504–4518 (2016).40. H. Yao, F. Dai, S. Zhang, Y. Zhang, Q. Tian, and C. Xu, “Dr2-net: Deep residual reconstruction network for imagecompressive sensing,” Neurocomputing , 483–493 (2019).41. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “Reconnet: Non-iterative reconstruction of imagesfrom compressively sensed measurements,” in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, (2016), pp. 449–458.42. J. M. Bioucas-Dias and M. A. Figueiredo, “A new twist: Two-step iterative shrinkage/thresholding algorithms forimage restoration,” IEEE Transactions on Image processing , 2992–3004 (2007).43. X. Yuan, Y. Liu, J. Suo, and Q. Dai, “Plug-and-play algorithms for large-scale snapshot compressive imaging,” in roceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 1447–1457.44. S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play admm for image restoration: Fixed-point convergence andapplications,” IEEE Transactions on Comput. Imaging , 84–98 (2016).45. J. Rick Chang, C.-L. Li, B. Poczos, B. Vijaya Kumar, and A. C. Sankaranarayanan, “One network to solve themall–solving linear inverse problems using deep projection models,” in Proceedings of the IEEE InternationalConference on Computer Vision, (2017), pp. 5888–5897.46. C. Metzler, A. Mousavi, and R. Baraniuk, “Learned d-amp: Principled neural network based compressive imagerecovery,” in
Advances in Neural Information Processing Systems, (2017), pp. 1772–1783.47. J. Zhang and B. Ghanem, “Ista-net: Interpretable optimization-inspired deep network for image compressive sensing,”in
Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), pp. 1828–1837.48. J. Sun, H. Li, Z. Xu et al. , “Deep admm-net for compressive sensing mri,” in
Advances in neural informationprocessing systems, (2016), pp. 10–18.49. A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing using generative models,” arXiv preprintarXiv:1703.03208 (2017).50. Y. Wu, M. Rosca, and T. Lillicrap, “Deep compressed sensing,” arXiv preprint arXiv:1905.06723 (2019).51. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in
Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, (2018), pp. 9446–9454.52. Y. Wang, L. Lin, Q. Zhao, T. Yue, D. Meng, and Y. Leung, “Compressive sensing of hyperspectral images via jointtensor tucker decomposition and weighted total variation regularization,” IEEE Geosci. Remote. Sens. Lett. ,2457–2461 (2017).53. K. M. León-López and H. A. Fuentes, “Online tensor sparsifying transform based on temporal superpixels fromcompressive spectral video measurements,” IEEE Transactions on Image Process. , 5953–5963 (2020).54. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility tostructural similarity,” IEEE transactions on image processing , 600–612 (2004).55. M. Marquez, H. Rueda-Chacon, and H. Arguello, “Compressive spectral light field image reconstruction via onlinetensor representation,” IEEE Transactions on Image Process. , 3558–3568 (2020).56. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEEconference on computer vision and pattern recognition, (2016), pp. 770–778.57. J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical featureextraction,” in
International conference on artificial neural networks, (Springer, 2011), pp. 52–59.58. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in
International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp.234–241.59. B. Arad and O. Ben-Shahar, “Sparse recovery of hyperspectral signal from natural rgb images,” in
EuropeanConference on Computer Vision, (Springer, 2016), pp. 19–34.60. A. Chakrabarti and T. Zickler, “Statistics of real-world hyperspectral images,” in
CVPR 2011, (IEEE, 2011), pp.193–200.61. C. V. Correa, H. Arguello, and G. R. Arce, “Spatiotemporal blue noise coded aperture design for multi-shotcompressive spectral imaging,” JOSA A , 2312–2322 (2016).62. L. Galvis, E. Mojica, H. Arguello, and G. R. Arce, “Shifting colored coded aperture design for spectral imaging,”Appl. optics58