Hyperspectral Denoising Using Unsupervised Disentangled Spatio-Spectral Deep Priors
Yu-Chun Miao, Xi-Le Zhao, Xiao Fu, Jian-Li Wang, Yu-Bang Zheng
11 Hyperspectral Denoising Using UnsupervisedDisentangled Spatio-Spectral Deep Priors
Yu-Chun Miao, Xi-Le Zhao ∗ , Xiao Fu ∗ , Jian-Li Wang, and Yu-Bang Zheng Abstract —Image denoising is often empowered by accurateprior information. In recent years, data-driven neural networkpriors have shown promising performance for RGB naturalimage denoising. Compared to classic handcrafted priors (e.g.,sparsity and total variation), the “deep priors” are learned usinga large number of training samples—which can accurately modelthe complex image generating process. However, data-drivenpriors are hard to acquire for hyperspectral images (HSIs) dueto the lack of training data. A remedy is to use the so-called unsupervised deep image prior (DIP). Under the unsupervised DIPframework, it is hypothesized and empirically demonstrated that proper neural network structures are reasonable priors of certaintypes of images, and the network weights can be learned withouttraining data. Nonetheless, the most effective unsupervised DIPstructures were proposed for natural images instead of HSIs.The performance of unsupervised DIP-based HSI denoising islimited by a couple of serious challenges, namely, networkstructure design and network complexity. This work puts forth anunsupervised DIP framework that is based on the classic spatio-spectral decomposition of HSIs. Utilizing the so-called linearmixture model of HSIs, two types of unsupervised DIPs, i.e., U-Net-like network and fully-connected networks, are employed tomodel the abundance maps and endmembers contained in theHSIs, respectively. This way, empirically validated unsupervisedDIP structures for natural images can be easily incorporatedfor HSI denoising. Besides, the decomposition also substantiallyreduces network complexity. An efficient alternating optimiza-tion algorithm is proposed to handle the formulated denoisingproblem. Semi-real and real data experiments are employed toshowcase the effectiveness of the proposed approach.
Index Terms —Hyperspectral image denoising, unsuperviseddeep image prior, spatio-spectral decomposition
I. I
NTRODUCTION H YPERSPECTRAL images (HSIs) contain rich spectraland spatial information of areas/objects of interest. HSIshave been widely used across many disciplines, e.g., biology,ecology, geoscience, and food/medicine science [1]. However,the acquired HSIs are often corrupted by various types ofnoise. Heavy noise may affect the performance of downstream ∗ Corresponding authors. Tel.: +86 28 61831016, Fax: +86 28 61831280.This work is supported by NSFC (No. 61876203, 61772003), theApplied Basic Research Project of Sichuan Province (No. 21YYJC3042),the Key Project of Applied Basic Research in Sichuan Province (No.2020YJ0216), and National Key Research and Development Program of China(No. 2020YFA0714001). The work of X. Fu is supported by NSF ECCS-2024058 and NSF ECCS-1808159.Y.-C. Miao, X.-L. Zhao, J.-L. Wang, and Y.-B. Zheng are with theResearch Center for Image and Vision Computing, School of MathematicalSciences, University of Electronic Science and Technology of China, Chengdu611731, P.R.China (e-mails: [email protected]; [email protected];[email protected]; [email protected]).X. Fu is with the School of Electrical Engineering and ComputerScience, Oregon State University (OSU), Corvallis, OR 97331, United States(e-mail: [email protected]). analytical tasks (e.g., hyperspectral pixel classification). In thepast two decades, a plethora of HSI denoising techniques wereproposed to address this challenge; see [2]–[5].At a high level, the idea of many HSI denoising methodsis to fit the acquired image using an estimated image withprior information-induced priors. The rationale is that noisedoes not obey the HSI priors, and thus such a fitting processcan effectively extract the “clean” HSI from the noisy version.Under this principle, early HSI denoising methods used spatialpriors such as sparsity [6]–[8] and total variation (TV) [9].Methods that exploit spectral priors were also proposed; see[10]–[12]. A number of denoising methods incorporated with implicit priors such as low matrix/tensor rank that is a result ofmulti-dimensional correlations; some examples can be foundin [2]–[4], [13]–[18].More recently, data-driven priors have drawn much attentionin the vision and imaging communities [19]. In a nutshell, deepneural networks are used to learn a generative model of imagesfrom a large number of training samples. Deep generativemodels have been successful in computer vision, see, e.g.,[20]–[22]. In particular, these models are able to map low-dimensional random vectors to visually authentic images—which means that they capture the essence of the imagegenerating process. Hence, the learned generative network isnaturally a good prior of clean images. This idea has also beenused in HSI denoising; see, e.g., [23]–[27].Although the methods mentioned above have attained sat-isfactory results for HSI denoising, these models’ expressiveability is limited by the training data’s adversity and quantity.That is, there is a lack of training data for HSIs. This is becauseHSIs are, in general, much more costly to acquire relativeto natural RGB images. In addition, different hyperspectralsensors often admit largely diverse specifications (e.g., thefrequency band used, the spectral resolution, and the spatialresolution)—data acquired from one sensor may not be usefulfor training deep priors for images from other sensors.Recently, Ulyanov et al. proposed an unsupervised imagerestoration framework, namely, deep image prior (DIP) [28].DIP directly learns a generator network from a single noisyimage—instead of learning the generator from a large numberof training samples. The work in [28] showed that properdeep neural network architectures, without training on anysamples, can already “encode” much critical information inthe natural image generating process. This discovery hashelped design unsupervised
DIPs for tasks such as imagedenoising, inpainting, and super-resolution. This work has thusattracted much attention. Since the DIP approach does not useany training data, it is particularly suitable for data-starved a r X i v : . [ ee ss . I V ] F e b LRTFL0DIP2D selected frontal slices selected tube
Deep Spectral Prior Deep Spatial Prior
Band index P i x e l v a l u e OriginalLRTFL0
Band index P i x e l v a l u e OriginalDIP2D
Band index P i x e l v a l u e OriginalDS2DP
DS2DP S S S r c c c r Fig. 1. The LMM for HSI and the proposed unsupervised disentangled spatio-spectral deep priors (
DS2DP ). applications like hyperspectral imaging. Indeed, Sidorov et al. [29] extended the DIP idea to HSI denoising and observedpositive results.Nonetheless, capitalizing on the power of DIP for HSIdenoising still faces a series of challenges. Unlike RGB imagesthat only have three spectral channels, HSIs are often measuredover hundreds of spectral channels. Therefore, directly usingthe DIP method that is originally proposed for RGB imagesto handle HSIs may not be as promising. First, it is unclearif the network structures used in [28] are still effective forHSIs. Second, due to the large size of HSIs, the scalabilitychallenge is much more severe compared to the natural imagecases. Indeed, as one will see in Sec. V, the two neural networkstructures used in [29] for modeling the generator of a standardHSI induce 2.150 and 2.342 million parameters, respectively—which makes the learning process challenging. Third, due tothe special data acquisition process of HSIs, outlying pixelsand structured noise (other than Gaussian noise) often arise.The DIP denoising loss function used in [28], [29] did nottake these aspects into consideration. Contributions.
In this work, our interest lies in an unsuper-vised DIP-based denoising framework tailored for HSIs. Ourdetailed contributions are summarized as follows: • Disentangled Spatio-Spectral Deep Prior for HSI.
Wepropose an unsupervised DIP structure that is inspired bythe well-established linear mixture model (LMM) for HSIs[30]; see Fig. 1. The LMM views every hyperspectral pixelas a linear combination of spectral signatures of a number ofmaterials ( endmembers ). The linear combination coefficientsof different endmembers across the image give rise to the abundance maps (i.e., spatial distribution patterns) of theendmembers [31]. The LMM is effective in capturing thevast majority of information in HSIs (empirically, about 98%energy of typical HSI datasets can be explained by LMM [32]).Using LMM, the spatial and spectral information embeddedin the HSI can be “disentangled”. This way, the spectral andspatial priors can be designed and modeled individually . Thatis, one only needs to learn deep priors of all the endmembers (1D vectors) and abundance maps (2D images)—and thenumber of endmembers is often not large. As a result, themodeling and computational complexities can be substantiallyreduced—which often leads to improved accuracy. By ourdesign, empirically validated unsupervised DIP structures fornatural images can be much more easily capitalized for HSIdenoising. • Structured Noise-robust Optimization.
To handle struc-tured noise (e.g., stripe-shaped or deadlines), we propose atraining loss that models the structured noise as sparse outliers.We use an alternating optimization process to handle theformulated structured-noise robust deep prior-based denoisingmethod. The algorithm alternates between learning generativemodels of endmembers/abundance maps and structured-noiseidentification and removal, and both stages admit efficient andlightweight updates. • Extensive Experiments.
We test the proposed approachon a large variety of semi-real and real datasets. The experi-ments support our design—we observe substantially improveddenoising performance relative to classic methods and morerecent neural prior-based methods over all the datasets undertest. In particular, due to our disentangled network design, theproposed method outperforms the existing unsupervised DIP-based HSI denoising methods in [29] in terms of both accuracyand memory/computational efficiency.
Notation.
A scalar, a vector, a matrix, and a tensor are denotedas x , x , X , and X , respectively. [ x ] i , [ X ] i,j , and [ X ] i,j,k denote the i -th, ( i, j ) -th, and ( i, j, k ) -th element of x ∈ R I , X ∈ R I × J , and X ∈ R I × J × K , respectively. The Frobeniusnorms of X and X are denoted as (cid:107) X (cid:107) F = (cid:113)(cid:80) i,j [ X ] i,j and (cid:107) X (cid:107) F = (cid:113)(cid:80) i,j,k [ X ] i,j,k , respectively. Given y ∈ R N and amatrix X ∈ R I × J , the outer product is defined as X ◦ y . Inparticular, X ◦ y ∈ R I × J × N and [ X ◦ y ] i,j,n = [ X ] i,j [ y ] n . Thematrix unfolding operator for a tensor is defined as mat( X ) ,which denotes the mode-3 unfolding of X (see details of theunfolding of HSI in [33]). The vec( X ) operator represents vec( X ) = [[ X ] T : , , . . . , [ X ] T : ,J ] T . II. P
RELIMINARIES
In this section, we briefly review pertinent backgroundinformation.
A. HSI Denoising
The acquired HSIs are three-dimensional arrays (i.e., tensors[34]). Denote X ∈ R I × J × K as the HSI captured by aremotely deployed hyperspectral sensor, where I × J is thenumber of pixels presenting in the 2D spatial domain, and K is the number of spectral bands. Unlike natural images thatare measured with the R, G, and B channels (i.e., K = 3 ),HSIs are measured over tens or hundreds of frequency bands,depending on the specifications of the employed sensors.In general, X is a noise-contaminated version of the under-lying “clean” HSI (denoted by X (cid:92) ). There are many factorscontributing to noise in the hyperspectral acquisition process,i.e., thermal electronics, dark current, and stochastic error ofphoton-counting. If the noise is additive, we have X = X (cid:92) + V , (1)where V ∈ R I × J × K denotes the noise. The objective of HSIdenoising is to “extract” X (cid:92) from X . B. Prior-Regularization Based HSI Denoising
Note that even under the additive noise model in (1),this problem is ill-posed—this is essentially a disaggregationproblem which admits an infinite number of solutions. Toovercome such ambiguity, prior information of the HSI is usedto confine the solution space. A generic formulation can besummarized as follows: (cid:99) X = arg min M (cid:107) X − M (cid:107) F + λR ( M ) , (2a) subject to M ∈ M , (2b)where (cid:99) X denotes the estimate for X (cid:92) using the above estima-tor, M and R ( · ) : R I × J × K → R + are the constraint set andregularization function imposed according to prior knowledgeabout the clean image X (cid:92) , respectively, and λ ≥ is theregularization parameter that balances the data fidelity term(i.e., the first term in (2a)) and the regularization.
1) From Analytical Priors to Data-Driven Priors:
A vari-ety of regularization/constraints have been considered in theliterature. For example, in [2], [35], R ( · ) = (cid:107) · (cid:107) TV is the TV across the two spatial dimensions, since image dataexhibits certain slow changing properties over the space. In[36], [37], M represents the nonnegative orthant, since HSIsare always nonnegative. In [13], [38]–[40], low tensor andmatrix rank constraints are added to M through low-rankparameterization, respectively. Such prarameterization-basedregularization can be written as (cid:98) z = arg min z (cid:107) X − G ( z ) (cid:107) F , (3)where G : R N → R I × J × K is a pre-specified parameterizationfunction that represents the I × J × K HSI using N parameters, i.e., z . For example, if mat ( X ) is believed to be a low-rankmatrix, mat ( G ( z )) = AB T and z = [vec( A ) T , vec( B ) T ] T .After estimating the parameters z , the clean image can besimply estimated via (cid:99) X = G ( (cid:98) z ) . Classic priors are useful but often insufficient to capture thecomplex nature of the underlying structure of HSIs.A number of works used deep neural networks to parame-terize the regularization—i.e., these works use a deep neuralnetwork G θ ( · ) : R N → R I × J × K whose network weightsare collected in θ ∈ R D to act as the regularization in (2a)[23]–[27]. Instead of having an analytical expression, suchregularizers are “trained” using a large number of trainingsamples. As deep neural networks are universal functionapproximators, such learned “deep priors” are believed to beable to approximate complex generative processes of HSIs andthus are more effective priors for denoising. (cid:98) z = arg min z (cid:107) X − G θ ( z ) (cid:107) F , (4)However, unlike natural RGB images that have tens ofthousands of training samples for learning G θ , HSI (especiallyremotely sensed HSI) datasets are relatively rare due to theircostly acquisition process. Without a large amount of (diverse)HSIs, training such a regularizer may be out of reach.
2) Unsupervised Deep Image Prior:
Very recently, Ulyanov et al. proposed the so-called DIP [28] to circumvent thelack of training samples. The major discovery in [28] is thata proper neural network architecture (without knowing theneural network weights θ ) can already encode much priorinformation of images. As a result, tasks such as imagedenoising can be done by learning a neural network G θ ( z ) to fit X with a random but known z .With this idea, the denoising problem can be formulated asfollows: (cid:98) θ = arg min θ (cid:107) X − G θ ( z ) (cid:107) F , (5)and the denoised image can be estimated via (cid:99) X = G (cid:98) θ ( z ) . (6)The idea of DIP is quite different compared to the superviseddeep prior-based approaches such as those in [23]–[26] [cf.Eq. (4)]. In DIP, the network weights θ is learned from asingle degraded image in an unsupervised manner, and z isgiven instead of learned.At first glance, it may be surprising that an untrainedneural network can be used for image denoising (and alsoinpainting and super-resolution as revealed in [28]). The keyrationale behind this approach may be understood as follows:First, some carefully designed neural network structures (e.g.,convolutional neural network with proper modifications) areable to capture much information in the generating processof some types of images of interest. That is, not all neuralnetwork structures could work well for all types of images.Different structures may need to be carefully handcrafted fordifferent types of images. The handcrafted neural networkstructure is analogous to the handpicked priors such as the Encoder Decoder
Loss function
Update
Forward Backward
Deep Spatial Prior
Deep Spectral Prior
X S z C w Y Y r r rR r r F ( ) ( ) Y Y
Y X S z C w r r rR r r soft th _ ( ) ( ) /2 1 S z r r θθ ( ) C w r r ζζ ( ) S z r r θθ ( ) C w r r ζζ ( ) C r ζζ S r θθ w r z r Loss r Loss r skip connection Fig. 2. Illustration of the proposed
DS2DP . The generative networks C ζ r and S θ r are applied to capture the deep spectral prior of the spectral signaturesand the deep spatial prior of the abundance matrices, respectively. L norm, Tikhonov regularization, and TV regularization—which are also not learned from training samples. In theoriginal paper [28], the U-Net-like "hourglass" architecturewas shown to be powerful in natural RGB image restorationtasks under the DIP framework. In [29], various networkstructures (namely, DIP2D and DIP3D) were experimented forHSI denoising—and the results can be quite different, as onewill also see in Sec. IV. Second, in image restoration tasks,the degraded (noisy) X still contains much information in theunderlying image. Hence, the fitting loss in (5) also “forces”the G θ to faithfully capture the essential information in X .In particular, since G θ has a structured underlying generativeprocess (by construction), the learned G θ is more likely tocapture the “structured signal part” (i.e., the clean image X (cid:92) )in X other than the random noise part.Since the DIP procedure does not use any training examples,it is particularly attractive to data-starved applications suchas hyperspectral imaging. In addition, although it involvescareful structure handcrafting, DIP still inherits many goodproperties of neural networks, e.g., being capable of modelingcomplex generative processes. Consequently, it often exhibitsmore appealing image restoration performance compared toclassic regularizer/parameterization based methods (e.g., TVand low matrix/tensor rank); see [28], [29]. C. Challenges
The unsupervised DIP-based approaches are attractive sincethey are effective without using any training data. However,finding a proper network structure to serve as prior of HSIsand learning the corresponding θ is by no means a trivial task.A couple of notable new challenges that arise in the domainof hyperspectral imaging are as follows:
1) Challenge 1 - Network Structure:
Since HSIs are quitedifferent compared to natural RGB images (in terms ofsensors, sensing processes, resolutions, and frequency bandsused), directly using the neural network structure in [28] inhyperspectral imaging may not be best practice. The work in [29] proposed two structures crafted for this, but it isnot clear if these two structures are “optimal” due to thelack of extensive experiments. In fact, as we will show inSec. IV, these two unsupervised DIP structures are sometimesnot as promising as some classic models (e.g., low-ranktensor decomposition-based denoising) in terms of denoisingperformance. To capitalize on the power of unsupervised DIPfor HSI denoising, it is critical to design the structure of G θ so that it suits the nature of HSIs.
2) Challenge 2 - Network Size:
Another challenge thatarises in unsupervised DIP-based HSI denoising is that theHSIs are large-scale images due to the large number ofspectral bands contained in the pixels. Directly modeling thegenerative process of a large-scale 3D image (or a third-ordertensor) inevitably leads to an overly sized neural network G θ .Although the work in [29] employed a number of tricks fornetwork size reduction, the final constructions still yield a largenumber of network parameters. This leads to a computationallyheavy optimization problem [cf. Eq. (5)]. Since the problem isalready nonconvex and challenging, the excessive scale of theoptimization problem only makes the denoising procedure lessefficient. The challenging nature of numerical optimizationmay also affect the denoising performance since "bad" localminima may be easier to happen.III. P ROPOSED A PPROACH
To circumvent the challenges, we will leverage the well-established LMM of HSI to come up with our customizedunsupervised DIPs in the next section. To this end, we brieflyreview the main idea of LMM.
A. Linear Mixture Model of HSI
The LMM of X is as follows (when the noise is absent): X = R (cid:88) r =1 S r ◦ c r , (7) where S r ∈ R I × J and c r ∈ R K represent the r -th endmem-ber’s abundance map and the spectral signature, respectively,and R is the number of endmembers contained in the HSI.The LMM can also be expressed as [ X ] i,j,k = R (cid:88) r =1 [ S r ] i,j [ c r ] k ; see [1], [30]. Physically, it means that every pixel is anon-negative combination of the spectral signatures of theconstituting endmembers in the HSI. Note that S r ≥ , c r ≥ according to their physical meanings—and thus the modelin (7) is often related to non-negative matrix factorization(NMF) [41]. An illustration of the LMM can be found inFig. 1. The LMM model with a relatively small R can oftencapture around 98% of the energy of the HSI [32]. Hence,it is a reliable model for HSIs. Indeed, the LMM has beenutilized for a large variety of hyperspectral imaging tasks,e.g., hyperspectral unmixing [1], [31], [42]–[45], hyperspectralsuper-resolution [46], pansharpening [47], compression andrecovery [48], and denoising [49], just to name a few. In thiswork, we propose to use the LMM to help design unsupervisedDIP neural network structures and denoising algorithms. B. LMM-Aided Unsupervised DIP for HSI
Notably, the LMM disentangles the spectral and spatialinformation into two sets of latent factors, i.e., { S r } Rr =1 and { c r } Rr =1 . Our motivations for using the LMM representationto design unsupervised DIP for HSIs are as follows:First, the physical meaning of the latent factors entailsthe opportunity to employ known effective neural networkstructures of unsupervised DIP. The abundance matrix S r canbe understood as how the material r spreads over space. Thehypothesis is that the abundance maps exhibit similar proper-ties of natural images that focus on capturing and conveyingspatial information. Under this hypothesis, it is reasonableto use unsupervised DIP neural network structures that areknown to work well for natural images to model S r . Moreover,the c r vector can be understood as the spectral signatureof the r -th material, which is the variation of reflectance oremittance of material over different wavelengths. It is knownthat fully connected neural networks (FCNs) can approximatesuch relatively simple 1-D continuous smooth functions well.Second, by disentanglement and LMM, the model size ofthe HSI is substantially reduced. Instead of directly imposingunsupervised DIP on the whole HSI, we employ two types ofunsupervised DIPs (i.e., the deep spatial and spectral priors) tomodel abundance maps and spectral signatures, respectively.Since the number of endmembers is often not large, thecomputational complexity is substantially reduced.Following the above argument, we model the HSI using thefollowing: X = R (cid:88) r =1 S θ r ( z r ) ◦ C ζ r ( w r ) , (8) where S θ r ( · ) : R N a → R I × J is the unsupervised DIPneural network of the r -th endmember’s abundance map, and θ r collects all the corresponding network weights; similarly, C ζ r ( · ) : R N s → R K and ζ r denote the unsupervised DIP ofthe r -th endmember and its corresponding network weights,respectively; the vectors z r ∈ R N a and w r ∈ R N s are low-dimensional random vectors that are responsible for generatingthe r -th abundance map and endmember, respectively. Ourdetailed design for S θ r and C ζ r are as follows:
1) Unsupervised DIP for Abundance Maps:
As mentioned,the abundance maps capture the spatial information of thecorresponding materials. We propose to employ the U-Net-like“hourglass” architecture in [28] for modeling S θ r . Note thatthis network architecture was shown to be able to capture thespatial prior of nature images. The U-Net is an asymmetricautoencoder [50] with skip connections, whose structure isshown in Fig. 2.
2) Unsupervised DIP for Endmembers:
The endmembersare relatively simple to model—since they can be understoodas one-dimensional smooth functions. Hence, we employFCNs as the unsupervised DIP for C ζ r . We use FCNs withthree layers; also see Fig. 2.Besides the above unsupervised DIP design, in this work,we also take into consideration of impulsive noise and grosslycorrupted pixels (outliers) that often arise in HSIs. Unlike nat-ural images whose sensing environment can be well controlled,remotely sensed HSIs often suffer from heavily corruptedpixels or spectral bands due to various reasons; see [39], [40].If not accounted for, the HSI denoising performance could beseverely hindered by such noise. To this end, we consider anoisy data acquisition model as follows: X = R (cid:88) r =1 S θ r ( z r ) ◦ C ζ r ( w r ) (cid:124) (cid:123)(cid:122) (cid:125) X (cid:92) + Y + V , (9)where V represents ubiquitous noise, e.g., the Gaussian noise,and Y denotes the impulsive noise or outliers. Accordingly,We propose the following denoising criterion: arg min { θ r ,ζ r } Rr =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X − R (cid:88) r =1 S θ r ( z r ) ◦ C ζ r ( w r ) − Y (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F + λ (cid:107) Y (cid:107) , (10)where λ ≥ and (cid:107) Y (cid:107) = (cid:80) Ii =1 (cid:80) Jj =1 (cid:80) Kk =1 | [ Y ] i,j,k | isused for imposing the sparsity prior on Y , since outliershappen sparsely. C. Optimization Algorithm
Let us denote the objective function in (10) using thefollowing shorthand notation: arg min { θ r ,ζ r } Rr =1 , Y Loss (cid:16) { θ r , ζ r } Rr =1 , Y (cid:17) . (11) We propose the following algorithmic structure: { θ t +1 r , ζ t +1 r } Rr =1 ← arg (cid:103) min { θ r , ζ r } Rr =1 Loss (cid:0) { θ r , ζ r } Rr =1 , Y t (cid:1) (12) Y t +1 ← arg min Y Loss (cid:0) { θ t +1 r , ζ t +1 r } Rr =1 , Y (cid:1) , (13)where the superscript “ t ” is the iteration index. In (12), weuse (cid:103) min to denote inexact minimization since exactly solvingthe subproblem w.r.t. the network parameters may not bepossible—due to its large size and nonconvexity.
1) Solution for (12) : Note that the subproblem w.r.t. { θ r , ζ r } Rr =1 is nothing but a regression problem using neuralmodels. Hence, any off-the-shelf neural network optimizer canbe employed for updating { θ r , ζ r } Rr =1 . In this work, we usethe (sub-)gradient descent algorithm with momentum that hasbeen proven effective in complex network learning problems[51]: θ t +1 r ← θ tr − α t ∇ θ r Loss (cid:0) { θ r , ζ tr } Rr =1 , Y t (cid:1) (14a) ζ t +1 r ← ζ tr − α t ∇ ζ r Loss (cid:0) { θ tr , ζ r } Rr =1 , Y t (cid:1) , (14b)for all r = 1 , . . . , R . Note that the gradient w.r.t. θ r and ζ r can be computed by the standard back-propagation algorithm[52]. Here, α t is the step size (i.e., learning rate) of iteration t .There are multiple ways of determining α t . In this work, weuse the step size rule advocated in the Adam algorithm [51].
2) Solution for (13) : The subproblem (13) is convex—whose solution is the well-known soft-thresholding proximaloperator [53]. Hence, the update of Y can be expressed as Y t +1 = soft _ th λ/ (cid:32) X − R (cid:88) r =1 (cid:98) S t +1 r ◦ (cid:98) c t +1 r (cid:33) . (15)where (cid:98) S t +1 r = S θ t +1 r ( z r ) , (cid:98) c t +1 r = C ζ t +1 r ( w r ) and soft _ th λ/ ( · ) applies soft-thresholding to every entry ofits input, in which the entry-wise thresholding is defined as soft _ th δ ( x ) = sgn( x ) max( | x | − δ, . (16) Algorithm 1
DS2DP for HSI Denoising.
Input:
HSI X ∈ R I × J × K . sample random z r and w r from uniform distribution; for t = 1 to T do (repeat until convergence) (cid:98) S r = S θ t − r ( z r ) , (cid:98) c r = C ζ t − r ( w r ) ; update θ r , ζ r for all r ; using the Adam [51]; update Y according to (13); end for (cid:99) X = (cid:80) Rr =1 (cid:98) S r ◦ (cid:98) c r ; Output: denoised HSI (cid:99) X .The algorithm is summarized in Algorithm 1, which wename as the unsupervised disentangled spatio-spectral deep Since the ReLU activation functions used in the U-Net and the FCNare not differentiable at one point, the algorithm is subgradient based.Nonetheless, we use ∇ (usually for denoting gradient) to denote subgradientfor notation simplicity. prior ( DS2DP ) algorithm. The algorithm falls into the cate-gory of inexact block coordinate descent [54]. Under somerelatively mild conditions, the algorithm produces a solutionsequence that converges to a stationary point of the optimiza-tion problem in (10); see detailed discussions in [54].IV. E
XPERIMENTS
In this section, we use semi-real and real data to demonstratethe effectiveness of the proposed approach.
A. Baselines
To thoroughly evaluate the performance of
DS2DP , weimplemented five state-of-the-art methods as the baselines.These methods include two unsupervised methods, i.e., deepimage prior based on 2D convolution ( DIP2D ) [29] and deep image prior based on 3D convolution ( DIP3D ) [29], amatrix optimization-based method, i.e., hyperspectral imagerestoration using low-rank matrix recovery ( LRMR ) [38], andtwo tensor optimization-based methods, i.e.,
TV-regularizedlow-rank tensor decomposition ( LRTDTV ) [39] and hyperspec-tral restoration via L gradient regularized low-rank tensorfactorization ( LRTFL0 ) [40].For
DIP2D and
DIP3D , we set the maximum number ofiterations to be 6,000 and report the best results during theiterations. For
LRMR , LRTDTV , and
LRTFL0 , their parametersare set as suggested in [38]–[40]—with parameter fine-tuningeffort to uplift its performance in some cases. The experimentsof
DIP2D , DIP3D , and
DS2DP are executed using
Python on a computer with a six-core Intel(R) Core(TM) i7-9750HCPU @ 2.60GHz, 32.0 GB of RAM, and an NVIDIA GeForceRTX 2070 GPU. The experiments of
LRMR , LRTDTV , and
LRTFL0 are implemented in Matlab (2019a) on the samecomputer.
B. Semi-Real Data Experiments
Evaluation Metrics.
We adopt three frequently used evalua-tion metrics, namely, peak signal-to-noise ratio (PSNR), struc-ture similarity (SSIM), and spectral angle mapper (SAM) [40].Generally, better-restored denoising performance is reflectedby higher PSNR and SSIM values and lower SAM values.
Semi-Real Data.
For semi-real data, we use a number of HSIsto serve as our ground truth, which include Washington DCMall (WDC Mall) of size 256 × × of size 200 × ×
80 that is clipped into 192 × ×
80, and Pavia University of size 256 × ×
87. Themultispectral images (MSIs) in the CAVE dataset of size 256 × ×
31 are also used to serve as our clean data X (cid:92) . Scenarios.
We consider a series of scenarios with varioustypes of noise:
Case (Gaussian Noise) : In this basic scenario, the i.i.d. zero-mean Gaussian noise is added to all bands with the variance setto be 0.1. The signal-to-noise ratios (SNRs) (see definition in[55]) associated with different datasets can be found in TableII. One can see the noise levels in different datasets are similar.Note that the HSIs with SNR being 6dB to 8dB are consideredas severely corrupted data. http://lesun.weebly.com/hyperspectral-data-set.html TABLE IQ
UANTITATIVE COMPARISON OF THE DENOISING RESULTS BY DIFFERENT METHODS . T HE BEST
AND SECOND BEST VALUES ARE HIGHLIGHTED IN BOLDAND UNDERLINED , RESPECTIVELY . Case Case 1 Case 2 Case 3 Case 4 Case 5 Case 6Dataset Method PSNR SSIM SAM PSNR SSIM SAM PSNR SSIM SAM PSNR SSIM SAM PSNR SSIM SAM PSNR SSIM SAMWDC Mall DIP2D 30.408 0.871 0.122 26.540 0.770 0.163 24.043 0.708 0.228 22.679 0.678 0.271 23.366 0.696 0.227 21.759 0.594 0.282DIP3D * * * * * * * * * * * * * * * * * *LRMR 34.954 0.951 0.130 34.954 0.951 0.130 32.422 0.933 0.156 32.058 0.925 0.148 32.358 0.920 0.159 29.815 0.907 0.210LRTDTV 35.293 0.952 0.106 35.087 0.950 0.106 33.307 0.925 0.148 33.024 0.919 0.136 33.464 0.914 0.113 31.691 0.894 0.136LRTFL0 36.043 0.964 0.112 35.796 0.961 0.111 34.151 0.948 0.133 35.278 0.941 0.115 34.296 0.949 0.123 33.224 0.943 0.163DS2DP
Pavia Centre DIP2D 31.965 0.897 0.068 29.603 0.876 0.072 25.319 0.758 0.186 23.587 0.728 0.232 24.885 0.768 0.164 22.175 0.551 0.180DIP3D 26.969 0.694 0.075 26.338 0.691 0.078 25.421 0.651 0.094 23.445 0.637 0.104 24.173 0.672 0.091 23.039 0.627 0.131LRMR 33.293 0.926 0.090 33.293 0.926 0.090 30.398 0.816 0.052 32.398 0.916 0.142 31.409 0.901 0.106 24.667 0.742 0.724LRTDTV 33.511 0.921 0.095 33.608 0.923 0.065 31.465 0.901 0.104 33.096 0.903 0.147 31.415 0.881 0.104 31.882 0.894 0.101LRTFL0 33.833 0.923 0.088 33.310 0.935 0.089 31.751 0.917 0.096 32.756 0.927 0.089 32.676 0.928 0.090 32.003 0.920 0.101DS2DP
Pavia University DIP2D 33.103 0.852 0.107 25.818 0.770 0.177 25.157 0.727 0.223 24.047 0.714 0.269 24.024 0.719 0.283 21.549 0.574 0.382DIP3D 30.070 0.804 0.111 24.968 0.705 0.151 25.307 0.701 0.156 24.198 0.683 0.166 24.265 0.701 0.166 23.509 0.640 0.173LRMR 33.063 0.862 0.113 31.582 0.787 0.149 31.155 0.860 0.119 31.858 0.861 0.115 31.385 0.829 0.139 27.615 0.747 0.240LRTDTV 33.136 0.875 0.108 32.223 0.861 0.110 31.497 0.841 0.151 32.190 0.866 0.112 32.123 0.851 0.136 31.027 0.830 0.187LRTFL0 34.312 0.890 0.092 33.724 0.879 0.099 32.972 0.867 0.123 33.642 0.877 0.103 33.146 0.863 0.124 32.735 0.858 0.126DS2DP
CAVE DIP2D 29.643 0.636 0.339 23.839 0.589 0.421 23.204 0.562 0.449 21.955 0.526 0.506 22.416 0.538 0.484 22.416 0.539 0.484DIP3D 28.960 0.709 0.332 23.397 0.571 0.447 23.377 0.566 0.449 22.157 0.534 0.471 22.435 0.549 0.460 21.405 0.509 0.501LRMR 30.633 0.661 0.418 30.633 0.661 0.418 27.724 0.607 0.466 31.809 0.807 0.334 29.015 0.680 0.445 26.404 0.659 0.536LRTDTV 35.529 0.883 0.165 34.769 0.877 0.210 32.792 0.843 0.260 34.036 0.862 0.232 31.779 0.772 0.361 31.063 0.773 0.430LRTFL0 33.241 0.877 0.233 33.191 0.891 0.262 32.978 0.846 0.209 33.743 0.852 0.264 32.139 0.781 0.352 30.956 0.855 0.301DS2DP
Band index PS N R WDC Mall: Case 1
Band index PS N R WDC Mall: Case 2
Band index PS N R WDC Mall: Case 3
Band index PS N R WDC Mall: Case 4
Band index PS N R WDC Mall: Case 5
Band index PS N R WDC Mall: Case 6
Band index SS I M WDC Mall: Case 1
Band index SS I M WDC Mall: Case 2
Band index SS I M WDC Mall: Case 3
Band index SS I M WDC Mall: Case 4
50 100 150 191
Band index SS I M WDC Mall: Case 5 Band index SS I M WDC Mall: Case 6
Fig. 3. PSNR and SSIM values of all bands obtained by different methods on HSI WDC Mall under Cases 1-6.TABLE IIT HE SNR
OF THE DEGRADED IMAGES UNDER C ASE
Case (Gaussian Noise + Impulse Noise) : In this case, theGaussian noise in Case 1 in kept. We also additionally considerimpulse noise that often happens in real HSI analysis. Theimpulsive noise is also added to each band. Such noise isgenerated following the i.i.d. zero-mean Laplacian distributionwith the density parameter being 0.1. Observed DIP2D LRMR LRTDTV LRTFL0 DS2DP Ground truthFig. 4. Denoising results obtained by different methods. (From Left to Right) The observed image, the denoising results by
DIP2D , LRMR , LRTDTV , LRTFL0 , DS2DP (proposed), and the ground truth, respectively. The first two rows are the denoising results of the WDC Mall under Cases 4 and 6, respectively. Thesecond two rows are the denoising results of the Pavia Centre under Cases 4 and 6, respectively. The last two rows are the denoising results of the PaciaUniversity under Cases 4 and 6, respectively.
Case (Gaussian Noise + Impulse Noise + Deadlines) : Tomake the case more challenging, we include deadlines on topof Case 2; see Fig. 4 for illustration of deadlines. The deadlinesare generated by nullifying some selected pixels and bands. Weassume that the deadlines randomly affect 30% of the bands.Moreover, for each selected band, the number of deadlines israndomly generated from 10 to 15, and the spatial width ofthe deadlines is randomly selected from 1 to 3 pixels. Case (Gaussian Noise + Impulse Noise + Diagonal Stripes) :In this case, we replace the deadlines in Case 3 by diagonalstripes; see Fig. 4 for illustration. The the elements of thediagonal stripes are all ones, which is used to simulate theconstant brightness. As before, we assume that the stripesaffect 30% of the bands. Moreover, for each selected band,the number of diagonal stripes is randomly generated from 15 to 30. Case (Gaussian Noise + Impulse Noise + Vertical Stripes) :In this case, we use the setting as in Case 4, except that vertical(other than diagonal) stripes are added; see Fig. 4. For eachaffected band, the number of vertical stripes is randomly gen-erated from 10 to 15. In this case, the elements of each verticalstripe are set to a certain value randomly generated from therange of [0.6, 0.8], to diversify our simulated scenarios. Case (Gaussian Noise + Impulse Noise + Deadlines +Diagonal Stripes + Vertical Stripes) : To create an extrachallenging case, Gaussian noise, impulse noise, and deadlinesare added as in Case 3. Moreover, diagonal stripes and verticalstripes are added as in Case 4 and Case 5, respectively. Parameter Setting. In DS2DP , there are two parameters tobe manually tuned, namely, λ and R . For the parameter Observed DIP2D LRMR LRTDTV LRTFL0 DS2DP Ground truthFig. 5. Denoising results obtained by different methods under Case 6. (From Top to Bottom) The band 4 in Beads, the band 4 in Pompoms, and the band31 in Flowers, respectively. (From Left to Right) The observed image, the denoising results of
DIP2D , LRMR , LRTDTV , LRTFL0 , DS2DP , and the groundtruth, respectively.DIP2D DIP3D LRMR LRTDTV LRTFL0 DS2DP
Band index P i x e l v a l u e Beads: Case 6
OriginalDIP2D
Band index P i x e l v a l u e Beads: Case 6
OriginalDIP3D
Band index P i x e l v a l u e Beads: Case 6
OriginalLRMR
Band index P i x e l v a l u e Beads: Case 6
OriginalLRTDTV
Band index P i x e l v a l u e Beads: Case 6
OriginalLRTFL0
Band index P i x e l v a l u e Pompoms: Case 6
OriginalDS2DP
Band index P i x e l v a l u e Flowers: Case 6
OriginalDIP2D
Band index P i x e l v a l u e Flowers: Case 6
OriginalDIP3D
Band index P i x e l v a l u e Flowers: Case 6
OriginalLRMR
Band index P i x e l v a l u e Flowers: Case 6
OriginalLRTDTV
Band index P i x e l v a l u e Flowers: Case 6
OriginalLRTFL0
Band index P i x e l v a l u e Flowers: Case 6
OriginalDS2DP
Band index P i x e l v a l u e Pompoms: Case 6
OriginalDIP2D
Band index P i x e l v a l u e Pompoms: Case 6
OriginalDIP3D
Band index P i x e l v a l u e Pompoms: Case 6
OriginalLRMR Band index P i x e l v a l u e Pompoms: Case 6
OriginalLRTDTV
Band index P i x e l v a l u e Pompoms: Case 6
OriginalLRTFL0 Band index P i x e l v a l u e Pompoms: Case 6
OriginalDS2DP Fig. 6. Spectral curves of the denoising results by different compared methods under Case 6. (From Left to Right) The results by
DIP2D , DIP3D , LRMR , LRTDTV , LRTFL0 , and
DS2DP , respectively. (From Top to Bottom) The results of the MSI Beads, Flowers, and Pompoms, respectively. λ , we generally set it as i × j ( i = 2 , , j = − , − , − , − , − under Cases 1-6. Regarding the param-eter R , which is the number of endmembers in the HSI andcan be determined by many existing algorithms, e.g., [32]. Quantitative Comparison.
Table I lists the quantitative com-parisons of the competing methods in Cases 1-6. The symbol“*” in Table I means that the corresponding methods haveexhausted the computational resources (memory or time) butstill could not produce sensible results. For the CAVE dataset,we report the averaged evaluation results from 32 images.From Table I, it is easy to see that
DS2DP outperforms thestate-of-the-art approaches in most cases, in terms of PSNR,SSIM, and SAM. For example, in Case 1,
DS2DP achievesaround 1.4 dB gain in PSNR compared to the second-best method (
LRTFL0 ) on Pavia Centre. In Case 5, when theclean image is corrupted by Gaussian noise, impulse noise,and vertical stripes, the proposed method also achieves around1.2 dB gain in PSNR against the same second-best method(
LRTFL0 ).To test our method’s performance on every band, eachband’s PSNR and SSIM values on WDC Mall in Cases 1-6are shown in Fig. 3. As observed,
DS2DP achieves the highestSSIM and PSNR values on most bands in all cases.
Visual Comparison.
Figs. 4 and 5 show denoising resultsof HSIs and MSIs by different methods, respectively. Thelow-rank matrix model based approach
LRMR cannot effec-tively remove the stripes and deadlines. Additionally,
LRTDTV achieves noise removal in partial bands but fails to remove Observed DIP2D LRMR LRTDTV LRTFL0 DS2DPFig. 7. Denoising results by different methods of Urban dataset and Pavia University dataset. (From Top to Bottom) The band 203 in the Urban datasetand the band 132 in Pavia University dataset, respectively. (From Left to Right) The observed image, the results of
DIP2D , LRMR , LRTDTV , LRTFL0 , and
DS2DP , respectively. the stripes and deadlines in all bands. Besides,
LRTFL0 removes almost all of the noise but fails to capture the detailedinformation. Although there is some residual structured noiseremaining in the result produced by
DS2DP , the overall visualperception largely outperforms the baselines. We conjecturethat such performance boost is mainly due to the deep spatialprior’s ability to preserve the local spatial details—empoweredby the expressiveness of appropriately crafted neural networkstructures.Fig. 6 visualizes the denoising results of the algorithmsin the spectral domain. One can see that, among all algo-rithms, the
DS2DP -produced spectral signatures (on randomlyselected pixel) also exhibit the highest visual similarity withthose from the ground-truth image. This is consistent with itsgood performance in the spatial domain.
C. Real Data Experiments
For real-data experiments, we choose two real-world HSIdatasets to test the real noise removal, i.e., the Urban datasetand the Pavia University dataset. More precisely, the sizeof Urban dataset is 307 × × × × DS2SP ,the parameters R is set as as 3 and 2 for Urban and PaviaUniversity respectively. λ is set as 0.01 for both real datasets.The denoising results of the Urban dataset and the PaviaUniversity dataset are shown in Fig. 7. One can see that allalgorithms offer reasonable results on the Urban data, perhapsbecause the data is not severely corrupted. Nevertheless, theproposed method produces the visually sharpest results. Inparticular, in the zoomed-in area, one can see that the proposedmethod’s result does not have horizontal stripes, while suchstripes still appear in results given by most of the baselines.For the Pavia University dataset, since the selected bandwas severely damaged by sparse noise, the denoising task isparticularly challenging. One can see that traditional methodscan hardly produce satisfactory results. Nonetheless, DS2DP removes almost all of the noise—with the price of blurring the image to a certain extent—and offers the most visuallypleasing result. V. F
URTHER D ISCUSSIONS
A. Analysis of Algorithm Complexity
In this part, we analyze the algorithm complexity of theproposed method on HSI WDC Mall and MSI Superballsunder Case 6.
DIP2D and
DIP3D are selected as the baselinemodels since they stand for the unsupervised HSI denoisingmodels. For
DIP2D and
DIP3D , we select the networkstructure with the best performance according to the originalimplementation.For a fair comparison, the network structure utilized in
DS2DP , which is expected to capture the spatial prior informa-tion, is simply designed as U-Net-like “hourglass" architecture.Moreover, we do not focus on meticulous designs on reducingthe model scale in this work, i.e., depth-wise separable con-volution, model pruning, and model compression [56]. Thesetechniques may be used to reduce the network complexity ofall methods (including ours), but this is beyond the scope ofthis work. Table III lists the scale of parameters of differentmethods on HSI WDC Mall and MSI Superballs. Moreover,the corresponding values of PSNR and SSIM are also reportedin III.As shown in Table III, the proposed
DS2DP achievessignificantly better performance with roughly equal parameterscompared with the baseline models. More precisely,
DS2DP outperforms
DIP2D by 12.593 dB and 14.136 dB in terms ofPSNR on HSI WDC Mall and MSI Superballs, respectively.
DS2DP achieves performance gains over
DIP3D with about14 dB on MSI Superballs.In our original implementation, to push
DS2DP to attain the(empirically) achievable “best” performance, we use severalparallel networks with the same architecture to generate theabundance maps. To reduce the number of the parameters, welet the parameters be shared between several parallel networks.This method is denoted by
DS2DP * and its performance is also TABLE IIIT
HE RELEVANT INDICATORS OF
DIP3D, DIP2D,
AND
DS2DP ON HSIWDC M
ALL AND
MSI S
UPERBALLS UNDER CASES
6. T HE BEST
ANDSECOND BEST VALUES ARE HIGHLIGHTED IN BOLD AND UNDERLINED , RESPECTIVELY .Data Methods Params PSNR SSIMHSI: WDC Mall(256 × × DS2DP* × ×
32) DIP3D 6.275M 20.705 0.399DIP2D 2.138M 20.901 0.408DS2DP 2.150M
DS2DP* shown in Table III. This way, the parameter amount reducesby 3/4, while the PSNR is essentially unaffected.
B. Effectiveness of the Deep Spectral and Spatial Priors
In this part, we take a deeper look at the deep spectraland spatial priors in
DS2DP . To verify these two priors’effectiveness, we conduct ablation studies under Case 6 usingthe WDC Mall data. The impacts of our designed priors inspectral and spatial domains are shown in Fig. 8 and Fig. 9,respectively.
Band index P i x e l v a l u e OriginalDS2DP without deep spectral prior
Band index P i x e l v a l u e OriginalDS2DP without deep spatial prior
Band index P i x e l v a l u e OriginalDS2DP (a) (b) (c)Fig. 8. Effectiveness of the deep prior in the spectral domain. The red curve isthe ground truth of a selected pixel for illustration. The blue curves correspondto: (a) the estimated spectrum by
DS2DP without deep spectral prior; (b) theestimated spectrum by
DS2DP without deep spatial prior; and (c) the proposed
DS2DP . Fig. 8 (a) shows that when only employing deep spatialprior in
DS2DP without the deep spectral prior), the estimatedspectrum of the selected pixel is not accurate. In contrast, whenconsidering both types of priors
DS2DP , the results becomemuch more promising; see (c). Besides,
DS2DP without thedeep spatial prior and the complete
DS2DP both achievesatisfactory performance on most bands. This supports ouridea for disentangling the spatial and spectral information andmodeling them individually.Fig. 9 shows similar effects in the spatial domain. Onecan see that there is obviously visible noise in the resultswhen only employing the deep spectral prior. However, whenconsidering the two priors, the performance is clearly muchmore visually pleasing. In addition, Fig. 9 (c) also clearlydemonstrates the disentanglement between the spatial andspectral effects.
C. Effectiveness of the Sparsity Regularization
To verify the sparsity regularization’s effect, we design acomparative experiment, also using Case 6 and the WDC Mall (a) (b)(c) (d)Fig. 9. Effectiveness of deep prior in the spatial domain. The four figurescorrespond to: (a) the denoising results by
DS2DP without deep spectralprior; (b) the denoising results by
DS2DP without deep spatial prior; (c) thedenoising results by
DS2DP ; and (d) the observed image.Fig. 10. The history of PSNR values and the corresponding denoising resultsby
DS2DP with and without sparsity regularization. data. The result is shown in Fig. 10. One can see that when thesparsity regularization is not applied, the PSNR first increasesand then declines slowly as the number of iterations increases.In contrast, when sparsity regularization is employed, thePSNR maintains an upward trend during the iterations—andeventually exhibits a big PSNR improvement relative to theformer case.Fig. 10 also shows the visualization of the algorithm withand without the sparsity regularization in the 1,000th iteration.One can see that the proposed method produces a relativelyclean image, which clearly shows advantage over the casewithout the L term. D. Sensitivity Analysis of the Parameters R and λ In this subsection, we conduct an empirical sensitivityanalysis of the parameters R and λ , using the WDC Malldata and Case 5. As previously illustrated, R is related tothe number of prominent materials in the HSI [1]. For HSIWDC Mall, we set it as 5 in our experiment. Fig. 11 (left) presents the PSNR values by DS2DP with different R valuesunder Case 5. One can see that the PSNR peaks at 5, whichmeans that there are 5 prominent endmembers in this particularHSI. In practice, R may be estimated by many existing R -estimation methods for HSIs, e.g., [31], [32]. PS N R -6 -5 -4 -3 -2 -1 PS N R Fig. 11. Sensitivity analysis of R and λ on HSI WDC Mall in Case 5. Fig. 11 (right) shows the PNSRs under various λ . One canobserved that the PSNR peaks at λ = 0.01. This makes sense,showing that there is a balance between data fitting and sparseregularization that one needs to strike. E. Impact of the Random Input to
DS2DP
As illustrated previously, the input of our proposed
DS2DP is random but known noise sampled from a uniform distri-bution. One may wonder if the input z r has a significantimpact on results? The answer is negative. We show this bycalculating the means and standard deviations of the algorithmoutputs’ PSNR under Cases 1-6 on WDC Mall. For eachcase, we run ten trials with different z r ’s that are randomlygenerated from U (-0.05, 0.05), where U stands for uniformdistribution. The results are shown in Table IV. One can seethat, perhaps a bit surprisingly, the standard deviations of theresults are fairly small—which means the method is essentiallynot affected by the random input to a good extent. TABLE IVT
HE DENOISING RESULTS ’ PSNR
VALUES (M EAN ± S TD .D EV ) UNDER C ASES ON WDC M
ALL
Case Case 1 Case 2 Case 3PSNR 36.213 ± ± ± ± ± ± VI. C
ONCLUSIONS
We proposed an unsupervised deep prior-based HSI denois-ing framework. Unlike existing methods that directly learnsdeep generative networks for the entire HSI, our methodleverages the classic LMM to disentangle the spatial andspectral information, and learns two types of deep priors forthe abundance maps and the spectral signatures of the end-members, respectively. Our design is driven by the challengesthat network structures used in deep priors for different typeof images (in particular, HSIs) may be hard to search. Usingour information-disentangled framework, empirically validatedunsupervised deep image prior structures for natural images can be easily incorporated for HSI denoising. Besides, thenetwork complexity can be substantially reduced with properparameter sharing, making the learning process more afforablethan existing approaches. We also proposed a structurednoise-robust optimization criterion that is tailored for HSIdenoising. We tested our method using extensive experimentswith various cases and ablation studies. The numerical resultsdemonstrated promising HSI denoising performance of theproposed approach. R
EFERENCES[1] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader,and J. Chanussot, “Hyperspectral unmixing overview: Geometrical,statistical, and sparse regression-based approaches,”
IEEE J. Sel. TopicsAppl. Earth Observ. Remote Sens. , vol. 5, no. 2, pp. 354–379, 2012.[2] W. He, H. Zhang, L. Zhang, and H. Shen, “Total-variation-regularizedlow-rank matrix factorization for hyperspectral image restoration,”
IEEETrans. Geosci. Remote Sens. , vol. 54, no. 1, pp. 178–188, 2016.[3] Y. Wang, J. Peng, Q. Zhao, Y. Leung, X. Zhao, and D. Meng, “Hyper-spectral image restoration via total variation regularized low-rank tensordecomposition,”
IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. ,vol. 11, no. 4, pp. 1227–1243, 2018.[4] F. Xiong, J. Zhou, and Y. Qian, “Hyperspectral restoration via l _ gradient regularized low-rank tensor factorization,” IEEE Trans. Geosci.Remote Sens. , vol. 57, no. 12, pp. 10410–10425, 2019.[5] L. Zhuang and M. K. Ng, “Hyperspectral mixed noise removal by (cid:96) -norm-based subspace representation,” IEEE J. Sel. Top. Appl. EarthObserv. Remote Sens. , vol. 13, pp. 1143–1157, 2020.[6] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoisingby sparse 3-D transform-domain collaborative filtering,”
IEEE Trans.Image Process. , vol. 16, no. 8, pp. 2080–2095, 2007.[7] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-localsparse models for image restoration,” in
Proc. IEEE Int. Conf. Comput.Vis. , pp. 2272–2279, 2009.[8] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparserepresentation for image restoration,”
IEEE Trans. Image Process. ,vol. 22, no. 4, pp. 1620–1630, 2012.[9] Y. Chen, J. Li, and Y. Zhou, “Hyperspectral image denoising by totalvariation-regularized bilinear factorization,”
Signal Process. , vol. 174,p. 107645, 2020.[10] G. Chen and S. Qian, “Denoising of hyperspectral imagery using prin-cipal component analysis and wavelet shrinkage,”
IEEE Trans. Geosci.Remote Sens. , vol. 49, no. 3, pp. 973–980, 2011.[11] P. Zhong and R. Wang, “Multiple-spectral-band CRFs for denoising junkbands of hyperspectral imagery,”
IEEE Trans. Geosci. Remote Sens. ,vol. 51, no. 4, pp. 2260–2275, 2013.[12] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi, “Nonlocaltransform-domain filter for volumetric data denoising and reconstruc-tion,”
IEEE Trans. Image Process. , vol. 22, no. 1, pp. 119–133, 2013.[13] Y. Chen, X. Cao, Q. Zhao, D. Meng, and Z. Xu, “Denoising hyperspec-tral image with non-i.i.d. noise structure,”
IEEE Trans. Cybern. , vol. 48,no. 3, pp. 1054–1066, 2018.[14] Y. Chang, L. Yan, X. L. Zhao, H. Fang, Z. Zhang, and S. Zhong,“Weighted low-rank tensor recovery for hyperspectral image restora-tion,”
IEEE Trans. Cybern. , vol. 50, no. 11, pp. 4558–4572, 2020.[15] Y. Chen, Y. Guo, Y. Wang, D. Wang, C. Peng, and G. He, “Denoising ofhyperspectral images using nonconvex low rank matrix approximation,”
IEEE Trans. Geosci. Remote Sens. , vol. 55, no. 9, pp. 5366–5380, 2017.[16] F. Xu, Y. Chen, C. Peng, Y. Wang, X. Liu, and G. He, “Denoising ofhyperspectral image using low-rank matrix factorization,”
IEEE Geosci.Remote Sens. Lett. , vol. 14, no. 7, pp. 1141–1145, 2017.[17] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectralimage restoration using low-rank matrix recovery,”
IEEE Trans. Geosci.Remote Sens. , vol. 52, no. 8, pp. 4729–4743, 2014.[18] L. Zhuang and J. M. Bioucas-Dias, “Hyperspectral image denoisingbased on global and non-local low-rank factorizations,” in
Proc. Int.Conf. Image Process. , pp. 1900–1904, 2017.[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recog. , 2016.[20] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” in
Proc. Int. Conf. Learn. Representations , 2014. [21] W. Wang, Y. Huang, Y. Wang, and L. Wang, “Generalized autoencoder:A neural network framework for dimensionality reduction,” in Proc.IEEE Conf. Comput. Vis. Pattern Recog. , pp. 496–503, 2014.[22] Z. Wang, Q. She, and T. E. Ward, “Generative adversarial net-works in computer vision: A survey and taxonomy,” arXiv preprintarXiv:1906.01529 , 2019.[23] Q. Yuan, Q. Zhang, J. Li, H. Shen, and L. Zhang, “Hyperspectral imagedenoising employing a spatial—spectral deep residual convolutionalneural network,”
IEEE Trans. Geosci. Remote Sens. , vol. 57, no. 2,pp. 1205–1218, 2019.[24] W. Dong, H. Wang, F. Wu, G. Shi, and X. Li, “Deep spatial–spectralrepresentation learning for hyperspectral image denoising,”
IEEE Trans.Comput. Imag. , vol. 5, no. 4, pp. 635–648, 2019.[25] Q. Yuan, Y. Wei, X. Meng, H. Shen, and L. Zhang, “A multiscale andmultidepth convolutional neural network for remote sensing imagerypan-sharpening,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. ,vol. 11, no. 3, pp. 978–989, 2018.[26] Q. Zhang, Q. Yuan, J. Li, X. Liu, H. Shen, and L. Zhang, “Hybridnoise removal in hyperspectral imagery with a spatial–spectral gradientnetwork,”
IEEE Trans. Geosci. Remote Sens. , vol. 57, no. 10, pp. 7317–7329, 2019.[27] Y. Chang, M. Chen, L. Yan, X. Zhao, Y. Li, and S. Zhong, “Towarduniversal stripe removal via wavelet-based deep convolutional neuralnetwork,”
IEEE Trans. Geosci. Remote Sens. , vol. 58, no. 4, pp. 2880–2897, 2020.[28] V. Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in
Proc.IEEE Conf. Comput. Vis. Pattern Recog. , pp. 9446–9454, 2018.[29] O. Sidorov and J. Y. Hardeberg, “Deep hyperspectral prior: Single-image denoising, inpainting, super-resolution,” in
Proc. IEEE Int. Conf.Comput. Vis. , pp. 3844–3851, 2019.[30] W.-K. Ma, J. M. Bioucas-Dias, T.-H. Chan, N. Gillis, P. Gader, A. J.Plaza, A. Ambikapathi, and C.-Y. Chi, “A signal processing perspectiveon hyperspectral unmixing: Insights from remote sensing,”
IEEE SignalProcess. Mag. , vol. 31, no. 1, pp. 67–81, 2014.[31] X. Fu, W.-K. Ma, J. M. Bioucas-Dias, and T.-H. Chan, “Semiblindhyperspectral unmixing in the presence of spectral library mismatches,”
IEEE Trans. Geosci. Remote Sens. , vol. 54, no. 9, pp. 5171–5184, 2016.[32] J. M. Bioucas-Dias and J. M. P. Nascimento, “Hyperspectral subspaceidentification,”
IEEE Trans. Geosci. Remote Sens. , vol. 46, no. 8,pp. 2435–2445, 2008.[33] C. I. Kanatsoulis, X. Fu, N. D. Sidiropoulos, and W.-K. Ma, “Hyperspec-tral super-resolution: A coupled tensor factorization approach,”
IEEETrans. Signal Process. , vol. 66, no. 24, pp. 6503–6517, 2018.[34] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalex-akis, and C. Faloutsos, “Tensor decomposition for signal processingand machine learning,”
IEEE Trans. Signal Process. , vol. 65, no. 13,pp. 3551–3582, 2017.[35] J. Liu, Y. Sun, X. Xu, and U. S. Kamilov, “Image restoration using totalvariation regularized deep image prior,” in
Proc. IEEE Int. Conf. Acoust.Speech Signal Process. , pp. 7715–7719, 2019.[36] M. Ye, Y. Qian, and J. Zhou, “Multitask sparse nonnegative matrixfactorization for joint spectral–spatial hyperspectral imagery denoising,”
IEEE Trans. Geosci. Remote Sens. , vol. 53, no. 5, pp. 2621–2639, 2014.[37] M. A. Veganzones, J. E. Cohen, R. C. Farias, J. Chanussot, andP. Comon, “Nonnegative tensor CP decomposition of hyperspectraldata,”
IEEE Trans. Geosci. Remote Sens. , vol. 54, no. 5, pp. 2577–2588,2015.[38] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectralimage restoration using low-rank matrix recovery,”
IEEE Trans. Geosci.Remote Sens. , vol. 52, no. 8, pp. 4729–4743, 2014.[39] Y. Wang, J. Peng, Q. Zhao, Y. Leung, X. Zhao, and D. Meng, “Hyper-spectral image restoration via total variation regularized low-rank tensordecomposition,”
IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. ,vol. 11, no. 4, pp. 1227–1243, 2018.[40] F. Xiong, J. Zhou, and Y. Qian, “Hyperspectral restoration via (cid:96) gradient regularized low-rank tensor factorization,” IEEE Trans. Geosci.Remote Sens. , vol. 57, no. 12, pp. 10410–10425, 2019.[41] E. Wycoff, T.-H. Chan, K. Jia, W.-K. Ma, and Y. Ma, “A non-negativesparse promoting algorithm for high resolution hyperspectral imaging,”in
Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 1409–1413,2013.[42] N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled nonnegative matrixfactorization unmixing for hyperspectral and multispectral data fusion,”
IEEE Trans. Geosci. Remote Sens. , vol. 50, no. 2, pp. 528–537, 2012.[43] H. K. Aggarwal and A. Majumdar, “Hyperspectral unmixing in thepresence of mixed noise using joint-sparsity and total variation,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. , vol. 9, no. 9, pp. 4257–4266, 2016.[44] Y. Qian, F. Xiong, S. Zeng, J. Zhou, and Y. Y. Tang, “Matrix-vectornonnegative tensor factorization for blind unmixing of hyperspectralimagery,”
IEEE Trans. Geosci. Remote Sens. , vol. 55, no. 3, pp. 1776–1792, 2017.[45] F. Xiong, Y. Qian, J. Zhou, and Y.-Y. Tang, “Hyperspectral unmixingvia total variation regularized nonnegative tensor factorization,”
IEEETrans. Geosci. Remote Sens. , vol. 57, no. 4, pp. 2341–2357, 2019.[46] C. Lanaras, E. Baltsavias, and K. Schindler, “Hyperspectral super-resolution by coupled spectral unmixing,” in
Proc. IEEE Int. Conf.Comput. Vis. , pp. 3586–3594, 2015.[47] L. Loncan, J. Chanussot, S. Fabre, and X. Briottet, “Hyperspectral pan-sharpening based on unmixing techniques,” in
Workshop HyperspectralImage Signal Proces.: Evol. Remote Sens. , pp. 1–4, 2015.[48] A. Karami, R. Heylen, and P. Scheunders, “Hyperspectral image com-pression optimized for spectral unmixing,”
IEEE Trans. Geosci. RemoteSens. , vol. 54, no. 10, pp. 5884–5894, 2016.[49] Y. Zhao, J. Yang, C. Yi, and Y. Liu, “Joint denoising and unmixing forhyperspectral image,” in
Workshop Hyperspectral Image Signal Proces.:Evol. Remote Sens. , pp. 1–4, 2014.[50] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-works for biomedical image segmentation,” in
Medical Image Comput-ing and Computer-Assisted Intervention , pp. 234–241, Springer Interna-tional Publishing, 2015.[51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in
Proc. Int. Conf. Learn. Representations , 2015.[52] R. Rojas,
The Backpropagation Algorithm , pp. 149–182. Berlin, Hei-delberg: Springer Berlin Heidelberg, 1996.[53] D. L. Donoho, “De-noising by soft-thresholding,”
IEEE Trans. Inf.Theory , vol. 41, no. 3, pp. 613–627, 1995.[54] Q. Shi, H. Sun, S. Lu, M. Hong, and M. Razaviyayn, “Inexact blockcoordinate descent methods for symmetric nonnegative matrix factor-ization,”
IEEE Trans. Signal Process. , vol. 65, no. 22, pp. 5995–6008,2017.[55] H. Othman and Shen-En Qian, “Noise reduction of hyperspectral im-agery using hybrid spatial-spectral derivative-domain wavelet shrink-age,”
IEEE Trans. Geosci. Remote Sens. , vol. 44, no. 2, pp. 397–408,2006.[56] S. Ge, Z. Luo, S. Zhao, X. Jin, and X. Zhang, “Compressing deepneural networks for efficient visual inference,” in