[PDF] Learning deep multiresolution representations for pansharpening

Abstract

Retaining spatial characteristics of panchromatic image and spectral information of multispectral bands is a critical issue in pansharpening. This paper proposes a pyramid based deep fusion framework that preserves spectral and spatial characteristics at different scales. The spectral information is preserved by passing the corresponding low resolution multispectral image as residual component of the network at each scale. The spatial information is preserved by training the network at each scale with the high frequencies of panchromatic image alongside the corresponding low resolution multispectral image. The parameters of different networks are shared across the pyramid in order to add spatial details consistently across scales. The parameters are also shared across fusion layers within a network at a specific scale. Experiments suggest that the proposed architecture outperforms state of the art pansharpening models. The proposed model, code and dataset is publicly available at this https URL

Full PDF

11 Learning deep pyramid based representations forpansharpening

Hannan Adeel, Syed Sohaib Ali, Muhammad Mohsin Riaz, Syed Abdul Mannan Kirmani, Muhammad ImranQureshi and Junaid Imtiaz

Abstract —Retaining spatial characteristics of panchromaticimage and spectral information of multispectral bands is acritical issue in pansharpening. This paper proposes a pyramidbased deep fusion framework that preserves spectral and spatialcharacteristics at different scales. The spectral information ispreserved by passing the corresponding low resolution multispec-tral image as residual component of the network at each scale.The spatial information is preserved by training the networkat each scale with the high frequencies of panchromatic imagealongside the corresponding low resolution multispectral image.The parameters of different networks are shared across thepyramid in order to add spatial details consistently across scales.The parameters are also shared across fusion layers within anetwork at a speciﬁc scale. Experiments suggest that the proposedarchitecture outperforms state of the art pansharpening models.The proposed model, code and dataset is publically available atgithub . Index Terms —Pansharpening, deep learning, image fusion,deep pyramid networks, multiresolution learning

I. I

NTRODUCTION S ATELLITE imagery is mostly acquired from two typesof sensors for the purpose of remote sensing. Thepanchromatic sensor provides a single channel high resolutionpanchromatic image (HR-PAN) that has high spatial resolutionbut lacks spectral colors. The second sensor provides a lowresolution multispectral image (LR-MSI) containing severalbands but lacks high spatial resolution. Combining spatial andspectral characteristics of both sensors (as shown in ﬁgure1) is a challenging task in pansharpening. The process ofpansharpening aims to produce a high-resolution multispectralimage (HR-MSI) by fusing high spatial resolution of panchro-matic image and rich spectral information of multispectralimage. This results in a tradeoff between spectral and spatialinformation in the output HR-MSI [1].

Hannan Adeel is currently pursuing his PhD from Electrical Engi-neering department, COMSATS University Islamabad, Pakistan. E-mail:[email protected] Sohaib Ali is serving as Assistant Professor in Computer Sci-ence department, COMSATS University Islamabad, Pakistan. E-mail: [email protected] Mohsin Riaz is serving as Assistant Professor in Center forAdvanced Studies in Telecommunication (CAST), COMSATS UniversityIslamabad, Pakistan.Syed Abdul Mannan Kirmani is serving as Assistant Professor in Centerfor Advanced Studies in Telecommunication (CAST), COMSATS UniversityIslamabad, Pakistan.Muhammad Imran Qureshi is serving as Assistant Professor in Dept ofElectrical Engineering, FICT, BUITEMS, Quetta, Pakistan.Junaid Imtiaz is serving as Head, Dept of Electrical Engineering, BahriaUniversity Islamabad, Pakistan. https://github.com/sohaibali01/deep pyramid fusion Fig. 1:

Pansharpening example on WorldView-3 Satellite Imagery: Left:Panchromatic Image acquired at . m spatial resolution, Middle: Interpo-lated multispectral image (originally acquired at . m spatial resolution),Right: Fused Output II. L

ITERATURE R EVIEW

The traditional pansharpening methods can be classiﬁedinto three classes; spectral transformation based componentsubstitution (CS) methods, spatial transformation based multi-resolution analysis (MRA) methods and model based methods.In CS based methods, components such as intensity obtainedthrough spectral transformation of LR-MSI is merged withHR-PAN. Intensity hue saturation (IHS) [2], [3], multivariatestatistical methods such as principal component substitution(PCS) [4], Gram-Schmidt (GS) spectral transformation [5],and band-dependent spatial-detail (BDSD) schemes [6] arerepresentative methods of this class. The non-linear (NL) IHSmethod is proposed in [2] to optimize the intensity component.In [3] Fast generalized IHS (FGIHS) method uses variationaloptimization to increase the accuracy of estimated intensityimage in the IHS space and to minimize the associatedspectral distortion. The authors in [4] recently proposed CSbased approach to target HR-MSI by replacing the histogrammatched luminance component of upscaled LR-MSI repre-sented in

CIELab color space with original HR-PAN. In [5]spectral sensitivity is used in conjunction with IHS and Gram-Schmidt transformation. These methods achieve comparablespatial sharpness and improved spectral information in thefused image as compared to conventional CS methods.

Vivone [6] recently proposed regression and physical constraints basedvariants of band-dependent spatial-detail (BDSD) approach. Ingeneral, CS based methods suffer from distortion owing to thedifference between estimated component values and PAN data.The MRA based methods commonly use multiresolutionrepresentations such as Laplacian pyramids (LP) [7], waveletfamily transforms [8], non-subsampled transforms [9]-[10],latent low rank (LLR) decomposition [11] and morphologicalpyramids (MP) [12] to extract information from spatially en-riched PAN image. MRA based methods are generally superior a r X i v : . [ ee ss . I V ] F e b in terms of spectral preservation but artifacts appear due tosubsampling in the PAN image [13]. Regression based gen-eralised LP framework proposed in [7] achieves high qualityresults on different satellite imagery with heavy computationalburden. MRA based method proposed in [8] decomposes HR-PAN image using non-decimated ” ` a trous” wavelet transform(ATWT) to produce low resolution PAN image (LR-PAN).Then linearly weighted LR-PAN and re-sampled multispectralimage is used to achieve color preservation. In [9] non-subsampled contourlet transform (NSCT) has shown compa-rable performance to BDSD. Kumar [14] uses non subsam-pled shearlet transform (NSST) in order to avoid frequencyaliasing and achieving better directional selectivity and shiftinvariance. In [10], NSST is used with guided ﬁltering basedMRA approach, however the process introduces blurry edgesconsequently affecting spatial quality. MRA of LLR basedcomposite image is proposed in [11]. This framework extractsdetails from reconstructed image using LLR decomposition ofHR-PAN and LR-MSI data. In [12] morphological gradientshave been used in a nonlinear pyramidical setting. This schemeshows good results for various satellite imagery. Recentlyin [15] image segmentation based pansharpening method isproposed to reduce the effect of spectral distortion caused byfused spectra of mixed pixels in MRA based methods.In addition to CS and MRA based methods, model basedmethods pose pansharpening as a local optimization problem.In [16] sparse representation and low rank regularization isused in a variational optimization setup however this schemerequires empirical parameter setting and suffers from com-putational cost for high quality pan sharpening.

Khatheri etal. [17] proposed a model based pan sharpening that usessparse coefﬁcients with patch dictionary to generate HR-MSI.The fused output preserves image and patch energy ratio asthat of LR-MSI. In [18] spatial information regularization iscarried out to combat local dissimilarities by optimizing aconvex energy function. Spectral consistency based variationaloptimization model based on half quadratic optimization [19]presented in [20] aims to reduce spectral distortion duringpansharpening. This model shows good spectral preservationwhile also preserving spatial smoothness of HR-PAN.

Tian etal. [21] reported that a general assumption regarding sparserepresentation based methods is that multi-resolution imagesare represented by same sparse coding under some dictionar-ies. They proposed gradient sparse coding based variationalmodel and used gradient similarity based pansharpening. In[22], Bayesian model is used to jointly express the LR-MSI,HR-PAN and HR-MSI using multiorder gradients (MoG) opti-mized by alternating direction method of multipliers (ADMM).In [23] spectral and intensity modulation coefﬁcients obtainedfrom spectral and statistical measures of LR-MSI and HR-PANare combined in an adaptive linear model to strike a balancebetween spectral and spatial details in target HR-MSI.

Fu etal. [24] formulated HR-MSI problem as a variational opti-mization problem based on regression and gradient differenceof HR-PAN and LR-MSI for spatial preservation. In general,model based methods focus on certain aspects and have highcomputational cost due to local optimization solvers.Deep learning (DL) based methods assume non-linear map- ping between LR-MSI and HR-PAN images. Such methodscan be used both in supervised and unsupervised setting.Recently

Guo et al. [25] proposed mulitscale recursive blockbased convolutional neural network (CNN) trained in MOGdomain for pansharpening. Residual blocks with multi-scaleLP and parameter sharing along network branches is used in[26] to improve fusion performance. In [27], convolutionalauto-encoders (CAE) learns the mapping between LR and HRspace using degraded PAN and HR-PAN patches. AfterwardsCAE based estimated HR-MSI is used to preserve spectral-spatial details in the target HR-MSI. In [28] the mapping isestimated by CNN based on pyramid structure to minimizethe spectral and spatial dissimilarity in HR-PAN and LR-MSI.

Masi et al. [29] proposed CNN based pansharpening(PNN) method using three-layered architecture. Their stackednetwork uses interpolated LR-MSI along with HR-PAN im-age allowing training at the target resolution.

Scarpa et al. [30] proposed the target-adaptive enhanced version of PNNacronym as PNNplus or PNN+. The deeper network is trainedto produce residual pansharpened image corrected by up-sampled LR-MSI through skip connections. In comparison toPNN [29] this network achieves better performance using L loss function for different types of satellite imagery. Inspiredby residual network (ResNet)[31], Wei et al. [32] proposeddeep residual PNN (DRPNN) with multiple sparse residuallayers to boost the network efﬁciency. Pansharpening network(PanNet) [33] is a ResNet based DL scheme that incorporatesup-sampled LR-MSI for spectral correction in the target superresolved image and high frequency components for edgepreservation and avoiding inconsistencies between HR-PANand HR-MSI image data. PanNet shows good quality fusionfor variety of satellite imagery.

Liu et al. [34] and

Ma et al. [35] demonstrated the use of generative adversarial networks(GANs) for pansharpening.

Lou et al. [36] formulated anew loss function based on original HR-PAN and LR-MSIinput pair and target HR-MSI, achieving good spatio-spectralperformance for small scale objects. Differential Informa-tion residual CNN [37] learns mapping between HR-PANand LR-MSI and HR-PAN and HR-MSI based on residualblock and attention modules to fully cascade high and lowfrequency components and feature reﬁnement. Deep leaningwith CS method is proposed in [38] where stacked self-attention modules are used and sub-pixel level spectral detailsare injected in LR-MSI to obtain target HR-MSI. Similarly

Ozcelik et al. [39] have shown in their PSColorGAN that alongwith color injection, adopting random scale down-samplingstrategy during training enhances the overall performance ofthe network and minimizes the blurring effect.III. P

ROPOSED S CHEME

Since most of the satellite sensors including Worldview-4, Ikonos, QuickBird etc provide multispectral image that isacquired at a quarter of a spatial resolution compared to thepanchromatic band, the proposed pyramid framework in ﬁgure2 uses a two stage decomposition and reconstruction. Theproposed pansharpening procedure inherently assumes that, • The low frequencies of the output fused image can bedirectly obtained by the input multispectral image. • At each scale, the high frequencies of the output canbe estimated by the corresponding high frequencies ofpanchromatic band but this estimation should be coherentwith the low frequencies obtained at the coarser scale. • The spatial consistency assumption should be followed.For instance, once the output multispectral image isdownsampled from m to m , the residual high frequen-cies should be proportional to the ones obtained if thecurrent image is again downsampled from m to m .Figure 2 ensures this by sharing the network parametersacross two levels of the pyramid. A. Pyramid based fusion

Let the input panchromatic image ( P ) be decomposed into J levels using laplacian pyramids such that, P j = (cid:16) P j − ∗ h (cid:17) ↓ ∀ j ∈ { , , ..., J } (1)where P j represents a low pass output of j th level. Forinitialization, P = P , while h can be a symmetric andseparable low pass ﬁlter such as that used in [40]. Thehigh frequencies at j th level are estimated as an input-outputdifference, i.e., ˆ P j = P j − − (cid:16) P j ↑ (cid:17) (2)Similarly let the j th approximation of the b th band of themultispectral image ( M b ) be represented as M b,j . The up-sampled version of the input multispectral image provides theapproximation for the st scale, i.e., M b, = (cid:16) M b ↑ (cid:17) (3)At each scale the corresponding high frequencies of thepanchromatic image are stacked together with low pass ap-proximations of the coarser scale and the image stack is passedthrough the fusion network to obtain the corresponding scaleapproximation of the fused image, i.e., S j = (cid:104) ˆ P j , M ,j , M ,j , ..., M B,j (cid:105) (4)where B is the total number of input multispectral bands. S j represents image stack at j th scale obtained by channelwiseconcatenation of high frequency and low frequency bands.This stack is then passed through the fusion network thatestimates the j th scale approximation of the fused image. M b,j +1 = (cid:16) f FuseNet ( S j ) + M b,j (cid:17) ↑ (5)where f FuseNet ( · ) provides a deep convolutional mapping dis-cussed in the next section. Note that this mapping doesn’t varyacross scales due to parameter sharing. The skip connectionin the form of M b,j in eq. (5) preserves the identity mappingand allows the framework to output the interpolated versionof the low resolution input multispectral image in the worstcase scenario. The output image M b,J is obtained via eq. (5)in recursive fashion.Given a ground truth image, M b , the parameters of f FuseNet are optimized by minimizing the following loss function, L = (cid:88) j (cid:107) M b,j − M b,j (cid:107) (6)where M b,j represents j th scale approximation of ground truthobtained by summing low and high frequency components ofthe pyramid, M b,j = ˆ M b,j + M b,j (7)where M b,j and ˆ M b,j represent low and high pass componentsrespectively which are obtained by following eq. (1) and eq.(2) respectively. B. Proposed FuseNet

The proposed fully convolutional FuseNet (shown in ﬁgure3) provides a residual learning framework while simultane-ously combining local and global features in a hierarchicalfashion.The output F n at n th layer is calculated as, F n = σ (cid:16) W n ∗ F n − + b n (cid:17) (8)where W n and b n represent ﬁlter weights and bias at n th layer respectively while ∗ and σ represents convolution andactivation function respectively. The ﬁrst layer extracts ashallow representation from j th input stack S j , i.e., F = S j .The output F is then passed onto the stack of K residuallearning blocks, each of which learns the local features forfusion. C. Local Feature Fusion Block

The local feature fusion is obtained by passing the input ofthe block to two consecutive × convolutional layers andthen concatenating the two layers alongside the input. Theﬁnal output feature map of the block is estimated by linearweighting of the concatenated channels which is done using × convolution.Let the output of n th layer at k th local fusion block berepresented as F k,n , such that, F k,n = σ (cid:16) W k,n ∗ F k,n − + b k,n (cid:17) (9)Since the ﬁlter weights are biases are shared across K blocks,we have, W k,n = W k +1 ,n = W k +2 ,n = ... = W K,n (10)Likewise, b k,n = b k +1 ,n = b k +2 ,n = ... = b K,n (11)The third layer is channel-wise concatenation of the precedinglayers, i.e., F k, = (cid:2) F k, , F k, , F k, (cid:3) (12)The output of k th block (cid:0) F k, (cid:1) is ﬁnally determined aschannel-wise linear weighting of the concatenated layers. D. Global Feature Fusion

Just like feature maps fusion within a block, the outputfeature maps of multiple blocks are also fused in a hierarchical

Fig. 2: Proposed architecture for pansharpeningFig. 3: Proposed Networkmanner. The shallow feature map ( F ) is used as a skipconnection and passed as an additional input to each localfusion block, i.e., F k, = F k − , + F (13)Finally the feature output maps of each block are concatenated, F = (cid:2) F k, , F k +1 , , F k +2 , , ..., F K, (cid:3) (14)The feature maps in F are then linearly weighted via × convolution and passed through another × convolutional layer, which outputs B number of output channels.IV. E XPERIMENTS

Every convolutional layer of FuseNet uses number offeature maps except the last × layer which outputs B number of output channels. The parameters at each layer areinitialized using Xavier initialization [41] while the total loss (a) Original Image (b) PNN [29] (c) PNN+ [30](d) DRPNN [32] (e) PanNet [33] (f) VPLG [24] (g) Proposed FuseNet Fig. 4:

Worldview 2 Pansharpening(a) PNN difference (b) PNN+ difference (c) DRPNN difference (d) PanNET difference (e) VPLG difference (f) Proposed difference

Fig. 5:

Absolute Difference between input and different output images presented in ﬁgure 4 is minimized using ADAM optimizer [42] with a learning rateof e − . Training is done with a batch size of and patch sizeof × . Leaky rectiﬁed linear unit (Leaky ReLU) with anegative slope of . is used as activation function throughoutthe network. The tensorﬂow code alongwith training andtesting dataset is available at github . A. Reduced Scale Quality Assessment

Since satellite sensors do not provide a ground truth multi-spectral image at the same high resolution as that of panchro-matic image, a common trend is to consider the available lowresolution multispectral image as ground truth and simulatethe input panchromatic and multispectral images by followingWald’s protocol [1] [43]. Like [33] the input panchromatic andmultispectral images are simulated by downsampling originalimages with a factor of . The output is then comparedagainst the ground truth multispectral image using SAM [44], https://github.com/sohaibali01/deep pyramid fusion relative dimensionless global error in synthesis (ERGAS) [45],universal image quality index averaged over the bands (QAVE)[46] and spatial correlation coefﬁcient (SCC) [47].

1) 8-band Pansharpening:

The current work focusses onsensors that provide multispectral bands alongwith thepanchromatic band. These include Worldview2 and World-view3 which acquire images at different spatial resolutions.Free sample imagery is collected from the internet and adataset comprising of images is created, with each imagehaving dimensions of × . samples out of belong to Worldview2 satellite and the remaining comefrom Worldview3 satellite. Instead of training the modelseparately for these two sensors, we train the model jointlyon out of images selected randomly. The testing isperformed on all of the remaining samples.The proposed scheme is compared with state of the art deeplearning models including Pansharpening with convolutionalNeural Networks (PNN) [29], PNN+ [30], Deep ResidualPansharpening Neural Network (DRPNN) [32] and PanNet (a) Panchromatic Image (b) Interpolated Multispectral Image (c) PNN [29] (d) PNN+ [30](e) DRPNN [32] (f) PanNET [33] (g) VPLG [24] (h) Proposed FuseNet Fig. 6:

Full scale Worldview 3 Pansharpening

TABLE I: Reduced scale pansharpening quality assessment. Best value according to each metric is highlighted.Scheme WorldView 2 (Average on 171 Images) WorldView 3 (Average on 100 Images)QAVE SAM ERGAS SCC QAVE SAM ERGAS SCCPNN [29] 0.7300 5.5836 4.0532 0.8440 0.7711 8.3305 7.3133 0.7888PNN+ [30] 0.7193 5.7245 4.2630 0.8167 0.6385 9.4245 7.7605 0.4968DRPNN [32] 0.7569 5.1577 4.0459 0.8539 0.8269 7.0739 5.1512 0.8331PanNet [33] 0.7458 4.7030 3.9146 0.8394 0.8607 4.8948 4.2456 0.8674VPLG [24] 0.7479 3.9255 3.1048 0.8771 0.9117 4.3710 3.2945 0.8944Proposed [33]. The pre-trained models of all of these schemes arealready available online. In addition to deep learning models,recently proposed Variational Pansharpening with Local Gra-dient constraints (VPLG) [24] is also included in comparison.Figure 4 presents a sample visual comparison between stateof the art pansharpening schemes on Worldview2 satellite im-age while ﬁgure 5 shows the difference of each output schemewith the reference image. PNN produces color distortion inﬁgure 4(b) which is also conﬁrmed since the difference imagein ﬁgure 5(a) seems biased towards red channel. Visuallyﬁgure 4(e) appears the most crisp image but the correspondingdifference image in ﬁgure 5(d) suggests that PanNET producesextra sharp edges that are not present in the reference image.VPLG produces blurring artifacts highlighted in red box inﬁgure 4(f). Compared with the rest, the proposed scheme in5(f) produces least amount of differential edges. Table I quantiﬁes state of the art pansharpening schemesusing SAM [44], ERGAS [45], QAVE [46] and SCC [47].QAVE and ERGAS measure global distortion while SAMand SCC measure spectral and spatial distortion in the outputrespectively. Table I suggests that the proposed scheme out-performs state of art in minimizing both spatial and spectraldistortions.

B. Full Scale Quality Assessment

The models pretrained on reduced scale images are alsotested on input images at full scale. In the absence of groundtruth, the input low resolution multispectral image is usedas spectral reference and the panshromatic image is used asspatial reference. QNR [48] is used as an evaluation metric thatdoesn’t need a ground truth reference. QNR itselt measuresglobal distortion but internally measures spectral distortion

TABLE II: Full scale pansharpening quality assessment. Best value according to each metric is highlighted.Scheme WorldView 2 (Average on 254 Images) WorldView 3 (Average on 177 Images) Q λ Q s QN R Q λ Q s QN R

PNN [29] 0.1318 0.2189 0.6908 0.0870 0.1044 0.8221PNN+ [30] 0.0829 0.1624 0.7734 0.0558 0.1098 0.8410DRPNN [32] 0.1192 0.2207 0.7043 0.0661 0.0662 0.8742PanNet [33] 0.0779 0.1719 0.7726 0.0402 0.0576 0.9058VPLG [24]

Proposed 0.0504 D λ ) and spatial distortion ( D s ) separately. The values of D λ and D s should be low, resulting in high value of QNR in theideal scenario.Figure 6 shows visual comparison of different pansharpen-ing schemes on Worldview3 satellite images at full scale. Onecan see the artifacts around edges of rooftop most noticeablyin PNN+ (ﬁgure 6d) and VPLG (ﬁgure 6g). Table II suggeststhat VPLG performs quite close to the proposed scheme. Amajor difference in table II is that VPLG minimizes spectraldistortion ( D λ ) better while the proposed scheme minimizesspatial distortion ( D s ) better in comparison with the rest.V. C ONCLUSION

This paper proposes a pyramid based deep pansharpeningnetwork that minimizes spatial and spectral distortions in ahierarchial fashion. Each level of the pyramid is trained bystacking the low resolution multispectral image obtained atthe previous level and the corresponding high frequencies ofthe panchromatic image at that particular level. The residualcomponent in the form of low resolution multispectral com-ponents preserves the spectral mapping between input andoutput. Experiments suggest that the proposed multiresolutionframework outperforms state of the pansharpening models.R

EFERENCES[1] L. Loncan, L. B. De Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanus-sot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Simoes et al. ,“Hyperspectral pansharpening: A review,”

IEEE Geoscience and remotesensing magazine , vol. 3, no. 3, pp. 27–46, 2015.[2] M. Ghahremani and H. Ghassemian, “Nonlinear ihs: A promisingmethod for pan-sharpening,”

IEEE Geoscience and Remote SensingLetters , vol. 13, no. 11, pp. 1606–1610, 2016.[3] W. Li, L. Ying, H. Qiujun, and Z. Liping, “Model-based variationalpansharpening method with fast generalized intensity–hue–saturation,”

Journal of Applied Remote Sensing , vol. 13, no. 3, p. 2804, 2019.[4] A. Rahimzadeganasl, U. Alganci, and C. Goksel, “An approach for thepan sharpening of very high resolution satellite images using a cielabcolor based component substitution algorithm,”

Applied Sciences , vol. 9,no. 23, p. 5234, 2019.[5] X. Li, H. Chen, J. Zhou, and Y. Wang, “Improving component sub-stitution pan-sharpening through reﬁnement of the injection detail,”

Photogrammetric Engineering & Remote Sensing , vol. 86, no. 5, pp.317–325, 2020.[6] G. Vivone, “Robust band-dependent spatial-detail approaches forpanchromatic sharpening,”

IEEE Transactions on Geoscience and Re-mote Sensing , vol. 57, no. 9, pp. 6421–6433, 2019. [7] G. Vivone, S. Marano, and J. Chanussot, “Pansharpening: Context-basedgeneralized laplacian pyramids by robust regression,”

IEEE Transactionson Geoscience and Remote Sensing , vol. 58, no. 9, pp. 6152–6167, 2020.[8] S. Wady, Y. Bentoutou, A. Bengermikh, A. Bounoua, and N. Taleb,“A new ihs and wavelet based pansharpening algorithm for high spatialresolution satellite imagery,”

Advances in Space Research , vol. 66, no. 7,pp. 1507–1521, 2020.[9] X. Lu, J. Zhang, and Y. Zhang, “An improved non-subsampled contourlettransform-based hybrid pan-sharpening algorithm,” in . IEEE,2017, pp. 3393–3396.[10] J. Jiao and L. Wu, “Pansharpening with a gradient domain gif based onnsst,”

Electronics , vol. 8, no. 2, p. 229, 2019.[11] H. Hallabia and A. B. Hamida, “A pan-sharpening method based latentlow-rank decomposition model,” in . IEEE, pp. 1–4.[12] R. Restaino, G. Vivone, M. Dalla Mura, and J. Chanussot, “Fusionof multispectral and panchromatic images based on morphologicaloperators,”

IEEE Transactions on Image Processing , vol. 25, no. 6, pp.2882–2895, 2016.[13] X. Meng, H. Shen, H. Li, L. Zhang, and R. Fu, “Review of thepansharpening methods for remote sensing images based on the idea ofmeta-analysis: Practical discussion and challenges,”

Information Fusion ,vol. 46, pp. 102–113, 2019.[14] U. Kumar, “Pan-sharpening using spatial-frequency method,” in

SatelliteInformation Classiﬁcation and Interpretation . IntechOpen, 2019.[15] H. Li and L. Jing, “Image fusion framework considering mixed pixelsand its application to pansharpening methods based on multiresolutionanalysis,”

Journal of Applied Remote Sensing , vol. 14, no. 3, p. 038501,2020.[16] Y. Chen, T. Wang, F. Fang, and G. Zhang, “A pan-sharpening methodbased on the admm algorithm,”

Frontiers of Earth Science , vol. 13, no. 3,pp. 656–667, 2019.[17] M. Khateri, H. Ghassemian, and F. Mirzapour, “A model-based methodfor pan-sharpening of multi-spectral images using sparse representation,”in . IEEE, 2019, pp. 219–224.[18] W. Wang, H. Liu, L. Liang, Q. Liu, and G. Xie, “A regularised model-based pan-sharpening method for remote sensing images with localdissimilarities,”

International journal of remote sensing , vol. 40, no. 8,pp. 3029–3054, 2019.[19] J. Yang, W. Yin, Y. Zhang, and Y. Wang, “A fast algorithm for edge-preserving variational multichannel image restoration,”

SIAM Journal onImaging Sciences , vol. 2, no. 2, pp. 569–592, 2009.[20] M. Khateri, F. Shabanzade, and F. Mirzapour, “Regularised ihs-basedpan-sharpening approach using spectral consistency constraint and totalvariation,”

IET Image Processing , vol. 14, no. 1, pp. 94–104, 2019.[21] X. Tian, Y. Chen, C. Yang, X. Gao, and J. Ma, “A variational pansharp-ening method based on gradient sparse representation,”

IEEE SignalProcessing Letters , vol. 27, pp. 1180–1184, 2020.[22] T. Wang, F. Fang, F. Li, and G. Zhang, “High-quality bayesian pan-sharpening,”

IEEE Transactions on Image Processing , vol. 28, no. 1,pp. 227–239, 2018.[23] Y. Yang, L. Wu, S. Huang, Y. Tang, and W. Wan, “Pansharpening formultiband images with adaptive spectral–intensity modulation,”

IEEE

Journal of Selected Topics in Applied Earth Observations and RemoteSensing , vol. 11, no. 9, pp. 3196–3208, 2018.[24] X. Fu, Z. Lin, Y. Huang, and X. Ding, “A variational pan-sharpeningwith local gradient constraints,” in

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , 2019, pp. 10 265–10 274.[25] P. Guo, P. Zhuang, and Y. Guo, “Bayesian pan-sharpening with mul-tiorder gradient-based deep network constraints,”

IEEE Journal ofSelected Topics in Applied Earth Observations and Remote Sensing ,vol. 13, pp. 950–962, 2020.[26] J. Jiang, H. Sun, X. Liu, and J. Ma, “Learning spatial-spectralprior for super-resolution of hyperspectral imagery,” arXiv preprintarXiv:2005.08752 , 2020.[27] A. Azarang, H. E. Manoochehri, and N. Kehtarnavaz, “Convolutionalautoencoder-based multispectral image fusion,”

IEEE Access , vol. 7, pp.35 673–35 683, 2019.[28] Z. Li and C. Cheng, “A cnn-based pan-sharpening method for integrat-ing panchromatic and multispectral images using landsat 8,”

RemoteSensing , vol. 11, no. 22, p. 2606, 2019.[29] G. Masi, D. Cozzolino, L. Verdoliva, and G. Scarpa, “Pansharpening byconvolutional neural networks,”

Remote Sensing , vol. 8, no. 7, p. 594,2016.[30] G. Scarpa, S. Vitale, and D. Cozzolino, “Target-adaptive cnn-basedpansharpening,”

IEEE Transactions on Geoscience and Remote Sensing ,vol. 56, no. 9, pp. 5443–5457, 2018.[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 770–778.[32] Y. Wei, Q. Yuan, H. Shen, and L. Zhang, “Boosting the accuracy ofmultispectral image pansharpening by learning a deep residual network,”

IEEE Geoscience and Remote Sensing Letters , vol. 14, no. 10, pp. 1795–1799, 2017.[33] J. Yang, X. Fu, Y. Hu, Y. Huang, X. Ding, and J. Paisley, “Pannet: A deepnetwork architecture for pan-sharpening,” in

Proceedings of the IEEEInternational Conference on Computer Vision , 2017, pp. 5449–5457.[34] X. Liu, Y. Wang, and Q. Liu, “Psgan: A generative adversarial networkfor remote sensing image pan-sharpening,” in . IEEE, 2018, pp.873–877.[35] J. Ma, W. Yu, C. Chen, P. Liang, X. Guo, and J. Jiang, “Pan-gan: Anunsupervised learning method for pan-sharpening in remote sensing im-age fusion using a generative adversarial network,”

Information Fusion ,2020.[36] S. Luo, S. Zhou, Y. Feng, and J. Xie, “Pansharpening via unsupervisedconvolutional neural networks,”

IEEE Journal of Selected Topics inApplied Earth Observations and Remote Sensing , vol. 13, pp. 4295–4310, 2020.[37] “A differential information residual convolutional neural network forpansharpening.”[38] Y. Qu, R. K. Baghbaderani, H. Qi, and C. Kwan, “Unsupervisedpansharpening based on self-attention mechanism,”

IEEE Transactionson Geoscience and Remote Sensing , 2020.[39] F. Ozcelik, U. Alganci, E. Sertel, and G. Unal, “Rethinking cnn-basedpansharpening: Guided colorization of panchromatic images via gans,”

IEEE Transactions on Geoscience and Remote Sensing , 2020.[40] P. Burt and E. Adelson, “The laplacian pyramid as a compact imagecode,”

IEEE Transactions on communications , vol. 31, no. 4, pp. 532–540, 1983.[41] X. Glorot and Y. Bengio, “Understanding the difﬁculty of trainingdeep feedforward neural networks,” in

Proceedings of the thirteenthinternational conference on artiﬁcial intelligence and statistics , 2010,pp. 249–256.[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[43] L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images ofdifferent spatial resolutions: Assessing the quality of resulting images,”

Photogrammetric Engineering and Remote Sensing , vol. 63, pp. 691–699, 11 1997.[44] R. H. Yuhas, A. F. Goetz, and J. W. Boardman, “Discrimination amongsemi-arid landscape endmembers using the spectral angle mapper (sam)algorithm,” 1992.[45] L. Wald,

Data fusion: deﬁnitions and architectures: fusion of images ofdifferent spatial resolutions . Presses des MINES, 2002.[46] Z. Wang and A. C. Bovik, “A universal image quality index,”

IEEEsignal processing letters , vol. 9, no. 3, pp. 81–84, 2002.[47] J. Zhou, D. Civco, and J. Silander, “A wavelet transform method tomerge landsat tm and spot panchromatic data,”

International journal ofremote sensing , vol. 19, no. 4, pp. 743–757, 1998. [48] L. Alparone, B. Aiazzi, S. Baronti, A. Garzelli, F. Nencini, and M. Selva,“Multispectral and panchromatic data fusion assessment without refer-ence,”