[PDF] Deep Learning-Based Detail Map Estimation for MultiSpectral Image Fusion in Remote Sensing

Abstract

This paper presents a deep learning-based estimation of the intensity component of MultiSpectral bands by considering joint multiplication of the neighbouring spectral bands. This estimation is conducted as part of the component substitution approach for fusion of PANchromatic and MultiSpectral images in remote sensing. After computing the band dependent intensity components, a deep neural network is trained to learn the nonlinear relationship between a PAN image and its nonlinear intensity components. Low Resolution MultiSpectral bands are then fed into the trained network to obtain an estimate of High Resolution MultiSpectral bands. Experiments conducted on three datasets show that the developed deep learning-based estimation approach provides improved performance compared to the existing methods based on three objective metrics.

Full PDF

DDeep Learning-Based Detail Map Estimation for MultiSpectral Image Fusion in Remote Sensing

Arian Azarang, Nasser Kehtarnavaz

Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, USA

Corresponding Author: Arian Azarang, E-mail: [email protected]

Arian Azarang received the BS degree and the first rank award from Shiraz University, Iran, in 2015, and the MS degree in electrical engineering from Tarbiat Modares University, Iran, in 2017. He is currently pursuing the PhD degree in electrical engineering at the University of Texas at Dallas. His research interests include signal and image processing, deep learning, remote sensing, and chaos theory. He has authored or co-authored 15 journal papers and conference papers in these areas. (

E-mail : [email protected])

Nasser Kehtarnavaz is an Erik Jonsson Distinguished Professor in the Department of Electrical and Computer Engineering and the Director of the Embedded Machine Learning Laboratory at the University of Texas at Dallas. His research interests include signal and image processing, machine learning, and real-time implementation on embedded processors. He has authored or co-authored ten books and more than 400 journal papers, conference papers, patents, manuals, and editorials in these areas. He is a Fellow of IEEE, a Fellow of SPIE, and a Licensed Professional Engineer. He is currently serving as Editor-in-Chief of Journal of Real-Time Image Processing. (

Email : [email protected]) eep Learning-Based Detail Map Estimation for MultiSpectral Image Fusion in Remote Sensing

1. Introduction

In remote sensing, different types of images are captured via different sensors. Earth observation satellites normally collect spatial and spectral attributes of the earth surface in the form of so called PANchromatic (PAN) and MultiSpectral (MS) images. The fusion of or combining these two types of images, known as pansharpening, has been extensively studied in the literature (e.g., Lolli et al. et al. et al. et al. et al. et al. et al. et al. , 2018). his paper presents an improvement of our previous work for estimation of detail maps in (Azarang, and Kehtarnavaz, 2020). The intensity component of the MS image is estimated by taking into consideration the joint multiplication of neighboring bands of a spectral band as part of the fusion approach known as component substitution. The nonlinear relationship between the estimated nonlinear intensity components and PAN image are modeled by using a deep neural network. After training the network, the LRMS image is fed into it to provide an estimation of the HRMS image. It is shown that this approach provides a more accurate estimation of the detail map for each spectral band, generating improved fusion outcomes. The rest of the paper is organized as follows: Section 2 provides a description of the developed deep learning-based method. In Section 3, the datasets and the evaluation metrics used are stated followed by the experimental results in Section 4. Finally, the paper is concluded in Section 5.

2. Developed Deep Learning-Based Pansharpening Method

The general framework of the component substitution fusion can be expressed as follows: 𝐌̂ 𝑘 = 𝐌̃ 𝑘 + 𝑔 𝑘 (𝐏 − 𝐈) (1) where 𝐌̂ 𝑘 and 𝐌̃ 𝑘 denote High Resolution MS (HRMS) and Low Resolution MS (LRMS) images, respectively, 𝑔 𝑘 the injection gain corresponding to the k -th spectral band, 𝐏 the PAN image, and 𝐈 the intensity image which is obtained via: 𝐈 𝑘 = ∑ 𝜔 𝑖,𝑘 𝐌̃ 𝑖𝐿𝑖=1 (2) where 𝐿 indicates the number of spectral bands covering the spectral signature of the PAN image, and 𝜔 𝑖,𝑘 𝐃 𝑘 = 𝐏 − 𝐈 𝑘 , can be estimated more accurately for the four spectral bands shown in Fig. 1 as follows: 𝐃 = 𝐏 − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ 𝐃 = 𝐏 − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ 𝐃 = 𝐏 − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ 𝐃 = 𝐏 − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ − 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ ⨀ denotes element-wise multiplication, and 𝑏 𝑖,𝑘 ’s are the weights for joint multiplication terms. In this equation, the joint multiplications of neighboring bands are considered in the computation of detail maps. The weights for the LRMS bands as well as the weights for the joint multiplication terms are obtained via the minimization of the Mean Squared Error (MSE) between the intensity component of a spectral band and its Modulation Transfer Function (MTF)-filtered PAN version. The detail map for each band is then obtained via Eq. (3). A flowchart of the fusion process is illustrated in Fig. 2. The nonlinear intensity component for each spectral band and its corresponding histogram matched PAN image are partitioned into overlapping patches to serve as the input and the target of a DAE network as shown in Fig. 3. This network learns the nonlinear relationship between the intensity components of the spectral bands and its corresponding histogram matched PAN image. From a contextual point of view, the network attempts to inject the spatial information while keeping the spectral information. After the training phase or during the fusion operation, the Low Resolution MS (LRMS) bands are partitioned into overlapping patches and fed into the trained network to obtain an estimate of the HRMS image patches. Then, the patches are tiled back to obtain a fused image as follows: 𝐌̂ 𝑘 = 𝐌̅ 𝑘 + 𝑔 𝑘 (𝐏 𝑘 − 𝐈 𝑘 ) (4) where 𝐌̅ 𝑘 indicates the k -th tiled version of the estimated HRMS band, and the nonlinear intensity components 𝐈 𝑘 ’s are given by: Fig. 2 Flowchart of the first part of the fusion process. Fig. 3 Illustration of the Denoising AutoEncoder (DAE) network. = 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ 𝐈 = 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ 𝐈 = 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ 𝐈 = 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝜔 𝐌̃ + 𝑏 𝐌̃ ⊙ 𝐌̃ (5) Note that the detail injection gains of the intensity components are computed as described in (Vivone et al. , 2014) via the following equation 𝑔 𝑘 = cov(𝐌̃ 𝑘 , 𝐈 𝑘 )var(𝐈 𝑘 ) (6) where cov( X , Y ) denotes the covariance between two images X and Y , and var( X ) is the variance of X . D ATASETS AND E VALUATION M ETRICS

Three datasets associated with the GeoEye-1 (Washington, USA), QuickBird (Sundarbans, Bangladesh), and Pleiades-1A (Melbourne, Australia) sensors were examined to evaluate the developed deep learning-based method. The PAN and LRMS images for all the three datasets are of size 1024×1024 and 256×256 pixels, respectively. The developed method (labeled as IDM-DAE) is compared with some popular methods in the literature, namely AIHS (Rahmani et al. , 2010), GSA (Aiazzi et al. , 2007), GS2-GLP (Kallel, 2014), DNN (Huang et al. , 2015), MTF-GLP-HPM (Vivone et al. , 2013), BDSD (Garzelli et al. , 2007), FDIF (Azarang and Ghassemian, 2018), GLP-HRI (Vivone. 2019), and IDM – Base (Azarang and Kehtarnavaz, 2020). To have an objective evaluation of the fusion outcomes, the full reference and no reference quality metrics are computed. In case of full reference metrics, Spectral Angle Mapper (SAM) (Vivone et al. , 2013) and Q4 (Vivone et al. , 2013) are computed. For no reference evaluations, 𝐷 s (Palsson et al. , 2013) and 𝐷 λ (Vitale, 2019) and the Quality of No Reference (QNR) (Vivone et al. , 2019) metrics are computed. The average values of these metrics are listed in Table I. 𝐷 s and 𝐷 λ reflect the spatial and spectral distortion, respectively, with their lower values indicating better performance. QNR combines 𝐷 s and 𝐷 λ in the form of an overall distortion. A description of the above objective metrics is provided next. One way to measure spectral deviation of the fused product from the original MS image is to utilize the SAM metric, which is expressed as 𝑆𝐴𝑀(𝑨 𝑖 , 𝑩 𝑖 ) = 𝑎𝑟𝑐𝑐𝑜𝑠 ( ≺𝑨 𝑖 ,𝑩 𝑖 ≻∥𝑨 𝑖 ∥∥𝑩 𝑖 ∥ ) (7) here 𝐀 and 𝐁 denote the fused and reference image, respectively, ≺. , . ≻ the inner product, and ∥. ∥ the vector 𝑙 -norm. By averaging SAM values over all the pixels in an image, an overall SAM value can be acquired. The ideal value for the overall SAM is 0. In addition to SAM, the Q4 metric is also computed which is an extension of the Universal Image Quality Index (UIQI) (Wang and Bovik, 2002). This index is computed as follows: UIQI(𝐀, 𝐁) = 𝜎 𝐀𝐁 𝜎 𝐀 𝜎 𝐁 𝐀 𝜇 𝐁 (𝜇 𝐀2 + 𝜇 𝐁2 ) 2𝜎 𝐀 𝜎 𝐁 (𝜎 𝐀2 + 𝜎 𝐁2 ) (8) where 𝜎 𝐀𝐁 is the covariance of 𝐀 and 𝐁 , and 𝜇 𝐀 is the mean of 𝐀 . Expressing Eq. (8) in vector format up to four bands is called Q4. For the no reference quality case, the spectral and spatial distortions are quantified using 𝐷 λ and 𝐷 s , respectively. 𝐷 λ and 𝐷 s are computed as follows: 𝐷 λ = √ 1(𝑁)(𝑁 − 1) ∑ ∑ |UIQI(𝐌̂ 𝑖 , 𝐌̂ 𝑗 ) − UIQI(𝐌̃ 𝑖 , 𝐌̃ 𝑗 )| 𝑃𝑁𝑗=1,𝑖≠𝑗𝑁𝑖=1 𝑝 (9) 𝐷 s = √ 1𝑁 ∑|UIQI(𝐌̂ 𝑖 , 𝐏) − UIQI(𝐌̃ 𝑖 , 𝐏 LP )| 𝑞𝑁𝑖=1 𝑞 (10) where 𝐏 LP denotes the low resolution PAN image at the scale of MS image. It is worth mentioning that the optimal values for both 𝐷 λ and 𝐷 s is 0. The QNR metric combines these metrics together to provide a global average distortion as follows: QNR = (1 − 𝐷 λ )(1 − 𝐷 s ) (11) with its range being 0 to 1.

4. E

XPERIMENTAL R ESULTS

The fusion products at both reduced and full resolution are examined in this section. As seen from Table I, from an objective point of view, the improvement in the metrics are significant in comparison to the existing methods. This table consists of both the reduced and full resolution results. To see the difference in the reduced resolution, sample fusion products for each sensor are shown in Figs. 4 through 6. To see a comparison among the methods, the outcome corresponding to Pleiades-1A is shown in Fig. 6. The spatial superiority of the developed method is evident in Fig. 6(l). There are some regions in Fig. 6(d) and Fig. 6(h) that suffer from spectral distortion. The green area n the right side of the fused outcome is turned to light green for Fig. 6(c), Fig. 6(e), and Fig. 6(g). In Fig. (i) and Fig 6(j), the level detail injection is not sufficient. Our previous method generated similar results to those of the method developed in this paper except for some regions such as the overpass at the bottom of the images. In terms of visual assessment at full resolution, sample fusion outcomes for each dataset are displayed in Figs. 7 through 9. In these figures, the arrows point to a visual difference. For instance, in Fig. 4, a part of the image is magnified for comparison purposes. It can be seen that the AIHS method suffers from color distortion especially in the green area. GS2-GLP has blurry outcome at the harbour edges. The green area in the FDIF image turned into the dark green in comparison to the green color of the LRMS image. The methods BDSD, GSA, and DNN oversharpened the outcome with a slight color distortion. The methods MTF-GLP-HPM, IDM – Base, and IDM-DAE produced similar outcomes in terms of visual perception but the developed method preserved the spectral information better than the MTF-GLP-HPM and its baseline methods. Another example is shown in Fig. 9. In this figure, one can see that the methods AIHS and GSA suffered from the spectral distortion in few regions. The building edges when using the methods GS2-GLP, DNN and FDIF appeared blurred. The color information in the methods MTF-GLP-HPM and BDSD were lost in some regions. The method GLP-HRI suffered from oversharpening. The colors associated with the developed deep learning-based method (IDM-DAE) appeared better preserved across different datasets in comparison to its baseline version (IDM – Base). Fig. 4 Fusion outcomes of different methods for GeoEye-1 sensor at reduced scale: (a) LRMS, (b) PAN, (c) BDSD, (d) AIHS, (e) GSA, (f) GS2-GLP, (g) DNN, (h) MTF-GLP-HPM, (i) FDIF, (j) GLP-HRI, (k) IDM – Base, and (l) IDM – DAE. Fig. 5 Fusion outcomes of different methods for QuickBird sensor at reduced scale: (a) LRMS, (b) PAN, (c) BDSD, (d) AIHS, (e) GSA, (f) GS2-GLP, (g) DNN, (h) MTF-GLP-HPM, (i) FDIF, (j) GLP-HRI, (k) IDM – Base, and (l) IDM – DAE. Fig. 6 Fusion outcomes of different methods for Pleiades-1A sensor at reduced scale: (a) LRMS, (b) PAN, (c) BDSD, (d) AIHS, (e) GSA, (f) GS2-GLP, (g) DNN, (h) MTF-GLP-HPM, (i) FDIF, (j) GLP-HRI, (k) IDM – Base, and (l) IDM – DAE.

Fig. 7 Fusion outcomes of different methods for GeoEye-1 sensor at full scale: (a) LRMS, (b) PAN, (c) BDSD, (d) AIHS, (e) GSA, (f) GS2-GLP, (g) DNN, (h) MTF-GLP-HPM, (i) FDIF, (j) GLP-HRI, (k) IDM – Base, and (l) IDM – DAE.

Fig. 8 Fusion outcomes of different methods for QuickBird sensor at full scale: (a) LRMS, (b) PAN, (c) BDSD, (d) AIHS, (e) GSA, (f) GS2-GLP, (g) DNN, (h) MTF-GLP-HPM, (i) FDIF, (j) GLP-HRI, (k) IDM – Base, and (l) IDM – DAE.

Fig. 9 Fusion outcomes of different methods for Pleiades-1A sensor at full scale: (a) LRMS, (b) PAN, (c) BDSD, (d) AIHS, (f) GSA, (e) GS2-GLP, (g) DNN, (h) MTF-GLP-HPM, (i) FDIF, (j) GLP-HRI, (k) IDM – Base, and (l) IDM – DAE. ABLE I. Average Values of Objective Evaluation Metrics for Three Datasets

Reduced Resolution Analysis Full Resolution Analysis GeoEye-1 QuickBird

Pleiades-1A

GeoEye-1 QuickBird Pleiades-1A Method

SAM Q4 SAM Q4 SAM Q4 D s D λ QNR D s D λ QNR D s D λ QNR

AIHS

GSA

GS2-GLP

DNN

BDSD

FDIF

GLP-HRI

IDM - Base

IDM - DAE C ONCLUSION

A new band dependent detail injection pansharpening method has been developed in this paper by using a deep neural network. In this method, the computation of the intensity component of each MS band is carried out by considering the joint multiplications of neighboring bands in order to estimate the PAN image more accurately. The results obtained indicate the developed deep learning-based method outperforms the existing methods in terms of three objective metrics.

References

Aiazzi, B., Baronti, S. and Selva, M., 2007. Improving component substitution pansharpening through multivariate regression of MS+Pan data.

IEEE Transactions on Geoscience and Remote Sensing , (10), pp.3230-3239. Azarang, A. and Ghassemian, H., 2017, April. A new pansharpening method using multi resolution analysis framework and deep neural networks. In , pp. 1-6. Azarang, A. and Ghassemian, H., 2018. Application of fractional-order differentiation in multispectral image fusion. Remote sensing letters , (1), pp.91-100. Azarang, A. and Kehtarnavaz, N., 2020. Multispectral Image Fusion Based on Map Estimation with Improved Detail. Remote Sensing Letters , in Press. Azarang, A., Manoochehri, H.E. and Kehtarnavaz, N., 2019. Convolutional autoencoder-based multispectral image fusion.

IEEE Access , , pp.35673-35683. Eghbalian, S. and Ghassemian, H., 2018. Multi spectral image fusion by deep convolutional neural network and new spectral loss function. International Journal of Remote Sensing , (12), pp.3983-4002. Garzelli, A., Nencini, F. and Capobianco, L., 2007. Optimal MMSE pan sharpening of very high resolution multispectral images. IEEE Transactions on Geoscience and Remote Sensing , (1), pp.228-236. MTF-GLP-HPM uang, W., Xiao, L., Wei, Z., Liu, H. and Tang, S., 2015. A new pan-sharpening method with deep neural networks.

IEEE Geoscience and Remote Sensing Letters , (5), pp.1037-1041. Kallel, A., 2014. MTF-adjusted pansharpening approach based on coupled multiresolution decompositions. IEEE Transactions on Geoscience and Remote Sensing , (6), pp.3124-3145. Liu, Y., Chen, X., Wang, Z., Wang, Z.J., Ward, R.K. and Wang, X., 2018. Deep learning for pixel-level image fusion: Recent advances and future prospects. Information Fusion , , pp.158-173. Lolli, S., Alparone, L., Garzelli, A. and Vivone, G., 2017. Haze correction for contrast-based multispectral pansharpening. IEEE Geoscience and Remote Sensing Letters , (12), pp.2255-2259. Palsson, F., Sveinsson, J.R. and Ulfarsson, M.O., 2013. A new pansharpening algorithm based on total variation. IEEE Geoscience and Remote Sensing Letters , (1), pp.318-322. Rahmani, S., Strait, M., Merkurjev, D., Moeller, M. and Wittman, T., 2010. An adaptive IHS pan-sharpening method. IEEE Geoscience and Remote Sensing Letters , (4), pp.746-750. Scarpa, G., Vitale, S. and Cozzolino, D., 2018. Target-adaptive CNN-based pansharpening. IEEE Transactions on Geoscience and Remote Sensing , (9), pp.5443-5457. Vitale, S., 2019, July. A CNN-Based Pansharpening Method with Perceptual Loss. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium , pp. 3105-3108. Vivone, G., 2019. Robust Band-Dependent Spatial-Detail Approaches for Panchromatic Sharpening.

IEEE Transactions on Geoscience and Remote Sensing . Vivone, G., Addesso, P. and Chanussot, J., 2018. A Combiner-Based Full Resolution Quality Assessment Index for Pansharpening.

IEEE Geoscience and Remote Sensing Letters , (3), pp.437-441. Vivone, G., Alparone, L., Chanussot, J., Dalla Mura, M., Garzelli, A., Licciardi, G.A., Restaino, R. and Wald, L., 2014. A critical comparison among pansharpening algorithms. IEEE Transactions on Geoscience and Remote Sensing , (5), pp.2565-2586. Vivone, G., Restaino, R., Dalla Mura, M., Licciardi, G. and Chanussot, J., 2013. Contrast and error-based fusion schemes for multispectral image pansharpening. IEEE Geoscience and Remote Sensing Letters , (5), pp.930-934. Wang, Z. and Bovik, A.C., 2002. A universal image quality index. IEEE Signal Processing Letters , (3), pp.81-84. Wei, Y., Yuan, Q., Shen, H. and Zhang, L., 2017. Boosting the accuracy of multispectral image pansharpening by learning a deep residual network. IEEE Geoscience and Remote Sensing Letters ,14