[PDF] Underwater Color Restoration Using U-Net Denoising Autoencoder

Abstract

Visual inspection of underwater structures by vehicles, e.g. remotely operated vehicles (ROVs), plays an important role in scientific, military, and commercial sectors. However, the automatic extraction of information using software tools is hindered by the characteristics of water which degrade the quality of captured videos. As a contribution for restoring the color of underwater images, Underwater Denoising Autoencoder (UDAE) model is developed using a denoising autoencoder with U-Net architecture. The proposed network takes into consideration the accuracy and the computation cost to enable real-time implementation on underwater visual tasks using end-to-end autoencoder network. Underwater vehicles perception is improved by reconstructing captured frames; hence obtaining better performance in underwater tasks. Related learning methods use generative adversarial networks (GANs) to generate color corrected underwater images, and to our knowledge this paper is the first to deal with a single autoencoder capable of producing same or better results. Moreover, image pairs are constructed for training the proposed network, where it is hard to obtain such dataset from underwater scenery. At the end, the proposed model is compared to a state-of-the-art method.

Full PDF

UUnderwater Color Restoration Using U-NetDenoising Autoencoder

Yousif Hashisho

Department of Maritime GraphicsFraunhofer Institute for Computer Graphics Research (IGD)

Rostock, [email protected]

Tom Krause

Department of Maritime GraphicsFraunhofer Institute for Computer Graphics Research (IGD)

Rostock, [email protected]

Mohamad Albadawi

Department of Maritime GraphicsFraunhofer Institute for Computer Graphics Research (IGD)

Rostock, [email protected]

Uwe Freiherr von Lukas

Department of Maritime GraphicsFraunhofer Institute for Computer Graphics Research (IGD)Department of Computer ScienceUniversity of Rostock

Rostock, [email protected]

Abstract —Visual inspection of underwater structures by vehi-cles, e.g. remotely operated vehicles (ROVs), plays an importantrole in scientiﬁc, military, and commercial sectors. However,the automatic extraction of information using software toolsis hindered by the characteristics of water which degrade thequality of captured videos. As a contribution for restoring thecolor of underwater images, Underwater Denoising Autoencoder(UDAE) model is developed using a denoising autoencoder withU-Net architecture. The proposed network takes into consid-eration the accuracy and the computation cost to enable real-time implementation on underwater visual tasks using end-to-end autoencoder network. Underwater vehicles perception isimproved by reconstructing captured frames; hence obtainingbetter performance in underwater tasks. Related learning meth-ods use generative adversarial networks (GANs) to generate colorcorrected underwater images, and to our knowledge this paper isthe ﬁrst to deal with a single autoencoder capable of producingsame or better results. Moreover, image pairs are constructedfor training the proposed network, where it is hard to obtainsuch dataset from underwater scenery. At the end, the proposedmodel is compared to a state-of-the-art method.

Index Terms —autoencoders, underwater image, image restora-tion, Generative Adverarial Networks, real-time

I. I

NTRODUCTION

Marine robots, such as remotely operated vehicles (ROVs),are being increasingly used in the scientiﬁc, military, andcommercial sectors. They are critical in collecting data andperforming certain underwater operations. Due to safety andhealth concerns, human intervention can be risky and limitedwhen executing underwater missions [1]. Thus, underwatervehicles are supplied with cameras systems for performingnumerous vision tasks. For instance, Choi et al., 2017 [2]operated an ROV manually for inspecting harbour structuresand acquiring high quality videos. Manjunatha et al., 2018 [3]built a robot equipped with a high deﬁnition camera for visualinspection at a speciﬁed depth in a water body. However, theautomatic extraction of information using software tools is hindered by underwater image degradation caused by poorwater medium and light behaviour.Contrast loss and color distortion affect the algorithms andultimately the vehicle performance in gathering data and pro-cessing them. An image enhancement technique is needed forvehicle navigation by human operator to facilitate underwatertasks. Furthermore, the processing speed should be taken intoconsideration for a real-time implementation.This paper proposes Underwater Denoising Autoencoder(UDAE), a deep learning network based on a single denoisingautoencoder [4] using U-Net [5] as a CNN architecture,for improving the quality of underwater imagery and videomaterial. The contributions presented in this paper can besummarized as follows: • UDAE network is proposed which is specialized in un-derwater color restoration. • Faster processing speed is achieved than the state-of-the-art method which optimize the real-time capability. • A new dataset with a combination of different under-water scenarios (turbidity, depth, temperature, attenuationtype..) is synthesized for training the proposed network.The synthetic dataset is generated using a generative deeplearning method. • The fully end-to-end proposed model generalizes well(real underwater images) with different degradation types.The rest of the paper is as follows: §II talks about relevantwork; §III gives experiments and methods followed to restoreunderwater images; §IV presents corrected underwater imagesand the performance of the proposed network; ﬁnally, §Vsummarizes the paper.II. R

ELATED W ORK

Numerous attempts have been made with different imageimprovement methods for restoring the color of raw under-water images. These methods fall into two categories [6]: a r X i v : . [ c s . C V ] M a y ardware-based methods [7], [8] and software-based meth-ods [9]–[11]. Software-based methods invert the formation ofunderwater images and construct physical models for imageenhancement in addition to modifying the image pixel values.Hardware-based methods capture multiple images with helpof polarization ﬁlters, stereo setups or specialized hardwaredevices and use the obtained additional information [6], [12].Both categories show good performance, however, they arelimited to certain scenarios and don’t match various underwa-ter lightening conditions. They are expensive to implementsince some of them use specialized sensors and multipleimages for the enhancement. Recent approaches have focusedon Generative Adversarial Networks (GANs) as a new wayfor achieving better results.When improving underwater imagery using deep learningmodels (e.g. GANs), image pairs consisting of clean anddistorted underwater images are needed for training the model.It is hard to capture clean underwater images without theattenuation of light and other underwater effects. Thus, severalworks have been done to synthesize training images.Li et al., 2018 [13] used two types of networks: WaterGenerative Adversarial Network (WaterGAN) for generatingrealistic underwater images and Underwater Image RestorationNetwork for correcting the color. The generator of WaterGANmodels the formation of underwater image using three stages:Attenuation, Scattering, and Camera Model. After that, thelearned generator is used to generate training image samplesfor the color restoration network. First, a relative depth mapis estimated and reconstructed from the input image andare both used for color restoration. They showed efﬁciencyfor real-time applications, however, their network is limitedto certain degradation type appearance due to the way ofgenerating images. Figure 1 shows the images that were usedfor training the network which do not reﬂect underwaterstructures. The clean images consist of in-air images, whereasthe corruption process is limited to certain degradation types(e.g. greenish mask). Fig. 1. The synthesized underwater images by WaterGAN. The color restora-tion might be limited to certain types of color degradation appearance. [13]

As an improvement over the aforementioned data generationmethod, Li et al., 2018 [14] and Fabbri et al., 2018 [15]used CycleGAN [16] for generating underwater images. Aftersynthesizing the data, it was later used for training their colorrestoration model.The previously mentioned deep learning methods showedgood performance in restoring the color. However in certainscenarios, they led to an unrealistic color correction of under-water images as in Li et al., 2018 [14]. The training dataset lacked true colors of underwater structures such as coral reefsand ﬁsh. Furthermore, a drawback in the color restorationmodel, Underwater Generative Adversarial Network (UGAN),of Fabbri et al., 2018 [15] is the efﬁciency of real-timeimplementation with high resolution images, as the model’sarchitecture makes it computationally costly.We follow the same procedure as in Fabbri et al., 2018 [15]for generating synthetic images. However, a different set ofimages is used for the training of CycleGAN. Fabbri etal., 2018 [15] collected clear underwater images and style-transferred the characteristics of degradation from distortedunderwater images to them. Our generated dataset is composedof various underwater locations with different degradationtypes, leading to a better generalization than their network.III. M

ETHODOLOGY

Two important aspects are discussed in building the UDAEmodel. The ﬁrst aspect is the methodology followed to gener-ate the underwater dataset for training the network. The secondone is the architecture of the UDAE model and the beneﬁtsof using a denoising autoencoder.

A. Dataset

A dataset is gathered and ﬁltered to be used for the gener-ation process of the image pairs. This section is divided intotwo subsections. The ﬁrst subsection discusses data collectionof underwater images, while the second discusses generatingdata for obtaining underwater image pairs.

1) Data Collection:

To train a network capable of restoringthe true underwater color from the distorted images, clearimages were gathered without light scattering in them. Theseimages were taken from different sources on the Internet. Asit is hard to get clear images, it was possible to obtain themfrom: • Large ﬁsh aquariums such as the ones in museums andtouristic towers. • Underwater images that were captured in a close distanceto the structures with artiﬁcial light exposure. • Various images and frames taken from videos that wereenhanced and processed by commercial software tools.The clean images were chosen based on contrast lossand degradation presented in underwater images. After that,distorted images were gathered with different attenuation typesfrom various locations. Some of them were captured byFraunhofer IGD from the Baltic Sea, while the others weregathered from the Internet corresponding to different locations,depths, temperatures and other degradation factors.

2) Image Pairs Data Generation: , images composedof clear and distorted images were collected. After that, thecollected images were ﬁltered, based on a subjective qualityevaluation, into two categories: A (clear) containing , images and B (distorted) containing , images. The twodifferent categories are shown in Figure 2. All images wereresized to × using area interpolation method.After gathering suitable images, CycleGAN generativemodel was used for style-transferring. It uses adversarial lossor learning a mapping from a source domain X to a targetdomain Y ( G : X → Y ) [16]. It was used to transfer theunderwater style from B images to A ones, and the resultwas the category A (cid:48) , Figure 3. The image pairs in A and A (cid:48) were then used to train the autoencoder. The training of theCycleGAN model took around days on NVIDIA TITANX GPU devices, after that , image pairs were generatedand ﬁltered into , after removing failure cases. The failurecases are due to limitations in the style transfer of CycleGAN. (a) A (clean images samples)(b) B (distorted images samples)Fig. 2. Samples of images used for the style-transferring.(a) A (clean images samples)(b) A (cid:48) (distorted images samples)Fig. 3. Samples of the UDAE dataset after the generation of image pairs. B. Proposed Network

Denoising autoencoder is used for restoring the color ofunderwater images. We consider the problem of the colorrestoration as a reconstruction of a corrupted input. Considerthat x is the clean image and ˜ x is the corrupted version of it bythe style transfer c (˜ x | x ) . Then we would try to reconstruct arepaired input by learning a decoding distribution p θ ( x | z ) froman encoded distribution q φ ( z | ˜ x ) . Denoising autoencoders areexpected to capture implicit invariances in the data and extract the key features from the input images [4], [17]. U-Net is usedas a CNN architecture due to its efﬁciency in computationand training in addition to its ability to propagate contextinformation to higher resolution layers [5].For a better illustration of the proposed UDAE network,refer to Figure 4. Same kernel sizes and layers were usedas in UNet [5]. First of all, a distorted RGB underwaterimage is fed into the encoder of the denoising autoencoder.In the encoder part, subsequent convolutions downsample theimage gradually to a latent variable. In each downsamplingstage, × × max-pooling with astride of . The number of feature maps are doubled in eachstage. In the decoder part, upsampling is done from the latentvariable back to the original input size sequentially. After eachupsampling, the tensor (image) is concatenated with the outputof the corresponding symmetric layer in the encoder side and consecutive convolutions are followed. The feature mapsare reduced gradually to channels. The concatenation of theoutput of layers combines the contextual information from thedownsampling step [5]. The reconstructed image should bearresemblance to the clean images, therefore and inspired bythe work of Zhao et al. [18], Multi-scale Structural SIMilarity(MS-SSIM) index and absolute value ( L ) loss functions wereused. The loss function can be expressed as: L = α · L MS − SSIM + (1 − α ) · L L , (1)where L represents the loss of the reconstructed image and α isset to . after conducting several experiments and observingbest reconstruction. The objective of the autoencoder is tominimize the loss function as much as possible. Weight decayis omitted in the proposed network since the presented noisein the input images has a similar regularization effect toweight decay with faster training dynamics [19]. Tensorﬂowframework was used for the training.IV. R ESULTS AND D ISCUSSION

The training of UDAE took around day on NVIDIAQuadro M5000. It was then tested on , images witha resolution of × . The average time per image inseconds was . ( . f ps ) on NVIDIA RTX 2080ti.The selected loss function was capable of preserving detailswhen reconstructing the image. SSIM is sensitive to varioustypes of image degradation [20], whereas L preserves colorsand luminance [18]. The proposed network produced goodresults as shown in Figure 5 with a suitable speed for real-time implementation. In certain scenarios where the cleanimage is only partially clear such as the one in subﬁgure 5c,the reconstructed image showed a better recovery from thedistorted color than the clean image itself. The reason is thatthe network in general learned an encoding and decodingdistribution capable of reconstructing color-recovered images.Additionally, UDAE network was tested on real data suchas underwater videos extracted from YouTube to evaluate its the parameters learn well even with a small dataset

56 12832 323 64 64 128 128 256 256 128 128 64 3264 3326432 64 128

2D convolution Max-pooling

Upsampling and 2D convolutionConcatenation

Fig. 4. Architecture of the UDAE proposed model.(a) (b)(c) (d)Fig. 5. Testing image with its clean and reconstructed version using UDAE. The images from left to right correspond to clean image, input image, and outputimage respectively. generalization ability. Figure 6 shows samples of the recon-structed images on the following videos: Baltic Sea , ScubaDiving , and Fish Hunting . The color of the input underwaterimages with different degradation type was restored and thedetails were preserved. A. Comparison with UGAN

UDAE was compared to Underwater Generative AdversarialNetwork (UGAN) [15]. First, both networks were tested on thedataset described in Section III-A ( , images) due to theavailability of the clean image and for an objective evaluation.The testing images are of size × . Three metrics wereused for the evaluation: MSE, SSIM, and MS-SSIM-L1 (eq. 1).In all three metrics, UDAE showed better reconstruction errorthan that of UGAN, Table I .For a fair comparison, both networks were then evaluated onthe testing images that the authors of UGAN published in theirpaper, Figure 7 . The average processing time was calculated MSE and MS-SSIM-L1 give a score for identical images, while SSIMgives a score . for a better comparison of images, it is better to view them in digital form. (a) Case - “Baltic Sea” Video (b) Case - “Fish Hunting” Video(c) Case - “Baltic Sea” Video (d) Case - “Fish Hunting” Video(e) Case - “Scuba Diving” Video (f) Case - “Scuba Diving” VideoFig. 6. Samples of the reconstructed frames of the color restoration taskextracted from YouTube videos. UDAE performs well on real data even thoughit was trained on synthetic dataset.ABLE IE VALUATION OF

UGAN

AND

UDAE

USING THREE METRICS OVER , IMAGES WITH A RESOLUTION OF × . Objective EvaluationMetrics MSE SSIM MS-SSIM-L1

UDAE . . . UGAN . . . over , testing images resized to × . The averagetime per image of UGAN was . seconds ( . f ps ),whereas that of UDAE was . seconds ( . f ps ).The processing was conducted on NVIDIA RTX2080ti.Sinceclean images were not available, the evaluation was only basedon the human perception. UDAE showed good generalization (a)(b)(c)(d)Fig. 7. Comparison to a testing image produced by UGAN. The ﬁrst columnrepresents the original image, second column represents the reconstructedimage using UGAN, and the third column represents the reconstructed imageusing UDAE. where the color was restored and the details were preserved.UGAN achieved good performance in restoring the colorsof some images such as subﬁgure 7a, however, UDAE hadbetter color brightness. Another inference drawn from theimages is the background reconstruction. In many images, UGAN failed to reconstruct the background properly such asimages with plain background as in subﬁgure 7b, whereas ourproposed network was capable of restoring the color of boththe foreground and the background without any artifacts. Anexample on the artifacts is the halo effect shown in UGANreconstructed image. As for the high frequencies, the imagesof subﬁgure 7c were zoomed in by a factor of using bilinearinterpolation method in Figure 8. UDAE outperforms UGAN (a) Input (b) UGAN (c) UDAEFig. 8. The input image is zoomed in with a factor of using bilinearinterpolation. The details in the reconstructed image of UDAE are preserved. network in preserving and reconstructing better details. Thecoral reefs in the reconstructed image of UGAN were blurryand many details were lost. The details are important forobject detection and tracking by underwater vehicles. Somefailure cases were noticed by our proposed network such assubﬁgure 7d. This will be kept for future work where a betterdataset with more degradation types would be established fora better generalization.V. C ONCLUSION

This paper proposed Underwater Denoising Autoencoder(UDAE); a new way for restoring the color of underwaterimages using a single denoising autoencoder with real-timecapability. We showed that it is possible to reconstruct under-water images using a network based on a single denoisingautoencoder, where it gave same or better results than anetwork based on a GAN. However, using a single autoencoderis better suited for real-time implementation. Additionally, asan improvement to previous networks, UDAE is capable ofrestoring better color in images and preserving the details.We believe that there is a space for improving the network,where better generalization ability should be achieved. Thenetwork was trained on a relatively small dataset, however,obtaining a larger one with various color distortion wouldlead to great improvement. The processing speed could bealso improved by trying different CNN-baseline or latent spacesize. R

EFERENCES[1] R. B. Wynn, V. A. Huvenne, T. P. Le Bas, B. J. Murton, D. P.Connelly, B. J. Bett, H. A. Ruhl, K. J. Morris, J. Peakall, D. R. Parsons et al. , “Autonomous underwater vehicles (auvs): Their past, presentand future contributions to the advancement of marine geoscience,”

Marine Geology , vol. 352, pp. 451–468, 2014. [Online]. Available:https://doi.org/10.1016/j.margeo.2014.03.012[2] J. Choi, Y. Lee, T. Kim, J. Jung, and H.-T. Choi, “Developmentof a rov for visual inspection of harbor structures,” in . IEEE, 2017, pp. 1–4. [Online].Available: https://doi.org/10.1109/UT.2017.78902853] M. Manjunatha, A. A. Selvakumar, V. P. Godeswar, and R. Manimaran,“A low cost underwater robot with grippers for visual inspectionof external pipeline surface,”

Procedia computer science , vol. 133,pp. 108–115, 2018. [Online]. Available: https://doi.org/10.1016/j.procs.2018.07.014[4] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extractingand composing robust features with denoising autoencoders,” in

Proceedings of the 25th international conference on Machinelearning . ACM, 2008, pp. 1096–1103. [Online]. Available: https://doi.org/10.1145/1390156.1390294[5] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutionalnetworks for biomedical image segmentation,” in

InternationalConference on Medical image computing and computer-assistedintervention . Springer, 2015, pp. 234–241. [Online]. Available:https://doi.org/10.1007/978-3-319-24574-4 28[6] H. Lu, Y. Li, Y. Zhang, M. Chen, S. Serikawa, and H. Kim, “Underwateroptical image processing: a comprehensive review,”

Mobile networksand applications , vol. 22, no. 6, pp. 1204–1211, 2017. [Online].Available: https://doi.org/10.1007/s11036-017-0863-4[7] Y. Y. Schechner and N. Karpel, “Recovery of underwater visibilityand structure by polarization analysis,”

IEEE Journal of oceanicengineering , vol. 30, no. 3, pp. 570–587, 2005. [Online]. Available:https://doi.org/10.1109/JOE.2005.850871[8] T. Treibitz and Y. Y. Schechner, “Active polarization descattering,”

IEEE transactions on pattern analysis and machine intelligence ,vol. 31, no. 3, pp. 385–399, 2009. [Online]. Available: https://doi.org/10.1109/TPAMI.2008.85[9] C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert,“Color balance and fusion for underwater image enhancement,”

IEEETransactions on Image Processing , vol. 27, no. 1, pp. 379–393, 2018.[Online]. Available: https://doi.org/10.1109/TIP.2017.2759252[10] F. Farhadifard, Z. Zhou, and U. F. von Lukas, “Learning-basedunderwater image enhancement with adaptive color mapping,” in . IEEE, 2015, pp. 48–53. [Online]. Available:https://doi.org/10.1109/ISPA.2015.7306031[11] J. Y. Chiang and Y.-C. Chen, “Underwater image enhancement bywavelength compensation and dehazing,”

IEEE Transactions on ImageProcessing , vol. 21, no. 4, pp. 1756–1769, 2012. [Online]. Available:https://doi.org/10.1109/TIP.2011.2179666[12] R. Schettini and S. Corchs, “Underwater image processing: state of theart of restoration and image enhancement methods,”

EURASIP Journalon Advances in Signal Processing , vol. 2010, no. 1, p. 746052, 2010.[Online]. Available: https://doi.org/10.1155/2010/746052[13] J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson,“Watergan: unsupervised generative network to enable real-time colorcorrection of monocular underwater images,”

IEEE Robotics andAutomation Letters , vol. 3, no. 1, pp. 387–394, 2018. [Online].Available: https://doi.org/10.1109/LRA.2017.2730363[14] C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater imagecolor correction based on weakly supervised color transfer,”

IEEESignal Processing Letters , vol. 25, no. 3, pp. 323–327, 2018. [Online].Available: https://doi.org/10.1109/LSP.2018.2792050[15] C. Fabbri, M. J. Islam, and J. Sattar, “Enhancing underwater imageryusing generative adversarial networks,” in . IEEE, 2018,pp. 7159–7165. [Online]. Available: https://doi.org/10.1109/ICRA.2018.8460552[16] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-imagetranslation using cycle-consistent adversarial networks,” in

Proceedingsof the IEEE International Conference on Computer Vision , 2017, pp.2223–2232. [Online]. Available: https://doi.org/10.1109/ICCV.2017.244[17] A. Creswell and A. A. Bharath, “Denoising adversarial autoencoders,”

IEEE transactions on neural networks and learning systems , no. 99, pp.1–17, 2018.[18] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions forimage restoration with neural networks,”

IEEE Transactions onComputational Imaging , vol. 3, no. 1, pp. 47–57, 2017. [Online].Available: https://doi.org/10.1109/TCI.2016.2644865[19] A. Pretorius, S. Kroon, and H. Kamper, “Learning dynamics oflinear denoising autoencoders,” in

Proceedings of the 35th InternationalConference on Machine Learning , ser. Proceedings of Machine LearningResearch, J. Dy and A. Krause, Eds., vol. 80. Stockholmsmssan, Stockholm Sweden: PMLR, 10–15 Jul 2018, pp. 4141–4150. [Online].Available: http://proceedings.mlr.press/v80/pretorius18a.html[20] A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in201020th International Conference on Pattern Recognition