[PDF] A Convolutional Neural Networks Denoising Approach for Salt and Pepper Noise

Abstract

The salt and pepper noise, especially the one with extremely high percentage of impulses, brings a significant challenge to image denoising. In this paper, we propose a non-local switching filter convolutional neural network denoising algorithm, named NLSF-CNN, for salt and pepper noise. As its name suggested, our NLSF-CNN consists of two steps, i.e., a NLSF processing step and a CNN training step. First, we develop a NLSF pre-processing step for noisy images using non-local information. Then, the pre-processed images are divided into patches and used for CNN training, leading to a CNN denoising model for future noisy images. We conduct a number of experiments to evaluate the effectiveness of NLSF-CNN. Experimental results show that NLSF-CNN outperforms the state-of-the-art denoising algorithms with a few training images.

Full PDF

AA Convolutional Neural Networks Denoising Approachfor Salt and Pepper Noise

Bo Fu, Xiao-Yang Zhao, Yi Li, Xiang-Hai Wang and Yong-Gong Ren

School of Computer and Information Technology, Liaoning Normal University, [email protected]

Abstract.

The salt and pepper noise, especially the one with extremely highpercentage of impulses , brings a significant challenge to image denoising. Inthis paper, we propose a non-local switching filter convolutional neural networkdenoising algorithm, named NLSF-CNN, for salt and pepper noise. As its namesuggested, our NLSF-CNN consists of two steps, i.e., a NLSF processing stepand a CNN training step. First, we develop a NLSF pre-processing step fornoisy images using non-local information. Then, the pre-processed images aredivided into patches and used for CNN training, leading to a CNN denoisingmodel for future noisy images. We conduct a number of experiments toevaluate the effectiveness of NLSF-CNN. Experimental results show thatNLSF-CNN outperforms the state-of-the-art denoising algorithms with a fewtraining images. Keywords:

Image denoising, Salt and pepper noise, Convolutional neuralnetworks, Non-local switching filter.

Images are always polluted by noise during image acquisition and transmission,resulting in low-quality images. The removal of such noise is a must beforesubsequent image analysis tasks. To achieve this, researchers have proposed manydenoising algorithms [1-7], where the representative ones include non-local means(NLM) [1] and block-matching 3D filtering (BM3D) [2].In this paper, we focus onthe salt and pepper noise, where pixels exhibit maximal or minimal values with somepre-defined probability. To our knowledge, the most commonly used denoisingalgorithm for salt and pepper noise is the nonlinear filter family, such as the medianfilter, with noise detecting step [10-15]. Empirical results reported in previous studiesshow that these traditional algorithms are effective for salt and pepper noise to someextent [16, 22, 30-33]. Wang et al tried to apply hash retrieval and low rank methodsto compute similarity and code[34-42]. However, such traditional algorithms areheavily dependent on local information of images. Thus, they often perform worsewhen facing high density noise without sufficient exact information. This isproblematic in real-world image applications.

In view of the above problems, the motivations of our research have three point:First point, we design a switching templates to avoid noise disturb in the process ofmeasuring similarity. Second, we extract repairable information in a non-local regionin-stead of a local patch. Based on two points above, we proposed a navel non-localswitching filter (NLSF) as pre-processing part. Because NLSF can remove most ofnoise, so CNN can more effective discover the non-linear relationship between lostdetails in noisy image and original image. The third point, we proposed aconvolutional neural networks denoising algorithm embracing our NLSF and CNNnamed NLSF-CNN.Under the framework of NLSF-CNN, we optimize the repair information analysisby expanding the search scope to a non-local scope, and build a switching templatesto avoid noise disturbing in the process of calculating weights. A set of patchesfiltered by NLSF are input CNN, this mapping of neural networks will be learnt froma filtered image to the ground-truth image. In the front part of NLSF-CNN, we useour NLSF to pre-process noisy image for making sure the learning process is not in ahigh noise environment. This step will bring great benefits to subsequent networkslearning. At the same time, CNN with multi-layer structure is effective on increasingthe capacity and flexibility for exploiting details which lost in previous filteringprocess.The contributions of our work are summarized as following: First, we propose anew Salt and Pepper noise image denoising algorithm based on convolutional neuralnet-works. In contrast to traditional CNNs for denoising which directly input noisyimages and estimate clean image, our network is added a pre-filtering step(NLSF) andlearns clean image from a mapping between filtered image(not noisy image) and thecorresponding ground-truth image. Second, we propose a new non-local switchingfilter. It can be well executed in the pre-processing part of the network. This pre-processing step can reduce the interference of high-intensity noise in deep learningprocess. A series of experiments are performed and our algorithm obtain a higherPSNR score than other methods. The visual results indicate that our algorithm canbetter supplement details.The remainder of this paper is organized as follows. In section 2, we introducepatch model and networks structure firstly, and introduce the denoising process of ourneural network. In Section 3, we test several simulation experiments and comparewith state-of-the-art methods. In Section 4, we conclude and prospect.

With the continuous development of deep learning theory, more and more imageprocessing problems were tried by variety types of networks and achieved goodresults. In considering existing problems mentioned above, some scholars apply morepowerful deep learning methods to the image denoising field. V. Jain firstly added aCNN model to denoising method in 2008[17].Their algorithm not only achieves betterresults than the traditional wavelet and hidden Markov models, but also shows that aparticular form of CNN can be regarded as an approximation of the result of Markovmodel inference for image denoising. At the same time, CNN model can avoid the computational difficulty of Markov model in probability learning and inference.Rectifier Linear Unit [18], batch normalization [19] are also proposed comparing tothe traditional methods [21]. In 2012, Xie used stacked denoising auto-encoder forimage de-noising and image restoration [23]. In 2017, Zhang proposed a deeper CNNnetwork called DnCNN [20].These methods have made great progress in image denoising field, but they are allaimed at removal of Gauss white noise. At present, there is still lacking of an effectivenetwork structure between an image of salt and pepper noise pollution and thecorresponding ground-truth image.

In this section, we briefly review image denoising with salt and pepper noise, andthen introduce the proposed non-local switching filter convolutional neural networkdenoising algorithm, named NLSF-CNN.

In order to facilitate later use, we first named some commonly used variables. Theinput noise image is cut into a series of overlapping patches, the generation process isshown in Figure 2. For a patch P, let (i, j) be the coordinates of a pixel, ( , ) I i j is thevalue of pixel (i,j).We first detect salt and pepper noise pixels by using a threshold  .For each pixel ( , ) I i j , the noise detection follows the following rule:     ( , )( , ) ( , ) I 0, 255 , 255I I i ji j i j if orotherwise      (1)where  denotes the salt and pepper noise .Usually, the values of salt and pepper noise distribute both ends of the gray scalerange, i.e.    or  

255 , 255  . Here,  represents a threshold. Because mostnoise values are concentrated at both ends of the gray scale, the pixels which gray-value located in the middle of the gray-scope are most likely to be normal pixelsProbability. So, classical non-linear filter usually use median value to repair noise. Ofcourse, many advanced algorithms have been followed up concerning more validpixel. However, these methods are faced with a key problem, i.e. repair steps are oftendependent on analysis and measurement, but these analysis and measurement methodswill be disturbed by noise. So, How to extract details more effectively without noiseinterference is still a problem for scholars to solve. Here, we empirically set  to 1. Fig. 1.

Patches generation of size L=3

We introduce the proposed NLSF-CNN denoising algorithm for salt and pepper noise.In NLSF-CNN, we train a patch-based CNN model over training images withartificial noise, and then use the model as a denoiser for future noisy images. NLSF-CNN consists of two steps, including a pre-processing step and a CNN training step.In the pre-processing step, we first detect the noisy pixels of salt and pepper and thensmooth them using a non-local switching filter method. In the CNN training step, thepre-processed images are divided into overlapping patches. We use these patches asthe input for CNN, and train the optimal parameters of CNN.

Fig. 2.

Network structure

We now present details of each step in NLSF-CNN, and the selection of parametersis discussed in the 4.1 section.

Pre-processing Step.

Given training images with artificial salt and pepper noise, wepre-process noisy pixels before feed them into CNN training. This is because the saltand pepper noise is a pure noise, which is harmful to CNN training. In this work, wepropose a non-local switching filter method in the pre-processing step.Then, we use non-local information to remove noisy pixels one by one. For eachnoisy pixel, generate R point-wise patches of size L around it. We replace the noisypixel with a weighted sum of medians of these R patches: ( , ) ( , )( ( , ) n k l i jk l I w I      i,j) % (2) where ( , ) k l w is the weight of patch p k for p l , and 0 < ( , ) k l w < 1.The weight value isdepended on the Euclidian distance between any patch and the patch centered by thecurrent noisy pixel: ( , )( , ) ( , ) 1 k l k l k l n sw s   (3) ( , ) k ln k l P P n s e     (4)To obtain more accurate distances between patches, we replace all noisy pixelswith the mean of patches during distance measuring. Generating Non-linear mapping and Reconstruction.

So far, all patches in set Pare pre-filtered. We input these patches into our networks. Our networks own threelayers, in each layers, we define a set of filters and operators to generate mappings.In the first layer, we represent these patches by a set of bases. The operation can bydescribe as follows:

F (P) max(0, W * P ) B   (5)where W and B represent the bases filter set and biases respectively, size of filter is f .We denote the number of bases filters is n, and filter’s size is same to size of patch.So, each patch is applied n convolutions. The convolutional result of each patch iscorresponding an n-dimensional feature map. Then we apply the Rectiﬁed Linear Unit(ReLU, max(0,x)) [24] on the ﬁlter responses to obtain more non-linearity.Based on an n-dimensional feature extracted in first layer, we map each of these n-dimensional features into a new n-dimensional feature. The operation of the secondlayer is: F (P) max(0, W * F (P) ) B   (6)Here W is of a size n×1×1×n, this is equivalent to applying filters which have aspatial support 1×1, and B is biases. Then we still apply the Rectiﬁed Linear Unit.The second layer can increase the non-linearity between pre-filtered patch andground-truth.In the third layer, we deﬁne a convolutional layer to predict enhancing patches, andreconstruct them as a result image. The operation of the third layer is: F (P) max(0, W * F (P) ) B   (7)here W is of a set of filter, and B is biases.In the process of learning mapping, we estimate mapping function F of parametersΘ = {W , W2, W3, B1, B2, B3}. This is a processing of achieving minimizing the loss between the result images F(Y;Θ) and the corresponding ground truth. We useMean Squared Error (MSE) as the loss function: ( , ) ( , )1 1 M N i j i ji j

I IMN        % (8)We give NLSF-CNN a set of noisy images and their ground-truth images andobtain a mapping which describing their nonlinear relationship. For each noisy image, we first use the NLSF method to compute a pre-processedversion of this image. Then, we divide it into patches with legal size and feed theminto the pre-trained CNN model for clean outputting patches. Finally, we integratethese clean patches, leading to the clean image.

This section shows empirical results. We compare NLSF-CNN against the sate-of-the-art baseline algorithms, and also empirically evaluate its crucial parameters.

Dateset

We employ a public training set from [25-28], which contains 91 trainingimages.During testing, we use two different test sets to evaluate our algorithm. (1) A smalltest set, i.e., the standard test images, contains 11 commonly used images fordenoising, e.g., “ Lena ” , “ Baboon ” and “ Pepper ” . (2) A big test set, i.e., Berkeleysegmentation dataset (BSD300) , contains 300 images.For our NLSF-CNN, the parameters are set as follows: In the process of trainingthe networks, we set f1=9, f3=5, n1=64, n2=32 in main evaluations. Baseline algorithm

We use four existing denoising algorithms as baselines, includingthree traditional algorithms and a neural network based one.The traditional algorithms include decision based algorithm (DBA) [10] , patch-based approach to remove impulse-Gaussian noise from images (PARIGI) [22] andadaptive switching non-local filter (NASNLM) [24].The neural network based algorithm is the multi-layer perception (MLP) proposedin [21]. Here, we use the pre-trained MLP model provided by its authors. For faircomparisons, we perform our NLSF pre-processing step to test noisy images beforeapplying the pre-trained MLP model , referring to as NLSF-MLP. https://github.com/gkh178/noise-adaptive-switching-non-local-means/tree/master/NASNLM https://github.com/urbste/MLPnP_matlab_toolbox Besides, we use the proposed NLSF pre-processing method as a supplementbaseline.For our NLSF-CNN, the parameters are empirically set as follows: (1) In the NLSFpre-processing step, the patch size is set to 3 when the noise density is lower than30%, otherwise 5. (2) In the CNN training step, the size of input patch is set to 64.

Evaluation Metric

The peak signal-to-noise ratio (PSNR) is adopted to measure theobjective performance of our algorithm. PSNR is defined as follows.

MSE  (9)Here, MSE is Mean Square Error, it is defined in Formula 8. This subsection shows the evaluation results. We examine our NLSF–CNN withdifferent noise densities, i.e., 30%, 50% and 70%.

Table 1.

PSNR scores of standard test images.

TestImage level DBA NASNLM PARIGI NLSF NLSF-MLP NLSF-CNN

Lena 30% 34.42 28.09 33.90 34.20 30.80

50% 30.11 26.15 29.91 30.12 29.28

70% 25.84 25.97 25.22 25.79 27.63

Bridge 30% 28.07 23.68 25.19 28.21 25.19

50% 24.24 22.91 22.61 24.45 23.86

70% 21.12 22.63 20.06 21.02 22.61

Girl 30% 29.41 20.61 29.74 32.88 29.64

50% 27.47 16.69 27.25 29.66 28.28

70% 24.99 16.32 24.29 26.33 26.90 pepper 30% 26.85 22.38 28.88 32.27 30.01

50% 25.27 21.82 25.44 27.99 28.57

70% 22.11 21.58 21.46 23.04 27.04

Averageof 11images 30% 31.79 27.07 30.86 32.28 29.77

50% 28.27 26.38 27.47 29.28 28.09

70% 24.38 26.98 23.87 25.09 26.36

Table 1 shows the PSNR results on the standard test set, including four examplestandard test images, e.g., "Lena" and "Bridge", and average results of all 11 imagesof this test set. Overall, we can observe that our NLSF-CNN outperforms baselinealgorithms in all settings. Some detailed observations are made. (1) Comparing withthe three traditional denoising algorithms, the gain of NLSF-CNN is significant. Forexample, the PSNR scores of NLSF-CNN are about 5~7 higher than those of

NASNLM on "Lena” and about 4~6 higher than those of PARIGI on "Pepper". (2)Comparing with NLSF-MLP, our NLSF-CNN also performs better. This indicatesthat NLSF-CNN is much more practical, since we use a very small training set, i.e.,91 training images only, but MLP is trained on hundreds of thousands of images. (3)We can observe that the gain of NLSF-CNN becomes more significant with highnoise densities. For example, the average PSNR of NLSF-CNN is about 3~6 higherthan those of baseline algorithms when the noise density reaches 70%. This impliesthat NLSF-CNN is more robust and practical in real world image applications.We also perform comparative methods on BSD300, and put average PSNR scoresof 300 images in Table2.

Table 2.

The average results of PSNR (dB) on the BSD300 dataset.

TestImage Noiselevel DBA NASNLM NLSF NLSF-MLP NLSF-CNN

BSD300 30% 29.92 25.74 30.01 29.77

BSD300 50% 26.32 24.5 26.25 26.19

BSD300 70% 22.81 24.65 22.85 24.72

Table 2 shows the average PSNR results on BSD300 test set with different noisedensities. We also observe that NLSF-CNN performs better than baseline algorithms,especially for higher noise density. This further indicates that our NLSF-CNN is moreeffective for the salt and pepper denoising. Besides, NLSF-CNN outperforms NLSF-MLP, but uses much less training images, saving many training time . The visual results are also compared in Figure 3, and Figure 4. In Figure 3, wecompared original clean image, 50% noisy image, and results images of DBA,PARIGI, NASNLM, NLSF-MLP, NLSF and NLSF-CNN. original 50% noise level DBA PARIGINASNLM NLSF-MLP NLSF NLSF-CNN original 50% noise level DBA PARIGINASNLM NLSF-MLP NLSF NLSF-CNN

Fig. 3.

The test results of all comparative algorithms

The zoomed details are added to the lower right corner of these images. In 50%noise density, DB

A and NASNLM are not completely removing noise, and sometiny artifacts consist in result of NLSF. NLSF-MLP over smooth details, forexample it loses two black nails in image of boat, and loses some tiny blackshadow on the stem of a pepper.

It can be seen that NLSF-CNN not only removesnoise but also obtains satisfactory visual effects and preserves the details. lena_30% lena_50% lena_70%baboon_30% baboon_50% baboon_70%

Fig. 4.

Some results and details by NLSF-CNN In Figure 4, we performed our algorithm in different texture types of images in30%, 50%. 70% noise density levels. It can be seen that satisfactory visual effectswere obtained, especially the edges were well protected.

In this subsection, we empirically evaluate the patch size in the NLSF preprocessingstep. To achieve this, we show the PSNR results of different patch sizes, i.e., 3, 5 and7, with different noise densities. Figure 5 shows the results. We can see that the PSNRscores of size 3*3 are higher than others when the noise density is lower than 30%,and those of size 5*5 are better for higher noise densities. The possible reason is thatsmall patches may contain insufficient local information for high noise densities.

Fig. 5.

Performance choose different size of patch in various noise intensities levelsIn this subsection, we empirically evaluate the patch size in the NLSF pre-processing step. To achieve this, we show the PSNR results of different patch sizes,i.e., 3, 5 and 7, with different noise densities. Figure 5 shows the results. We can seethat the PSNR scores of size 3*3 are higher than others when the noise density islower than 30%, and those of size 5*5 are better for higher noise densities. Thepossible reason is that small patches may contain insufficient local information forhigh noise densities.

In this paper, we propose a convolutional neural networks denoising algorithm,namely NLSF-CNN, for Salt and Pepper noise. New non-local switching medianfilter, learning algorithm and neural networks architecture are embraced in our imagede-noising algorithm. There are three aspects of the innovation of our algorithm. First, we define a patch model to describe the local image contaminated by noise. Second,we design a switching median filter based self-similarity model above. Third, wedesigned a CNN structure by introducing our switching median filter, and then themapping is learning from filtered image to target image.Simulation results demonstrate that the proposed algorithm implements effectivelyin various salt and pepper noise densities. It can be seen from Table1, Table 2, Figure3 and figure 4 that our algorithm obtain satisfactory visual effects and higher PSNRscores only use 91 images to train networks. The reason that proposed algorithm has abetter effect is our networks can more efficient mining details and add them to theNLSF result. These results prove that our algorithm is effective and meaningful. This work is supported by the National Natural Science Foundation of China (NSFC)No. 61702246, Liaoning Province of China General Project of Scientific ResearchNo. L2015285, Liaoning Province of China Doctoral Research Start-Up Fund No.201601243.