[PDF] Blind Image Denoising and Inpainting Using Robust Hadamard Autoencoders

Abstract

In this paper, we demonstrate how deep autoencoders can be generalized to the case of inpainting and denoising, even when no clean training data is available. In particular, we show how neural networks can be trained to perform all of these tasks simultaneously. While, deep autoencoders implemented by way of neural networks have demonstrated potential for denoising and anomaly detection, standard autoencoders have the drawback that they require access to clean data for training. However, recent work in Robust Deep Autoencoders (RDAEs) shows how autoencoders can be trained to eliminate outliers and noise in a dataset without access to any clean training data. Inspired by this work, we extend RDAEs to the case where data are not only noisy and have outliers, but also only partially observed. Moreover, the dataset we train the neural network on has the properties that all entries have noise, some entries are corrupted by large mistakes, and many entries are not even known. Given such an algorithm, many standard tasks, such as denoising, image inpainting, and unobserved entry imputation can all be accomplished simultaneously within the same framework. Herein we demonstrate these techniques on standard machine learning tasks, such as image inpainting and denoising for the MNIST and CIFAR10 datasets. However, these approaches are not only applicable to image processing problems, but also have wide ranging impacts on datasets arising from real-world problems, such as manufacturing and network processing, where noisy, partially observed data naturally arise.

Full PDF

BBlind Image Denoising and Inpainting UsingRobust Hadamard Autoencoders

Rasika Karkare ∗ Data ScienceWorcester Polytechnic InstituteWorcester, MA 01609Email: [email protected]

Randy Paffenroth † Mathematical Sciences,Comp. Science and Data ScienceWorcester Polytechnic InstituteWorcester, MA 01609Email: [email protected]

Gunjan Mahindre ‡ Electrical and Computer EngineeringColorado State UniversityFort Collins, CO 80523Email: [email protected]

Abstract —In this paper, we demonstrate how deep autoen-coders can be generalized to the case of inpainting and denoising,even when no clean training data is available. In particular, weshow how neural networks can be trained to perform all of thesetasks simultaneously. While, deep autoencoders implementedby way of neural networks have demonstrated potential fordenoising and anomaly detection, standard autoencoders havethe drawback that they require access to clean data for training.However, recent work in Robust Deep Autoencoders (RDAEs)shows how autoencoders can be trained to eliminate outliersand noise in a dataset without access to any clean trainingdata. Inspired by this work, we extend RDAEs to the casewhere data are not only noisy and have outliers, but also onlypartially observed. Moreover, the dataset we train the neuralnetwork on has the properties that all entries have noise, someentries are corrupted by large mistakes, and many entries arenot even known. Given such an algorithm, many standardtasks, such as denoising, image inpainting, and unobserved entryimputation can all be accomplished simultaneously within thesame framework. Herein we demonstrate these techniques onstandard machine learning tasks, such as image inpainting anddenoising for the MNIST and CIFAR10 datasets. However, theseapproaches are not only applicable to image processing problems,but also have wide ranging impacts on datasets arising from real-world problems, such as manufacturing and network processing,where noisy, partially observed data naturally arise.

Index Terms —Autoencoders, Robust Deep Autoencoders, BlindDenoising, Inpainting, Anomaly Detection

I. I

NTRODUCTION

Recent advances in deep learning and neural networks haveshown the effectiveness of these techniques in numerous ﬁeldssuch as object detection, image recognition, drug discovery,and genomics to name but a few [1]. By using backpro-pogation, deep learning demonstrates how the parameters ofthe network can be changed to compute the representationsin each layer. Such models have proven to learn faithfulrepresentations of a dataset by learning non-linear featuresin the data. Multiple layers are used to learn different levelsof abstraction in the data and are essentially what make thenetwork deep [2].For image reconstruction, standard deep autoencoder vari-ations such as denoising autoencoders are widely used [3].However, the drawback of using such models is that they require access to clean data for training which is not readilyavailable in all real-world problems. In this work we extendsuch techniques to the task of image denoising and inpaintingwhere we do not have access to the noise-free version of thedata. RDAEs [4], as mentioned previously, do not require cleandata for training overall, however, these models isolate thenoise and outliers in the input and the autoencoder is trainedafter this isolation.Moreover, an important issue with RDAEs [4], in additionto the fact that standard RDAEs cannot be applied when thedata is only partially observed, is that while the overall methoddoes not require clean training data, the neural network thathas been embedded in the larger method does require cleantraining data. The structure of the RDAE equations that leadsto this issue is perhaps not readily apparent, and we willdiscuss it in detail in the background section below. Whilenot an issue if data is always processed in a single batch, thisdoes require the retraining of the entire neural network if thedata should ever change. The ability to denoise new data thatthe network was not originally trained on, what we hereincall inference , is not possible using the RDAEs formalism [4].Accordingly, herein we extend the framework of the RDAEsmodel in a way such that it can be tested on unseen data thatit was not originally trained on.A main focus of this paper is an attempt to overcome theabove drawback in RDAEs so that they can be used in a moregeneral context. As we explain in the further sections of thepaper, our model is trained only with the corrupt data as inputto the neural network, as opposed to RDAEs [4]. This makesour model inductive , in which it can infer the noise in the data,as opposed to RDAEs. Our model, which we call a RobustHadamard Autoencoder (RHA) can also handle data that isonly partially observed, which is a feature that is lacking instandard RDAEs. It is interesting to note that our RHA is theﬁrst method, of which we are aware, that provides all of thefunctionality of a classic linear Robust Principal ComponentAnalysis (RPCA) type algorithm [5], [6], but in the context ofa non-linear autoencoder that can simultaneously inpaint anddenoise data in a blind manner. All the code that generates a r X i v : . [ ee ss . I V ] J a n he results from this paper is freely available at github .II. C ONTRIBUTION

In this work, we aim to generalize RDAEs to a case wherewe make the model testable by providing only corrupt dataas input to the neural network. We demonstrate the superiorperformance of our method as compared to the state-of-the-arttechniques, using the standard MNIST and CIFAR10 datasetsin presence of noise as well as missing regions in the data.Moreover, our model is able to infer the noise in the data andsuccessfully eliminate it, without being given any informationabout the noise-free version of the data. As we demonstratein this paper, our model is robust and can be used in presenceof random impulse noise as well as coherent corruptions suchas inferring random blocks of missing data.Our model differs from existing techniques as follows:1) Our model is able to obtain a non-linear projection ofthe data, unlike RPCA which can only obtain a linear projection to a lower dimension.2) We do not need clean training data to train our modelunlike standard techniques such as denoising autoen-coders [3].3) We do not provide our model with low-dimensionalinput, as is the case with RDAEs. The primary differencebetween the two models being, that our neural networkmodel infers the noise and successfully eliminates it,whereas, in the case of RDAEs, the neural networkmodel is given data in which the noise has been sub-tracted off during the training process.4) We also address the issue of ﬁlling-in missing blocksof data based on the surrounding information [7], [8].We demonstrate the superior performance of our modelon denoising as well as image inpainting, as comparedto the RDAEs model, which cannot handle coherentcorruptions in the data, such as missing blocks of datain images. We also compare our model to a state-of-the-art inpainting algorithm, namely, the context encoder(CE) [9], and show that our model shows comparableperformance with the CE model for the task of inpaint-ing in presence of missing data and outperforms CEin the case of denoising and inpainting simultaneously.This is despite the fact that in the CE model, thenetwork is given the clean block of data as a referencefor training the generator in the Generative AdversarialNetwork (GAN), whereas, our model does not have anyinformation about the missing part in the data (i.e. thecomplete version of the image).5) We compare our model with a standard autoencoder(SAE) and a variational autoencoder (VAE) [10] for thetask of image denoising and inpainting on the CIFARand MNIST datasets. Our model outperforms both ofthe other models in this task, as will be demonstrated inthe results section. https://github.com/rskarkare/Robust-Hadamard-Autoencoders III. B

ACKGROUND

Here we cover some background information about Deepautoencoders and RPCA, which are the foundations of ourmodel.

A. Autoencoders:

An autoencoder is a type of neural network that is primarilyused to learn an identity map for the input data. Autoencoderscan be used to obtain a compressed representation of the databy projecting it to a low-dimensional latent layer [1]. Alongwith learning a low-dimensional representation, deep autoen-coders are made non-trivial due to the non-linearities that areintroduced in the hidden layer by using various activationfunctions. The network learns a map of the input throughencoding and decoding network pairs and the reconstructionis given by: ¯ X = D ( E ( X )) . (1)Furthermore, the autoencoders give us a solution to thebelow optimization problem. min D,E || X − D ( E ( X )) || F . (2)Here, E is the encoded representation of the input data tothe hidden layer and D is the decoded representation from thehidden layer to the output layer. ¯ X is the reconstructed versionof the input data. The goal is to minimize the differencebetween the original input data and the reconstructed data.The loss function is commonly chosen to be the Frobeniusnorm between the original and reconstructed data. In orderto learn complicated distributions, deep learning models usemultiple hidden layers to learn higher order features in thedata [1]. Unfortunately, the presence of noise and outliers inthe data affect the quality of the features that are learnt by theautoencoder [11], [12]. B. Robust Principal Component Analysis (RPCA):

Principal Component Analysis (PCA) is a linear dimen-sionality reduction technique that is widely used in numerousapplications. However, a drawback of PCA is that it is grosslysensitive to outliers and noise and hence a variation, namelyRobust Principal Component Analysis (RPCA), is used in thepresence of outliers and noise. RPCA allows separation of theoutliers and anomalies in such a way that the noise-free datacan be recovered [5]. RPCA splits the input data X as L + S ,where L is the low-rank matrix and S is the sparse matrix.Here X, L, S ∈ R m × n . L contains the low-dimensional representation of X and S contains the element-wise outliers in the data. Theoptimization problem can be written as follows: min L,S ρ ( L ) + λ || S || s . t . || X − L − S || F = 0 . (3)ere ρ ( L ) is the rank of L and || S || is the number of non-zero entries in S , || F || is the frobenius norm. || S || is the l -norm of S [13].However, since the above optimization is NP-hard, therelaxed version of the class of problems is given by the || . || norm [5].The convex version of the problem is thus given as follows: min L,S || L || ∗ + λ || S || s . t . || X − L − S || F = 0 . (4)Thus, RPCA allows one to learn a more faithful representationof the noise-free low-dimensional representation of the data bycarefully separating out the sparse outliers [6]. In our model,the nuclear norm of RPCA is replaced by the reconstructionerror of an autoencoder. This gives us a non-linear projectionto a low-dimensional hidden layer as opposed to a linearprojection given by RPCA. C. Robust Deep Autoencoders (RDAEs):

Robust Deep Autoencoders combine the salient features ofautoencoders and RPCA, to solve the following optimizationproblem: min θ || L D − D θ ( E θ ( L D ))) || + λ || S || s . t . X − L D − S = 0 . (5)However, since the above term is not computationallytractable, the l -norm is replaced with the l -norm and theconvex relaxation of this problem similar to RPCA is givenbelow [4]: min θ || L D − D θ ( E θ ( L D ))) || + λ || S || s . t . X − L D − S = 0 . (6) As it can be seen in the above optimization, the RDAEs modelrequire the input to the neural network model to be low-rank ( L D ) . Moreover, it has a limitation that it is not inductiveand cannot be used to ﬁlter out noise and anomalies on unseendata without having to retrain the network on the new data.In this work, we aim to generalize the RDAEs by developinga model that is robust to noise and anomalies, while at thesame time it is inductive and can be tested on unseen corruptimages, because the only input to the neural network is acorrupt image as opposed to a low-rank one in the case ofRDAEs. Our model is inspired by RPCA and RDAEs wherein,the input data X is comprised of two parts X = L + S where L can be effectively reconstructed by the autoencoderand S contains the outliers and noise in the data. Our modelalso addresses the issue of ﬁlling-in missing blocks of databy using a Hadamard or element-wise product in the costfunction [7]. This has important applications in ﬁelds such asmissing value imputation in manufacturing datasets. Moreover,in such datasets some instances do not have the complete setof features available during training. Rather than eliminate the rows that lack this information, we want to be able toreproduce the values of the missing features that are as closeto the original as possible. Typically, the values of certainparameters found in such datasets are missing at random.Hence, we demonstrate the results of our model on standardbenchmark image datasets, namely the MNIST and CIFAR10.It can be seen that our model performs better as comparedto the state-of-the-art techniques on these datasets in case ofdenoising and inpainting simultaneously.IV. M ETHODOLOGY

In this section we explain our objective function, for boththe denoising as well as the task of image inpainting. The keyidea as inspired by the above models is that the autoencoderwould generate a low-dimensional non-linear representationof the low-rank input and ﬁlter out the noise, which isincompressible. The noise and anomalies will essentially beﬁltered out in the sparse matrix as per the analogy withRPCA. Thus, we combine the non-linearity capability of anautoencoder with the outlier detection capability of RPCA toobtain an objective function for our model. The key differencebeing that the input to the neural network which is embeddedinside the entire network as a whole, is a corrupt image.This gives rise to blind denoising capabilities of our model,wherein no information about the noise in the data is explicitlyprovided to the neural network.

A. Robust Hadamard Autoencoders with l regularization: Figure 1 shows the developmental ideas of our model.Inspired by RPCA, we want to obtain a low-dimensionalrepresentation of our data without the few data points thatare considered to be the outliers or noise . These points areﬁltered out in the sparse matrix as they are incompressible.The nuclear norm in the RPCA objective function is replacedby the reconstruction error of an autoencoder in order to obtaina non-linear representation of the input. Our objective functionthus becomes, min θ || ( X − S ) − D ( E ( X ))) (cid:12) Ω || + λ || S || . (7)Here, similar to RPCA, we replace the l norm on S with the l norm, in order to make it computationally tractable. The (cid:12) represents the Hadamard or element-wise product betweenthe cost function and Ω [7], Ω being an indicator matrix ofones and zeros. Every entry in the dataset that is missing hasa corresponding entry of zero in the indicator matrix, whereasevery entry in the dataset that is observed has a correspondingentry of one in the indicator matrix. The idea is that the costfunction is calculated with respect to only the data that isobserved. For data that is missing, the corresponding entriesin the indicator matrix are zero and thus the missing entriesare ignored by the cost function. The assumption here is that,while imputing the data if the autoencoder does well on thedata that is observed, then the autoencoder will also do wellon the data that is missing. The λ parameter in the objectivefunction similar to RDAEs needs to be tuned in order to ig. 1. Our model combines the non-linearity capability of an autoencoder with the denoising capability of RPCA. In addition to random impulse noise,it is also robust to coherent corruptions in the data such as blocks of missing values which we impute using the Hadamard or element-wise product in ourobjective function. Every entry in the dataset that is unobserved, will have a corresponding entry of 0 in the indicator matrix and every entry that is observedwill have a corresponding entry of 1 in the indicator matrix. Thus, we mask the input by multiplying element-wise with the indicator matrix. We then replaceall the missing entries with the mean value of the observed data before providing the corrupt input to the autoencoder network. Note that the mean valuesare purely for initialization and are updated to the imputed values by the neural network. change the penalty on the sparse matrix to ﬁlter out the noise.As seen in Eq. 7, the input to the neural network in this caseis the X matrix, which is the corrupt version of the data.The target or reference clean image is not provided, as isthe case with a standard denoising autoencoder. Moreover, theautoencoder learns the L and S matrix by inferring the noiseduring training, and it is then ﬁltered out from the input toobtain a low-rank representation of the data (i.e. L = X − S ).Note here that in Eq. 6 for the RDAEs model the input to theneural network is the low-rank matrix given by L D in theobjective function. This illustrates the difference between thetwo models. For the task of pure image inpainting, the aboveobjective function can be easily modiﬁed to the one below: min θ || ( X − D ( E ( X ))) (cid:12) Ω || . (8)Here the S term is not needed because there is no noiseto be ﬁltered out in this case. This objective function ignoresthe regions in the data that are missing during optimizationand only ﬁnds the optimal values for imputation based on theregions that are observed during training.V. A LGORITHM T RAINING

For training our model we performed hyper-parametertuning to ﬁnd the optimal set of parameters for our model.This section details the parameters that we used to buildour model as well as the training method that we used tooptimize the network. We use an autoencoder network withtwo hidden layers. The number of nodes in the hidden layers is 200 and 50 respectively and the batch size is 40. It can beseen in Fig.2 that our model is fairly robust across multiplebatch sizes. We use the sigmoid activation function owing tothe fact that the data was normalized between 0 and 1 duringthe pre-processing phase. It is interesting to note that evensuch a simple network, when armed with additional termsfrom RPCA [5], [7], [14] can lead to superior performanceover much deeper networks. Of course, studying how suchapproaches work with even deeper networks is a direction forfuture research. The next section details the method used forcorruption and training of the algorithm.For corrupting the image, we add salt and pepper noise toeach of the images in the input and also mask random blocksby replacing the pixels in these blocks by the mean pixelvalue, i.e., 0.5. The mean initialized values are then imputedusing our model. Salt and pepper noise, also known as impulsenoise is a form of sparsely occurring noise in an image signalwhere randomly selected pixels are black and white [15], [16].The learning rate used for all our experiments is 0.01 and theoptimizer is Adam. The pre-processed corrupt data is then fedas input to the neural network. We train for 50 epochs andthen test on a different set of corrupt images which we callour test dataset . We use an optimization similar to that usedby RDAEs wherein we use a combination of backpropagationand proximal gradients as suggested in [4].The l norm can beoptimized efﬁciently through the use of a proximal method asgiven in [4]. Below is the pseudo-code of the algorithm thatis used to train our model. For additional details refer [14]. lgorithm 1: Proposed Training MethodInput: X ∈ R m × n . Here X is the corrupt image. Wedeﬁne L S = X as the initial state which will beupdated during training.Initialize L D ∈ R m × n , S ∈ R m × n to be zeromatrices. Here L D is the low-rank matrix and S isthe sparse matrix. while true do • Provide the autoencoder with X as the input fortraining. • Minimize || ( X − S ) − D ( E ( X )) || usingbackpropagation. • Set L D to be the reconstruction from the trainedautoencoder: L D = D ( E ( X )) • Set S to be the difference between X and L D : S = X − L D • Optimize S using a proximal operator: S = prox λ,l , ( S ) or S = prox λ,l ( S ) • Check the convergence condition that L D and S are close to the input X thereby satisfying theconstraint: c = || X − L D − S || / || X || • Check the convergence condition that L D and S have converged to a ﬁxed point: c = || L S − L D − S || / || X || if c < (cid:15) or c < (cid:15) :break • Update L S for convergence checking in the nextiteration: L S = L D + S end Return L D and S VI. R

ESULTS

Table I shows a comparison between all the modelsmentioned above including SAE, RDAEs, VAEs and CE withour model for the task of image denoising and inpainting onthe CIFAR10 dataset.

TABLE IC

OMPARISON OF ALL MODELS ON THE TASK OF SIMULTANEOUS IMAGEINPAINTING AND DENOISING ON THE

CIFAR10

DATASET

Corruption SAE VAE RDAE CE RHA

10 0.02992 0.06718 0.01895 0.01763

70 0.05672 0.10521 0.04795 0.06176

350 0.27894 0.19901 0.15712 0.45005

We have used three values of corruption to demonstratethe superior performance of our model as compared to anyof the other models on the task of denoising and inpaintingsimultaneously. The corruption column indicates the numberof pixels that are corrupted by the addition of random saltand pepper noise. Along with this we also add random10x10 blocks of missing regions to all the input images. Thechoice of corruption shown in the table is based on usingthe smallest, largest and intermediate values of corruption, toshow that our model outperforms the other methods across awide range of corruption values. For the methods that have a λ parameter in their objective function, we use the λ thatgives the best performance, i.e., the λ value with the smallestRMSE as the ﬁnal metric reported in the table. TABLE IIC

OMPARISON OF ALL MODELS ON THE TASK OF SIMULTANEOUSDENOISING AND INPAINTING WITH DIFFERENT SIZES OF MISSING DATA ONTHE

CIFAR10

DATASET . Corruption Size SAE VAE RDAE CE RHA

70 5x5 0.03887 0.06183 0.03896 0.04523

70 10x10 0.05672 0.10521 0.04795 0.06176

70 20x20 0.23886 0.37617 0.19521 0.17782

Table II summarizes a comparison of all the models onthe task of denoising and inpainting simultaneously on theCIFAR10 dataset. Here, we ﬁx the level of corruption and testhow well our model compares to the state-of-the-art techniqueson different sized blocks of missing data. Corruption indicatesthe level of corruption in terms of addition of salt and peppernoise. The size column indicates the block sizes of missingdata. We can see that across multiple sizes of missing data, ourmodel shows superior performance as compared to any of theother techniques in terms of the reconstruction error betweenthe original image and recovered image using the respectivemodels. Now we demonstrate the results of our model onboth the MNIST and the CIFAR10 datasets [17], [18]. For thegrayscale MNIST dataset, each image has a shape of 28x28,whereas for the CIFAR10 colored dataset each image has ashape of 32x32x3 owing to the three color channels. Themetric used for comparison in each of the results shown is thepercentage difference in Root Mean Squared Error (RMSE)between the two models. It is given by the formula:

P ercentage (%) dif f erence in RM SE = RM SE ( modelA ) − RM SE ( modelB ) RM SE ( modelA ) × . (9)Here model A refers to the model that we are comparingagainst and model B refers to our model. Thus, higher thedifference between the two, the better our model performs. Inthe colormaps shown, the corruption level indicates the numberof pixels that are corrupted with the salt and pepper noise ineach image. For example, in the MNIST dataset a corruptionlevel of 10 indicates that 10 out of 784 pixels are corruptin every image. In addition, for inpainting and denoisingsimultaneously we use random blocks of size 10x10 pixelsand replace them with the mean value, i.e., 0.5, in additionto the salt and pepper corruption before feeding the corruptimages as input to the neural network. In order to have a faircomparison among the methods, we use the same amount ofcorruption and maintain a constant size of the missing blocks,i.e., 10x10 in all the models. To evaluate how RHA performsunder different sizes of missing data we also tested it underthe conditions of different sizes of random missing blocks.The results for this are given in Table II. A. Comparison with RDAEs for denoising

Here we compare the denoising capabilities of our modelwith that of RDAEs. Figure 2 shows a comparison of our ig. 2. Comparison of RHA with RDAEs for the task of denoising on theMNIST dataset. The metric used for comparison in this case is Root MeanSquared Error (RMSE) between the reconstructed image and the noise-free image. We calculate the percentage (%) differences between the RDAEs [4]model and RHA for different values of corruption and λ on the test dataset.Thus, higher the difference between the two models, the better our modelperforms. It can be seen that with the right selection of the parameter λ , ourmodel is able to outperform the RDAEs model across all levels of corruption. model with the RDAEs model for the task of denoising onthe MNIST test dataset. It can be seen that with the rightselection of parameter λ , our model is able to outperform theRDAEs model across all levels of corruption. The code usedfor the RDAEs can be obtained at github . B. Comparison with Context Encoder (CE) for inpainting

Figure 3 shows a comparison of our model with that of astate-of-the-art inpainting algorithm, namely, the Context En-coder (CE) which essentially uses a GAN model for inpaintingmissing blocks in the CIFAR10 dataset. It can be seen from thecomparison of the models showing the inpainted images in thethird row that our model outperforms the CE algorithm despitenot having any information about the missing regions in thedata, as opposed to the CE model. In the CE framework, themissing regions are regressed onto the ground-truth version ofthe image during training.

C. Comparison with a Standard Autoencoder (SAE)

In order to assess the denoising and inpainting capabilitiesof a standard autoencoder owing to its ability to go low-dimensional in the hidden layers, we compare the performanceof a SAE with our model. Figure 4 shows a comparisonof our model with a SAE across multiple values of λ andcorruption in terms of the RMSE on the MNIST dataset. Itcan be seen that our model outperforms a SAE in terms of thereconstruction error across all the levels of corruption as seenon the colormap. The SAE, although faithful to the originalinput, incorrectly reproduces the noisy data. D. Comparison with RDAEs for the task of denoising andinpainting simultaneously

Figure 5 shows a comparison of our model with the RDAEsmodel in the presence of noise and missing data on the MNISTdataset. It can be seen that we outperform the RDAEs modelin terms of RMSE across all levels of corruption. Here, higherthe difference, the better our model performs as compared tothe RDAEs model.

E. Comparison with CE model for the task of denoising andinpainting simultaneously

Figure 6 shows a comparison between our model and astate-of-the-art inpainting algorithm, namely the CE model interms of RMSE on the CIFAR10 dataset. It can be seen thatalthough the CE model shows almost comparable performancewith ours for the task of inpainting alone, the network does notperform well with the addition of noise and the generator failsto reconstruct the clean images. The architecture of the CEnetwork is the same as given in [9], [19] with the change thatwe add random salt and pepper noise to the input images alongwith random 10x10 blocks of missing data before training thegenerator network [16]. It can be seen that as the corruptionlevel increases, our model signiﬁcantly outperforms the CEmodel in terms of the reconstruction quality. The architecturethat we used for the CE model can be found at github . F. Comparison with a Variational Autoencoder (VAE) on thetask of image inpainting and denoising simultaneously

Variational Autoencoders (VAEs) are generative models thatare modiﬁcations of the vanilla autoencoders. VAEs inherit thearchitecture of a standard vanilla autoencoder which allowsthem to sample randomly from the latent space and generatesamples from the same probability distribution as the inputdata [20]. Recent advances in the ﬁeld have used the VAEsnetwork for the task of image denoising [21]. Im et al. [21] usethe VAE network for the task of image denoising by addingcorruptions to the image before providing it as input to thenetwork. Figure 7 shows a comparison of our model witha VAEs model on the MNIST dataset. Following the samecorruption process described earlier, we corrupt the input tothe VAEs model with random salt and pepper noise and alsoadd missing 10x10 blocks of pixels at random. It can beseen that our model outperforms the VAEs model in terms ofdenoising and inpainting simultaneously across lower levelsof corruption shown, while showing comparable performanceat higher corruption levels. The corruption level here indicatesthe number of pixels that have been corrupted by random saltand pepper noise. It is important to note here that the VAEmodel input is corrupted, however, it is trained and validatedusing clean data . The architecture that we used for the VAEcomparison can be found at github . ig. 3. Comparison of our model with Context Encoder (CE) [19] for image inpainting on the CIFAR10 dataset. This comparison uses RMSE as a metricbetween the inpainted image and the original image. It can be seen that our model outperforms the CE model in the task of image inpainting. It is importantto note that in case of the CE model, the model uses the original image as a reference for training the generator in the GAN network, whereas our modeldoes not use the original image as reference and ﬁlls in the missing regions only based on the surrounding pixel information.Fig. 4. Comparison of our model with a SAE in presence of noise and missingblocks of data in the MNIST dataset. The metric used for comparison is thepercentage (%) difference in RMSE between the reconstructed image and theoriginal noise-free image. It can be seen from the colormap that we outperformSAE across all corruption levels shown. G. Reconstruction of the noisy MNIST images using RHA

Figure 8 shows the reconstruction of the MNIST datasetimages using our RHA model, in the presence of salt andpepper noise and random 10x10 blocks of missing data. Theimage on the left is X , which is the corrupt input to the neuralnetwork and the image on the right is the low-rank imagereconstruction that is learned by the neural network for the test dataset . H. Results on CIFAR10 data using RHA

In Figure 9, the ﬁrst row shows the original image, thesecond row is the masked image with the addition of randomnoise, namely salt and pepper [16] which is fed as the input tothe neural network, the third row is the low-rank matrix learnt https://github.com/zc8340311/RobustAutoencoder https://github.com/eriklindernoren/Keras-GAN/tree/master/context encoder https://github.com/lyeoni/keras-mnist-VAE Fig. 5. Comparison of our model with the RDAEs model for the task ofdenoising and inpainting simultaneously on the MNIST dataset. The metricused for comparison in this case is RMSE between the original noise-free image and the reconstructed image using the respective models. The metric isbased on percentage (%) differences in the RMSE between the RDAEs modeland our model. Thus, higher the difference between the two, as indicatedby the positive values on the colormap, the better our model performs. Theresults on the test dataset show that we outperform the RDAEs in the taskof denoising and inpainting simultaneously for all levels of corruption in thetesting data by picking the right value of the parameter λ . by the neural network and the fourth row is the sparse matrix inwhich the noise is ﬁltered out. The image shows the low-rankand sparse matrices learnt by the neural network for differentvalues of λ . It can be seen that for lower values of λ such as . , the penalty to ﬁlter out the noise in the sparse matrixis very small and the normal data is ﬁltered out in the sparsematrix along with the noise. Similarly, for very high values of λ , such as , the penalty to ﬁlter out the noise and outliersin the sparse matrix is very high and the noise fails to beﬁltered out in the sparse matrix leading to poor reconstructionof the clean low-rank image. Hence, it is important to tune λ and ﬁnd the right value of penalty to ﬁlter out only the noisein the sparse matrix while at the same time retaining the clean ig. 6. Comparison of our model with the CE in presence of noise and missingblocks of data on the CIFAR10 dataset. It can be seen that despite using aclean noise-free version of the image as reference for training the generatorin the GAN network, the CE model fails to reconstruct the clean imagewhen noise is added to the input. Our model shows superior performanceand outperforms the CE model when the same amount of noise is added toboth models. The metric used here for comparison is the RMSE between theoriginal noise-free image and the reconstructed image using the respectivemodels. It can be seen from the colormap above that although the differencesare not signiﬁcant at lower levels of corruption, our model notably outperformsthe CE model as the corruption level increases. The colormap indicates thepercentage differences between the RMSE for the CE model and our model,thus higher the differences, the better our model performs as compared to theCE model. The architecture used to train the CE model is as given in [19].Fig. 7. Comparison of our model with the reconstruction of the MNISTimages in presence of noise and random 10x10 blocks of missing data usinga variational autoencoder (VAE) network. The VAE model architecture andloss function is kept the same as a standard VAE model as given in [10], withthe modiﬁcation that the input to the network is corrupted with the addition ofrandom salt and pepper noise and missing data blocks before training. As canbe seen from the colormap, our model is able to handle the reconstructionof the image, both in presence of random noise as well as missing blocksof data, as opposed to the VAE model at lower levels of corruption, whileshowing comparable performance at higher levels. The colormap indicates thepercentage (%) differences in RMSE between the VAE model and our model.Thus, higher the differences between the two, the better our model performs.It should be noted that the VAE model is validated based on a clean testdataset . data in the low-rank matrix . Thus, intermediate values of λ such as are suitable and ﬁlter out the right amount of datain the sparse matrix in order to generate a low-rank matrixreconstruction that is more faithful to the original noise-free image. I. Results on the CIFAR10 dataset using the CE model

The poor reconstruction quality of the CIFAR10 data usingthe CE network is seen in Figure 10. We regress the entirecorrupt image onto the ground-truth version to evaluate theperformance of the CE model.VII. C

ONCLUSIONS

In this paper we demonstrate a novel approach for denoisingas well as image inpainting, in which we develop a testable version of the Robust Deep Autoencoders. We have alsoextended the capabilities of the model to be able to handlecoherent corruptions such as random blocks of missing valuesin a dataset by imputing them based on the values of thesurrounding pixels. This has proven useful for missing valueimputation in manufacturing datasets, where we often have todeal with random blocks of missing process or chemistry dataand we cannot always use na¨ıve imputation techniques suchas mean imputation [22]–[24]. Moreover, in such scenarios,we deal with a small-sized dataset and do not want to reducethe size of the data even further by simply eliminating therows that do not have the complete set of information. Anotherimportant application of our model is for imputing and makingpredictions on graph network data, for example in socialnetworks. VIII. S

COPE FOR FUTURE WORK

We wish to extend this framework to that of anomalydetection in the future. We want to test the performance ofthe autoencoder network on anomaly detection in presence ofrandom missing blocks of data of different sizes and compareit to existing state-of-the-art approaches. In our experiments,we used the mean value to replace the pixels in the missingregions before providing the corrupt image as input to theautoencoder. In the future, we also want to experiment withdifferent initialization values such as 0, -1, and random, toﬁnd the most suitable value to begin with before trainingthe autoencoder for imputation and evaluate the robustnessof our model to different initializations. In addition, the aboveoptimization was solved using the same technique used by theRDAEs model [4] namely, a combination of backpropogationand proximal gradients [14], [25], [26]. We would also like toprovide a theoretical justiﬁcation for the failure of optimizingthe model using gradient descent alone, as well as the conver-gence rates for the type of alternating methods we use. The l -norm optimization using backpropogation fails to ﬁlter outthe anomalies or noise, as noted in our initial experiments.This would be a direction for future work and investigation.IX. A CKNOWLEDGEMENTS

We would like to thank the Advanced Casting ResearchCenter at WPI for providing us with the funding for thisproject. We would also like to thank Prof. Diran Apelian forhis support with this work. ig. 8. A sample reconstruction of the MNIST images using the RHA model. It can be seen that our model successfully inpaints and eliminates the noisesimultaneously to develop a noise-free and complete version of the image.Fig. 9. Results on CIFAR10 data using our RHA model in presence of noise and random missing blocks of data of size 10x10. The ﬁrst row shows theoriginal image, the second row shows the noisy image with the masking in regions that are missing. The third and fourth rows are the low-rank and sparsereconstructions of noisy images learnt by our model. It can be seen that depending on the values of λ , our model is able to ﬁlter out noise in the sparsematrix and generate a low-rank representation of the image, while at the same time inpaint the missing regions in the data. For lower values of λ we imposea very small penalty on the sparse matrix and the model ﬁlters out everything in the sparse matrix. For very high values of λ , we impose a high penalty onthe sparse matrix and the model fails to ﬁlter out any noise in the sparse matrix. Thus, it is important to tune λ in a way such that the noise is ﬁltered outin the sparse matrix, while at the same time the original noise-free and complete data is retained in the low-rank matrix. R EFERENCES[1] I. Goodfellow, Y. Bengio, and A. Courville,

Deep learning . MIT press,2016.[2] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature , vol. 521,no. 7553, pp. 436–444, 2015.[3] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extract-ing and composing robust features with denoising autoencoders,” in

Proceedings of the 25th international conference on Machine learning ,2008, pp. 1096–1103.[4] C. Zhou and R. C. Paffenroth, “Anomaly detection with robust deepautoencoders,” in

Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining , 2017, pp. 665–674.[5] E. J. Cand`es, X. Li, Y. Ma, and J. Wright, “Robust principal componentanalysis?”

Journal of the ACM (JACM) , vol. 58, no. 3, pp. 1–37, 2011.[6] R. C. Paffenroth, P. C. Du Toit, L. L. Scharf, A. P. Jayasumana,V. Banadara, and R. Nong, “Space-time signal processing for distributedpattern detection in sensor networks,” in

Signal and Data Processing of Small Targets 2012 , vol. 8393. International Society for Optics andPhotonics, 2012, p. 839309.[7] R. A. Horn, “The Hadamard product,” in

Proc. Symp. Appl. Math ,vol. 40, 1990, pp. 87–169.[8] J. T´oth, A. L. Nagy, and D. Papp, “Mathematical background,” in

Reaction Kinetics: Exercises, Programs and Theorems . Springer, 2018,pp. 359–379.[9] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros,“Context encoders: Feature learning by inpainting,” in

Proceedings ofthe IEEE conference on computer vision and pattern recognition , 2016,pp. 2536–2544.[10] Variational autoencoders architecture. [Online]. Available: https://github.com/lyeoni/keras-mnist-VAE[11] O. Lyudchik, “Outlier detection using autoencoders,” Aug 2016.[Online]. Available: http://cds.cern.ch/record/2209085[12] Y. Ma, P. Zhang, Y. Cao, and L. Guo, “Parallel auto-encoder for efﬁcientoutlier detection,” in .IEEE, 2013, pp. 15–17.[13] G. L´opez, V. Mart´ın-M´arquez, F. Wang, and H.-K. Xu, “Solving the splitig. 10. Results on CIFAR10 data using the CE model when noise is added to the input along with blocks of missing data. We add salt and pepper noisebefore training the model. It can be seen that the model cannot handle the noise in the input and is unable to reconstruct the noise-free image on the task ofdenoising and inpainting simultaneously. The noisy image is regressed onto the ground-truth while training the generator in the GAN network.feasibility problem without prior knowledge of matrix norms,”

InverseProblems , vol. 28, no. 8, p. 085004, 2012.[14] S. Boyd, S. P. Boyd, and L. Vandenberghe,

Convex optimization .Cambridge university press, 2004.[15] B. Efron, “Missing data, imputation, and the bootstrap,”

Journal of theAmerican Statistical Association , vol. 89, no. 426, pp. 463–475, 1994.[16] R. H. Chan, C.-W. Ho, and M. Nikolova, “Salt-and-pepper noise removalby median-type noise detectors and detail-preserving regularization,”

IEEE Transactions on image processing , vol. 14, no. 10, pp. 1479–1485,2005.[17] L. Deng, “The mnist database of handwritten digit images for machinelearning research [best of the web],”

IEEE Signal Processing Magazine ,vol. 29, no. 6, pp. 141–142, 2012.[18] A. Krizhevsky and G. Hinton, “Convolutional deep belief networks oncifar-10,”

Unpublished manuscript , vol. 40, no. 7, pp. 1–9, 2010.[19] Context encoders architecture. [Online]. Available: https://github.com/eriklindernoren/Keras-GAN[20] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin,“Variational autoencoder for deep learning of images, labels and cap-tions,” in

Advances in neural information processing systems , 2016, pp.2352–2360.[21] D. I. J. Im, S. Ahn, R. Memisevic, and Y. Bengio, “Denoising criterionfor variational auto-encoding framework,” in

Thirty-First AAAI Confer-ence on Artiﬁcial Intelligence , 2017.[22] L. Monostori, “AI and machine learning techniques for managingcomplexity, changes and uncertainties in manufacturing,”

Engineeringapplications of artiﬁcial intelligence , vol. 16, no. 4, pp. 277–291, 2003.[23] B. Komer, J. Bergstra, and C. Eliasmith, “Hyperopt-sklearn: automatichyperparameter conﬁguration for scikit-learn,” in

ICML workshop onAutoML , vol. 9. Citeseer, 2014.[24] K. Lakshminarayan, S. A. Harp, R. P. Goldman, T. Samad et al. ,“Imputation of missing data using machine learning techniques.” in

KDD , 1996, pp. 140–145.[25] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al. , “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”

Foundations and Trends® in Machine learning , vol. 3,no. 1, pp. 1–122, 2011.[26] L. Vinet and A. Zhedanov, “A ‘missing’ family of classical orthogonalpolynomials,”