[PDF] A Generative Model based Adversarial Security of Deep Learning and Linear Classifier Models

Abstract

In recent years, machine learning algorithms have been applied widely in various fields such as health, transportation, and the autonomous car. With the rapid developments of deep learning techniques, it is critical to take the security concern into account for the application of the algorithms. While machine learning offers significant advantages in terms of the application of algorithms, the issue of security is ignored. Since it has many applications in the real world, security is a vital part of the algorithms. In this paper, we have proposed a mitigation method for adversarial attacks against machine learning models with an autoencoder model that is one of the generative ones. The main idea behind adversarial attacks against machine learning models is to produce erroneous results by manipulating trained models. We have also presented the performance of autoencoder models to various attack methods from deep neural networks to traditional algorithms by using different methods such as non-targeted and targeted attacks to multi-class logistic regression, a fast gradient sign method, a targeted fast gradient sign method and a basic iterative method attack to neural networks for the MNIST dataset.

Full PDF

AA G

ENERATIVE M ODEL BASED A DVERSARIAL S ECURITY OF D EEP L EARNING AND L INEAR C LASSIFIER M ODELS

A P

REPRINT

Ferhat Ozgur Catak

Simula Research laboratoryOslo, Norway [email protected]

Samed Sivaslioglu

TUBITAK BILGEM, Kocaeli, Turkey [email protected]

Kevser Sahinbas

Department of Management Information SystemIstanbul Medipol UniversityIstanbul, Turkey [email protected]

October 20, 2020 A BSTRACT

In recent years, machine learning algorithms have been applied widely in various ﬁelds such as health,transportation, and the autonomous car. With the rapid developments of deep learning techniques,it is critical to take the security concern into account for the application of the algorithms. Whilemachine learning offers signiﬁcant advantages in terms of the application of algorithms, the issueof security is ignored. Since it has many applications in the real world, security is a vital part ofthe algorithms. In this paper, we have proposed a mitigation method for adversarial attacks againstmachine learning models with an autoencoder model that is one of the generative ones. The mainidea behind adversarial attacks against machine learning models is to produce erroneous resultsby manipulating trained models. We have also presented the performance of autoencoder modelsto various attack methods from deep neural networks to traditional algorithms by using differentmethods such as non-targeted and targeted attacks to multi-class logistic regression, a fast gradientsign method, a targeted fast gradient sign method and a basic iterative method attack to neuralnetworks for the MNIST dataset. K eywords First keyword · Second keyword · More

With the help of artiﬁcial intelligence technology, machine learning has been widely used in classiﬁcation, decisionmaking, voice and face recognition, games, ﬁnancial assessment, and other ﬁelds [1, 2]. The machine learning methodsconsider player’s choices in the animation industry for games and analyze diseases to contribute to the decision-makingmechanism [3–6]. With the successful implementations of machine learning, attacks on the machine learning processand counter-attack methods and incrementing robustness of learning have become hot research topics in recent years[7–11]. The presence of negative data samples or an attack on the model can lead to producing incorrect results in thepredictions and classiﬁcations even in the advanced models.It is more challenging to recognize the attack because of using big data in machine learning applications comparedto other cybersecurity ﬁelds. Therefore, it is essential to create components for machine learning that are resistant tothis type of attack. In contrast, recent works have conducted in this area and demonstrated that the resistance is notvery robust to attacks [12, 13]. These methods have shown success against a speciﬁc set of attack methods and havegenerally failed to provide complete and generic protection[14]. a r X i v : . [ c s . L G ] O c t PREPRINT - O

CTOBER

20, 2020Previous methods have shown success against a speciﬁc set of attack methods and have generally failed to providecomplete and generic protection [14]. This ﬁeld has been spreading rapidly, and, in this ﬁeld, lots of dangers haveattracted increasing attention from escaping the ﬁlters of unwanted and phishing e-mails, to poisoning the sensor data ofa car or aircraft that drives itself [15, 16]. Disaster scenarios can occur if any precautions are not taken in these systems[17].The main contribution of this work is to explore the autoencoder based generative models against adversarial machinelearning attacks to the models. Adversarial Machine Learning has been used to study these attacks and reduce theireffects [18, 19]. Previous works point out the fundamental equilibrium to design the algorithms and to create newalgorithms and methods that are resistant and robust against attacks that will negatively affect this balance. However,most of these works have been implemented successfully for speciﬁc situations. In Section 3, we present someapplications of these works.This work aims to propose a method that not only presents a generic resistance to speciﬁc attack methods but alsoprovides robustness to machine learning models in general. Our goal is to ﬁnd an effective method that can be used bymodel trainers. For this purpose, we have processed the data with autoencoder before reaching to the machine learningmodel. In our previous works [20, 21] we applied generative model based mitigation approach for the deep learningmodel attacks.We have used non-targeted and targeted attacks to multiclass logistic regression machine learning models for observingthe change and difference between attack methods as well as various attack methods to neural networks such as fastgradient sign method (FGSM), targeted fast gradient sign method (T-FGSM) and basic iterative method (BIM). Wehave selected MNIST dataset that consists of numbers from people’s handwriting to provide people to understand andsee changes in the data.The study is organized as follows. In Section 2, we ﬁrst present the related works. In Section 3, we introduce severaladversarial attack types, environments, and autoencoder. In Section 4, we present selection of autoencoder model,activation function and tuning parameters. In Section 5, we provide some observation on the robustness of autoencoderfor adversarial machine learning with different machine learning algorithms and models. In Section 6, we conclude thisstudy.

In recent years, with the increase of the machine learning attacks, various studies have been proposed to create defensivemeasures against these attacks. Data sterility and learning endurance are recommended as countermeasures in deﬁninga machine learning process [18]. Most of the studies in these ﬁelds have been focused on speciﬁc adversarial attacksand generally, presented the theoretical discussion of adversarial machine learning area [22, 23].Bo Li and Yevgeniy Vorobeychik present binary domains and classiﬁcations. In their work, the approach starts withmixed-integer linear programming (MILP) with constraint generation and gives suggestions on top of this. They alsouse the Stackelberg game multi-adversary model algorithm and the other algorithm that feeds back the generatedadversarial examples to the training model, which is called as RAD (Retraining with Adversarial Examples) [24]. Onthe other hand, their work is particular and works only in speciﬁc methods, even though it is presented as a generalprotection method. They have proposed a method that implements successful results. Similarly, Xiao et al. providea method to increase the speed of resistance training against the rectiﬁed linear unit (RELU) [25]. They use weightsparsity and RELU stability for robust veriﬁcation. It can be said that their methodology does not provide a generalapproach.Yu et al. propose a study that can evaluate the neural network’s features under hostile attacks. In their study, theconnection between the input space and hostile examples is presented. Also, the connection between the networkstrength and the decision surface geometry as an indicator of the hostile strength of the neural network is shown. Byextending the loss surface to decision surface and other various methods, they provide adversarial robustness by decisionsurface. The geometry of the decision surface cannot be demonstrated most of the time, and there is no explicit decisionboundary between correct or wrong prediction. Robustness can be increased by constructing a good model, but it canchange with attack intensity [26].Mardy et al. investigate artiﬁcial neural networks resistant with adversity and increase accuracy rates with differentmethods, mainly with optimization and prove that there can be more robust machine learning models [14].Pinto et al. provide a method to solve this problem with the supported learning method. In their study, they formulatelearning as a zero-sum, minimax objective function. They present machine learning models that are more resistant todisturbances are hard to model during the training and are better affected by changes in training and test conditions.2

PREPRINT - O

CTOBER

20, 2020They generalize reinforced learning on machine learning models. They propose a "Robust Adversarial ReinforcedLearning" (RARL), where they train an agent to operate in the presence of a destabilizing adversary that appliesdisturbance forces to the system. However, in their work, Robust Adversarial Reinforced Learning may overﬁt itself,and sometimes it can miss predicting without any adversarial being in presence [27].Carlini and Wagner propose a model that the self-logic and the strength of the machine learning model with a strongattack can be affected. They prove that these types of attacks can often be used to evaluate the effectiveness of potentialdefenses. They propose defensive distillation as a general-purpose procedure to increase robustness [12].Harding et al. similarly investigate the effects of hostile samples produced from targeted and non-targeted attacks indecision making. They provide that non-targeted samples are more effective than targeted samples in human perceptionand categorization of decisions [28].Bai et al. present a convolutional autoencoder model with the adversarial decoders to automate the generation ofadversarial samples. They produce adversary examples by a convolutional autoencoder model. They use poolingcomputations and sampling tricks to achieve these results. After this process, an adversarial decoder automates thegeneration of adversarial samples. Adversarial sampling is useful, but it cannot provide adversarial robustness on itsown, and sampling tricks are too speciﬁc [29].Sahay et al. propose FGSM attack and use an autoencoder to denoise the test data. They have also used an autoencoderto denoise the test data, which is trained with both corrupted and healthy data. Then they reduce the dimension ofthe denoised data. These autoencoders are speciﬁcally designed to compress data effectively and reduce dimensions.Hence, it may not be wholly generalized, and training with corrupted data requires a lot of adjustments to get better testresults [30].I-Ting Chen et al. also provide with FGSM attack on denoising autoencoders. They analyze the attacks from theperspective that attacks can be applied stealthily. They use autoencoders to ﬁlter data before applied to the model andcompare it with the model without an autoencoder ﬁlter. They use autoencoders mainly focused on the stealth aspect ofthese attacks and used them speciﬁcally against FGSM with speciﬁc parameters [31].Gondim-Ribeiro et al. propose autoencoders attacks. In their work, they attack 3 types of autoencoders: Simplevariational autoencoders, convolutional variational autoencoders, and DRAW (Deep Recurrent AttentiveWriter). Theypropose to scheme an attack on autoencoders. As they accept that "No attack can both convincingly reconstruct thetarget while keeping the distortions on the input imperceptible.". This method cannot be used to achieve robustnessagainst adversarial attacks [32].Table 2 shows the strength and the weakness of the each paper.

In this section, we consider attack types, data poisoning attacks, model attacks, attack environments, and autoencoder.

Machine Learning attacks can be categorized into data poisoning attacks and model attacks. The difference betweenthe two attacks lies in the inﬂuencing type. Data poisoning attacks mainly focus on inﬂuencing the data, while modelevasion attacks inﬂuencing the model for desired attack outcomes. Both attacks aim to disrupt the machine learningstructure, evasion from ﬁlters, causing wrong predictions, misdirection, and other problems for the machine learningprocess. In this paper, we mainly focus on machine learning model attacks.

According to machine learning methods, algorithms are trained and tested with datasets. Data poisoning in machinelearning algorithms has a signiﬁcant impact on a dataset and can cause problems for algorithm and confusion fordevelopers. With poisoning the data, adversaries can compromise the whole machine learning process. Hence, datapoisoning can cause problems in machine learning algorithms.

Machine learning model attacks have been applied mostly in adversarial attacks, and evasion attacks being have beenused most extensively in this category. For spam emails, phishing attacks, and executing malware code, adversariesapply model evasion attacks. There are also some beneﬁts to adversaries in misclassiﬁcation and misdirection. In this3

PREPRINT - O

CTOBER

20, 2020Table 1: Related Work Summary

Research Study Strength Weakness

Adversarial Machine Learning [18] Introduces the emerging ﬁeld of Adversarial Machine Learn-ing. Discusses the countermeasures against attacks without sug-gesting a method.Evasion-RobustClassiﬁcation on Binary Domains [24] Demonstrates some methods that can be used on Binary Do-mains, which are based on MILP. Very speciﬁc about the robustness, even though it is presentedas a general method.Training for Faster Adversarial Robust-ness Veriﬁcation via Inducing ReLUStability [25] Using weight sparsity and RELU stability for robust veriﬁca-tion. Does not provide a general approach, or universality as it issuggested in paper.Interpreting Adversarial Robustness: AView from Decision Surface in InputSpace [26] By extending the loss surface to decision surface and othervarious methods, they provide adversarial robustness by deci-sion surface. The geometry of the decision surface cannot be shown mostof the times and there is no explicit decision boundary be-tween correct or wrong prediction. Robustness can be in-creased by constructing a good model but it can change withattack intensity.Robust AdversarialReinforcement Learning [27] They have tried to generalize reinforced learning on machinelearning models. They suggested a Robust Adversarial Re-inforced Learning (RARL) where they have trained an agentto operate in the presence of a destabilizing adversary thatapplies disturbance forces to the system. Robust Adversarial Reinforced Learning may overﬁt itselfand sometimes it may mispredict without any adversarial be-ing in presence.Alleviating Adversarial Attacks via Con-volutional Autoencoder [29] They have produced adversary examples via a convolutionalautoencoder model. Pooling computations and samplingtricks are used. Then an adversarial decoder automate thegeneration of adversarial samples. Adversarial sampling is useful but it cannot provide adversar-ial robustness on its own. Sampling tricks are also too speci-ﬁed.Combatting Adversarial Attacks throughDenoising and Dimensionality Reduc-tion: A Cascaded Autoencoder Ap-proach [30] They have used an autoencoder to denoise the test data whichis trained with both corrupted and normal data. Then theyreduce the dimension of the denoised data. Autoencoders speciﬁcally designed to compress data effec-tively and reduce dimensions. Therefore it may not be com-pletely generalized and training with corrupted data requiresa lot of adjustments for test results.A Comparative Study of Autoencodersagainst Adversarial Attacks [31] They have used autoencoders to ﬁlter data before applyinginto the model and compare it with the model without autoen-coder ﬁlter. They have used autoencoders mainly focused on the stealthaspect of these attacks and use them speciﬁcally againstFGSM with speciﬁc parameters.Adversarial Attacks on Variational Au-toencoders [32] They propose a scheme to attack on autoencoders and vali-date experiments to three autoencoder models: Simple, con-volutional and DRAW (Deep Recurrent Attentive Writer). As they have accepted "No attack can both convincingly re-construct the target while keeping the distortions on the inputimperceptible.". it cannot provide robustness against adver-sarial attacks.Understanding Autoencoders with Infor-mation Theoretic Concepts [33] They examine data processing inequality with stacked au-toencoders and two types of information planes with autoen-coders. They have analyzed DNNs learning from a joint geo-metric and information theoretic perspective, thus emphasiz-ing the role that pair-wise mutual information plays importantrole in understanding DNNs with autoencoders. The accurate and tractable estimation of information quanti-ties from large data seems to be a problem due to Shannon’sdeﬁnition and other information theories are hard to estimate,which severely limits its powers to analyze machine learningalgorithms.Adversarial Attacks and Defences Com-petition [34] Google Brain organized NIPS 2017 to accelerate research onadversarial examples and robustness of machine learning clas-siﬁers. Alexey Kurakin and Ian Goodfellow et al. presentsome of the structure and organization of the competition andthe solutions developed by several of the top-placing teams. We experimented with the proposed methods of this competi-tion bu these methods do not provide a generalized solutionfor the robustness against adversarial machine learning modelattacks.Explaining AndHarnessing Adversarial Examples [35] Ian Goodfellow et al. makes considerable observations aboutGradient-based optimization and introduce FGSM. Models may mislead for the efﬁciency of optimization. Thepaper focuses explicitly on identifying similar types of prob-lematic points in the model. type of attack, the attacker does not change training data but disrupts or changes its data and diverse this data from thetraining dataset or make this data seem safe. This study mainly concentrates on model attacks.

There are two signiﬁcant threat models for adversarial attacks: the white-box and black-box models.

Under the white-box setting, the internal structure, design, and application of the tested item are accessible to theadversaries. In this model, attacks are based on an analysis of the internal structure. It is also known as open boxattacks. Programming knowledge and application knowledge are essential. White-box tests provide a comprehensiveassessment of both internal and external vulnerabilities and are the best choice for computational tests.

In the black-box model, internal structure and software testing are secrets to the adversaries. It is also known asbehavioral attacks. In these tests, the internal structure does not have to be known by the tester. They provide acomprehensive assessment of errors. Without changing the learning process, black box attacks provide changes to beobserved as external effects on the learning process rather than changes in the learning algorithm. In this study, themain reason behind the selection of this method is the observation of the learning process.4

PREPRINT - O

CTOBER

20, 2020 I N P U T S x x x x n-2 x n-1 x n ...... Input Layer ... ...

Hidden Layer I O U T P U T S ............ Hidden Layer IV Output LayerHidden Layer II Hidden Layer III x x x x n-2 x n-1 x n Autoencoder

Figure 1: Autoencoder Layer StructureAn autoencoder neural network is an unsupervised learning algorithm that takes inputs and sets target values to beequals of the input values [33]. Autoencoders are generative models that apply backpropagation. They can work withoutthe results of these inputs. While the use of a learning model is in the form of model.fit(X,Y) , autoencoders workas model.fit(X,X) . The autoencoder works with the ID function to get the output x that corresponds to x entries.The identity function seems to be a particularly insigniﬁcant function to try to learn; however, there is an interestingstructure related to the data, putting restrictions such as limiting the number of hidden units on the network[33]. Theyare neural networks which work as neural networks with an input layer, hidden layers and an output layer but insteadof predicting Y as in model.fit(X,Y) , they reconstruct X as in model.fit(X,X) . Due to this reconstruction beingunsupervised, autoencoders are unsupervised learning models. This structure consists of an encoder and a decoder part.We will deﬁne the encoding transition as φ and decoding transition as ψ . φ : X → Fψ : F → Xφ, ψ = argmin φ,ψ || X − ( ψ ◦ φ ) X || With one hidden layer, encoder will take the input x ∈ R d = χ and map it to h ∈ R p = F . The h below is referred toas latent variables. σ is an activation function such as ReLU or sigmoid which were used in this study[36, 37]. b is biasvector, W is weight matrix which both are usually initialized randomly then updated iteratively through training[38]. h = σ ( W x + b ) After the encoder transition is completed, decoder transition maps h to reconstruct x (cid:48) . x (cid:48) = σ (cid:48) ( W (cid:48) h + b (cid:48) ) where σ (cid:48) , W (cid:48) , b (cid:48) of decoder are unrelated to σ , W , b of encoder. Loss of autoencoders are trainedto be minimal, showed as L below. L ( x, x (cid:48) ) = || x − x (cid:48) || = || x − σ (cid:48) ( W (cid:48) ( σ ( W x + b )) + b (cid:48) ) || So the loss function shows the reconstruction errors, which need to be minimal. After some iterations with input trainingset x is averaged.In conclusion, autoencoders can be seen as neural networks that reconstruct inputs instead of predicting them. In thispaper, we will use them to reconstruct our dataset inputs. This section presents the selection of autoencoder model, activation function, and tuning parameters.5

PREPRINT - O

CTOBER

20, 2020

In this paper, we have selected the MNIST dataset to observe changes easily. Therefore, the size of the layer structure inthe autoencoder model is selected as 28 and multipliers to match the MNIST datasets, which represents the numbers by28 to 28 matrixes. Figure 2 presents the structure of matrixes. The modiﬁed MNIST data with autoencoder is presentedin Figure 3. In the training of the model, the encoded data is used instead of using the MNIST datasets directly. As atraining method, a multi-class logistic regression method is selected, and attacks are applied to this model. We trainautoencoder for 35 epochs. Figure 4 provides the process diagram. I N P U T S x x x x x x ...... Relu ... ...

Relu O U T P U T S ...... ... ... Exponential

Softplus Relu Relu x x x x x x Autoencoder

Decoding Encoding

Figure 2: Autoencoder Activation Functions. Note that layer sizes given according to the dataset which is MNISTdataset Figure 3: Normal and Encoded Data Set of MNIST

In machine learning and deep learning algorithms, the activation function is used for the computations betweenhidden and output layers[39]. The loss values are compared with different activation functions. Figure 5 indicates thecomparison results of loss value.

Sigmoid and

ReLU have the best performance among these values and gave the bestresults.

Sigmoid has more losses at lower epochs than

ReLU , but it has better results. Therefore, it is aimed to reach thebest result of activation function in both layers. The model with the least loss value is to make the coding parts with the

ReLU function and to use the exponential and softplus functions in the analysis part respectively. These functionsare used in our study. Figure 6 illustrates the result of the loss function, and Figure 2 presents the structure of the modelwith the activation functions. 6

PREPRINT - O

CTOBER

20, 2020

Data Set AutoencoderAttackResults Untargeted andTargeted Attacks onTrained Model EncodedData SetModel Training

Figure 4: Process Diagram ./Imgs/AE/MCLR/allRelu.eps (a) Relu Loss History ./Imgs/AE/MCLR/allSigmoid.eps (b) Sigmoid Loss History ./Imgs/AE/MCLR/allSoftSign.eps (c) Softsign Loss History ./Imgs/AE/MCLR/allTanh.eps (d) Tanh Loss History

Figure 5: Loss histories of different activation functions

The tuning parameters for autoencoders depend on the dataset we use and what we try to apply. As previously mentioned,

ReLU and sigmoid function are selected to be activation function for our model [37, 39].

ReLU is the activation functionthrough the whole autoencoder while exponential is the softplus being the output layer’s activation function whichyields the minimal loss. Figure 2 presents the input size as 784 due to our dataset and MNIST dataset contains 28x28pixel images[40]. Encoding part for our autoencoder size is × × and decoding size is × × .7 PREPRINT - O

CTOBER

20, 2020 ./Imgs/AE/MCLR/allOptimizedRelu.eps

Figure 6: Optimized Relu Loss HistoryThis structure is selected by the various neural network structures that take the square of the size of the matrix, lower it,and give it to its dimension size lastly. The last hidden layer of the decoding part with the size of 504 uses exponential activation function, and an output layer with the size of 784 uses softplus activation function [41, 42]. We used adam optimizer with categorical crossentropy[43, 44]. We see that a small number is enough for training, so we select epochnumber for autoencoder as 35. This is the best epoch value to get meaningful results for both models with autoencoderand without autoencoder to see accuracy. In lower values, models get their accuracy scores too low for us to see thedifference between them, even though some models are structurally stronger than others.

We examine the robustness of autoencoder for adversarial machine learning with different machine learning algorithmsand models to see that autoencoding can be a generalized solution and an easy to use defense mechanism for mostadversarial attacks. We use various linear machine learning model algorithms and neural network model algorithmsagainst adversarial attacks.

In this section, we look at the robustness provided with auto-encoding. We select a linear model and a neural networkmodel to demonstrate this effectiveness. In these models, we also observe the robustness of different attack methods.We also use the MNIST dataset for these examples.

In linear machine learning model algorithms, we use mainly two attack methods: Non-Targeted and Targeted Attacks.The non-targeted attack does not concern with how the machine learning model makes its predictions and tries to forcethe machine learning model into misprediction. On the other hand, targeted attacks focus on leading some correctpredictions into mispredictions. We have three methods for targeted attacks: Natural, Non-Natural, and one selectedtarget. Firstly, natural targets are derived from the most common mispredictions made by the machine learning model.For example, guessing number 5 as 8, and number 7 as 1 are common mispredictions. Natural targets take thesenon-targeted attack results into account and attack directly to these most common mispredictions. So, when number 5 isseen, an attack would try to make it guessed as number 8. Secondly, non-natural targeted attacks are the opposite ofnatural targeted attacks. It takes the minimum number of mispredictions made by the machine learning model withthe feedback provided by non-natural attacks. For example, if number 1 is least mispredicted as 0, the non-naturaltarget for number 1 is 0. Therefore, we can see that how much the attack affects the machine learning model beyond itscommon mispredictions. Lastly, one targeted attack focuses on some random numbers. The aim is to make the machinelearning model mispredict the same number for all numbers. For linear classiﬁcations, we select multi-class logisticregression to analyze the attacks. Because we do not interact with these linear classiﬁcation algorithms aside fromcalling their deﬁned functions from scikit-learn library, we use a black-box environment for these attacks. In our study,the attack method against multi-class classiﬁcation models developed in NIPS 2017 is used [34]. An epsilon value is8

PREPRINT - O

CTOBER

20, 2020used to determine the severity of the attack, which we select 50 in this study to demonstrate the results better. We applya non-targeted attack to a multi-class logistic regression trained model which is trained with MNIST dataset without anautoencoder. The confusion matrix of this attack is presented in 9.

Predicted Values A c t u a l V a l u es

973 0 4 0 1 2 9 1 4 6 Figure 7: Confusion matrix of the model without any attack and without autoencoder

Predicted Values A c t u a l V a l u es

973 0 4 0 1 2 9 1 4 6 Figure 8: Confusion matrix of the model without any attack and with autoencoderThe ﬁndings from Figure 9 and 10 show that an autoencoder model provides robustness against non-targeted attacks.The accuracy value change with epsilon is presented in Figure 13. Figure 11 illustrates the change and perturbation ofthe selected attack with epsilon value as 50.We apply a non-targeted attack on the multi-class logistic regression model with autoencoder and without autoencoder.Figure 13 provides a difference in accuracy metric. The detailed graph of the non-targeted attack on the model withautoencoder is presented in Figure 14. The changes in the MNIST dataset after autoencoder is provided in Figure 3.The value change and perturbation of an epsilon 50 value on data are indicated in Figure 12.The following process is presented in Figure 4. In the examples with the autoencoder, data is passed through theautoencoder and then given to the training model, in our current case a classiﬁcation model with multi-class logisticregression. Multi-class logistic regression uses the encoded dataset for training. Figure 10 provides to see improvementas a confusion matrix. For the targeted attacks, we select three methods to use. The ﬁrst one is natural targets for9

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

247 0 17 51 8 73 32 20 8 7

32 18 69 37 181 24 251 288 191 255

49 174 222 8 128 106 25 193 489 141

509 58 56 154 43 9 502 110 55 172

45 0 93 35 68 109 4 5 25 1

23 210 48 22 33 26 43 26 52 1

51 678 366 586 31 378 23 141 0 189

47 13 60 76 469 60 25 194 137 0

Figure 9: Confusion matrix of non-targeted attack to model without autoencoder

Predict ed Values A c t u a l V a l u es

987 0 7 8 0 13 0 1 5 4 Figure 10: Confusion matrix of non-targeted attack to model with autoencoderFigure 11: Value change and perturbation of a non-targeted attack on model without autoencoder10

PREPRINT - O

CTOBER

20, 2020Figure 12: Value change and perturbation of a non-targeted attack on model with autoencoder ./Imgs/AE/MCLR/nontargetedAttack.eps

Figure 13: Comparison of accuracy with and without autoencoder for non-targeted attackMNIST dataset, which is also deﬁned in NIPS 2017 [34]. Natural targets take the non-targeted attack results intoaccount and attack directly to these most common mispredictions. For example, the natural target for number 3 is 8.When we apply the non-targeted attack, we obtain these results. Heat map for these numbers is indicated in Figure 15.The second method of targeted attacks is non-natural targets which is the opposite of natural targets. We select the leastmis predicted numbers as the target. These numbers is indicated as the heat map in Figure 15. The third method is theselection one number and making all numbers predict it. We randomly choose 7 as that target number. Targets for thesemethods are presented in Figure 16. The confusion matrixes for these methods are presented below.

We use neural networks with the same principles as multi-class logistic regressions and make attacks to the machinelearning model. We use the same structure, layer, activation functions and epochs for these neural networks as we usein our autoencoder for simplicity. Although this robustness will work with other neural network structures, we will notdemonstrate them in this study due to structure designs that can vary for all developers. We also compare the results ofthese attacks with both the data from the MNIST dataset and the encoded data results of the MNIST dataset. As forattack methods, we select three methods: FGSM, T-FGSM and BIM. Cleverhans library is used for providing theseattack methods to the neural network, which is from the Keras library.11

PREPRINT - O

CTOBER

20, 2020 ./Imgs/AE/MCLR/nontargetedAttackOnlyAutoEncoder.eps

Figure 14: Details of accuracy with autoencoder for non-targeted attack ./Imgs/AE/MCLR/Heatmap.eps

Figure 15: Heatmap of actual numbers and mispredictions12

PREPRINT - O

CTOBER

20, 2020

Natural Targets

Actual Numbers

Target Numbers

Non-Natural Targets

Actual Numbers

Target Numbers

One Number Targeted

Actual Numbers

Target Numbers

Figure 16: Actual numbers and their target values for each targeted attack method

Predict ed Values A c t u a l V a l u es

291 0 10 9 1 5 10 16 1 1

680 3 22 21 5 49 559 15 29 0

18 1124 783 917 17 735 26 41 130 17 Figure 17: Confusion matrix of natural targeted attack to model without autoencoder

Predict ed Values A c t u a l V a l u es

989 0 2 1 0 6 7 1 0 1 Figure 18: Confusion matrix of natural targeted attack to model with autoencoder

Predict ed Values A c t u a l V a l u es

735 147 281 41 8 36 31 29 694 12

29 88 200 53 107 15 214 170 135 22

37 59 96 71 41 95 9 136 59 19

83 0 5 31 1 2 107 14 5 4

72 8 99 24 103 110 422 39 28 380

33 741 246 195 30 258 13 104 22 163 Figure 19: Confusion matrix of non-natural targeted attack to model without autoencoder

Predict ed Values A c t u a l V a l u es

994 0 1 0 0 7 0 0 0 0 Figure 20: Confusion matrix of non-natural targeted attack to model with autoencoder13

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

281 0 17 14 1 27 17 0 1 0

16 12 330 109 2 132 46 0 96 0

69 0 9 12 0 13 165 0 6 0

612 1114 372 778 828 406 479 1021 731 1005

17 0 32 44 107 82 24 0 130 4

Figure 21: Confusion matrix of one number targeted attack to model without autoencoder

Predict ed Values A c t u a l V a l u es

991 0 3 0 0 8 0 0 0 0 Figure 22: Confusion matrix of one number targeted attack to model with autoencoder ./Imgs/AE/MCLR/AllInOneTargetedAttacks.eps

Figure 23: Comparison of accuracy with and without autoencoder for targeted attacks.

AE stands for the models withautoencoder, WO stands for models without autoencoder PREPRINT - O

CTOBER

20, 2020 ./Imgs/AE/MCLR/AEInOneTargetedAttacks.eps

Figure 24: Details of accuracy with autoencoder for targeted attacksWe examine the differences between the neural network model that has autoencoder and the neural network modelthat takes data directly from the MNIST dataset with confusion matrixes and classiﬁcation reports. Firstly, our modelwithout autoencoder gives the following results, as seen in Figure 25 for the confusion matrix and the classiﬁcationreport. The results with the autoencoder are presented in Figure 26. Note that these confusion matrixes and classiﬁcationreports are indicated before any attack.

Predict ed Values A c t u a l V a l u es

974 0 5 0 1 2 7 1 5 5 Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 25: Confusion matrix and classiﬁcation report of the neural network model without autoencoder

Predict ed Values A c t u a l V a l u es

966 0 4 0 0 2 3 0 3 3 Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 26: Confusion matrix and classiﬁcation report of the neural network model with autoencoder15

PREPRINT - O

CTOBER

20, 2020

Fast Gradient Sign Method:

There is a slight difference between the neural network models with autoencoder and without autoencoder model. Weapply the FGSM attack on both methods. The method uses the gradients of the loss accordingly for creating a newimage that maximizes the loss. We can say the gradients are generated accordingly to input images. For these reasons,the FGSM causes a wide variety of models to misclassify their input [35].

Predict ed Values A c t u a l V a l u es

80 1 42 7 16 11 68 7 24 14

177 127 73 120 43 3 47 95 264 5

19 13 344 50 7 337 54 504 234 171

17 538 35 2 85 1 356 47 18 295

68 2 6 351 1 99 177 3 185 63

275 8 9 0 32 70 71 0 38 1

215 177 64 228 7 7 40 48 318

109 223 206 303 69 253 154 68 16 105

213 3 15 108 467 105 4 248 117 26

Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 27: Confusion matrix and classiﬁcation report of the neural network model without autoencoder after FGSMattack

Predict ed Values A c t u a l V a l u es

966 0 5 0 0 2 2 1 3 2 Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 28: Confusion matrix and classiﬁcation report of the neural network model with autoencoder after FGSM attackAs we expect due to results from multi-class logistic regression, autoencoder gives robustness to the neural networkmodel too. After the DGSM, the neural network without an autoencoder suffers an immense drop in its accuracy, andthe FGSM works as intended. But the neural network model with autoencoder only suffers a 0.01 percent accuracydrop.

Targeted Fast Gradient Sign Method:

There is a directed type of FGSM, called T-FGSM. It uses the same principlesto maximize the loss of the target. In this method, a gradient step is computed for giving the same misprediction fordifferent inputs.

Predict ed Values A c t u a l V a l u es

972 1119 844 1004 906 890 947 982 956 871 Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 29: Confusion matrix and classiﬁcation report of the neural network model without autoencoder after T-FGSMattackIn the confusion matrix, the target value for this attack is number 5. The neural network model with the autoencoder isstill at the accuracy of 0.98. The individual differences are presented when compare with Figure 26.16

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

965 0 4 0 0 2 3 0 3 3 Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 30: Confusion matrix and classiﬁcation report of the neural network model with autoencoder after T-FGSMattack

Basic Iterative Method:

BIM is an extension of FGSM to apply it multiple times with iterations. It provides the recalculation of a gradient attackfor each iteration.

Predict ed Values A c t u a l V a l u es

201 138 24 132 40 2 51 96 258 4

15 12 350 4 8 350 65 492 251 181

19 533 42 3 11 2 385 43 19 300

48 2 5 342 3 15 160 3 168 58

284 8 11 0 47 72 20 0 40 0

191 184 70 221 7 7 21 45 296

136 243 232 323 98 304 178 61 15 119

247 3 22 126 501 126 4 287 124 24

Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 31: Confusion matrix and classiﬁcation report of the neural network model without autoencoder after basiciterative method attack

Predict (cid:72)(cid:71)

Values A c t u a l V a l u es

967 0 3 0 0 2 2 1 4 2 Precision Recall F1-Score SupportMicro AvgMacro AvgWeighted Avg

Figure 32: Confusion matrix and classiﬁcation report of the neural network model with autoencoder after basic iterativemethod attackThis is the most damaging attack for the neural network model that takes its inputs directly from the MNIST Datasetwithout an autoencoder. The ﬁndings from Figure 31 show that the accuracy drops between 0.01 and 0.02 percent. Theneural network model with autoencoder’s accuracy stays as 0.97 percent, losing only 0.1 percent.Findings indicate that autoencoding before giving dataset as input to linear models and neural network models improverobustness against adversarial attacks signiﬁcantly. We use vanilla autoencoders. They are the basic autoencoderswithout modiﬁcation. In the other sections, we apply the same attacks with the same machine learning models withdifferent autoencoder types. 17

PREPRINT - O

CTOBER

20, 2020

Sparse autoencoders present improved performance on classiﬁcation tasks. It includes more hidden layers than theinput layer. The signiﬁcant part is deﬁning a small number of hidden layers to be active at once to encourage sparsity.This constraint forces the training model to respond uniquely to the characteristics of translation and uses the statisticalfeatures of the input data.Because of this sparse autoencoders involve sparsity penalty Ω( h ) in their training. L ( x, x (cid:48) ) + Ω( h ) This penalty makes the model to activate speciﬁc areas of the network depending on the input data while making allother neurons inactive. We can create this sparsity by relative entropy, also known as Kullback-Leibler divergence. (cid:98) ρ j = m (cid:80) mi =1 [ h j ( x i )] (cid:98) ρ j is our average activation function of the hidden layer j which is averaged over m trainingexamples. For increasing the sparsity in terms of making the number of active neurons as smaller as it can be, wewould want ρ close to zero. The sparsity penalty term Ω( h ) will punish (cid:98) ρ j for deviating from ρ , which will be basicallyexploiting Kullback-Leibler divergence. KL ( p || (cid:98) ρ j ) is our Kullback-Leibler divergence between a random variable ρ and random variable with mean (cid:98) ρ j . (cid:80) sj =1 KL ( ρ || (cid:98) ρ j ) = (cid:80) sj =1 [ ρlog ρ (cid:98) ρ j + (1 − ρ ) log − ρ − (cid:98) ρ j ] Sparsity can be achieved with other ways, such as applying L1 and L2 regularization terms on the activation of thehidden layer. L is our loss function and λ is our scale parameter. L ( x, x (cid:48) ) + λ (cid:80) i | h i | This section presents multi-class logistic regressions with sparse autoencoders. The difference from the autoencodersection is the autoencoder type. The ﬁndings from Figure 6 and Figure 33 show that loss is higher compared to theautoencoders in sparse autoencoder.Figure 33: Optimized Relu Loss History for Sparse AutoencoderThe difference between perturbation is presented in Figure 35 and Figure 36 compared to the perturbation in Figure 11and Figure 12. The perturbation is sharper in sparse autoencoder.Figure 37 indicates that sparse autoencoders performs poorly compared to autoencoders in multi-class logistic regression.

Sparse autoencoder results for neural networks indicate that vanilla autoencoder seems to be slightly better than sparseautoencoders for neural networks. Sparse autoencoders do not perform as well in linear machine learning models, inour case, multi-class logistic regression.

Denoising autoencoders are used for partially corrupted input and train it to recover the original undistorted input. Inthis study, the corrupted input is not used. The aim is to achieve a good design by changing the reconstruction principle18

PREPRINT - O

CTOBER

20, 2020Figure 34: Comparison of accuracy with and without sparse autoencoder for non-targeted attackFigure 35: Value change and perturbation of a non-targeted attack on model without sparse autoencoderFigure 36: Value change and perturbation of a non-targeted attack on model with sparse autoencoder19

PREPRINT - O

CTOBER

20, 2020Figure 37: Comparison of accuracy with and without sparse autoencoder for targeted attacks.

AE stands for the modelswith sparse autoencoder, WO stands for models without autoencoder

Predict ed Values A c t u a l V a l u es

972 0 1 0 1 2 5 1 6 4 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 38: Confusion matrix and classiﬁcation report of the neural network model without sparse autoencoder

Predict ed Values A c t u a l V a l u es

967 0 9 0 2 5 2 1 5 4 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 39: Confusion matrix and classiﬁcation report of the neural network model with sparse autoencoder20

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

54 0 29 5 2 13 111 12 20 10

369 416 154 295 77 14 222 252 510 14

14 12 315 36 5 338 58 297 80 212

41 1 4 329 11 80 185 1 117 47

276 9 11 1 48 89 120 0 57 3

203 183 72 288 7 0 57 80 411

108 308 195 188 73 249 89 83 16 88

94 0 8 78 361 93 5 271 46 19

Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 40: Confusion matrix and classiﬁcation report of the neural network model without sparse autoencoder afterFGSM attack

Predict ed Values A c t u a l V a l u es

966 0 7 0 2 4 4 2 3 4 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 41: Confusion matrix and classiﬁcation report of the neural network model with sparse autoencoder after FGSMattack

Predict ed Values A c t u a l V a l u es

980 1130 1023 1007 976 892 958 961 967 998 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 42: Confusion matrix and classiﬁcation report of the neural network model without sparse autoencoder afterT-FGSM attack

Predict ed Values A c t u a l V a l u es

966 0 9 0 1 3 4 1 4 4 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 43: Confusion matrix and classiﬁcation report of the neural network model with sparse autoencoder afterT-FGSM attack 21

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

377 360 10 295 65 5 246 205 494 14

14 12 398 11 6 337 68 315 98 211

37 0 2 330 6 23 177 2 110 45

299 11 15 1 56 103 11 0 59 5

223 206 72 278 7 1 18 78 392

118 374 218 190 89 272 94 95 16 102

110 1 14 101 432 124 7 330 51 22

Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 44: Confusion matrix and classiﬁcation report of the neural network model without sparse autoencoder afterbasic iterative method attack

Predict ed Values A c t u a l V a l u es

964 0 6 0 2 4 4 2 3 4 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 45: Confusion matrix and classiﬁcation report of the neural network model with sparse autoencoder after basiciterative method attackfor using denoising autoencoders. For achieving this denoising properly, the model requires to extract features thatcapture useful structure in the distribution of the input. Denoising autoencoders apply corrupted data through stochasticmapping. Our input is x and corrupted data is (cid:101) x and stochastic mapping is (cid:101) x ∼ q D ( (cid:101) x | x ) . As its a standard autoencoder, corrupted data (cid:101) x is mapped to a hidden layer. h = f θ ( (cid:101) x ) = s ( W (cid:101) x + b ) . And from this the model reconstructs z = g (cid:48) θ ( h ) . In denoising autoencoder for multi-class logistic regression, the loss does not improve for each epoch. Although it startsbetter at lower epoch values, in the end, vanilla autoencoder seems to be better. Sparse autoencoder’s loss is slightlyworse.And just like sparse autoencoder, denoising autoencoder also applies a sharp perturbation, which is presented in Figure48 and Figure 49.We observe that there is a similarity between accuracy results for denoising autoencoder with multi-class logisticregression and sparse autoencoder results. Natural fooling accuracy drops drastically in denoising autoencoder, butnon-targeted and one targeted attack seem to be somewhat like sparse autoencoder, one targeted attack having lessaccuracy in denoising autoencoder.

We investigate that neural network accuracy for denoising autoencoder is worse than sparse autoencoder results andvanilla autoencoder results. It is still a useful autoencoder for denoising corrupted data and other purposes; however, itis not the right choice just for robustness against adversarial examples.22

PREPRINT - O

CTOBER

20, 2020Figure 46: Optimized Relu Loss History for Denoising AutoencoderFigure 47: Comparison of accuracy with and without denoising autoencoder for non-targeted attackFigure 48: Value change and perturbation of a non-targeted attack on model without denoising autoencoder23

PREPRINT - O

CTOBER

20, 2020Figure 49: Value change and perturbation of a non-targeted attack on model with denoising autoencoderFigure 50: Comparison of accuracy with and without denoising autoencoder for targeted attacks.

AE stands for themodels with denoising autoencoder, WO stands for models without autoencoder

Predict ed Values A c t u a l V a l u es

974 0 3 0 0 2 7 1 5 3 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 51: Confusion matrix and classiﬁcation report of the neural network model without denoising autoencoder24

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

961 0 2 0 1 2 7 1 7 3 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 52: Confusion matrix and classiﬁcation report of the neural network model with denoising autoencoder

Predict ed Values A c t u a l V a l u es

82 0 22 7 15 5 71 4 18 7

200 365 86 255 54 15 141 210 304 8

83 7 8 345 19 92 212 6 232 78

303 20 8 1 25 56 77 0 53 1

206 181 68 278 2 1 56 57 350

146 244 213 218 84 352 120 77 16 141

137 1 5 55 347 66 1 254 86 12

Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 53: Confusion matrix and classiﬁcation report of the neural network model without denoising autoencoder afterFGSM attack

Predict ed Values A c t u a l V a l u es

961 0 1 0 2 2 7 1 6 3 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 54: Confusion matrix and classiﬁcation report of the neural network model with denoising autoencoder afterFGSM attack

Predict ed Values A c t u a l V a l u es

980 1125 1032 1010 976 892 957 1026 971 1009 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 55: Confusion matrix and classiﬁcation report of the neural network model without denoising autoencoder afterT-FGSM attack 25

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

340 374 1018 773 30 88 6 14 571 28

621 274 2 20 269 579 949 0 218 27

487 12 217 683 225 3 1014 185 954 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 56: Confusion matrix and classiﬁcation report of the neural network model with denoising autoencoder afterT-FGSM attack

Predict ed Values A c t u a l V a l u es

207 323 20 263 34 5 165 190 291 5

10 273 40 1 7 0 304 49 15 285

56 1 5 336 14 14 216 5 216 67

331 15 8 1 31 64 12 0 58 1

202 190 76 289 2 0 18 59 346

184 308 239 238 120 415 149 84 14 161

173 2 7 78 407 75 1 303 103 17

Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 57: Confusion matrix and classiﬁcation report of the neural network model without denoising autoencoder afterbasic iterative method attack

Predict ed Values A c t u a l V a l u es

351 391 1017 773 30 89 6 16 577 28

609 273 3 17 261 575 949 0 212 26

471 12 220 691 228 3 1012 185 955 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 58: Confusion matrix and classiﬁcation report of the neural network model with denoising autoencoder afterbasic iterative method attack

In this study, we examine variational autoencoders as the ﬁnal type of autoencoder type. The variational autoencodershave an encoder and a decoder, although their mathematical formulation differs signiﬁcantly. They are associatedwith Generative Adversarial Networks due to their architectural similarity. In summary, variational autoencoders arealso generative models. Differently, from sparse autoencoders, denoising autoencoders, and vanilla autoencoders, allof which aim discriminative modeling, generative modeling tries to simulate how the data can be generated and tounderstand the underlying causal relations. It also considers these causal relations when generating new data.Variational autoencoders use an estimator algorithm called Stochastic Gradient Variational Bayes for training. Thisalgorithm assumes the data is generated by p θ ( x | h ) which is a directed graphical model and θ being the parametersof decoder, in variational autoencoder’s case, the parameters of the generative model. The encoder is learning anapproximation of q φ ( h | x ) to a posterior distribution which is showed by p θ ( x | h ) and φ being the parameters ofthe encoder; in variational autoencoder’s case, the parameters of recognition model. We will use Kullback-Leiblerdivergence again, showed as D KL . L = ( φ, θ, x ) = D KL ( q φ ( h | x ) || p θ ( h )) − E q φ ( h | x ) ( logp θ ( x | h )) .26 PREPRINT - O

CTOBER

20, 2020Variational and likelihood distributions’ shape is chosen by factorized Gaussians. The encoder outputs are p ( x ) and w ( x ) . The decoder outputs are µ ( h ) and σ ( h ) . The likelihood term of variational objective is deﬁned below. q φ ( h | x ) = N ( p ( x ) , w ( x ) I ) p θ ( x | h ) = N ( µ ( h ) , σ ( h ) I ) The ﬁndings from Figure 59 show that variational autoencoder indicates the best loss function result. However, Figure60 presents that the accuracy is low, especially in low epsilon values where even autoencoded data gives worse accuracythan the normal learning process.Figure 59: Optimized Relu Loss History for Variational AutoencoderFigure 60: Comparison of accuracy with and without variational autoencoder for non-targeted attackPerturbation applied by variational autoencoder is not as sharp in sparse autoencoder and denoising autoencoder. Itseems similar to vanilla autoencoder’s perturbation. 27

PREPRINT - O

CTOBER

20, 2020Figure 61: Value change and perturbation of a non-targeted attack on model without variational autoencoderFigure 62: Value change and perturbation of a non-targeted attack on model with variational autoencoderThe variational autoencoder has the worst results. Besides, it presents bad results at the low values of epsilon, makingautoencoded data less accurate and only a slight improvement compared to the normal data in high values of epsilon.Figure 63: Comparison of accuracy with and without variational autoencoder for targeted attacks.

AE stands for themodels with variational autoencoder, WO stands for models without autoencoder PREPRINT - O

CTOBER

20, 2020Figure 64: Because of Mnist dataset, our latent space is two-dimensional. One is to look at the neighborhoods ofdifferent classes on the latent 2D plane. Each of these colored clusters are a type of digit. Close clusters are digits thatare structurally similar, they are digits that share information in the latent space.Figure 65: Due to VAE is a generative model, we can also generate new Mnist digits using latent plane, sampling latentpoints at regular intervals, and generating the corresponding digit for each of these points.29

PREPRINT - O

CTOBER

20, 2020

Variational autoencoder with neural networks also illustrates the worst results compared to other autoencoder types,where the accuracy for autoencoded data against an attack has around between 0.96 and 0.99 accuracies, variationalautoencoder has around between 0.65 and 0.70 accuracies.

Predict ed Values A c t u a l V a l u es

976 1 3 0 0 2 6 1 6 7 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 66: Confusion matrix and classiﬁcation report of the neural network model without variational autoencoder

Predict ed Values A c t u a l V a l u es

863 0 3 4 1 7 44 0 2 1

27 3 810 102 12 49 64 2 37 12

69 6 46 3 8 17 815 0 6 4 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 67: Confusion matrix and classiﬁcation report of the neural network model with variational autoencoder

Predict ed Values A c t u a l V a l u es

96 4 47 10 20 15 109 8 20 12

243 239 116 210 60 7 143 157 454 10

15 129 386 52 7 309 25 396 103 215

48 1 0 310 5 62 178 8 112 53

255 14 15 0 35 69 103 0 43 2

284 131 45 249 7 4 47 46 288

47 165 163 172 71 294 134 74 17 85

250 1 30 199 390 126 5 266 124 20

Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 68: Confusion matrix and classiﬁcation report of the neural network model without variational autoencoder afterFGSM attack

In this paper, we have presented the results for pre-ﬁltering the data with an autoencoder before sending it to themachine learning model against adversarial machine learning attacks. We have investigated that the classiﬁer accuracychanges for linear and neural network machine learning models. We have also applied non-targeted and targeted attacksto multi-class logistic regression. Besides, FGSM, T-FGSM, and BIM attacks have been applied to the neural networkmachine learning model. The effects of these attacks on implementing autoencoder as a ﬁlter have been analyzed forboth machine learning models. We have observed that the robustness provided by autoencoder after adversarial attackscan be seen by accuracy drop between 0.1 and 0.2 percent while the models without autoencoder suffered tremendousaccuracy drops hitting accuracy score between 0.6 and 0.3 in some cases even 0.1. We have proposed general, generic,30

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

858 0 4 3 2 6 47 0 3 1

25 5 803 109 11 53 70 3 35 11

10 4 116 646 10 224 7 11 185 13

75 4 40 8 8 18 810 0 6 3 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 69: Confusion matrix and classiﬁcation report of the neural network model with variational autoencoder afterFGSM attack

Predict ed Values A c t u a l V a l u es

972 1029 775 914 912 889 948 953 901 853 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 70: Confusion matrix and classiﬁcation report of the neural network model without variational autoencoder afterT-FGSM attack

Predict ed Values A c t u a l V a l u es

858 0 3 4 0 6 40 0 2 1

23 3 788 90 16 42 57 2 31 12

18 40 64 195 58 534 36 45 534 21

70 6 44 4 9 16 813 0 6 3 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 71: Confusion matrix and classiﬁcation report of the neural network model with variational autoencoder afterT-FGSM attack

Predict ed Values A c t u a l V a l u es

266 213 15 217 50 5 180 143 450 11

45 1 0 302 5 15 171 9 107 53

290 19 18 1 42 76 12 0 43 2

290 130 47 233 4 6 21 48 272

62 266 211 192 86 338 161 79 15 97

281 2 44 222 442 140 7 309 136 20

Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 72: Confusion matrix and classiﬁcation report of the neural network model without variational autoencoder afterbasic iterative method attack 31

PREPRINT - O

CTOBER

20, 2020

Predict ed Values A c t u a l V a l u es

858 0 8 4 2 6 49 0 3 1

26 5 780 117 15 53 78 3 35 11

74 3 35 8 6 19 796 0 8 3 Precision Recall F1-Score Support

Micro Avg

Macro Avg

Weighted Avg

Figure 73: Confusion matrix and classiﬁcation report of the neural network model with variational autoencoder afterbasic iterative method attackFigure 74: Because of MNIST dataset, our latent space is two-dimensional. One is to look at the neighborhoods ofdifferent classes on the latent 2D plane. Each of these colored clusters are a type of digit. Close clusters are digits thatare structurally similar, they are digits that share information in the latent space.and easy to implement protection against adversarial machine learning model attacks. It will be beneﬁcial to remind thatall autoencoders in this study were trained with the epoch of 35 with 1024 sized batches, so the results can be improvedby increasing the number of epochs. In conclusion, we have discussed that autoencoders provide robustness againstadversarial machine learning attacks to machine learning models for both linear models and neural network models. Wehave examined the other types of autoencoders, which are mostly called vanilla autoencoders, give the best results. Thesecond most accurate autoencoder type is sparse autoencoders, and the third most accurate is denoising autoencoders,which gives similar results with the sparse autoencoders. We have observed that the worst autoencoder type for thisprocess is variational autoencoders because variational autoencoders are generative models used in different areas.In summary, the natural practice of implementing an autoencoder between data and machine learning models canprovide considerable defense and robustness against attacks. These autoencoders can be easily implemented withlibraries such as TensorFlow and Keras. Through the results of this review, it is evident that autoencoders can be used inany machine learning model easily because of their implementation as a separate layer.

Acknowledgement

Acknowledgement text. 32

PREPRINT - O

CTOBER

20, 2020Figure 75: Due to VAE is a generative model, we can also generate new Mnist digits using latent plane, sampling latentpoints at regular intervals, and generating the corresponding digit for each of these points.

References [1] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds,P. Georgiev, et al. , “Grandmaster level in starcraft ii using multi-agent reinforcement learning,”

Nature , vol. 575,no. 7782, pp. 350–354, 2019.[2] F. S. Board, “Artiﬁcial intelligence and machine learning in ﬁnancial services: Market developments and ﬁnancialstability implications,”

Financial Stability Board , p. 45, 2017.[3] H. Z. S. S. T. K. J. Saito, “Mode-adaptive neural networks for quadruped motion control,”

ACM Trans. Graph. ,vol. 37, pp. 145:1–145:11, July 2018.[4] T.-C. Wang, M.-Y. Liu, A. Tao, G. Liu, J. Kautz, and B. Catanzaro, “Few-shot video-to-video synthesis,” arXivpreprint arXiv:1910.12713 , 2019.[5] M. Bakator and D. Radosav, “Deep learning and medical diagnosis: A review of literature,”

Multimodal Technolo-gies and Interaction , vol. 2, p. 47, 2018.[6] B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, and I. Mordatch, “Emergent tool use frommulti-agent autocurricula,” arXiv preprint arXiv:1909.07528 , 2019.[7] A. Siddiqi, “Adversarial security attacks and perturbations on machine learning and deep learning methods,”

CoRR , July 2019.[8] K. A. R. T. Kolagari and M. Zoppelt, “Attacks on machine learning: Lurking danger for accountability,”

CoRR ,Jan. 2019.[9] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li, “Manipulating machine learning: Poisoningattacks and countermeasures for regression learning,” in ,pp. 19–35, 2018.[10] S. P. H. R. S. L. S. Lee and J. Lee, “Learning predict-and-simulate policies from unorganized human motion data,”

ACM Transactions on Graphics , vol. 38, pp. 1–11, Nov. 2019.[11] L. Y. Z. S. Y. Zheng and K. Zhou, “Dynamic hair modeling from monocular videos using deep neural networks,”

ACM Trans. Graph. , vol. 38, pp. 235:1–235:12, Nov. 2019.[12] N. Carlini and D. A. Wagner, “Towards evaluating the robustness of neural networks,”

CoRR , vol. abs/1608.04644,2016.[13] A. A. N. Carlini and D. A. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defensesto adversarial examples,”

CoRR , vol. abs/1802.00420, Feb. 2018.33

PREPRINT - O

CTOBER

20, 2020[14] A. M. A. M. L. S. D. Tsipras and A. Vladu, “Towards deep learning models resistant to adversarial attacks,”

CoRR ,vol. abs/1706.06083, 2017.[15] A. E. R. T. S. G. M. P. M. C. S. Z. N. O. Tippenhauer, “Real-time evasion attacks with physical constraints ondeep learning-based anomaly detectors in industrial control systems,”

CoRR , vol. abs/1907.07487, July 2019.[16] M. J. B. G. A. N. Asokan, “Making targeted black-box evasion attacks effective and efﬁcient,”

CoRR ,vol. abs/1906.03397, 2019.[17] A. C. A. O. C. Nita-Rotaru and B. Kim, “Are self-driving cars secure? evasion attacks against deep neural networksfor steering angle prediction,”

CoRR , vol. abs/1904.07370, Apr. 2019.[18] L. H. A. D. J. B. N. B. Rubinstein and J. D. Tygar, “Adversarial machine learning,” in

Proceedings of the 4th ACMWorkshop on Security and Artiﬁcial Intelligence , AISec ’11, (New York, NY, USA), pp. 43–58, ACM, Oct. 2011.[19] X. Y. P. H. Q. Z. R. R. Bhat and X. Li, “Adversarial examples: Attacks and defenses for deep learning,”

CoRR ,vol. abs/1712.07107, July 2017.[20] S. Sivaslıoglu, F. O. Catak, and E. Gul, “Incrementing adversarial robustness with autoencoding for machinelearning model attacks,” in ,pp. 1–4, 2019.[21] M. Aladag, F. O. Catak, and E. Gul, “Preventing data poisoning attacks by using generative models,” in , pp. 1–5, 2019.[22] J. G. Y. Z. X. H. Y. Jiang and J. Sun, “Rnn-test: Adversarial testing framework for recurrent neural networksystems,”

CoRR , Nov. 2019.[23] M. Isakov, V. Gadepally, K. M. Gettings, and M. A. Kinsy, “Survey of attacks and defenses on edge-deployedneural networks,” in , pp. 1–8, 2019.[24] B. Li and Y. Vorobeychik, “Evasion-robust classiﬁcation on binary domains,”

ACM Trans. Knowl. Discov. Data ,vol. 12, no. 4, pp. 50:1–50:32, 2018.[25] K. Y. X. V. T. N. M. Shaﬁullah and A. Madry, “Training for faster adversarial robustness veriﬁcation via inducingrelu stability,”

CoRR , vol. abs/1809.03008, Sept. 2018.[26] F. Y. C. L. Y. W. L. Zhao and X. Chen, “Interpreting adversarial robustness: A view from decision surface in inputspace,”

CoRR , vol. abs/1810.00144, Sept. 2018.[27] L. P. J. D. R. Sukthankar and A. Gupta, “Robust adversarial reinforcement learning,”

CoRR , vol. abs/1703.02702,Mar. 2017.[28] S. Harding, P. Rajivan, B. I. Bertenthal, and C. Gonzalez, “Human decisions on targeted and non-targetedadversarial sample.,” in

CogSci , 2018.[29] W. Bai, C. Quan, and Z. Luo, “Alleviating adversarial attacks via convolutional autoencoder,” in , pp. 53–58, IEEE, 2017.[30] R. S. R. Mahfuz and A. E. Gamal, “Combatting adversarial attacks through denoising and dimensionality reduction:A cascaded autoencoder approach,”

CoRR , vol. abs/1812.03087, Dec. 2018.[31] I. Chen and B. Sirkeci-Mergen, “A comparative study of autoencoders against adversarial attacks,” nt’l Conf. IP,Comp. Vision, and Pattern Recognition , 2018.[32] G. G. P. Tabacof and E. Valle, “Adversarial attacks on variational autoencoders,”

CoRR , vol. abs/1806.04646,2018.[33] S. Y. J. C. Príncipe, “Understanding autoencoders with information theoretic concepts,”

CoRR , vol. abs/1804.00057,Mar. 2018.[34] A. K. I. J. G. S. B. Y. D. F. L. M. L. T. P. J. Z. X. H. C. X. J. W. Z. Z. Z. R. A. L. Y. S. H. Y. Z. Y. Z. Z. H. J. L. Y.B. T. A. S. Tokui and M. Abe, “Adversarial attacks and defences competition,”

CoRR , vol. abs/1804.00097, Mar.2018.[35] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprintarXiv:1412.6572 , 2014.[36] J. Han and C. Moraga, “The inﬂuence of the sigmoid function parameters on the speed of backpropagationlearning,” in

International Workshop on Artiﬁcial Neural Networks , pp. 195–201, Springer, 1995.[37] A. F. Agarap, “Deep learning using rectiﬁed linear units (relu),”

CoRR , vol. abs/1803.08375, 2018.[38] J. Schmidhuber, “Deep learning in neural networks: An overview,”

CoRR , vol. abs/1404.7828, Apr. 2014.34

PREPRINT - O

CTOBER

20, 2020[39] C. N. W. I. A. Gachagan and S. Marshall, “Activation functions: Comparison of trends in practice and research fordeep learning,”

CoRR , vol. abs/1811.03378, Nov. 2018.[40] F. C. N. C. H. Mao and H. Hu, “Assessing four neural networks on handwritten digit recognition dataset (MNIST),”

CoRR , vol. abs/1811.08278, Nov. 2018.[41] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linearunits (elus),” arXiv preprint arXiv:1511.07289 , 2015.[42] H. Zheng, Z. Yang, W. Liu, J. Liang, and Y. Li, “Improving deep neural networks using softplus units,” in , pp. 1–4, IEEE, 2015.[43] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

International Conference on LearningRepresentations , Dec. 2014.[44] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,”