Adversarial Machine Learning: Perspectives from Adversarial Risk Analysis
AAdversarial Machine Learning:Perspectives from Adversarial Risk Analysis
David Rios Insua ∗ , Roi Naveiro † , Victor Gallego ‡ and Jason Poulos Institute of Mathematical Sciences, Spain (ICMAT-CSIC) The Statistical and Applied Mathematical Sciences Institute, NC, USA (SAMSI) Department of Statistical Science, Duke University, NC, USA
Abstract:
Adversarial Machine Learning (AML) is emerging as a major field aimed atthe protection of automated ML systems against security threats. The majority of work inthis area has built upon a game-theoretic framework by modelling a conflict between anattacker and a defender. After reviewing game-theoretic approaches to AML, we discussthe benefits that a Bayesian Adversarial Risk Analysis perspective brings when defendingML based systems. A research agenda is included.
Keywords and phrases:
Adversarial Machine Learning, Bayesian Methods, AdversarialRisk Analysis, Security.
1. Introduction
Over the last decade, an increasing number of processes are being automated through machinelearning (ML) algorithms (Breiman, 2001). It is essential that these algorithms are robust andreliable if we are to trust operations based on their output. State-of-the-art ML algorithmsperform extraordinarily well on standard data, but have recently been shown to be vulnerableto adversarial examples, data instances targeted at fooling those algorithms (Goodfellow et al.,2015). The presence of adaptive adversaries has been pointed out in areas such as spam detection(Zeager et al., 2017); fraud detection (Ko(cid:32)lcz and Teo, 2009); and computer vision (Goodfellowet al., 2015). In those contexts, algorithms should acknowledge the presence of possible adver-saries to defend against their data manipulations. Comiter (2019) provides a review from thepolicy perspective showing how many AI societal systems, including content filters, militarysystems, law enforcement systems and autonomous driving systems (ADS), are susceptible andvulnerable to attacks. As a motivating example, consider the case of fraud detection (Boltonand Hand, 2002). As ML algorithms are incorporated to such task, fraudsters learn how toevade them. For instance, they could find out that making a huge transaction increases theprobability of being detected and start issuing several smaller transactions rather than a singlelarge transaction.As a fundamental underlying hypothesis, ML systems rely on using independent and identi-cally distributed (iid) data for both training and testing. However, security aspects in ML, partof the field of adversarial machine learning (AML), question this iid hypothesis given the pres-ence of adaptive adversaries ready to intervene in the problem to modify the data and obtain abenefit. These perturbations induce differences between the training and test distributions.Stemming from the pioneering work in adversarial classification (AC) in Dalvi et al. (2004),the prevailing paradigm in AML to model the confrontation between learning-based systems andadversaries has been game theory (Menache and Ozdaglar, 2011). This entails well-known com-mon knowledge (CK) hypothesis (Hargreaves-Heap and Varoufakis, 2004) which are doubtfulin security applications as adversaries try to hide and conceal information. ∗ DRI is supported by the AXA-ICMAT Chair and the Spanish Ministry of Science program MTM2017-86875-C3-1-R. This work supported by Severo Ochoa Excellence Programme SEV-2015-0554, the EU’s Horizon 2020project 815003 Trustonomy as well as the US NSF grant DMS-1638521. † RN acknowledges support from grant FPU15-03636. ‡ VG acknowledges support from grant FPU16-05034.1 a r X i v : . [ c s . A I] M a r D. R´ıos Insua et al.
As pointed out in Fan et al. (2019), there is a need for a principled framework that guaranteesrobustness of ML against adversarial manipulations in a principled way. In this paper we aimat formulating such framework. After providing an overview of key concepts and methods inAML emphasising the underlying game theoretic assumptions, we suggest an alternative formalBayesian decision theoretic approach based on Adversarial Risk Analysis (ARA), illustrating itin supervised and reinforcement learning problems. We end up with a research agenda for theproposed framework.
2. Motivating examples
Two motivating examples serve us to illustrate key issues in AML.
Vision algorithms are at the core of many AI systems such as ADS (Bojarski et al., 2016) and(McAllister et al., 2017). The simplest and most notorious attack examples to such algorithmsconsist of modifications of images in such a way that the alteration becomes imperceptible tothe human eye, yet drives a model trained on millions of images to misclassify the modifiedones, with potentially relevant security consequences.
Example.
With a relatively simple convolutional neural network model, we are able to ac-curately predict 99% of the handwritten digits in the MNIST data set (LeCun et al., 1998).However, if we attack those data with the fast gradient sign method in Szegedy et al. (2014),its accuracy is reduced to 62%. Figure 1 provides an example of an original MNIST image anda perturbed one. Although to our eyes both images look like a 2, our classifier rightly identifiesa 2 in the first case (Fig. 1a), whereas it suggests a 7 after the perturbation (Fig. 1b). (a) Original image (b) Perturbed image
Fig 1: An original input and its attacked version. (cid:52)
This type of attacks are easily built through low cost computational methods. However, theyrequire the attacker to have precise knowledge about the architecture and weights of the corre-sponding predictive model. This is debatable in security settings, a main driver of this paper.
Consider spam detection problems, an example of content filter systems. We employ the utilitysensitive naive Bayes (NB) classifier, a standard spam detection approach (Song et al., 2009),studying its degraded performance under the 1 Good Word Insertion attacks in Naveiro et al.(2019a). In this case, the adversary just attacks spam emails adding to them a good word, thatis, a word which is common in legit emails but not in spam ones.
ML from ARA Example.
We use the Spambase Data Set from the UCI ML repository (Lichman, 2013).Accuracy and False Positive and Negative Rates are reported in Table 1, estimated via repeatedhold-out validation over 100 repetitions (Kim, 2009). We provide as well the standard deviationof each metric, also estimated through repeated hold-out validation.
Accuracy FPR FNRNB-Plain . ± .
009 0 . ± .
100 0 . ± . NB-Tainted . ± .
101 0 . ± .
100 0 . ± . Table 1
Performance of utility sensitive NB in clean and attacked data.
NB-Plain and NB-Tainted refer to results of NB on clean and attacked data, respectively.Observe how the presence of an adversary degrades NB accuracy: FPR coincides for NB-Plainand NB-Tainted as the adversary is not modifying innocent instances. However, false negativesundermine NB-Tainted performance, as the classifier is not able to identify a large proportionof attacked spam emails. (cid:52)
Both examples show how the performance of ML based systems may degrade under attacks.This suggests the need to take into account the presence of adversaries as done in AML.
3. Adversarial Machine Learning: a review
In this section, we present key results and concepts in AML. Further perspectives may be foundin recent reviews by Vorobeychik and Kantarcioglu (2018), Joseph et al. (2019), Biggio and Roli(2018), Dasgupta and Collins (2019) and Zhou et al. (2018). We focus on key ideas in the briefhistory of the field to motivate a reflection leading to our alternative approach in Sections 4 and5. We remain at a conceptual level, with relevant modelling and computational ideas in Section4.
Classification is one of the most widely used instances of supervised learning (Bishop, 2006).Most efforts in this field have focused on obtaining more accurate algorithms which, however,have largely ignored the presence of adversaries who actively manipulate data to fool a classifierin search of a benefit. Dalvi et al. (2004) introduced adversarial classification (AC), a pioneeringapproach to enhance classification algorithms when an adversary is present. They view AC asa game between a classifier, also referred to as defender ( D , she), and an adversary, ( A , he).The classifier aims at finding an optimal classification strategy against A ’s optimal attacks.Computing Nash equilibria (NE) in such general games quickly becomes very complex. Thus,they propose a forward myopic version in which D first assumes that the data is untainted,computing her optimal classifier; then, A deploys his optimal attack against it; subsequently, D implements the best response classifier against such attack, and so on. This approach assumesCK, or the assumption that all parameters of both players are known to each other. Althoughstandard in game theory, this assumption is actually unrealistic in the security settings typicalof AML.Stemming from it, there has been an important literature in AC reviewed in Biggio et al.(2014) or Li and Vorobeychik (2014). Subsequent approaches have focused on analyzing at-tacks over algorithms and upgrading their robustness against such attacks. To that end, strongassumptions about the adversary are typically made. For instance, Lowd and Meek (2005)consider that the adversary can send membership queries to the classifier to issue optimal at-tacks, then proving the vulnerability of linear classifiers. A few methods have been proposed D. R´ıos Insua et al. to robustify classification algorithms in adversarial environments, most of them focused onapplication-specific domains, as Ko(cid:32)lcz and Teo (2009) in spam detection. Vorobeychik and Li(2014) study the impact of randomization schemes over classifiers against attacks proposingan optimal randomization defense. Other approaches have focused on improving Dalvi et al.(2004) model but, as far as we know, none got rid of the unrealistic CK assumptions. As anexample, Kantarcıo˘glu et al. (2011) use a Stackelberg game in which both players know eachother’s payoff functions.
An important source of AML cases are Adversarial Prediction Problems (APPs), Br¨uckner andScheffer (2011).They focus on building predictive models where an adversary exercises somecontrol over the data generation process, jeopardising standard prediction techniques. APPsmodel the interaction between the predictor and the adversary as a two agent game, a defenderthat aims at learning a parametric predictive model and an adversary trying to transform thedistribution governing data at training time. The agents’ costs depend on both the predictivemodel and the adversarial data transformation. Both agents aim at optimizing their operationalcosts, which is a function of their expected losses under the perturbed data distribution. As thisdistribution is unknown, the agents actually optimize their regularized empirical costs, basedon the training data. The specific optimization problem depends on the case considered.First, in
Stackelberg prediction games , Br¨uckner and Scheffer (2011) assume full informationof the attacker about the predictive model used by the defender who, in addition, is assumedto have perfect information about the adversary’s costs and action space. D acts first choosingher parameter; then, A , after observing this decision, chooses the optimal data transformation.Finding NE in these games requires solving a bi-level optimization problem, optimizing thedefender’s cost function subject to the adversary optimizing his, after observing the defender’schoice. As nested optimization problems are intrinsically hard, the authors restrict to simpleclasses where analytical solutions can be found. On the other hand, in
Nash prediction games (Br¨uckner et al., 2012) both agents act simultaneously. The main concern is then seeking forNE. The authors provide conditions for their existence and uniqueness in specific classes ofgames.
Much less AML work is available in relation with unsupervised learning. A relevant proposalis Kos et al. (2018) who describe adversarial attacks to generative models such as variationalautoencoders (VAEs) or generative adversarial networks used in density estimation. Their focusis on slightly perturbing the input to the models so that the reconstructed output is completelydifferent from the original input.To the best of our knowledge, Biggio et al. (2013) first studied clustering under adversarialdisturbances. They suggest a framework to create attacks during training that significantlyalter cluster assignments, as well as an obfuscation attack that slightly perturbs an input to beclustered in a predefined assignment, showing that single-link hierarchical clustering is sensitiveto these attacks.Lastly, adversarial attacks on autoregressive (AR) models have started to attract interest.Alfeld et al. (2016) describe an attacker manipulating the inputs to drive the latent space ofa linear AR model towards a region of interest. Papernot et al. (2016) propose adversarialperturbations over recurrent neural networks. More recently, Naveiro and Insua (2019) provide efficient gradient methods to approximate solutions in moregeneral problems.
ML from ARA The prevailing solution approach in reinforcement learning (RL) is Q -learning (Sutton et al.,1998). Deep RL has faced an incredible growth (Silver et al., 2017); however, the correspondingsystems may be targets of attacks and robust methods are needed (Huang et al., 2017).An AML related field of interest is multi-agent RL (Bu¸soniu et al., 2010). Single-agent RLmethods fail in multi-agent settings, as they do not take into account the non-stationarity dueto the other agents’ actions: Q -learning may lead to suboptimal results. Thus, we must reasonabout and forecast the adversaries’ behaviour. Several methods have been proposed in the AIliterature (Albrecht and Stone, 2018). A dominant approach draws on fictitious play (Brown,1951) and consists of assessing the other agents computing their frequencies of choosing variousactions. Using explicit representations of the other agents’ beliefs about their opponents couldlead to an infinite hierarchy of decision making problems, as explained in Section 5.3.The application of these tools to Q -learning in multi-agent settings remains largely unex-plored. Relevant extensions have rather focused on Markov games. Three well-known solutionsare minimax- Q learning (Littman, 1994), which solves at each iteration a minimax problem;Nash- Q learning (Hu and Wellman, 2003), which generalizes to the non-zero sum case; or friend-or-foe- Q learning (Littman, 2001), in which the agent knows in advance whether her opponentis an adversary or a collaborator. One of the most influential concepts triggering the recent interest in AML are adversarialexamples . They were introduced by Szegedy et al. (2014) within neural network (NN) models,as perturbed data instances obtained through solving certain optimization problem. NNs arehighly sensitive to such examples (Goodfellow et al., 2015).The usual framework for robustifying models against these examples is adversarial training(AT) (Madry et al., 2018), based on solving a bi-level optimization problem whose objectivefunction is the empirical risk of a model under worst case data perturbations. AT approximatesthe inner optimization through a projected gradient descent (PGD) algorithm, ensuring thatthe perturbed input falls within a tolerable boundary. The complexity of this attack dependson the chosen norm. However, recent pointers urge modellers to depart from using norm basedapproaches (Carlini et al., 2019) and develop more realistic attack models, as in Brown et al.(2017) adversarial patches.
The above subsections reviewed key developments in AML from a historical perspective. Wesummarise now the pipeline typically followed to improve security of ML systems (Biggio andRoli, 2018).
Modelling threats.
This activity is critical to ensure ML security in adversarial environ-ments. In general, we should assess three attacker features.First, we should consider his goals , which vary depending on the setting, ranging from moneyto causing fatalities, going through damaging reputation (Couce-Vieira et al., 2019). Prior todeploying an ML system, it is crucial to guarantee its robustness against attackers with themost common goals. For instance, in fraud detection the attacker usually obfuscates fraudulenttransactions to make the system classify them as legitimate ones in search of an economic It is possible to frame the inner optimization problem as a mixed integer linear program and use generalpurpose optimizers to search for adversarial examples, see Kolter and Madry (2018). Note though that formoderate to large networks in mainstream tasks, exact optimization is still computationally intractable.
D. R´ıos Insua et al. benefit: a fraud detection system should be robust against such attacks. In general, these areclassified following two criteria regarding their goals: violation type and attack specificity . Forthe first criterion, we distinguish between integrity violations , aimed at moving the prediction ofparticular instances towards the attacker’s target, e.g. have malicious samples misclassified aslegitimate; availability violations , aimed at increasing the predictive error to make the systemunusable; and privacy violations (a.k.a, exploratory attacks) to gain information about the MLsystem. In relation with the second one, we distinguish between targeted , which address justa few, even one, defenders, and indiscriminate attacks, affecting many defenders in a randommanner.Next, we consider that at the time of attacking, the adversary could have knowledge aboutdifferent aspects of the ML system such as the training data used, or the features. Thus, weclassify adversarial threats depending on which aspects is the attacker assumed to have knowl-edge about. At one end of the spectrum, we find white box or perfect knowledge attacks: theadversary knows every aspect of the ML system. This is almost never the case in actual scenar-ios, except perhaps for insiders. Yet they could be useful in sequential settings where the MLsystem moves first, training an algorithm to find its specific parameters. The adversary, whomoves afterwards, has some time to observe the behaviour of the system and learn about it. At the other end, black box or zero knowledge attacks assume that the adversary has capabilitiesto query the system but does not have any information about the data, the feature space orthe particular algorithms used. This is the most reasonable assumption in which attacking anddefending decisions are eventually made simultaneously. In between attacks are called gray box or limited knowledge . This is the most common type of attacks in security settings, especiallywhen attacking and defending decisions are made sequentially but there is private informationthat the intervening agents are not willing to share.Finally, we classify the attacks depending on the capabilities of the adversary to influence ondata. In some cases, he may obfuscate training data to induce errors during operation, called poisoning attacks . On the other hand, evasion attacks have no influence on training data, butperform modifications during operation, for instance when trying to evade a detection system.These data alteration or crafting activities are the typical ones in AML and we designate themas coming from a data-fiddler . But there could be attackers capable of changing the underlyingstructure of the problem, affecting process parameters, called structural attackers . Moreover,some adversaries could be making decisions in parallel to those of the defender with the agents’losses depending on both decisions, which we term parallel attackers . Simulating attacks.
The standard approach, Biggio and Roli (2018), formalizes poisoningand evasion attacks in terms of constrained optimization problems with different assumptionsabout the adversary’s knowledge, goals and capabilities. In general, the objective function insuch problems assesses attack effectiveness, taking into account the attacker’s goals and knowl-edge. The constraints frame assumptions such as the adversary wanting to avoid detection orhaving a maximum attacking budget. Protecting learning algorithms.
Two types of defence methods have been proposed.
Reac-tive defences aim to mitigate or eliminate the effects of an eventual attack. They include timelydetection of attacks, e.g. Naveiro et al. (2019b); frequent retraining of learning algorithms; orverification of algorithmic decisions by experts.
Proactive defences aim to prevent attack execu-tion. They can entail security-by-design approaches such as explicitly accounting for adversarial However, although the adversary may have some knowledge, assuming that this is perfect is not realistic andhas been criticized, even in the pioneering Dalvi et al. (2004). Some attackers could combine the three capabilities in certain scenarios. For example, in a cybersecurityproblem an attacker might add spam modifying its proportion (structural); alter some spam messages (data-fiddler); and, in addition, undertake his own business decisions (parallel). A more natural and general formulation of the attacker’s problem is through a statistical decision theoreticperspective (French and Rios Insua, 2000), see Sections 4 and 5.
ML from ARA manipulations, e.g., Naveiro et al. (2019a), or developing provably secure algorithms againstspecific perturbations (e.g., Gowal et al. (2018)); or security-by-obscurity techniques such asrandomization of the algorithm response, or gradient obfuscation to make attacks less likely tosucceed (Athalye et al., 2018). We have provided an overview of key concepts in AML. Practically all ML methodologies havebeen touched upon from an adversarial perspective including, to quote a few approaches notmentioned above, logistic regression (Feng et al., 2014); support vector machines (Biggio et al.,2012); or latent Dirichlet allocations (Mei and Zhu, 2015). As mentioned, it is very relevant froman applied point of view in areas like national security and cybersecurity. Of major importancein AML is the cleverhans (Papernot et al., 2018) library, built on top of the tensorflow framework, aimed at accelerating research in developing new attack threats and more robustdefences for deep neural models.This is a difficult area which is rapidly evolving and leading to an arms race in which thecommunity alternates a cycle of proposing attacks and implementing defences that deal withthe previous ones (Athalye et al., 2018). Thus, it is important to develop sound techniques. Notethough, that stemming from the pioneering Dalvi et al. (2004), most of AML research has beenframed within a standard game theory approach pervaded by NE and refinements. However,these entail CK assumptions which are hard to maintain in the security contexts typical ofAML applications. We could argue that CK is too commonly assumed. We propose a decisiontheoretic based methodology to solve AML problems, adopting an ARA perspective to modelthe confrontation between attackers and defenders mitigating questionable CK assumptions,Rios Insua et al. (2009) and Banks et al. (2015).
4. ARA templates for AML
ARA makes operational the Bayesian approach to games, Kadane and Larkey (1982) and Raiffa(1982), facilitating a procedure to predict adversarial decisions, simulating them to obtain fore-casts and protecting the learning system, as we formalise in Section 4.4. We provide before acomparison of game-theoretic and ARA concepts over three template AML models associated,respectively, to white, black and gray box attacks. They constitute basic structures which maybe simplified or made more complex, through removing or adding nodes, and combined to ac-commodate specific AML problems. In all models, there is a defender who chooses her decision d ∈ D and an attacker who chooses his attack a ∈ A . In the AML jargon, the defender would bethe learning system (or the organisation deploying it) and the decisions she makes could refer tothe various choices required, including the data set used, the models chosen, or the algorithmsemployed to estimate the parameters. In turn, the attacker would be the attacking organisation(or their corresponding attacking system); his decisions refer to the data sets chosen to attack orthe period at which he decides to make it, but could also refer to structural or parallel decisionsas in Section 3.6. The involved agents are assumed to maximise expected utility (or minimiseexpected loss) (French and Rios Insua, 2000).We use bi-agent influence diagrams (BAIDS) (Banks et al., 2015) to describe the problems:they include circular nodes representing uncertainties; hexagonal utility nodes, modelling pref-erences over consequences; and, square nodes portraying decisions. The arrows point to decisionnodes (meaning that such decisions are made knowing the values of predecessors) or chance andvalue nodes (the corresponding events or consequences are influenced by predecessors). Differentcolours suggest issues relevant to just one of the agents (white, defender; gray, attacker); stripedones are relevant to both agents. https://github.com/tensorflow/cleverhans D. R´ıos Insua et al.
We start with sequential games. D chooses her decision d and, then, A chooses his attack a , afterhaving observed d . This exemplifies white-box attacks, as the attacker has full information of D ’s action at the time of making his decision. As an example, a classifier chooses and estimatesa parametric classification algorithm and an attacker, who has access to the specific algorithm,sends examples to try to fool the classifier. These games have received various names likesequential Defend-Attack (Brown et al., 2006) or Stackelberg (Gibbons, 1992). Their BAID isin Figure 2. Arc D - A reflects that D ’s choice is observed by A . The consequences for bothsystems depend on an attack outcome θ ∈ Θ. Each agent has its assessment on the probabilityof θ , which depends on d and a , respectively called p D ( θ | d, a ) and p A ( θ | d, a ). Similarly, theirutility functions are u D ( d, θ ) and u A ( a, θ ). Θ D AU D U A Fig 2: Basic two player sequential defend-attack gameThe basic game theoretic solution does not require A to know D ’s judgements, as he observesher decisions. However, D must know those of A , the CK condition in this case. For its solu-tion, we compute both agents’ expected utilities at node Θ: ψ A ( a, d ) = (cid:82) u A ( a, θ ) p A ( θ | d, a ) d θ ,and ψ D ( a, d ) = (cid:82) u D ( d, θ ) p D ( θ | d, a ) d θ . Next, we find a ∗ ( d ) = arg max a ∈A ψ A ( d, a ), A ’s bestresponse to D ’s action d . Then, D ’s optimal action is d ∗ GT = arg max d ∈D ψ D ( d, a ∗ ( d )). The pair( d ∗ GT , a ∗ ( d ∗ GT )) is a NE and, indeed, a sub-game perfect equilibrium. Example.
The Stackelberg game in Br¨uckner and Scheffer (2011), Section 3.2, modeling theconfrontation in an APP is a particular instance in which costs are minimized with no uncer-tainty about the outcome θ . D chooses the parameters d of a predictive model. A observes d andchooses the transformation, converting data T into a ( T ). Let (cid:98) c i ( d, a ( T )) be the (regularized)empirical cost of the i -th agent, i ∈ { A, D } . Then, they propose a pair [ d ∗ , a ∗ ( T ( d ∗ ))] solvingarg min d (cid:98) c D ( d, a ∗ ( T ( d )))s.t. a ∗ ( T ( d )) ∈ arg min a ( T ) (cid:98) c A ( d, a ( T )) , which provides a NE. (cid:52) Example.
Adversarial examples can be cast as well through sequential games. A finds thebest attack which leads to perturbed data instances obtained from solving the problemmin (cid:107) δ (cid:107)≤ (cid:15) (cid:98) c A ( h θ ( a ( x )) , y ) , with a ( x ) = x + δ , a suitable perturbation of the original data instance x ; h θ ( x ), the output ofa predictive model with parameters θ ; and (cid:98) c A ( h θ ( x ) , y ) = − (cid:98) c D ( h θ ( x ) , y ), the cost of classifying x as of being of class h θ ( x ) when the actual label is y .In turn, robustifying models against those perturbations through AT aims at solving theproblem min θ E ( x,y ) ∼D (cid:2) max (cid:107) δ x (cid:107)≤ (cid:15) (cid:98) c D ( h θ ( a ( x )) , y ) (cid:3) : we minimize the empirical risk of a model ML from ARA under worst case perturbations of the data D . The inner maximization problem is solved throughPGD, with iterations x t +1 = Π B ( x ) ( x t − α ∇ x (cid:98) c A ( h θ ( x t ) , y )) , where Π is a projection operatorensuring that the perturbed input falls within a tolerable boundary B ( x ). After T iterations, wemake a ( x ) = x T and optimize with respect to θ . Madry et al. (2018) argue that the PGD attackis the strongest one using only gradient information from the target model; however, there hasbeen evidence that it is not sufficient for full defence of neural models (Gowal et al., 2018). Thecomplexity of the above attack depends on the chosen norm; for instance, if we resort to smallperturbations under (cid:96) ∞ norm, the update simplifies to x := x − (cid:15) sign ∇ x (cid:98) c A ( h θ ( x ) , y ) , making itattractive because of its low computational burden. The (cid:96) norm can also be considered, leadingto updates x := x − (cid:15) ∇ x (cid:98) C A ( h θ ( x ) ,y ) (cid:107)∇ x (cid:98) C A ( h θ ( x ) ,y ) (cid:107) . (cid:52) The above CK condition is weakened if we assume only partial information, leading to gamesunder incomplete information (Harsanyi, 1967), but we defer their discussion until Section 4.3.These CK conditions, and those used under incomplete information, are doubtful as we do notactually have available the attacker his judgements. Moreover, they could lack robustness toperturbations in such judgements, Ekin et al. (2019).Alternatively, we perform a Bayesian decision theoretic approach based on ARA. We weakenthe CK assumption: the defender does not know ( p A , u A ) and faces the problem in Figure 3a.To solve it, she needs p D ( a | d ), her assessment of the probability that A will implement attack a after observing d . Then, her expected utility would be ψ D ( d ) = (cid:82) ψ D ( a, d ) p D ( a | d ) d a withoptimal decision d ∗ ARA = arg max d ∈D ψ D ( d ). This solution does not necessarily correspond to aNE (both solutions are based on different information and assumptions). Θ D AU D (a) Decision problem seen by defender. Θ D AU A (b) Defender analysis of attacker problem. Fig 3: Influence Diagrams for defender and attacker problems.To elicit p D ( a | d ), D benefits from modeling A ’s problem, with his ID in Figure 3b. For this, shewould use all information available about p A and u A ; her uncertainty about ( p A , u A ) is modelledthrough a distribution F = ( U A , P A ) over the space of utilities and probabilities. This inducesa distribution over A ’s expected utility, where his random expected utility would be Ψ A ( a, d ) = (cid:82) U A ( a, θ ) P A ( θ | a, d ) d θ . Then, D would find p D ( a | d ) = P F [ a = arg max x ∈A Ψ A ( x, d )], in thediscrete case and, similarly, in the continuous one. In general, we would use Monte Carlo (MC)simulation to approximate p D ( a | d ). Consider next simultaneous games: the agents decide their actions without knowing the onechosen by each other. Black-box attacks are assimilated to them. As an example, a defender fitsa classification algorithm; an attacker, who has no information about it, sends tainted examplesto try to outguess the classifier. Their basic template is in Fig. 4. See a cybersecurity example in Rios Insua et al. (2019).0
D. R´ıos Insua et al. Θ D AU D U A Fig 4: Basic two player simultaneous defend-attack game.Suppose the judgements from both agents, ( u D , p D ) and ( u A , p A ) respectively, are disclosed.Then, both A and D know the expected utility that a pair ( d, a ) would provide them, ψ A ( d, a )and ψ D ( d, a ). A NE ( d ∗ , a ∗ ) in this game satisfies ψ D ( d ∗ , a ∗ ) ≥ ψ D ( d, a ∗ ) ∀ d ∈ D and ψ A ( d ∗ , a ∗ ) ≥ ψ A ( d ∗ , a ) ∀ a ∈ A . Example.
Nash prediction games are particular instances of simultaneous defend-attack gameswith sure outcomes. As in an APP, agents minimize (regularized empirical) costs, (cid:98) c D ( d, a ( T ))and (cid:98) c A ( d, a ( T )), with a ( T ) being the attacked dataset. Under CK of both players’ cost func-tions, a NE [ d ∗ , a ∗ ( T )] satisfies d ∗ ∈ arg min d (cid:98) C D ( d, a ∗ ( T )), a ∗ ( T ) ∈ arg min a ( T ) (cid:98) C A ( d ∗ , a ( T )) . (cid:52) If utilities and probabilities are not CK, we may proceed modelling the game as one withincomplete information using the notion of types: each player will have a type known to himbut not to the opponent, representing private information. Such type τ i ∈ T i determines theagent’s utility u i ( d, θ, τ i ) and probability p i ( θ | d, a, τ i ), i ∈ { A, D } . Harsanyi proposes Bayes-Nash equilibria (BNE) as their solutions, still under a strong CK assumption: the adversaries’beliefs about types are CK through a common prior π ( τ D , τ A ) (moreover, the players’ beliefsabout other uncertainties in the problem are also CK). Define strategy functions by associatinga decision with each type, d : τ D → d ( τ D ) ∈ D , a : τ A → a ( τ A ) ∈ A . D ’s expected utilityassociated with a pair of strategies ( d, a ), given her type τ D ∈ T D , is ψ D ( d ( τ D ) , a, τ D ) = (cid:90) (cid:90) u D ( d ( τ D ) , θ, τ D ) p D ( θ | d ( τ D ) , a ( τ A ) , τ D ) π ( τ A | τ D ) d τ A d θ. Similarly, we compute the attacker’s expected utility ψ A ( d, a ( τ A ) , τ A ). Then, a BNE is a pair( d ∗ , a ∗ ) of strategy functions satisfying ψ D ( d ∗ ( τ D ) , a ∗ , τ D ) ≥ ψ D ( d ( τ D ) , a ∗ , τ D ) , ∀ τ D ψ A ( d ∗ , a ∗ ( τ A ) , τ A ) ≥ ψ A ( d ∗ , a ( τ A ) , τ A ) , ∀ τ A for every d and every a , respectively.The common prior assumptions are still unrealistic in AML security contexts. We thus weakenthem in supporting D . She should maximize her expected utility through(1) d ∗ = arg max d ∈D (cid:90) (cid:90) u D ( d, θ ) p D ( θ | d, a ) π D ( a ) dθda. where π D ( a ) models her beliefs about the attacker’s decision a , which we need to assess. Suppose D thinks that A maximizes expected utility, a ∗ = arg max a ∈A (cid:82) (cid:2)(cid:82) u A ( a, θ ) (cid:3) p A ( θ | d, a ) d θπ A ( d ) d d .In general, she will be uncertain about A ’s ( u A , p A , π A ) required inputs. If we model all informa-tion available to her about it through a probability distribution F ∼ ( U A , P A , Π A ), mimicking(1), we propagate such uncertainty to compute the distribution(2) A | D ∼ arg max a ∈A (cid:90) (cid:20)(cid:90) U A ( a, θ ) P A ( θ | d, a ) d θ (cid:21) Π A ( D = d ) d d. ML from ARA ( U A , P A ) could be directly elicited from D . However, eliciting Π A may require further analysisleading to an upper level of recursive thinking: she would need to think about how A analyzesher problem (this is why we condition in (2) by the distribution of D ). In the above, in orderfor D to assess (2), she would elicit ( U A , P A ) from her viewpoint, and assess Π A ( D ) throughthe analysis of her decision problem, as thought by A . This reduces the assessment of Π A ( D )to computing D | A ∼ arg max d ∈D (cid:82) (cid:82) U D ( d, θ ) P D ( θ | d, a ) dθ Π D ( A = a ) d a, assuming sheis able to assess Π D ( A ), where A represents A ’s random decision within D ’s second levelof recursive thinking. For this, D needs to elicit ( U D , P D ) ∼ G , representing her knowledgeabout how A estimates u D ( d, a ) and p D ( θ | d, a ), when she analyzes how the attacker thinksabout her decision problem. Again, eliciting Π D ( A ) might require further thinking from D ,leading to a recursion of nested models, connected with the level- k thinking concept in Stahland Wilson (1994), which would stop at a level in which D lacks the information necessaryto assess the corresponding distributions. At such point, she could assign a non-informativedistribution (French and Rios Insua, 2000). Further details may be seen in Rios and Rios Insua(2012). Our final template is the sequential Defend-Attack model with defender private information.Gray-box attacks are assimilated to them. As an example, a defender estimates the parametersof a classification algorithm and an attacker, with no access to the algorithm but knowing thedata used to train it, sends examples to try to fool the classifier. Fig. 5 depicts the template,with private information represented by V . Arc V − D reflects that v is known by D when shemakes her decision; the lack of arc V − A , that v is not known by A when making his decision.The uncertainty about the outcome θ depends on the actions by A and D , as well as on V . Theutility functions are u D ( d, θ, v ) and u A ( a, θ, v ). Θ D AU D U A V Fig 5: Basic template for sequential defend-attack game with private information.Standard game theory solves this model as a signaling game (Aliprantis and Chakrabarti,2002). For a more realistic approach, we weaken the required CK assumptions. Assume for nowthat D has assessed p D ( θ | d, a, v ), u D ( d, θ, v ) and p D ( a | d ). Then, she obtains her optimal defencethrough At node Θ , compute for each ( d, a, v ) , ψ D ( d, a, v ) = (cid:82) u D ( d, θ, v ) p D ( θ | d, a, v ) d θ. At node A , compute ( d, v ) → ψ D ( d, v ) = (cid:82) ψ D ( d, a, v ) p D ( a | d ) d a At node D , solve v → d ∗ ( v ) = arg max d ∈D ψ D ( d, v ) . To assess p D ( a | d ), D could solve A ’s problem from her perspective. As A does not know v , hisuncertainty is represented through p A ( v ), describing his (prior) beliefs about v . Arrow V − D can be inverted to obtain p A ( v | d ). Note that we would still need to assess p A ( d | v ) for this. D. R´ıos Insua et al.
Should D know A ’s utility function u A ( a, θ, v ) and probabilities p A ( θ | d, a, v ) and p A ( v | d ), shewould anticipate his attack a ∗ ( d ) for any d ∈ D by At node Θ , compute for each ( d, a, v ) , ψ A ( d, a, v ) = (cid:82) u A ( d, θ, v ) p i ( θ | d, a, v ) d θ .At node V , compute for each ( d, a ) , ψ A ( d, a ) = (cid:82) ψ A ( d, a, v ) p A ( v | d ) d v .At node A , solve d → a ∗ ( d ) = arg max a ∈A ψ A ( d, a ) . However, D does not know ( p A , u A ). She has beliefs about them, say F ∼ ( P A , U A ), whichinduce distributions Ψ A ( d, a, v ) and Ψ A ( d, a ) on A ’s expected utilities throughΨ A ( d, a, v ) = (cid:90) U A ( a, θ, v ) P A ( θ | d, a, v ) d θ, Ψ A ( d, a ) = (cid:90) Ψ A ( d, a, v ) P A ( v | d ) d v. Then, D ’s predictive distribution about A ’s response to her defense choice d would be definedthrough p D ( a | d ) = P F (cid:104) a = arg max x ∈A Ψ A ( d, x ) (cid:105) , ∀ a ∈ A . To sum up, the elicitation of ( P A ( θ | d, a, v ) , P A ( v | d ) , U A ( a, θ, v )) allows the defender to solveher problem of assessing p D ( a | d ). The defender may have enough information to directly assess P A ( θ | d, a, v ) and U A ( a, θ, v ). Yet the assessment of P A ( v | d ) requires a deeper analysis, since ithas a strategic component, and would lead to a recursion as in Section 4.2, as detailed in Riosand Rios Insua (2012). Based on the three templates, we revisit now the AML pipeline in Section 3.6 proposing anARA based decision theoretic approach with the same steps.
1. Model system threats.
This entails modelling the attacker problem from the defenderperspective through an influence diagram (ID). The attacker key features are his goals, knowl-edge and capabilities. Assessing these require determining which are the actions that he mayundertake and the utility that he perceives when performing a specific action, given a defender’sstrategy. The output is the set of attacker’s decision nodes, together with the value node andarcs indicating how his utility depends on his decisions and those of the defender. Assessing theattacker knowledge entails looking for relevant information that he may have when performingthe attack, and his degree of knowledge about this information, as we do not assume CK. Thisentails not only a modelling activity, but also a security assessment of the ML system to de-termine which of its elements are accessible to the attacker. The outputs are the uncertaintynodes of the attacker ID, the arcs connecting them and those ending in the decision nodes,indicating the information available to A when attacking. Finally, identifying his capabilitiesrequires determining which part of the defender problem the attacker has influence on. Thisprovides the way the attacker ID connects with that of the defender.
2. Simulating attacks.
Based on step 1, a mechanism is required to simulate reasonable at-tacks. The state of the art solution assumes that the attacker will be acting NE, given strong CKhypothesis. The ARA methodology relaxes such assumptions, through a procedure to simulateadversarial decisions. Starting with the adversary model (step 1), our uncertainty about hisprobabilities and utilities is propagated to his problem and leads to the corresponding randomoptimal adversarial decisions which provide the required simulation.
3. Adopting defences.
In this final step, we augment the defender problem incorporating theattacker one produced in step 1. As output, we generate a BAID reflecting the confrontationbig picture. Finally, we solve the defender problem maximizing her subjective expected utility,integrating out all attacker decisions, which are random from the defender perspective given thelack of CK. In general, the corresponding integrals are approximated through MC, simulatingattacks consistent with our knowledge level about the attacker using the mechanism of step 2.
ML from ARA
5. AML from an ARA perspective
We illustrate now how the previous templates can be combined and adapted through the pro-posed pipeline to provide support to AML models.
Almost every supervised ML problem entails the tasks reflected in Figure 6: an inference (learn-ing) stage, in which relevant information is extracted from training data T , and a decision (operational) stage, in which a decision y D is made based on the gathered information. y D y xu D T β Fig 6: Influence diagram for a supervised learning problemIn a general supervised learning problem under a Bayesian perspective, the first stage requirescomputing the posterior p ( β |T ) ∝ p ( β ) p ( T | β ), where p ( β ) is the prior on the model parameters; T , the data; and p ( T | β ) the likelihood. Based on it, the predictive distribution is p ( y | x, T ) = (cid:82) p ( y | x, β ) p ( β |T ) d β. At the second stage, given an input x , the response y D has to be decided.If the actual value is y , the attained utility is u D ( y, y D ) and, globally, the expected utility is ψ ( y D | x, T ) = (cid:90) y u D ( y, y D ) p ( y | x, T ) d y = (cid:90) y u D ( y, y D ) (cid:20)(cid:90) p ( y | x, β ) p ( β |T ) d β (cid:21) d y. We aim at finding arg max y D ψ ( y D | x, T ).An attacker might be interested in modifying the inference stage, through poisoning attacks,and/or the decision stage, through evasion attacks. Figure 7 represents both possibilities (step3). In it, T (cid:48) denotes the poisoned data and a T , the poisoning attack; a O denotes the evasionattack decision and x (cid:48) , the attacked feature vector actually observed by the defender. Finally, u A designates the attacker’s utility function. If we just consider evasion attacks, a T is the identity, T (cid:48) = T , and inference will be as in Figure 6; when considering only poisoning attacks, a O wouldbe the identity, the observed instance x (cid:48) will coincide with x , and the decision stage is that inFigure 6. Assume that attacks are deterministic transformations of data: applying an attack toa given input will always lead to the same output.Suppose the defender is not aware of the presence of the adversary. Then, she would receivethe training data T (cid:48) , use the posterior p ( β |T (cid:48) ), compute the predictive p ( y | x, T (cid:48) ), receive thedata x (cid:48) at operation time and solve arg max y D ψ D ( y D | x (cid:48) , T (cid:48) ). This will typically differ fromarg max y D ψ D ( y D | x, T ), leading to an undesired performance degradation as shown in Section2. Should the defender know how she has been attacked, and assuming that attacks are invertible,she would know a − O ( x (cid:48) ) and a − T ( T (cid:48) ) and maximise ψ D ( y D | a − O ( x (cid:48) ) , a − T ( T (cid:48) )). However, she willnot typically know neither the attacks, nor the original data. Thus, upon becoming adversaryaware, she would deal with the problem in Figure 8, where now A ’s decisions appear as random. D. R´ıos Insua et al. y D x (cid:48) u A y x a O u D a T T β T (cid:48) Fig 7: Influence diagram for a generic adversarial supervised learning problem y D x (cid:48) y x a O u D a T T β T (cid:48) Fig 8: Supervised learning from the defender’s perspectiveIn such case, if her uncertainty is modelled through p ( a T |T (cid:48) ) and p ( a O | x (cid:48) ), the defender wouldoptimise(3) (cid:90) (cid:20)(cid:90) ψ D ( y D | a − O ( x (cid:48) ) , a − T ( T (cid:48) )) p ( a T |T (cid:48) ) da T (cid:21) p ( a O | x (cid:48) ) da O . We describe a procedure to assess the required distributions following the methodology inSection 4.4. To that end, (step 1) considers the problem that the attacker would be solving; (step2) assesses our uncertainty about his problem to simulate from it; then, (step 3) the optimaldefence (3) is proposed.The attacker aims at modifying data to maximize his utility u A ( y D , y, a T , a O ), that dependson y D , y , and the attacks a T and a O , as these may have implementation costs. The form ofhis utility function depends on his goals, 3.6. Given that the adversary observes ( T , x, y ), if weassume that he aims at maximising expected utility when trying to confuse the defender, hewould find his best attacks through(4) max a T ,a O (cid:90) u A ( y D , y, a T , a O ) p A ( y D | a O ( x ) , a T ( T )) d y, ML from ARA where p A ( y D | a O ( x ) , a T ( T )) describes the probability that the defender says y D if she observesthe training data a T ( T ) and features a O ( x ), from the adversary’s perspective. At this point, D must model her uncertainty about A ’s utilities and probabilities. She will use random utilities U A and probabilities P A and look for the random optimal adversarial transformations(5) ( A T , A O ) ∗ ( T , x, y ) = arg max a T ,a O (cid:90) U A ( y D , y, a T , a O ) P A ( y D | a O ( x ) , a T ( T )) d y. Finally, she computes the probability of the attacker choosing attacks a T and a O when observing T , x, y as(6) p ( a T , a O |T , x, y ) = P [( A T , A O ) ∗ ( T , x, y ) = ( a T , a O )] . Using this, she would compute the required quantities p ( a T |T (cid:48) ) and p ( a O | x (cid:48) ).This last step is problem dependent. We illustrate some specificities through an applicationto AC under evasion attacks. Naveiro et al. (2019a) introduce ACRA, an approach to AC from an ARA perspective. Thisis an application of the sketched framework to binary classification problems with malicious(+) and benign ( − ) labels. ACRA considers only evasion attacks ( a T = id ) that are integrityviolations, thus exclusively affecting malicious examples. To unclutter notation, we remove thedependence on training data T , clean by assumption. In addition, we eliminate the subscript inevasion attacks and refer to them as a .Figure 9a represents the problem faced by the defender. This BAID is the same as in Figure8, removing the inference phase, standard by assumption. D aims at choosing the class y D y D x (cid:48) y x au D (a) Classifier problem y D x (cid:48) y x au A (b) Adversary problem Fig 9with maximum posterior expected utility based on the observed instance x (cid:48) : she must find c ( x (cid:48) ) = arg max y D (cid:80) y ∈{ + , −} u D ( y D , y ) p D ( y | x (cid:48) ). Under the assumed type of attacks, this problemis equivalent to (7) c ( x (cid:48) ) = arg max y D (cid:20) u D ( y D , +) p D (+) (cid:88) x ∈X (cid:48) p D ( a x → x (cid:48) | x, +) p D ( x | +) + u D ( y D , − ) p D ( x (cid:48) |− ) p D ( − ) (cid:21) , where X (cid:48) is the set of potentially originating instances x leading to x (cid:48) and p D ( a x → x (cid:48) | x, +)designates the probability that A will launch an attack that transforms x into x (cid:48) , when he D. R´ıos Insua et al. receives ( x, y = +), according to D . All required elements are standard except for p D ( a x → x (cid:48) | x, y ),which demands strategic thinking from D .To make the approach operational, consider A ’s decision making, Figure 9b. Let u A ( y D , y, a )be A ’s utility when D says y D , the actual label is y and the attack is a ; and p = p A ( c ( a ( x )) =+ | a ( x )), the probability that A concedes to D saying the instance is malicious, given thatshe observes x (cid:48) = a ( x ). He will have uncertainty about it; denote its density f A ( p | a ( x )) withexpectation p Aa ( x ) . Among all attacks, A would choose a ∗ ( x, y ) = arg max a (cid:90) (cid:20) u A ( c ( a ( x )) = + , y, a ) · p + u A ( c ( a ( x )) = − , y, a ) · (1 − p ) (cid:21) f A ( p | a ( x )) dp. Under integrity violation attacks, we only consider the case y = +. A ’s expected utility, when headopts attack a and the instance is ( x, y = +), is [ u A (+ , + , a ) − u A ( − , + , a )] p Aa ( x ) + u A ( − , + , a ).However, D does not know the ingredients u A and p Aa ( x ) . If her uncertainty is modelled througha random utility function U A and a random expectation P Aa ( x ) , the random optimal attack is A ∗ ( x, +) = arg max a (cid:18) [ U A (+ , + , a ) − U A ( − , + , a )] P Aa ( x ) + U A ( − , + , a ) (cid:19) , and p D ( a x → x (cid:48) | x, +) = P r ( A ∗ ( x, +) = a x → x (cid:48) ), which may be estimated using MC. This wouldthen feed problem (7). Example.
Applying the ACRA framework to the spam detection example in Section 2.2,leads to the results in Table 2. Observe that ACRA is robust to attacks and identifies most ofthe spam. Its overall accuracy is above 0 .
9, identifying most non-spam emails as well. ACRApresents smaller FNR than NB-Tainted and significantly lower FPR than NB, causing the overallperformance to raise up.
Accuracy FPR FNRACRA . ± .
010 0 . ± .
008 0 . ± . NB-Plain . ± .
009 0 . ± .
100 0 . ± . NB-Tainted . ± .
101 0 . ± .
100 0 . ± . Table 2
Performance comparison of ACRA and utility sensitive Naive Bayes. (cid:52)
Interestingly, ACRA beats NB-Untainted in accuracy. This effect was observed in Dalvi et al.(2004) and Goodfellow et al. (2015) in different settings. The latter argues that taking intoaccount the presence of an adversary has a regularizing effect improving the original accuracyof the underlying algorithm, making it more robust.
Section 3.4 described how the presence of other agents may make non-stationary the environ-ment in RL, rendering Q -learning ineffective. To properly deal with the problem, consider theextension of the simultaneous game in Section 4.2 to multiple stages, Figure 10. Figure 11represents the problem from the perspective of just the defender.To deal with this problem, the Markov Decision Process framework (Howard, 1960) has to beaugmented to account for the presence of adversaries. Consider the case of one agent (DM, D )facing just one opponent ( A ). A Threatened Markov Decision Process (TMDP) (Gallego et al., Relevant ways of modelling the random utilities and probabilities are available in Naveiro et al. (2019a).
ML from ARA θ θ θ d d d a a a u A, u A, u A u A, u D, u D, u D, u D ............... Fig 10: Multi-agent RL problem. θ θ θ d d d a a a u D, u D, u D, u D ............ Fig 11: Defender view on multi-agent RL.2019a,b) is a tuple (Θ , D , A , P , u D , p D ) such that Θ designates the state space; D denotes theset of actions d available to the supported agent; A designates the set of actions a available tothe adversary; P : Θ × D × A → ∆(Θ) is the transition distribution; u D : Θ × D × A → ∆( R ) is D ’s utility distribution (denoted reward in the RL literature); and p D ( a | θ ) models D ’s beliefsabout his opponent moves given the state θ ∈ Θ.The effect of averaging over opponent actions is translated to the sequential setting, extendingthe Q -learning iteration to(8) Q ( θ, d, a ) := (1 − α ) Q ( θ, d, a ) + α (cid:18) u D ( θ, d, a ) + γ max d E p D ( a | θ (cid:48) ) (cid:2) Q ( θ (cid:48) , d, a ) (cid:3)(cid:19) and its expectation, Q ( θ, d ) := E p D ( a | θ ) [ Q ( θ, d, a )]. This is used to compute an (cid:15) − greedy policyfor the DM, i.e., choosing with probability (1 − (cid:15) ) an action d ∗ = arg max d [ Q ( θ, d )], or, withprobability (cid:15) , a uniformly random action. The previous rule provides a fixed point of a con-traction mapping, leading to a defender’s optimal policy, assuming her opponent behaviour isdescribed by p D ( a | θ ). Thus, it is key to model the uncertainty regarding the adversary’s policythrough this distribution. In Gallego et al. (2019a) we use a level- k scheme, Section 4.2, tolearn the opponent model: both agents maximize their respective expected cumulative utilities,though we start with a case in which the adversary is considered non-strategic. Then, we alsoconsider the adversary being a level-1 agent and the DM a level-2 one. Gallego et al. (2019b)goes up in the level- k hierarchy considering also mixtures of opponents whose beliefs can bedynamically adapted by the DM. In most situations, D will not know which type of opponentshe is facing. For this, she can place a prior p D ( M i ) modeling her beliefs about her opponentusing model M i , i = 1 , . . . , m , with (cid:80) mi =1 p D ( M i ) = 1 , p D ( M i ) ≥
0. As an example, she mightplace a Dirichlet prior. At each iteration, after having observed a , she updates her beliefs p ( M i )increasing the count of the model causing that action, as in a standard Dirichlet-Categoricalmodel. For this, the defender maintains an estimate p M i ( a | s ) of the opponent’s policy for eachmodel M i . D. R´ıos Insua et al.
Example.
Consider the benefits of opponent modelling in a repeated matrix game, specificallythe iterated variant of the Chicken game (IC), with payoff matrix in Table 3. For example, ifthe row agent plays C and the column one plays E , the agents respectively obtain utilitiesof − C, E ) and (
E, C ). Fig. 12a depicts the utilities
C EC (0, 0) (-2, 1) E (1, -2) (-4, -4) Table 3
Payoff Matrix of Chicken obtained over time by players, when both agents are modelled as independent Q -learners (theyuse the standard Q -function from RL as learning rule). Observe the ill convergence due to lackof opponent awareness. As mentioned in Section 3.4, when other agents interfere with a DM’srewards, the environment can become non-stationary and Q -learning leads to suboptimal results(Bu¸soniu et al., 2010). (a) Q-learner vs Q-learner (b) L1 Q-learner (blue) vs Q-learner (red) Fig 12: Rewards in IC game.Alternatively (Fig. 12b), the DM with opponent modelling, based on (8) to estimate the oppo-nent’s actions with historical frequencies, has an advantage and converges to her optimal Nashequilibrium (
E, C ). (cid:52)
6. Conclusions: A research agenda
We have provided a review of key approaches, models and concepts in AML. This area is ofmajor importance in security and cybersecurity to protect systems increasingly based on MLalgorithms, Comiter (2019). The pioneering AC work by Dalvi et al. (2004) has framed mostof this research within the game-theoretic realm, with entailed CK conditions which actuallyhardly hold in the security contexts typical of AML. Our ARA alternative framework does notentail such conditions, being therefore much more realistic. As a byproduct, we obtain morerobust algorithms. We illustrated their application in adversarial supervised learning, AC andRL problems. We end up providing several lines for further research.
We argued how Bayesian methods provide enhanced robustness in AML. Ekin et al. (2019) showshow game theory solutions based on point estimates of preferences and beliefs, potentially leadto unstable solutions, while ARA solutions tend to be more robust acknowledging uncertaintiesin such judgements. Here are some additional ideas.
Bayesian methods and AML.
It has been empirically shown that model ensembling is aneffective mechanism for designing powerful attack models (Tram`er et al., 2018). The Bayesian
ML from ARA treatment of ML models offers a way of combining the predictions of different models via thepredictive distribution. Gal and Smith (2018) show how certain idealized Bayesian NNs properlytrained lack adversarial examples. Thus, a promising research line consists of developing efficientalgorithms for approximate Bayesian inference with robustness guarantees.Indeed, there are several ways in which the Bayesian approach may increase security of MLsystems. Regarding opponent modelling, in sequential decision making, an agent has uncertaintyover her opponent type initially; as information is gathered, she might be less uncertain abouther model via Bayesian updating (Gallego et al., 2019b). Uncertainty over attacks in supervisedmodels can also be considered to obtain a more robust version of AT, as in Ye and Zhu (2018)who sample attacks using a SG-MCMC method. Combining their approach with ARA opponentmodelling may further increase robustness. Lastly, there are alternative approaches to achieverobustness in presence of outliers and perturbed observations, as through the robust divergencesfor variational inference (Futami et al., 2018). Robust Bayesian methods.
The robust Bayesian literature burgeoned in the period 1980-2000 (Berger, 1982; Rios Insua and Ruggeri, 2000). In particular, there has been relevant work inBayesian likelihood robustness (Shyamalkumar, 2000) referring to likelihood imprecision, remi-niscent of the impact of attacks over the data received. Note that Bayesian likelihood robustnessfocuses around random or imprecise perturbations and contaminations in contrast to the pur-poseful perturbations in AML. Not taking into account the presence of an adversary affectingdata generation is an example of model mispecification; robustness of Bayesian inference tosuch issue has been revisited recently in Miller and Dunson (2019). Their ideas could be usedto robustify ML algorithms in adversarial contexts.
We discuss modelling and computational enhancements aimed at improving operational aspectsof the proposed framework.
Characterizing attacks.
A core element in the AML pipeline is the choice of the attackerperturbation domain. This is highly dependent on the nature of the data attacked. In computervision, a common choice is an (cid:96) p ball of radius δ centered at the original input. For instance,an (cid:96) ∞ norm implies that an attacker may modify any pixel by at most δ . These perturbations,imperceptible to the human eye, may not be representative of threats actually deployed. As anexample, Brown et al. (2017) designed a circular sticker that may be printed and deployed tofool state of the art classifiers. Thus, it is important to develop threat models that go beyond (cid:96) p norm assumptions. Moreover, we discussed only problems with two agents. It is relevant to dealwith multiple agents in several variants (one defender vs. several attackers, several defendersvs. several attackers) including cases in which agents on one of the sides somehow cooperate. New algorithmic approaches.
Exploring gradient-based techniques for bi-level optimizationproblems arising in AML is a fruitful line of research (Naveiro and Insua, 2019). However thefocus has been on white box attacks. It would be interesting to extend those results to theframework here proposed. On the other hand, Bayesian methods are also hard to scale to highdimensional problems or large datasets. Recent advances in accelerating SG-MCMC samplers,e.g., Gallego and Insua (2019, 2018), are crucial to leverage the benefits of a Bayesian treatment.
Single stage computations.
The framework proposed in Sections 4 and 5 essentially goesthrough simulating from the attacker problem to forecast attacks and then optimize for thedefender to find her optimal decision. This may be computationally demanding and we couldexplore single stage approaches to alleviate computations. Initial ideas based on augmentedprobability simulation are in Ekin et al. (2019). D. R´ıos Insua et al.
As mentioned in Section 4, our approach could be used to study adversarial versions of MLproblems (classification, regression, unsupervised learning and RL). We are particularly inter-ested in the following.
Adversarial classification.
A main limitation of ACRA, Section 5.2, is the requirement ofcomputing the conditional probabilities p ( x | y ): this is only compatible with explicit generativemodels, such as Naive Bayes. This model can be weak as it does not account for feature correla-tions. A straightforward extension can be using a deep Bayes classifier with a VAE. Likewise,extensions to discriminative models and multi-class problems is on-going.
Unsupervised learning.
Further research is required. As for non-hierarchical methods suchas k -means clustering, we could treat f ( x ) = arg max y (cid:107) x − µ y (cid:107) − as a score function, so stan-dard evasion techniques (Section 3) could straightforwardly be applied. Security of AR models(Alfeld et al., 2016) and natural language models (Wang et al., 2019) are attracting attention,so custom defences tailored to these models are in need. As presented in Comiter (2019), applications abound. We mention five of interest to us.
Fake news.
The rise in computer power used in deep learning is leading to quasi-realistic auto-matic text generation (Radford et al., 2019), which can be used for malicious purposes such asfake reviews generation (Juuti et al., 2018). At present, state of the art defences consist mostly ofstatistical analyses of token distributions (Gehrmann et al., 2019). These are attacker-agnostic.An ARA treatment may be beneficial to inform the model of which are the most likely attackpatterns.
Autonomous driving systems.
ADS directly benefit from developments in computer visionand RL. However, accidents still occur because of lack of guarantees in verifying the correct-ness of deep NNs. Alternative solutions include the use of Bayesian NNs, as there is evidencethat uncertainty measures can predict crashes up to five seconds in advance (Michelmore et al.,2018). McAllister et al. (2017) propose propagating the uncertainty over the network to achievea safer driving style. As mentioned, adversaries may interact with the visual system throughadversarial examples. Thus, developing stronger defences are of major importance.
Forecasting time series.
Industrial and business settings are pervaded by data with strongtemporal dependencies, with structured data generating processes. An attacker might be in-terested in intruding a defender network without alerting a monitoring system, extending theframework in Naveiro et al. (2019b), based on dynamic linear models (West and Harrison, 1989).Thus, an adversarial framework for these models would be of great interest.
Malware detection.
Its methods are traditionally classified in three categories (Nath andMehtre, 2014): static, dynamic and hybrid approaches. Shabtai et al. (2009) describe how MLalgorithms may accomplish accurate classification to detect new malware. Note though that noARA based AML methods have been used yet in this domain.
Causal inference.
The problem of causal inference in observational studies is inevitably con-fronted with sample selection bias, which arises when units choose whether they are exposedto treatment (Heckman, 1979). In ML terminology, such bias occurs when the test set of coun-terfactual data is drawn from a true distribution and the training set of factual data is drawnfrom a biased distribution, where the support of biased distribution is included in that of thetrue distribution (Cortes et al., 2008). AML can be used to learn balanced representations offactual and counterfactual data, thereby minimizing the risk of selection bias. Bica et al. (2020),for instance, use domain-adversarial training (Ganin et al., 2015) to trade-off between building See Gallego and Insua (2019) for an implementation in a non-adversarial setting.
ML from ARA balanced representations and estimating counterfactual outcomes in a longitudinal data setting. GlossaryAC : Adversarial Classification
ACRA : AC based on adversarial risk analysis
ADS : Autonomous Driving System AI : Artificial Intelligence AML : Adversarial Machine Learning
APP : Adversarial Prediction Problem AR : Autoregressive ARA : Adversarial Risk Analysis AT : Adversarial Training BAID : Bi-agent Influence Diagram
BNE : Bayes Nash equilibrium CK : Common Knowledge DM : Decision Maker FNR : False Negative Rate
FPR : False Positive Rate ID : Influence Diagram ML : Machine Learning MC : Monte Carlo NB : Naive Bayes NE : Nash Equilibrium NN : Neural Network PGD : Projected Gradient Descent RL : Reinforcement Learning SG-MCMC : Stochastic Gradient MarkovChain Monte Carlo
TMDP : Threatened Markov DecisionProcess
VAE : Variational Autoencoder
References
Albrecht, S. V. and Stone, P. (2018). Autonomous agents modelling other agents: A compre-hensive survey and open problems.
Artif. Intell. , 258:66–95.Alfeld, S., Zhu, X., and Barford, P. (2016). Data poisoning attacks against autoregressivemodels. In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence , pages1452–1458.Aliprantis, C. D. and Chakrabarti, S. K. (2002).
Games and Decision Making . Oxford UniversityPress.Athalye, A., Carlini, N., and Wagner, D. (2018). Obfuscated gradients give a false sense ofsecurity: Circumventing defenses to adversarial examples. In
International Conference onMachine Learning , pages 274–283.Banks, D. L., Aliaga, J. M. R., and Insua, D. R. (2015).
Adversarial Risk Analysis . Chapmanand Hall/CRC.Berger, J. (1982). The robust bayesian point of view. In
Robustness . Springer Verlag.Bica, I., Alaa, A. M., Jordon, J., and van der Schaar, M. (2020). Estimating CounterfactualTreatment Outcomes over Time Through Adversarially Balanced Representations. arXive-prints , page arXiv:2002.04083.Biggio, B., Fumera, G., and Roli, F. (2014). Security evaluation of pattern classifiers underattack.
IEEE Transactions on Knowledge and Data Engineering , 26:984–996.Biggio, B., Nelson, B., and Laskov, P. (2012). Poisoning attacks against support vector machines.In , pages 1807–1814.Biggio, B., Pillai, I., Rota Bul`o, S., Ariu, D., Pelillo, M., and Roli, F. (2013). Is data clusteringin adversarial settings secure? In
Proceedings of the 2013 ACM workshop on Artificialintelligence and security , pages 87–98. ACM.Biggio, B. and Roli, F. (2018). Wild patterns: Ten years after the rise of adversarial machinelearning.
Pattern Recognition , 84:317 – 331.Bishop, C. (2006).
Pattern Recognition and Machine Learning . Springer.Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D.,Monfort, M., Muller, U., Zhang, J., et al. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 . D. R´ıos Insua et al.
Bolton, R. J. and Hand, D. J. (2002). Statistical fraud detection: A review.
Statistical science ,pages 235–249.Breiman, L. (2001). Statistical modeling, the two cultures.
Statistical Science , 16:199–231.Brown, G., Carlyle, M., Salmer´on, J., and Wood, K. (2006). Defending critical infrastructure.
Interfaces , 36(6):530–544.Brown, G. W. (1951). Iterative solution of games by fictitious play.
Activity Analysis of Pro-duction and Allocation , pages 374–376.Brown, T. B., Man´e, D., Roy, A., Abadi, M., and Gilmer, J. (2017). Adversarial patch. arXivpreprint arXiv:1712.09665 .Br¨uckner, M., Kanzow, C., and Scheffer, T. (2012). Static prediction games for adversariallearning problems.
Journal of Machine Learning Research , 13(Sep):2617–2654.Br¨uckner, M. and Scheffer, T. (2011). Stackelberg games for adversarial prediction problems.In
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discoveryand data mining , pages 547–555. ACM.Bu¸soniu, L., Babuˇska, R., and De Schutter, B. (2010). Multi-agent reinforcement learning:An overview. In
Innovations in multi-agent systems and applications-1 , pages 183–221.Springer.Carlini, N., Athalye, A., Papernot, N., Brendel, W., Rauber, J., Tsipras, D., Goodfellow, I., andMadry, A. (2019). On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 .Comiter, M. (2019).
Attacking Artificial Intelligence . Belfer Center Paper.Cortes, C., Mohri, M., Riley, M., and Rostamizadeh, A. (2008). Sample selection bias correctiontheory. In
International Conference on Algorithmic Learning Theory , pages 38–53. Springer.Couce-Vieira, A., Rios Insua, D., and Kosgodagan, A. (2019). Assessing and forecasting cyber-security impacts.
Technical Report .Dalvi, N., Domingos, P., Mausam, Sumit, S., and Verma, D. (2004). Adversarial classification. In
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining , KDD ’04, pages 99–108.Dasgupta, P. and Collins, J. B. (2019). A survey of game theoretic approaches for adversarialmachine learning in cybersecurity tasks.
AI Magazine , 40(2).Ekin, T., Naveiro, R., Torres-Barr´an, A., and R´ıos-Insua, D. (2019). Augmented probabilitysimulation methods for non-cooperative games. arXiv preprint arXiv:1910.04574 .Fan, J., Ma, C., and Zhong, Y. (2019). A selective overview of deep learning. arXiv preprintarXiv:1904.05526 .Feng, J., Xu, H., Mannor, S., and Yan, S. (2014). Robust logistic regression and classifica-tion. In
Proceedings of the 27th International Conference on Neural Information ProcessingSystems-Volume 1 , pages 253–261. MIT Press.French, S. and Rios Insua, D. (2000).
Statistical Decision Theory . Wiley.Futami, F., Sato, I., and Sugiyama, M. (2018). Variational inference based on robust divergences.In
International Conference on Artificial Intelligence and Statistics , pages 813–822.Gal, Y. and Smith, L. (2018). Sufficient conditions for idealised models to have no adversarialexamples: a theoretical and empirical study with bayesian neural networks. arXiv preprintarXiv:1806.00667 .Gallego, V. and Insua, D. R. (2018). Stochastic gradient MCMC with repulsive forces. In
Bayesian Deep Learning Workshop, Neural Information and Processing Systems (NIPS) .Gallego, V. and Insua, D. R. (2019). Variationally inferred sampling through a refined bound.In
Advances in Approximate Bayesian Inference (AABI) .Gallego, V., Naveiro, R., and Insua, D. R. (2019a). Reinforcement learning under threats. In
Proceedings of the AAAI Conference on Artificial Intelligence , volume 33, pages 9939–9940.Gallego, V., Naveiro, R., Insua, D. R., and Oteiza, D. G.-U. (2019b). Opponent aware rein-forcement learning. arXiv preprint arXiv:1908.08773 .Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand,
ML from ARA M., and Lempitsky, V. (2015). Domain-Adversarial Training of Neural Networks. arXive-prints , page arXiv:1505.07818.Gehrmann, S., Strobelt, H., and Rush, A. (2019). GLTR: Statistical detection and visualizationof generated text. In
Proceedings of the 57th Annual Meeting of the Association for Com-putational Linguistics: System Demonstrations , pages 111–116, Florence, Italy. Associationfor Computational Linguistics.Gibbons, R. (1992).
A Primer in Game Theory . Harvester Wheatsheaf.Goodfellow, I., Shlens, J., and Szegedy, C. (2015). Explaining and harnessing adversarial ex-amples. In
International Conference on Learning Representations .Gowal, S., Dvijotham, K., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Arandjelovic, R., Mann,T. A., and Kohli, P. (2018). On the effectiveness of interval bound propagation for trainingverifiably robust models.
CoRR , abs/1810.12715.Hargreaves-Heap, S. and Varoufakis, Y. (2004).
Game Theory: A Critical Introduction . Taylor& Francis.Harsanyi, J. C. (1967). Games with incomplete information played by “Bayesian” players, I–IIIPart I. The basic model.
Management Science , 14(3):159–182.Heckman, J. J. (1979). Sample selection bias as a specification error.
Econometrica: Journal ofthe Econometric Society , pages 153–161.Howard, R. A. (1960).
Dynamic Programming and Markov Processes . MIT Press, Cambridge,MA.Hu, J. and Wellman, M. P. (2003). Nash Q-learning for general-sum stochastic games.
Journalof Machine Learning Research , 4(Nov):1039–1069.Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. (2017). Adversarial attackson neural network policies. arXiv preprint arXiv:1702.02284 .Joseph, A., Nelson, B., Rubinstein, B., and Tygar, J. (2019).
Adversarial Machine Learning .Cambridge University Press.Juuti, M., Sun, B., Mori, T., and Asokan, N. (2018). Stay on-topic: Generating context-specificfake restaurant reviews. In
European Symposium on Research in Computer Security , pages132–151. Springer.Kadane, J. B. and Larkey, P. D. (1982). Subjective probability and the theory of games.
Management Science , 28:113–120.Kantarcıo˘glu, M., Xi, B., and Clifton, C. (2011). Classifier evaluation and attribute selectionagainst active adversaries.
Data Mining and Knowledge Discovery , 22:291–335.Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeatedhold-out and bootstrap.
Computational Statistics and Data Analysis , 53(11):3735–3745.Ko(cid:32)lcz, A. and Teo, C. H. (2009). Feature Weighting for Improved Classifier Robustness. In
CEAS’09: Sixth Conference on Email and Anti-Spam .Kolter, Z. and Madry, A. (2018). Adversarial Robustness - Theory and Practice. https://adversarial-ml-tutorial.org/adversarial_examples/ .Kos, J., Fischer, I., and Song, D. (2018). Adversarial examples for generative models. In , pages 36–42. IEEE.LeCun, Y., Cortes, C., and Burges, C. (1998). THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/ .Li, B. and Vorobeychik, Y. (2014). Feature cross-substitution in adversarial classification.In
Proceedings of the 27th International Conference on Neural Information ProcessingSystems-Volume 2 , pages 2087–2095.Lichman, M. (2013). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml .Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning.In
Machine Learning Proceedings 1994 , pages 157–163. Elsevier.Littman, M. L. (2001). Friend-or-Foe Q-learning in General-Sum Games. In
Proceedings ofthe Eighteenth International Conference on Machine Learning , pages 322–328. Morgan D. R´ıos Insua et al.
Kaufmann Publishers Inc.Lowd, D. and Meek, C. (2005). Adversarial learning. In
Proceedings of the Eleventh ACMSIGKDD International Conference on Knowledge Discovery in Data Mining , KDD ’05,pages 641–647.Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2018). Towards deep learningmodels resistant to adversarial attacks. In
International Conference on Learning Represen-tations .McAllister, R., Gal, Y., Kendall, A., Van Der Wilk, M., Shah, A., Cipolla, R., and Weller, A. V.(2017). Concrete problems for autonomous vehicle safety: Advantages of bayesian deeplearning. In
International Joint Conferences on Artificial Intelligence, Inc.
Mei, S. and Zhu, X. (2015). The security of latent dirichlet allocation. In
Artificial Intelligenceand Statistics , pages 681–689.Menache, I. and Ozdaglar, A. (2011). Network games: Theory, models, and dynamics.
SynthesisLectures on Communication Networks , 4(1):1–159.Michelmore, R., Kwiatkowska, M., and Gal, Y. (2018). Evaluating uncertainty quantificationin end-to-end autonomous driving control. arXiv preprint arXiv:1811.06817 .Miller, J. W. and Dunson, D. B. (2019). Robust bayesian inference via coarsening.
Journal ofthe American Statistical Association , 114(527):1113–1125.Nath, H. V. and Mehtre, B. M. (2014). Static malware analysis using machine learning methods.In
International Conference on Security in Computer Networks and Distributed Systems ,pages 440–450. Springer.Naveiro, R. and Insua, D. R. (2019). Gradient methods for solving stackelberg games. In
International Conference on Algorithmic DecisionTheory , pages 126–140. Springer.Naveiro, R., Redondo, A., Insua, D. R., and Ruggeri, F. (2019a). Adversarial classification:An adversarial risk analysis approach.
International Journal of Approximate Reasoning ,113:133–148.Naveiro, R., Rodr´ıguez, S., and R´ıos Insua, D. (2019b). Large-scale automated forecastingfor network safety and security monitoring.
Applied Stochastic Models in Business andIndustry , 35(3):431–447.Papernot, N., Faghri, F., Carlini, N., Goodfellow, I., Feinman, R., Kurakin, A., Xie, C., Sharma,Y., Brown, T., Roy, A., Matyasko, A., Behzadan, V., Hambardzumyan, K., Zhang, Z.,Juang, Y.-L., Li, Z., Sheatsley, R., Garg, A., Uesato, J., Gierke, W., Dong, Y., Berthelot,D., Hendricks, P., Rauber, J., and Long, R. (2018). Technical report on the cleverhansv2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 .Papernot, N., McDaniel, P., Swami, A., and Harang, R. (2016). Crafting adversarial inputsequences for recurrent neural networks. In
MILCOM 2016-2016 IEEE Military Commu-nications Conference , pages 49–54. IEEE.Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Languagemodels are unsupervised multitask learners.
OpenAI Blog , 1(8).Raiffa, H. (1982).
The Art and Science of Negotiation . Harvard University Press.Rios, J. and Rios Insua, D. (2012). Adevrsarial risk analysis for counterterrorism modeling.
Risk Analysis , 32(5):894–915.Rios Insua, D., Couce-Vieira, A., Rubio, J. A., Pieters, W., Labunets, K., and G. Rasines, D.(2019). An adversarial risk analysis framework for cybersecurity.
Risk Analysis .Rios Insua, D., Rios, J., and Banks, D. (2009). Adversarial risk analysis.
Journal of the AmericanStatistical Association , 104(486):841–854.Rios Insua, D. and Ruggeri, F. (2000). Robust bayesian analysis.
Lecture Notes in Statistics ,152.Shabtai, A., Moskovitch, R., Elovici, Y., and Glezer, C. (2009). Detection of malicious code byapplying machine learning classifiers on static features: A state-of-the-art survey. informa-tion security technical report , 14(1):16–29.
ML from ARA Shyamalkumar, N. (2000). Likelihood robustness. In
Robust Bayesian Analysis .Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T.,Baker, L., Lai, M., Bolton, A., et al. (2017). Mastering the game of go without humanknowledge.
Nature , 550(7676):354.Song, Y., Ko(cid:32)lcz, A., and Giles, C. L. (2009). Better naive bayes classification for high-precisionspam detection.
Software: Practice and Experience , 39:1003–1024.Stahl, D. O. and Wilson, P. W. (1994). Experimental evidence on players’ models of otherplayers.
Journal of Economic Behavior & Organization , 25(3):309–327.Sutton, R. S., Barto, A. G., et al. (1998).
Introduction to reinforcement learning , volume 2.MIT Press Cambridge.Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R.(2014). Intriguing properties of neural networks. In
International Conference on LearningRepresentations .Tram`er, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., and McDaniel, P. (2018). En-semble adversarial training: Attacks and defenses. In
International Conference on LearningRepresentations .Vorobeychik, Y. and Kantarcioglu, M. (2018). Adversarial machine learning.
Synthesis Lectureson Artificial Intelligence and Machine Learning , 12(3):1–169.Vorobeychik, Y. and Li, B. (2014). Optimal randomized classification in adversarial settings. In
Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agentSystems , AAMAS ’14, pages 485–492.Wang, C., Bunel, R., Dvijotham, K., Huang, P.-S., Grefenstette, E., and Kohli, P. (2019). Know-ing when to stop: Evaluation and verification of conformity to output-size specifications. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages12260–12269.West, M. and Harrison, J. (1989).
Bayesian forecasting and dynamic models . Springer Series inStatistics. Springer.Ye, N. and Zhu, Z. (2018). Bayesian adversarial learning. In
Proceedings of the 32nd Inter-national Conference on Neural Information Processing Systems , pages 6892–6901. CurranAssociates Inc.Zeager, M. F., Sridhar, A., Fogal, N., Adams, S., Brown, D. E., and Beling, P. A. (2017).Adversarial learning in credit card fraud detection. In
Systems and Information EngineeringDesign Symposium (SIEDS), 2017 , pages 112–116. IEEE.Zhou, Y., Kantarcioglu, M., and Xi, B. (2018). A survey of game theoretic approach for ad-versarial machine learning.