[PDF] Foureye: Defensive Deception based on Hypergame Theory Against Advanced Persistent Threats

Abstract

Defensive deception techniques have emerged as a promising proactive defense mechanism to mislead an attacker and thereby achieve attack failure. However, most game-theoretic defensive deception approaches have assumed that players maintain consistent views under uncertainty. They do not consider players' possible, subjective beliefs formed due to asymmetric information given to them. In this work, we formulate a hypergame between an attacker and a defender where they can interpret the same game differently and accordingly choose their best strategy based on their respective beliefs. This gives a chance for defensive deception strategies to manipulate an attacker's belief, which is the key to the attacker's decision making. We consider advanced persistent threat (APT) attacks, which perform multiple attacks in the stages of the cyber kill chain where both the attacker and the defender aim to select optimal strategies based on their beliefs. Through extensive simulation experiments, we demonstrated how effectively the defender can leverage defensive deception techniques while dealing with multi-staged APT attacks in a hypergame in which the imperfect information is reflected based on perceived uncertainty, cost, and expected utilities of both attacker and defender, the system lifetime (i.e., mean time to security failure), and improved false positive rates in detecting attackers.

Full PDF

FFoureye: Defensive Deception based onHypergame Theory Against AdvancedPersistent Threats

Zelin Wan, Jin-Hee Cho,

Senior Member, IEEE , Mu Zhu, Ahmed H. Anwar, Charles Kamhoua,

SeniorMember, IEEE , and Munindar P. Singh,

IEEE Fellow (cid:70)

Abstract —Defensive deception techniques have emerged as a promis-ing proactive defense mechanism to mislead an attacker and therebyachieve attack failure. However, most game-theoretic defensive decep-tion approaches have assumed that players maintain consistent viewsunder uncertainty. They do not consider players’ possible, subjectivebeliefs formed due to asymmetric information given to them. In thiswork, we formulate a hypergame between an attacker and a defenderwhere they can interpret the same game differently and accordinglychoose their best strategy based on their respective beliefs. This givesa chance for defensive deception strategies to manipulate an attacker’sbelief, which is the key to the attacker’s decision making. We consideradvanced persistent threat (APT) attacks, which perform multiple attacksin the stages of the cyber kill chain where both the attacker and thedefender aim to select optimal strategies based on their beliefs. Throughextensive simulation experiments, we demonstrated how effectively thedefender can leverage defensive deception techniques while dealingwith multi-staged APT attacks in a hypergame in which the imperfectinformation is reﬂected based on perceived uncertainty, cost, and ex-pected utilities of both attacker and defender, the system lifetime (i.e.,mean time to security failure), and improved false positive rates indetecting attackers.

Index Terms —Defensive deception, hypergame theory, uncertainty, at-tacker, defender, advanced persistent threat

NTRODUCTION

The key purpose of a defensive deception technique is tomislead an attacker’s view and make it choose a suboptimalor poor action for the attack failure [33]. When both theattacker and defender are constrained in their resources,strategic interactions can be the key to beat an opponent.In this sense, non-game-theoretic defense approaches haveinherent limitations due to lack of efﬁcient and effectivestrategic tactics. Forms of deception techniques have beendiscussed based on certain classiﬁcations, such as hiding the • Zelin Wan and Jin-Hee Cho are with the Department of ComputerScience, Virginia Tech, Falls Church, VA 22043, USA. Email: { zelin,jicho } @vt.edu. Mu Zhu and Munindar P. Singh are with the Depart-ment of Computer Science, North Carolina State University, Raleigh,NC 27695, USA. Email: { mzhu5, mpsingh } @ncsu.edu. Ahmed H. An-war and Charles A. Kamhoua are with the US Army Research Lab-oratory, Adelphi, MD 20783, USA. Email: [email protected];[email protected]. truth vs. providing false information or passive vs. active forincreasing attackers’ ambiguity or confusion [3, 9].Game theory has been substantially used for dynamicdecision making under uncertainty, assuming that playershave consistent views. However, this assumption fails asplayers may often subjectively process asymmetric infor-mation available to them [22]. Hypergame theory [5] is avariant of game theory that provides a form of analysisconsidering each player’s subjective belief, misbelief, andperceived uncertainty and accordingly their effect on deci-sion making in choosing a best strategy [22].This paper leverages hypergame theory to resolve con-ﬂicts of views of multiple players as a robust decision-making mechanism under uncertainty where the playersmay have different beliefs towards the same game. Hyper-game theory models players, such as attackers and defend-ers in cybersecurity to deal with advanced persistent threat(APT) attacks. We dub this effort Foureye after the

Foureyebutterﬂyﬁsh , demonstrating deceptive defense in nature [40].To be speciﬁc, we identify the following nontrivial chal-lenges in obtaining a solution. First of all, it is not trivialto derive realistic game scenarios and develop defensivedeception techniques to deal with APT attacks beyond thereconnaissance stage. This aspect has not been explored inthe state-of-the-art. Second, quantifying the degree of uncer-tainty in the views of attackers and defenders is challenging,although they are critical because how each player frames agame signiﬁcantly affects its strategies to take. Third, givena number of possible choices under dynamic situations,dealing with a large number of solution spaces is not triv-ial whereas the deployment and maintenance of defensivedeception techniques is costly in contested environments.We partly addressed these challenges in our prior workin [12]; however, its contribution is very limited in con-sidering a small-scale network and a small set of strategieswith a highly simpliﬁed probability model developed usingStochastic Petri Network.To be speciﬁc, this paper has the following new keycontributions : • We modeled an attack-defense game under uncertaintybased on hypergame theory where an attacker and a1 a r X i v : . [ c s . G T ] J a n efender have different views of the situation and areuncertain about strategies taken by their opponents. • We reduced a player’s action space by using a subgamedetermined based on a set of strategies available whereeach subgame is formulated based on each stage of thecyber kill chain (CKC) based on a player’s belief underuncertainty. • We considered multiple defense strategies, including de-fensive deception techniques whose performance can besigniﬁcantly affected by an attacker’s belief and perceiveduncertainty, which impacts its choice of a strategy. • We modeled an attacker’s and a defender’s uncertaintytowards its opponent (i.e., the defender and the attacker,respectively) based on how long each player has moni-tored the opponent and its chosen strategy. To the best ofour knowledge, prior research on hypergame theory usesa predeﬁned constant probability to represent a player’suncertainty. In this work, we estimated the player’s un-certainty based on the dynamic, strategic interactionsbetween an attacker and a defender. • We conducted comparative performance analysis with orwithout a defender using defensive deception (DD) strate-gies and with or without perfect knowledge availabletowards actions taken by the opponent. We measured theeffectiveness and efﬁciency of DD techniques in terms ofa system’s security and performance, such as perceiveduncertainty, hypergame expected utility, action cost, meantime to security failure (MTTSF or system lifetime), andimproved false positive rate (FPR) of an intrusion detec-tion by the DD strategies taken by the defender.

ELATED W ORK

Garg and Grosu [15] proposed a game-theoretic decep-tion framework in honeynets with imperfect informationto ﬁnd optimal actions of an attacker and a defender andinvestigated the mixed strategy equilibrium. Carroll andGrosu [10] used deception in attacker-defender interactionsin a signaling game based on perfect Bayesian equilibriaand hybrid equilibria. They considered defensive deceptiontechniques, such as honeypots, camouﬂaged systems, ornormal systems. Yin et al. [41] considered a Stackelbergattack-defense game where both players make decisionsbased on their perceived observations and identiﬁed anoptimal level of deceptive protection using fake resources.Casey et al. [11] examined how to discover Sybil attacksbased on an evolutionary signaling game where a defendercan use a fake identity to lure the attacker to facilitatecooperation. Schlenker et al. [32] studied a sophisticated andna¨ıve APT attacker in the reconnaissance stage to identify anoptimal defensive deception strategy in a zero-sum Stackel-berg game by solving a mixed integer linear program.Unlike the above works cited [10, 11, 15, 32, 41], ourwork used hypergame theory which offers the powerful ca-pability to model uncertainty, different views, and boundedrationality by different players. This way reﬂects more real-istic scenarios between the attacker and defender.Hypergame theory has emerged to better reﬂect real-world scenarios by capturing players’ subjective and im-perfect belief, aiming to mislead them to adopt uncertainor non-optimized strategies. Although other game theoriesdeal with uncertainty by considering probabilities that a certain event may happen, they assume that all players playthe same game [34]. Hypergame theory has been used tosolve decision-making problems in military and adversarialenvironments House and Cybenko [20], Vane [37], Vaneand Lehner [39]. Several studies [16, 17] investigated howplayers’ beliefs evolve based on hypergame theory by de-veloping a misbelief function measuring the differencesbetween a player’s belief and the ground truth payoff ofother players’ strategies. Kanazawa et al. [21] studied anindividual’s belief in an evolutionary hypergame and howthis belief can be modelled by interpreter functions. Sasaki[31] discussed the concept of subjective rationalizability wherean agent believes that its action is a best response to theother agent’s choices based on its perceived game.Putro et al. [30] proposed an adaptive, genetic learningalgorithm to derive optimal strategies by players in a hy-pergame. Ferguson-Walter et al. [13] studied the placementof decoys based on a hypergame. This work developed agame tree and investigated an optimal move for both anattacker and defender in an adaptive game. Aljefri et al.[2] studied a ﬁrst level hypergame involving misbeliefs toresolve conﬂicts for two and then more decision makers.Bakker et al. [4] modeled a repeated hypergame in dynamicstochastic setting against APT attacks primarily in cyber-physical systems.Unlike the works using hypergame theory above [2,4, 13, 16, 17, 20, 21, 30, 31, 37, 39], our work consideredan APT attacker performing multi-staged attacks whereattack-defense interactions are modeled based on repeatedhypergames. In addition, we show the effectiveness of de-fensive deception techniques by increasing the attacker’suncertainty leading to choosing non-optimal actions andincreasing the quality of the intrusion detection (i.e., anetwork-based intrusion detection system, NIDS) throughthe collection of attack intelligence using defensive decep-tion strategies.

YSTEM M ODEL

This work concerns a software-deﬁned network (SDN)-based Internet-of-Things (IoT) environment characterizedby servers and/or IoT devices, such as an SDN-basedsmart environment [7]. The key beneﬁt of using the SDNtechnology is decoupling the network control plane fromthe data plane (e.g., packet forwarding) for higher ﬂexi-bility, robust security/performance, and programmabilityfor a networked system in which an SDN controller canefﬁciently and effectively manage security and performancemechanisms. We use the SDN controller to involve packetforwarding decisions and to deploy defense mechanisms,such as ﬁrewalls or NIDSs. SDN-enabled switches handleforwarding packets, where they encapsulate packets with-out exact matching ﬂow rules in ﬂow tables in which theencapsulated packets, ‘OFPT PACKET IN’ packets in Open-Flow (OF) protocol (i.e., a standard communication protocolbetween SDN-enabled switches and the SDN controller), areprovided to the SDN controller handling the ﬂow.The nodes in this environment collect data and performa periodic delivery of those collected data to the servers viamulti-hop communications, in which the servers may needto process further to provide queried services. The nodes2ay be highly heterogeneous in their types and functional-ities and spread over different Virtual Local Area Networks(VLANs) of the IoT environment. Each VLAN may have oneor more servers and is assigned with a set of nodes basedon the common characteristics of their functionalities. Weleverage the advanced SDN technology [27] for the effectiveand efﬁcient management of IoT nodes with the help of anSDN controller.

A node, including web servers, databases, honeypots, andIoT devices, is characterized by the following set of features: • Criticality:

This metric, c i , indicates how critical node i isin terms of its given role for security and reachability (i.e.,inﬂuence) in a network to maintain network connectivity,and given by: c i = importance i × reachability i , (1)where importance i is given as an integer ranged in [0 , during the network deployment phrase. reachability i iscomputed based on the faster betweenness centrality met-ric [8] by the SDN controller. Note that the algorithmiccomplexity of the faster betweenness in this work is O ( | V | ) as a given network follows Erd ¨os–R´enyi (ER)network model [28]. reachability i is estimated in the rangeof [0 , as a real number. • Security vulnerability : A node’s vulnerabilities to varioustypes of attacks are considered based on three types ofvulnerabilities: (1) vulnerabilities associated with softwareinstalled in each node, denoted by sv i ; (2) vulnerabilitiesassociated with encryption keys (e.g., secret or privatekeys), denoted by ev i . As a longer-term key exposeshigher security vulnerability, the attacker can exploit en-cryption vulnerability over time with ˆev i = ev i · e − / T rekey and T rekey is the time elapsed since the attacker hasinvestigated a given key; and (3) an unknown vulnera-bility, denoted by uv i , representing the average unknownvulnerability. We assume that all the vulnerabilities arecomputed based on the Common Vulnerability ScoringSystem (CVSS) [1] with the severity value in [0 , as aninteger. We measure the average vulnerability associatedwith node i being vulnerable by: vulnerability i = (cid:80) v j ∈ V i v j | V i | , (2)where V i is a set of vulnerabilities associated with node i (e.g., { sv , sv , sv , ev , ev , ev , uv } ), v j refers to oneof vulnerabilities, associated with node i where v j ismeasured based on [0 , following the CVSS. We de-note P vi = vulnerability i / as a normalized vulnerabilityprobability. P vi is used as the probability to exploit (i.e.,compromise) node i by an attacker. • Mobility : We model the mobility rate of node i by con-sidering a rewiring probability P ri only for IoT deviceswhere node i can be connected with a new IoT nodewith P ri . For rewiring connections, node i will select oneof its neighbors with P ri to disconnect and then select anew node to be connected to maintain a same number ofneighbors (nodes being directly connected). TABLE 1 E XAMPLE N ODE C HARACTERISTICS . Importance Software Vul. EncryptionVul.Web servers [8 ,

10] [3 ,

7] [1 , Datbases [8 ,

10] [3 ,

7] [1 , Honeypots 0 [7 ,

10] [9 , IoT devices [1 ,

5] [1 ,

5] [5 , Table 1 shows an example set of node characteristics show-ing the ranges of each node type’s attributes and theshown values used as default settings for our experimentsin Section 6. We select each attribute value at randombased on uniform distribution in a given range. Noticethat we consider zero importance for honeypots, implyingno performance degradation and security damage upon itscompromise. In addition, we put a fairly high range of thenumber of vulnerabilities in the honeypots in order to lureattackers with high attack utility. Since a legitimate usercan be compromised by the attacker, cp refers to the statusof a node’s compromise (i.e., cp i = 1 for compromise; 0otherwise). We summarize node i ’s proﬁle as: n i = [c i , cp i , ev i , V i , P vi , P ri ] . (3)Recall that c i , cp i , ev i , V i , P vi , and P ri are node i ’s criticalityin [0 , , the status of being compromised (=1) or not(=0),evicted (=1) or not (=0), vulnerability vector in software,encryption, and unknowns, the probability of the overallvulnerability, and rewiring probability for mobility in [0 , . We assume that the SDN controller and control channelare trusted and considering their security vulnerabilities isbeyond the scope of this work. Since each SDN controllershould be well informed of basic network information un-der its control and other SDN controllers’ control, each SDNcontroller periodically updates the network topology andsoftware vulnerabilities of nodes under its control to otherSDN controllers. Via this process, each SDN controller canperiodically check an overall system security state and takeactions accordingly.We also assume that a network-based IDS (NIDS) isdeployed in the SDN controller and is characterized bythe probabilities of false positives ( P fp ) and false negatives( P fn ). The NIDS runs throughout the system lifetime. TheNIDS’s P fp and P fn will be dynamically updated as itreceives more attack intelligence from the defensive de-ception techniques used in this work. We assume that thecollected signatures from the deception-based monitoringmechanisms can decrease P fn due to an increased volumeof additional signatures. We simply use Beta distribution toderive Beta ( P fn ; α, β ) where α refers to false negatives (FN)and β is true positives (TP) with P fn = F N/ ( T P + F N ) . Sim-ilarly, as more attack intelligence is forwarded to NIDS viadefensive deception-based monitoring, β (TP) increments by1 per monitoring interval. Similarly, false positives will bereduced as defensive deception techniques are used where P fp = F P/ ( T N + F P ) and TN increases by 1.We assume that legitimate users use a secret key forsecure group communications among internal, legitimateusers while prohibiting outsiders from accessing securednetwork resources. If an outsider wants to access a target3etwork and become an inside attacker with legitimatecredentials, it needs to be authenticated and given the secretkey to access the target network. In addition, network re-sources are accessed according to the privilege of each user.Therefore, to compromise a legitimate node, the attackershould obtain appropriate privileges to access them. We consider APT attackers performing multi-staged attacksfollowing the cyber kill chain (CKC) for compromisinga target node and exﬁltrating conﬁdential information tooutside [29]. We consider the APT attacks as follows.

APT Attack Procedure to Achieve Data Exﬁltration :We deﬁne an APT attacker’s goal in that the attacker hasreached and compromised a target node and successfullyexﬁltrated its conﬁdential data. We assume that nodes witha higher importance (i.e., having more important, credentialinformation) are more likely to be targeted.To reach a target node, the attacker needs to compromiseother intermediate nodes along the way. We often call thepath to the target node ‘an attack path.’ In reality, theattacker may not have an exact, complete view on networktopology. We assume that the attacker only knows its adja-cent nodes (i.e., nodes being directly connected) and needsto choose which node to compromise next. The attacker willconsider how easily given adjacent node i can be exploitedaccording to an attack cost metric, ac k , for attack strategy k . Moreover, if the attacker ﬁnds already compromised,adjacent nodes, it can leverage it and has no need to putadditional effort to compromise it. We call this ‘the valueof an intermediate node i in an attack path,’ denoted by AP V ( i, k ) , where k refers to attack strategy ID. Highest AP V ( i, k ) will be added to the attacker’s attack path to thetarget node. AP V ( i, k ) is given by: AP V ( i, k ) = (cid:40) (1 − ˆac k ) · P vi if cp i == 0 , otherwise. (4)Here ˆac k = e − (1 / ac k ) ∈ [0 , that represents a normalized at-tack cost where ac k is a predeﬁned attack cost ranged in [0 , (see ‘Attack Strategy Attributes’ later in this section), and vulnerability i is the overall vulnerability in Eq. (2). Given anode to be compromised next, its vulnerability degree canbe computed as P vi ( = vulnerability i / ). If cp i = 1 (i.e.,node i is compromised), the attacker may add it to the attackpath at no cost, which gives AP V ( i, k ) = 1 . The attackermay need to compromise more than one intermediate nodesbefore reaching a target node. Attack Strategy Attributes : An APT attacker can per-form multiple attacks through the stages of the CKC. Eachattack strategy k can be characterized by: (1) attack cost, ac k ,indicating how much time/effort is needed to launch theattack; and (2) the expected impact (i.e., attack effectiveness)upon attack success, ai k . ac k is a predeﬁned constant asan integer in [0 , reﬂecting no, low, medium, and highcost, respectively. ai k is obtained by victim j ’s criticality, c j (see Eq. (1)). This implies the attack beneﬁt throughcompromising a set of exploitable nodes. If there have beenmultiple nodes being compromised by taking given attack k , ai k captures the criticalities of the compromised nodes by: ai k = (cid:80) j ∈ C k c j N , (5)

TABLE 2 C HARACTERISTICS OF

APT A

TTACK S TRATEGIES AS CKCstage Attackcost ( ac ) Node com-promise Exploitedvulnerability AS R – DE 1 No UV AS D – DE 3 Yes (SN) SV + EV AS E – DE 3 Yes (MN) SV AS E – DE 3 Yes (SN) SV + UV AS E – DE 1 Yes (SN) UV AS C2 – DE 3 Yes (SN) EV AS E – DE 2 Yes (SN) EV AS DE 3 Yes (SN) S + EVNote: Each CKC stage is indicated by Reconnaissance (R), Delivery(D), Exploitation (E), Command and Control (C2), Lateral Movement(M), and Data Exﬁltration (DE). Attack cost is ranged in [1 , as aninteger, representing low, medium, and high, respectively. Nodecompromise may involve a single node compromise (SN) or multiplenodes compromise (MN). Exploited vulnerability is indicted byOverall (O: Average vulnerability across all three types ofvulnerabilities), Software (SV: software vulnerability), Encryption (EV:vulnerability by compromising encryption key(s)); and Unknown (UV:unknown vulnerability). where C k is a set of compromised nodes by given attack k and N is the the total number of nodes. If node j is alreadycompromised, then there is no additional attack impact, ai k = 0 , introduced by attack strategy k . Compromisingmore important nodes with highly conﬁdential informationleads to early system failure (see Eq. (8)). Attack Strategies : Attackers in IoT environments havetheir own characteristics. We consider several types of at-tacks at the different stages of the CKC by an APT attacker.The CKC consists of six stages denoted by (R, D, E, C2, M,and DE) (see Table 2). Each attack strategy is characterizedby (1) in which CKC stage the attacker is in; (2) whether theattacker will compromise other nodes in an attack path toreach a target; (3) what attack cost ac k and attack impact ai k ) are associated with each attack strategy k ; and (4)what vulnerability an attacker can exploit to perform agiven attack strategy ( AS k ). For simplicity, when an attackerexploits more than one vulnerability, the average securityvulnerability is used to compute the normalized vulnerabil-ity, P vi . In addition, each attack strategy k ’s attack impact, ai k , is obtained based on Eq. (5). Note that an attacker canselect a non-compromised adjacent victim with the highest AP V value (see Eq. (4)) to maximize the attack successprobability while minimizing the attack cost. We describeeach attack strategy as follows: • AS – Monitoring attack : This attack is to collect usefulsystem information and identify a vulnerable node tocompromise as a target. It can be performed inside oroutside the network from R to DE stages. In this attack,no node compromise process is involved and accordinglyits attack cost is low, ac = 1 . • AS – Social engineering: The typical examples of thisattack include email phishing, pretexting, baiting, or tail-gating [23]. We assume that an inside attacker can suc-cessfully compromise an adjacent node if the attack issuccessful. If the attacker is an outside attacker, it canidentify a node as vulnerable during its reconnaissancestage. This attack can be performed from D to DE stages asan outside or inside attacker. Since it is highly challengingto deceive a human user who can easily detect a social4ngineering attack, the associated attack cost for AS ishigh, ac = 3 . • AS – Botnet-based attack: A botnet consists of compro-mised machines (or bots) running malware using C2 ofa botmaster. When this attack is chosen, all compromisednodes (including original attackers) will launch epidemicattacks (e.g., spreading malware to compromise) to theiradjacent, legitimate nodes [6]. This attack can be usedfrom E to DE stages. This attack incurs high attack cost, ac = 3 . • AS – Distributed Denial-of-Service (DDoS) : A set of com-promised nodes can form a botnet and perform DDoS bysending multiple requests [6]. When an attacker tries tocompromise one of its adjacent nodes as a potential victimnode, if all compromised nodes send service requeststo the potential victim node, the potential victim node’svulnerability may increase because it could not properlyhandle all operations due to the large volume of requestsreceived (e.g., not properly executing underlying securityoperations). This will allow the attacker to easily compro-mise the potential victim node or exﬁltrate conﬁdentialdata from it. To model this, unknown vulnerability, uv i ,for a given victim node i will increase for the attackerto more easily compromise a node with unknown vul-nerability (e.g., increasing (cid:15) % for UV). This attack canbe performed from E to DE stages, with high attack cost, ac = 3 . • AS – Zero-day attacks: This attack can be performed toexploit unknown vulnerabilities of software, which arenot patched yet. The attacker can compromise chosenadjacent node i based on normalized uv j . This attack canbe performed from E to DE stages at low cost, ac = 1 . • AS – Breaking encryption: Examples include a legitimatenode’s private or secret key compromise. The attackerwith the encryption key is considered an inside attackerwith a privilege to exploit system resources. This attackcan be launched from C2 to DE stages to collect systemconﬁgurations or conﬁdential information. Upon the at-tack success, the attacker can intercept all the informa-tion to be sent to a victim node whose private key iscompromised. This attack may exploit vulnerabilities ˆev i associated with encryption keys and involve high attackcost, ac = 3 . We assume that if a legitimate node’s privatekey is compromised, the node is compromised. Hence, theattacker can escalate its attack by reauthenticating itselfwith a new password and steal conﬁdential informationor implant malware into ﬁle downloads. • AS – Fake identity: This attack can be performed whenpackets are transmitted without authentication or inter-nal nodes spooﬁng the ID of a source node, such asMAC/IP/Virtual LAN tag spooﬁng in an SDN-based IoTby an SDN switch [26]. This attack involves compromisinga node with a fake ID. This attack can be performed fromE to DE stages with cost, ac = 2 . This attack increasesthe encryption vulnerabilities of its adjacent nodes (e.g.,increasing (cid:15) % for EV). • AS – Data exﬁltration : This attack will also allow theattacker to compromise one of the adjacent nodes. Theattacker will check all data compromised by itself untilDE stage. Then, if the accumulated importance of compro-mised data exceeds a certain threshold (i.e., (cid:80) j ∈ C A c j > Th c ), the attacker can decide whether to exﬁltrate thecollected intelligence to the outside. This attack costs highwith ac = 3 .We summarize the characteristics of all attack strategiesconsidered in terms of the CKC stages involved, attack cost,node compromise, and exploited vulnerability in Table 2.Except AS , the attack success from AS to AS is deter-mined based on whether all nodes on the attack path toreach a target node have been successfully compromised.For AS , the attack success is determined based on howlong the attacker has monitored a target system. This iscomputed by the probability vulnerability i · e − /T A where T A is the time elapsed the attacker has monitored a given targetsystem. This implies that the attack is likely successful whenthe attacker has more scanned the targeted system longerand ﬁnd more vulnerabilities. After the attacker exﬁltratesdata successfully and leaves the system, a new attacker willarrive. Otherwise, the attacker may be evicted by the NIDSor need to try other attack strategies to escalate its attack toa next level. An Attacker’s Deception Detectability : Depending onan attacker’s capability, the attacker may have a differentlevel of intelligence to detect defensive deception tech-niques. We denote it by ad to represent an attacker’s prob-ability (omitted an attacker’s ID for simplicity) to detectdeception used by the defender. An attacker can use thisprobability, ad , to detect honeypots or honey information,as described in DS in the next section below. Attack Intelligence Collection : Different types of defensestrategies can be deployed by the defender to counter APTattackers. At the same time, the NIDS will be run periodi-cally (see Section 3.1). Note that we don’t count triggeringan NIDS as one of defense strategies in order to meet a highstandard of the system integrity. When an attacker arrivesat the system as an inside attacker (i.e., after the E stage),it can be detected by the NIDS. However, the system aimsto collect more attack intelligence (e.g., attack signatures),which can improve the NIDS as a long-term goal. Thus,depending on the perceived risk level from the attacker, thesystem will determine whether to keep the detected attackerin the system or evict it. We estimate the perceived systemrisk level based on the criticality level of the compromisednode, c i , and determine if the system will allow the attackerto reside in the system or be evicted according to predeﬁnedrisk threshold, Th risk ∈ [0 , . The decision to evict node i ,which is detected as compromised, can be given by: Evict i = (cid:40) if c i > Th risk otherwise. (6)Here Evict i = 1 means evicting node i while Evict i = 0 means allowing node i to reside in the system. Note that thisrule is applied when node i is detected as compromised bythe NIDS regardless of its correctness. Hence, false positivenodes can be also assessed by this rule while false negativenodes can safely reside in the system without being assessedby Th risk .When nodes detected as compromised (i.e., true andfalse positives) are evicted, all associated edges will be5isconnected, which may generate some non-compromisednodes being isolated from the network. To maintain connec-tivity of non-compromised but isolated nodes, we connectthem to the network based on P ri to maintain node i ’smean degree based on the ER network model [28]. To dealwith the attackers (or compromised nodes) residing in thesystem, which are either false negatives or attackers keptto collect further attack intelligence, the defender systemcan take the several defense strategies. Each strategy k willbe represented by: (i) defense cost ( dc k ) in time/complexityand expense, where dc k ∈ [0 , as an integer for no, low,medium, and high cost, respectively; (ii) defense impact( di k ) for its defense effectiveness; (iii) the stage of the CKC(i.e., R, D, E, C2, LM, or DE) for strategy k being used; and(iv) system change on what actual changes are made in thesystem (e.g., what vulnerabilities are reduced or networktopology or cryptographic keys being changed). The defenseimpact, di k , is computed by: di k = 1 − ai k , (7)where ai k is the attack impact introduced by strategy k inEq. (5). We measure the effectiveness of a defense strategyas the opposite impact of attack success (i.e., successfullycompromising a node). That is, attack failure will increasethe impact of the defense strategy. Defense Strategies : This work considers the followingdefense strategies: • DS – Firewalls : We assume that ﬁrewalls are implementedin the SDN controller to monitor and control the incom-ing and outgoing packet ﬂows according to predeﬁnedrules. We model the effectiveness of ﬁrewalls by loweringdown unknown vulnerabilities ( uv i ) all over the network.Speciﬁcally, ﬁrewall is assumed to reduce vulnerabilitiesto outside attackers by a certain percent (i.e., (cid:15) % ). • DS – Patch Management : Known vulnerabilities can bepatched by a given defense system [25]. A patch is usedto temporarily ﬁx software vulnerabilities or provide up-dates in a full software package. A patch refers to asoftware update such as code to be installed in a softwareprogram. This will decrease software vulnerabilities ( sv i )of all nodes, such as decreasing a certain percent of thevulnerability (i.e., (cid:15) % ). • DS – Rekeying Cryptographic Keys : Cryptographic keysused for all nodes in the network are rekeyed, whichlowers the encryption vulnerability by setting T rekey = 1 which reduces ˆev i = ev i · e − / T rekey . • DS – Eviction : Recall that an attacker with low risk (seeEq. (6)) is allowed to stay in the system for collectingattack intelligence. However, as the system is at risk dueto high security vulnerability in terms of the amount ofcompromised conﬁdential information (i.e., importance;see Eq. (8)), all inside attackers (or compromised nodes)will be evicted from the system. However, the false nega-tives will remain in the system while a substantial numberof compromised nodes is evicted using DS . • DS – Low/high-interaction honeypots (LHs/HHs) [24]: LHsand HHs can be activated as a defense strategy. LHs andHHs differ in their deception detectability and cost. In agiven network, we deploy a set of LHs and HHs which aredeactivated in the deployment phase. When this strategyis selected, they will be activated, which will change the TABLE 3 C HARACTERISTICS OF D EFENSE S TRATEGIES DS CKCstage Defensecost ( dc ) System change ( dsc ) DS R – D 1 Lowering UV DS D – DE 2 Lowering SV DS E – DE 3 Lowering EV DS E – DE 3 Evict all compromisednodes DS E – DE 3 Lure attackers to with LHsand HHs DS C2 – DE 1 Disseminate fake systemvulnerability information DS E – DE 2 Plant a fake key DS R – DE 2 Hide critical network edgesNote: Each CKC stage is indicated by Reconnaissance (R), Delivery(D), Exploitation (E), Command and Control (C2), Lateral Movement(M), and Data Exﬁltration (DE). Defense cost is ranged in [1 , as aninteger, representing low, medium, and high, respectively. Systemchange may involve lowering unknown vulnerabilities (UV), softwarevulnerabilities (SV), or encryption vulnerabilities (EV). network topology as LHs and HHs are to be connectedwith a number of nodes in the network. Hence, DS willchange attack paths and lure attackers to the honeypots.To be speciﬁc, when DS is selected, LHs and HHs willbe activated. This will enable them to be connected tohighly vulnerable nodes based on vulnerability i whereHHs will be connected to nodes of higher vulnerabilitythan nodes connected to LHs. In order for the attacker notto reach legitimate nodes, we will only allow incomingconnections (i.e., in-degree) from legitimate nodes to thehoneypots. Once the attacker is caught by one of theimplemented honeypots, it will be diverted to a fakenetwork for monitoring purposes. Recall that an attackercan detect the deception with ad for a LH and ad / for aHH. • DS – Honey information : This defense strategy can lureattackers by disseminating false information, such ashoney token, fake patch, honey ﬁles, or bait ﬁles. Thisstrategy will involve the dissemination of false systemvulnerability information, such as providing high (low)vulnerabilities for less (more) vulnerable nodes. The at-tacker will need to detect whether a known vulnerabilityof a potential victim node is true or fake according to itsdeception detectability, ad . If the attacker is successfullydeceived, it will make an attack strategy decision basedon incorrect vulnerability information. • DS – Fake keys [3]: Fake keys can be planted for potential,inside attackers which may use a fake key obtained bycompromising another legitimate, inside node to commu-nicate with other nodes to obtain more conﬁdential infor-mation. This will be realized that even if the attacker com-promises a cryptographic key (e.g., AS , AS , AS , AS ),a potential victim targeted by the attacker may not becompromised. We model this using the probability theattacker obtains a fake key implanted in nodes, P fake .When the attacker obtains the fake key of a node, the nodewill not be compromised. • DS – Hiding network topology edges : This strategy hides c NT % of network edges in order to hide an actual networktopology to an attacker. We use a simple rule for each nodeto hide the edge with the most critical adjacent node basedon its criticality value, c i .All defense strategies will have corresponding defense6osts ( dc k ’s) and are believed useful when the attacker arein certain CKC stages based on the defender’s belief. This isused for the defender to choose each subgame based onhypergame theory. We summarized the characteristics ofeach defense strategy considered in Table 3. System Failure Conditions : We deﬁne that a systemfailure (SF) occurs when the following condition is met: SF = (cid:40) if ρ ≤ (cid:80) i ∈ G cp i · Importance i (cid:80) i ∈ G Importance i || ρ ≥ | G t || G | otherwise. (8)Here G t refers to a network at time t which does not includenodes being evicted while G is an original network. Hence | G | and | G t | are the number of the original nodes and thenumber of the current nodes in the system at time t , respec-tively. ρ is a threshold as a fraction to determine whether asystem fails or not based on the sum of compromised nodes’importance values over the sum of all nodes’ importancevalues. SF mainly captures the system failure caused by theloss of three security goals, such as conﬁdentiality, integrity,and availability. ρ is a threshold to determine whethera system can functionally operate based on a sufﬁcientnumber of active nodes at time t . TTACK -D EFENSE H YPERGAME

First, the attacker will select strategy AS to monitor a targetsystem in the reconnaissance (R) stage, aiming to penetrateinto it as a legitimate user. If the attack is successful basedon the success probability vulnerability i · e − /T A , the attackercan proceed to the delivery (D) stage of the CKC. In theD stage, the attacker can choose one of the two strategies AS and AS . If the attacker can successfully compromise atargeted victim node, which is one of its adjacent nodes, itcan successfully penetrate the system and become an insideattacker with legitimate credentials. Now the attacker is inthe Exploitation (E) stage. From E to data exﬁltration (DE)stages, any inside attacker detected can be assessed by thedefender on whether it can stay in the system based onthe risk assessment in Eq. (6). Hence, depending on thecriticality of the attacked node, the attacker can be detectedby the NIDS or be kept in the system if the defense systemintends to collect attack intelligence from it. To assess suchrisk, the attacker should be detected as an attacker (i.e., trueand false positives) by the NIDS. If not (i.e., false negatives),the attacker can safely stay even without being detected.From E to DE, the attack is determined as successful if ai i > (see Eq. (5)). If the original attacker (i.e., a nodethe attacker is on) is evicted, then a new attacker will arrive.If an attacker is successful by taking AS (data exﬁltrated),it will leave the system and a new attacker will arrive. Thisprocess will continue until the system fails based on Eq. (8).Next we formulate the hypergame between the attackerand defender, and deﬁne the game components. We pro-vided the detailed explanation of hypergame theory formu-lations and its related equations in Appendix A, which areused in the sections below. An attacker’s utility ( u Apq ) corresponding to attack strategy p ( AS p ) can be expressed as the difference between attack TABLE 4 P OSSIBLE S TRATEGIES U NDER E ACH S TAGE OF THE

CKCSubgame CKC stage Attack strategies Defense strategies0 Full game AS − AS DS − DS AS DS , DS AS , AS DS , DS AS − AS , AS DS − DS , DS AS – AS DS – DS AS – AS DS – DS AS – AS DS – DS gain and attack loss. The attacker’s utility ( u Apq ) when theattacker takes AS p and the defender takes DS q is calculatedby: u Apq = G Apq − L Apq , G

Apq = ai p + dc q , L Apq = ac p + di q , (9)where the attack and defense cost (i.e., ac p and dc q ) and theattack and defense impact (i.e., ai p and di q ) are discussed inSections 3.4 and 3.5, respectively. A defender’s utility ( u Dqp ) by selecting DS q when theattacker takes AS p can be computed based on the differencebetween the gain and loss by: u Dqp = G Dqp − L Dqp , G

Dqp = di q + ac p , L Dqp = dc q + ai p . (10)Similar to u Apq , the attack and defense cost (i.e., dc q and ac p ) and the attack and defense impact (i.e., di q and ai p )are computed. We consider a zero-sum game between theattacker and defender (i.e., u Apq + u Dqp = 0 ). As in Eq. (22) in Appendix A, an attacker’s and defender’shypergame expected utilities (HEUs) are estimated basedon the level of uncertainty, g , perceived by each player. Inthis section, we show how the level of g is estimated by theattacker (i.e., g A ) and the defender (i.e., g D ). Note that weomit an ID of the attacker and defender for simplicity.We model an attacker’s perceived uncertainty basedwhether a defensive deception is used and how long theattacker has monitored a target system. That is, given thetime period the attacker has monitored in a target system( T A ) and a defense strategy taken ( df ), the attacker’s uncer-tainty ( g A ) is estimated by: g A = 1 − exp( − λ · df /T A ) . (11)Here λ is a parameter of representing an amount of initialknowledge towards a given system conﬁguration (higher λ increases uncertainty, and vice versa) and df = 1 + (1 − ad) · dec . df returns 1 when no defensive deception is used(i.e., dec = 0 ); it returns − ad) where ad refers toan attacker’s deception detectability in [0 , when defensivedeception is used (i.e., dec = dc where dc is defense costimplying that higher defense cost allows higher quality ofdeception). The formulation of g A above implies that theattacker has lower uncertainty as it has monitored the targetsystem longer. On the other hand, the attacker has higheruncertainty when it has lower deception detectability andthe defender uses a defensive deception strategy. Hence,we set dec = dc (defense cost) when defensive deceptionstrategies, DS – DS , are taken while setting dec = 0 whennon-deception defense strategies, DS − DS , are taken. A defender’s uncertainty towards an attacker increasesas it has monitored the attacker for a longer period where7he defender’s monitoring time towards the attacker isdenoted by T D . In addition, if the attacker has not beendeceived by defense strategies, it is assumed to be intelligentnot to expose its information to the defender. Consideringthese two, we model g D by: g D = 1 − exp( − µ · ad /T D ) , (12)where µ is a parameter of representing an amount of initialknowledge towards an attacker (higher λ increases uncer-tainty, and vice versa), ad is an attacker’s deception de-tectability, and T D is a defender’s accumulated monitoringtime towards the attacker. In g D , the defender perceiveslower uncertainty at longer T D while perceiving higheruncertainty at higher ad . In order to calculate the HEU for each player (see Eq. (22)in Appendix A), we need to obtain P κ (i.e., the probabilitya row player chooses subgame κ ), r κp (i.e., the probabilitythat a row player takes strategy k in subgame κ ), and c κj (i.e., the probability that a column player takes strategy h in subgame κ based on a row player’s belief) because S q isestimated based on P κ and c κh while r κp is needed when arow player considers strategy k . Computation of P κ : Recall that P κ refers to the probabil-ity that subgame κ is played by a row player. We notate thisfor an attacker and a defender by P Aκ and P Dκ , respectively.We deﬁne a subgame based on where an attacker is locatedin the stages of the CKC which will determine a set ofavailable strategies for both parties. We assume that theattacker clearly knows where it is located in the CKC whilethe defender is not certain about the stage of the attackerin the CKC. We model the defender’s P Dκ based on itsuncertainty g D . Thus, the defender can know the CKCstage of the attacker with − g D (certainty) and correctlychoose a subgame based on the attacker’s actual stage inthe CKC. With g D , the defender will choose subgame (i.e.,a full game with all available strategies). The set of availablestrategies may be different depending on what subgame toplay, as shown in Table 4. Computation of r κh and c κh : r κh is the probability thata row player will play strategy h . We denote this for the at-tacker and defender by r Aκp and r Dκq for attack strategy p anddefense strategy q , respectively. c κh is the probability that acolumn player will take strategy h based on a row player’sbelief. We also denote this for the attacker and defender by c Aκp and c Dκq attack strategy p and defense strategy q , respec-tively. In the very beginning, since no historical informationis available, each player will use a uniform probability bychoosing one of available strategies in a chosen subgamewith an equal probability, meaning choosing a strategy atrandom. As players participate in repeated games, theirrecorded history regarding what strategies have been takenis available. Then, we will use Dirichlet distribution [35]to model multinomial probabilities based on the strategiestaken for past repeated games. If either an attacker ordefender is certain about the opponent’s strategy, it willestimate its corresponding r Aκp , r Dκq , c Aκp , and c Dκq as: r Aκp = γ Aκp (cid:80) p ∈ AS κ γ Ap , c Aκq = γ Dκq (cid:80) q ∈ DS κ γ Dq , (13) r Dκq = γ Dκq (cid:80) q ∈ DS κ γ Dq , c Dκp = γ Aκp (cid:80) p ∈ AS κ γ Ap . (14)Note that AS κ and DS κ are a set of attack strategies anddefense strategies, respectively. γ Dq ( γ Ap ) is the number oftimes the defender (attacker) will take strategy q (or p )based on the attacker’s (defender’s) belief up to time ( t − where the current state is at time t . Since the probability of acolumn player playing a particular strategy is estimated bya row player’s belief, ground truth c Aκq and c Dκp (as shownin the equations above) will be only detected with theprobability (1 − g A ) and (1 − g D ) when the row player isan attacker or a defender, respectively. Otherwise, the rowplayer will select one among the available strategies in agiven subgame κ at random due to the uncertainty. An attacker’s HEU (AHEU) is computed with: (1) attackutilities (i.e., u Akh ’s in Eq. (9)); (2) the attacker’s belief aboutdefense strategy h (i.e., S Aq in Eq. (21) in Appendix A); and(3) the attacker’s perceived uncertainty (i.e., g A in Eq. (11)).Similarly, a defender’s HEU (DHEU) is estimated using:(1) defense utilities (i.e., u Dqp ’s in Eq. (10)); (2) the defender’sbelief about attack strategy p (i.e., S Dp in Eq. (21) in AppendixA); and (3) the defender’s perceived uncertainty (i.e., g D inEq. (12)). Both AHEU and DHEU can be obtained based onEq. (22) in Appendix A. Since the row player selects eachstrategy h based on r κh for given subgame κ , we calculateAHEU and DHEU as follows:AHEU ( rs Ap , g A ) = HEU ( rs Ap , g A ) , DHEU ( rs Dq , g D ) = HEU ( rs Dq , g D ) . (15)A player will play a strategy according to the probabilitydistribution of strategies available in a given subgame κ . XPERIMENTAL S ETTING

In this work, we use the following metrics : • Perceived Uncertainty Level ( g A or g D ): An attacker’s or adefender’s mean uncertainty level which is measured asshown in Eqs. (11) and (12), respectively. • Hypergame Expected Utility (HEU): This metric measuresthe HEU of played strategies proﬁle, according to Eq. (22)in Appendix A in the supplement document. • Cost for Taking a Chosen Strategy ( C A or C D ): This metricmeasures the average attack (or defense) cost paid by anattacker (or a defender) to play a speciﬁc strategy. Attackcost ( C A ) and defense cost ( C D ) of all available strategiesare summarized in Tables 2 and 3, respectively. For a givenscenario consisting of a series of games until the systemfails based on Eq. (8), the average attack or defense costper game is demonstrated. • Mean Time to Security Failure (MTTSF): This metric mea-sures a system lifetime based on the system states that donot fall in the system failure states based on Eq. (8). • TPR of an NIDS : This metric measures the true positiverate of the NIDS in order to observe how much defensivedeception can improve the quality of the NIDS based onthe attack intelligence collected during the time of usingdefensive deception.8ur work compares the performance following schemes: • Game with defensive deception and perfect information (DD-PI) : This scheme plays a game where each player hasperfect information regarding which strategy is played byits opponent, which means there is no uncertainty, g = 0 (i.e., g A = g D = 0 ), when a defender uses all defensivedeception (DD) strategies. • Game without defensive deception and perfect information (No-DD-PI) : This scheme plays a game where each player hasperfect information regarding what strategy its opponentplays (i.e., g = 0 ) when the defender does not use DDstrategies. • Hypergame with defensive deception and imperfect information(DD-IPI) : This scheme plays a game where each playerdoes not have perfect information regarding the strategyof its opponent (i.e., g > with g A > , g D > ) when thedefender uses DD strategies. This is our proposed schemethat considers uncertainty g (i.e., imperfect information,IPI) and DD. • Hypergame without defensive deception and imperfect informa-tion (No-DD-IPI) : This scheme plays a game where eachplayer does not have perfect information towards whatstrategy its opponent takes (i.e., g > ) when the defenderdoes not use DD strategies.We consider 500 nodes in a given network where anetwork topology is generated by the ER random graphmodel with G ( N, P r ) where N is the total number of nodesand P r (= P ri ) is the connection probability between anypair of nodes [28]. To consider honeypots with low or highinteractions, we also assign 75 nodes as honeypots with 50LHs and 25 HHs. For honeypots, we maintain a directed net-work where the outgoing edges (i.e., out-degree) are fromeach honeypot to all other honeypots to ensure an attackernot to be connected with other legitimate nodes. When ahoneypot is activated (i.e., DS ), highly vulnerable nodesare connected to the honeypot as an incoming edge (i.e., in-degree). However, outgoing edges from the honeypot arealways forwarded to other honeypots, not real legitimatenodes, which are protected from the attacker. In our ex-periment, when the honeypots are activated, the top 225vulnerable nodes are connected to honeypots where top 75vulnerable nodes are connected to 25 HHs and next top 150vulnerable nodes are connected to 50 LHs. We assume thatthe defender has inherently higher uncertainty regarding anattacker while the attacker has a certain level of knowledgeregarding a system due to its reconnaissance effort beforebecoming an inside attacker. This was reﬂected by setting λ = 0 . and µ = 8 in Eqs. (11) and (12), respectively. Wesummarized the notations of key design parameters, theirmeaning, and default values used in Table 5 in Appendix Bof the submitted supplement document. ESULTS & A

NALYSES

In Fig. 1a, the attacker’s perceived uncertainty is plotted forthe four defense schemes. When imperfect information (IPI)is considered, the attacker has fairly high uncertainty at thebeginning regardless of whether defensive deception (DD)is used or not (i.e., DD-IPI starting from over 0.7 and No-DD-IPI starting from over 0.55). The reason is that usingDD strategies can provide a high chance to increase the attacker’s uncertainty by misleading the attacker. On theother hand, under perfect information (PI), the attacker’suncertainty is zero. However, without DD (i.e., No-DD-PIand No-DD-IPI), the system lifetime (i.e., MTTSF) is short,so the curve with respect to the number of games stops ataround 80 rounds. Under DD-IPI, the system lifetime muchmore prolongs compared to No-DD schemes. However, theattacker’s uncertainty under DD-IPI decreases with morerounds of games played, because playing more games willresult in more compromised nodes. A new attacker canleverage this situation to get into the system quicker. Thismakes the inside attacker stay longer in the system, result-ing in lowering uncertainty as more rounds of games areplayed. Some more ﬂuctuations in later games are due tothe small number of runs in simulation show long systemlifetimes. Notice that using DD makes the system prolongeven if it allows some compromised nodes to reside in thesystem. This is because the system does not evict detectedintrusions immediately after detecting them but does re-assess them to reduce false positives while collecting moreattack intelligence. This process can give a chance for theNIDS to improve its detection rate. In addition, the attackerperceives lower uncertainty as more games are played asit can perceive less uncertainty about the system since ithas been in the system for a while. This was intentionallyallowed by the defender to collect attack intelligence.In Figs. 1b and 1c, under varying the vulnerability ofnodes in the network, we plotted AHEU and attack costfor each defense scheme. We didn’t observe any noticeablesensitivity with respect to varying the extent of node vulner-ability. This is because all three metrics, uncertainty, AHEU,and attack cost do not depend on network conditions butrather depend on the choices of strategies by the attackerand corresponding impact and cost in HEU. In terms ofAHEU in Fig. 1b, overall, the attacker performs betterunder DD-based schemes than No-DD-based schemes. Thereason is that under DD-based schemes, attackers can usemore strategies by being an insider of the system whileperforming only monitoring attacks as an outside attackerunder No-DD-based schemes. In addition, this leads theattacker naturally to perform better under PI than IPI. Thisexplains why the attacker obtains the highest AHEU underDD-PI while having the lowest AHEU under No-DD-IPI.In terms of attack cost, the attacker used more cost underIPI while using less cost under PI as shown in Fig. 1c.Under uncertainty, the attacker cannot choose its optimal,cost-effective strategy. Moreover, the attacker paid highercost under DD while incurring a lower cost under No-DD.This implies that DD strategies are effective to mislead theattacker to choose less cost-effective strategies by increasingits uncertainty. We also discussed how the attack cost andAHEU with respect to the number of games in Fig. 4 (a)-(b) in Appendix B of the supplement document with thedetailed explanations of the observed trends.In Fig. 2a, the defender’s uncertainty is shown with re-spect to the number of attack-defense hypergames. Overallunder IPI, the defender’s uncertainty is much lower thanthe attacker’s uncertainty. This is because the defender cancollect more attack intelligence while the attacker can beinterrupted by DD strategies which increase the attacker’suncertainty. In addition, there are more ﬂuctuations under9 a) Attacker’s uncertainty (b) Attacker’s HEU (c) Attack cost

Fig. 1. An attacker’s uncertainty, hyergame expected utility (AHEU), and attack cost. The ‘vulnerability upper bound’ ( U v ) refers to the CVSS-basedsoftware vulnerability score of IoT devices, Web servers and Databases, which is scaled in [1 , U v ] . (a) Defender’s uncertainty (b) Defender’s HEU (c) Defense cost Fig. 2. A defender’s uncertainty, hyergame expected utility (DHEU), and defense cost. The ‘vulnerability upper bound’ ( U v ) refers to the CVSS-basedsoftware vulnerability score of IoT devices, Web servers and Databases, which is scaled in [1 , U v ] . (b) MTTSF (c) TPR of an NIDS Fig. 3. System lifetime (i.e., MTTSF) and true positive rate (TPR) of anNIDS under varying the level of system vulnerability.

No-DD-IPI is because attackers are not allowed to stay ina system if they are detected as compromised. This keepsresetting the time the attacker has stayed in the system,which makes the defender observe the same attacker fora longer time.In Figs. 2b and 2c, we further show DHEU and defensecost for varying the vulnerability upper bound of nodes ( U v )in the network. In Fig. 2b, compared to AHEU (i.e., 2.5 to 6),we can observe much higher DHEU (i.e., 6 to 8.5). SinceHEU is estimated based on the impact and cost of taking achosen strategy, using DD costs more, leading to loweringDHEU. Besides, under IPI, the defender may not choose itsoptimal strategy all the time, lowering down DHEU due toless beneﬁt of taking a chosen strategy. Hence, it is reason-able to observe that the highest DHEU is obtained with No-DD-PI while the lowest DHEU is observed with DD-IPI. InFig. 2c, as expected, the highest defense cost incurs underDD-IPI while the lowest defense cost is observed underNo-DD-PI. This also reﬂects the role of the defense cost inDHEU. DHEU and defense cost with respect to the numberof games are also discussed in Fig. 4 (c)-(d) in Appendix Bof the submitted supplement document.In Fig. 3b, we showed how the four different schemesperform under varying the extent of node vulnerability in terms of MTTSF. Regardless of whether PI or IPI is con-sidered, DD-based schemes outperformed non-DD-basedschemes. Again this is because DD-based schemes allow forthe reassessment of detected intrusions, leading to reduc-tion in false positives while improving TPR of the NIDS.However, DD-IPI outperformed all other schemes in termsof MTTSF. This is because IPI can allow the defender toeffectively leverage the nature of DD strategies for mis-leading the attacker effectively and making it choose non-optimal strategies. Moreover, we notice that the behaviorof DD schemes is sensitive to node vulnerability, showingthe reduced MTTSF under high vulnerability because theattacker can better exploit vulnerable nodes and more efﬁ-ciently compromise them. Except insensitivity under highvulnerability nodes, the performance trends in TPR of theNIDS are well aligned with those in MTTSF under the fourschemes, as shown in Fig 3c. TPR can be improved underDD-IPI due to the high effectiveness of DD under IPI.We also discussed the probability of each strategy takenby an attacker and a defender in Figs. 5 and 6 and TPR ofthe NIDS in Fig. 7 of Appendix B with respect to the numberof attack-defense games played under each scheme in thesubmitted supplement document. ONCLUSION & F

UTURE W ORK

From this study, we obtained the following key ﬁndings : • An attacker’s and defender’s perceived uncertainty canbe reduced when defensive deception (DD) is used. Thisis because the attacker perceives more knowledge aboutthe system as it performs attacks as an inside attacker. Onthe other hand, the defender’s uncertainty can be reducedby collecting more attack intelligence by using DD whileallowing the attacker to be in the system. • Attack cost and defense cost are two critical factors indetermining HEUs (hypergame expected utilities). There-fore, high DHEU (defender’s HEU) is not necessarilyrelated to high system performance in MTTSF (mean time10o security failure) or TPR (true positive rate) which canalso be a key indicator of system security. Therefore, usingDD under imperfect information (IPI) yields the bestperformance in MTTSF (i.e., the longest system lifetime)while it gives the minimum DHEU among all schemes. • DD can effectively increase TPR of the NIDS in the systembased on the attack intelligence collected through the DDstrategies.This work bring up some important directions for futureresearch by: (1) considering multiple attackers arriving ina system simultaneously in order to consider more real-istic scenarios; (2) estimating each player’s belief basedon machine learning in order to more correctly predict anext move of its opponent; (3) dynamically adjusting a riskthreshold, i.e., Eq. (6), depending on a system’s securitystate; (4) introducing a recovery mechanism to restore acompromised node to a healthy node allowing the recoverydelay; (5) developing an intrusion response system that canreassess a detected intrusion in order to minimize falsepositives while identifying an optimal response strategy todeal with intrusions with high urgency; and (6) consideringanother intrusion prevention mechanism, such as movingtarget defense, as one of the defense strategies. A CKNOWLEDGEMENT

This research was partly sponsored by the Army Re-search Laboratory and was accomplished under Cooper-ative Agreement Number W911NF-19-2-0150. In addition,this research is also partly supported by the Army ResearchOfﬁce under Grant Contract Number W91NF-20-2-0140. Theviews and conclusions contained in this document are thoseof the authors and should not be interpreted as representingthe ofﬁcial policies, either expressed or implied, of the ArmyResearch Laboratory or the U.S. Government. The U.S. Gov-ernment is authorized to reproduce and distribute reprintsfor Government purposes notwithstanding any copyrightnotation herein. R EFERENCES

IEEE Trans. Systems, Man, and Cybernetics:Systems , vol. 48, no. 12, pp. 2158–2175, 2017.[3] H. Almeshekah and H. Spafford, “Cyber security de-ception,” in

Cyber Deception . Springer, 2016, pp. 25–52.[4] C. Bakker, A. Bhattacharya, S. Chatterjee, and D. L.Vrabie, “Learning and information manipulation: Re-peated hypergames for cyber-physical security,”

IEEEControl Systems Letters , vol. 4, no. 2, pp. 295–300, 2019.[5] P. G. Bennett, “Toward a theory of hypergames,”

Omega , vol. 5, no. 6, pp. 749–751, 1977.[6] E. Bertino and N. Islam, “Botnets and Internet of Thingssecurity,”

Computer , vol. 50, no. 2, pp. 76–79, Feb. 2017.[7] M. Boussard, D. T. Bui, L. Ciavaglia, R. Douville, M. L.Pallec, N. L. Sauze, L. Noirie, S. Papillon, P. Peloso,and F. Santoro, “Software-deﬁned LANs for intercon-nected smart environment,” in , Sep. 2015, pp. 219–227. [8] U. Brandes, “A faster algorithm for betweenness cen-trality,”

Jour. mathematical sociology , vol. 25, no. 2, pp.163–177, 2001.[9] J. W. Caddell, “Deception 101-primer on deception,”DTIC Document, Tech. Rep., 2004.[10] T. E. Carroll and D. Grosu, “A game theoretic investi-gation of deception in network security,”

Security andCommunication Networks , vol. 4, no. 10, pp. 1162–1172,2011.[11] W. Casey, A. Kellner, P. Memarmoshreﬁ, J. A. Morales,and B. Mishra, “Deception, identity, and security: Thegame theory of Sybil attacks,”

Comms. of the ACM ,vol. 62, no. 1, pp. 85–93, 2018.[12] J.-H. Cho, M. Zhu, and M. P. Singh,

Modeling andAnalysis of Deception Games based on Hypergame Theory .Cham, Switzerland: Springer Nature, 2019, ch. 4, pp.49–74.[13] K. Ferguson-Walter, S. Fugate, J. Mauger, and M. Major,“Game theory for adaptive defensive cyber deception,”in

Proc. 6th Annual Symp. on Hot Topics in the Science ofSecurity . ACM, 2019, p. 4.[14] N. M. Fraser and K. W. Hipel,

Conﬂict Analysis: Modelsand Resolutions . North-Holland, 1984.[15] N. Garg and D. Grosu, “Deception in honeynets: Agame-theoretic analysis,” in

Proc. IEEE Information As-surance and Security Workshop (IAW) . IEEE, 2007, pp.107–113.[16] B. Gharesifard and J. Cort´es, “Evolution of the percep-tion about the opponent in hypergames,” in

Proc. 49thIEEE Conf. Decision and Control (CDC) , Dec. 2010, pp.1076–1081.[17] ——, “Evolution of players’ misperceptions in hyper-games under perfect observations,”

IEEE Trans. Auto-matic Control , vol. 57, no. 7, pp. 1627–1640, Jul. 2012.[18] I. GmbH. MindNode. [Online]. Available: https://mindnode.com/[19] J. Han, J. Pei, and M. Kamber,

Data Mining: Conceptsand Techniques . Elsevier, 2011.[20] J. T. House and G. Cybenko, “Hypergame theory ap-plied to cyber attack and defense,” in

Proc. SPIE Conf.Sensors, and Command, Control, Comms., and Intelligence(C3I) Technologies for Homeland Security and HomelandDefense IX , vol. 766604, May. 2010.[21] T. Kanazawa, T. Ushio, and T. Yamasaki, “Replicatordynamics of evolutionary hypergames,”

IEEE Trans.Systems, Man, and Cybernetics - Part A: Systems andHumans , vol. 37, no. 1, pp. 132–138, Jan. 2007.[22] N. S. Kovach, A. S. Gibson, and G. B. Lamont, “Hy-pergame theory: A model for conﬂict, misperception,and deception,”

Game Theory , 2015, article ID 570639,20 pages.[23] K. Krombholz, H. Hobel, M. Huber, and E. Weippl,“Advanced social engineering attacks,”

Jour. Informa-tion Security and Applications , vol. 22, pp. 113–122, 2015.[24] S. Kyung, W. Han, N. Tiwari, V. H. Dixit, L. Srinivas,Z. Zhao, A. Doup´e, and G. Ahn, “Honeyproxy: Designand implementation of next-generation honeynet viaSDN,” in , Oct. 2017, pp. 1–9.[25] O. Leiba, Y. Yitzchak, R. Bitton, A. Nadler, and A. Shab-tai, “Incentivized delivery network of IoT software up-11ates based on trustless proof-of-distribution,” in , Apr. 2018, pp. 29–39.[26] Y. Liu, Y. Kuang, Y. Xiao, and G. Xu, “SDN-based datatransfer security for Internet of Things,”

IEEE Internetof Things Journal , vol. 5, no. 1, pp. 257–268, Feb. 2018.[27] D. F. Macedo, D. Guedes, L. F. M. Vieira,M. A. M. Vieira, and M. Nogueira, “Programmablenetworks—from software-deﬁned radio to software-deﬁned networking,”

IEEE Comms. Surveys Tutorials ,vol. 17, no. 2, pp. 1102–1125, Second Quarter 2015.[28] M. E. J. Newman,

Networks: An introduction , 1st ed.Oxford University Press, 2010.[29] H. Okhravi, M. A. Rabe, W. G. Leonard, T. R. Hobson,D. Bigelow, and W. W. Streilein, “Survey of cybermoving targets,” Lexington Lincoln Lab, MIT, TR 1166,2013.[30] U. S. Putro, K. Kijima, and S. Takahashi, “Adaptivelearning of hypergame situations using a genetic algo-rithm,”

IEEE Trans. Systems, Man, and Cybernetics-PartA: Systems and Humans , vol. 30, no. 5, pp. 562–572, Sep.2000.[31] Y. Sasaki, “Subjective rationalizability in hypergames,”

Advances in Decision Sciences , vol. 2014, no. Article ID263615, p. 7 pages, 2014.[32] A. Schlenker, O. Thakoor, h. Xu, F. Fang, M. Tambe,L. Tran-Thanh, P. Vayanos, and Y. Vorobeychik, “De-ceiving cyber adversaries: A game theoretic approach,”in

Proc. 17th Int’l Conf. Autonomous Agents and MultiA-gent Systems , 2018, pp. 892–900.[33] W. L. Sharp, “Military deception,” Joint War-FightingCenter, Doctrine and Education Group, Norfolk, VA,Pub. 3-13.4, 2006.[34] S. Tadelis,

Game Theory . Princeton University Press,2013.[35] Y. W. Teh,

Dirichlet Process . Boston, MA: Springer US,2010, pp. 280–287.[36] R. Vane,

Hypergame theory for DTGT agents . AAAI,2000.[37] ——, “Planning for terrorist-caused emergencies,” in

Proc. Winter Simulation Conf. , Dec. 2005.[38] ——, “Advances in hypergame theory,” in

Proc. AA-MAS Workshop on Game-Theoretic and Decision TheoreticAgents , 2006.[39] R. Vane and P. E. Lehner, “Using hypergames to selectplans in adversarial environments,” in

Proc. 1st Work-shop on Game Theoretic and Decision Theoretic Agents ,1999, pp. 103–111.[40] Wikipedia. (2018) Foureye butterﬂyﬁsh. Available athttps://en.wikipedia.org/wiki/Foureye butterﬂyﬁsh.[41] Y. Yin, B. An, Y. Vorobeychik, and J. Zhuang, “Optimaldeceptive strategies in security games: A preliminarystudy,” in

Proc. AAAI Conf. Artiﬁcial Intelligence , 2013. 12

PPENDIX A PPENDIX AH YPERGAME T HEORY

In this section, we brieﬂy discuss hypergame theory whichis mainly leveraged to propose the hypergame theoreticdefensive deception framework that deals with APT attacksin this work. This section was mainly used in Section 4 ofthe main paper.Hypergame theory offers two levels of hypergames thatcan be used to analyze games differently perceived bymultiple players [14]. We adopt ﬁrst-level hypergames forsimplicity. Although hypergame theory applies to multipleplayers, we consider a game of two players, an attacker anda defender.

A.1 First-Level Hypergame

Given two players, p and q , vectors of their preferences,denoted by V p and V q , deﬁne game G that can be repre-sented by G = { V p , V q } [14]. Note that V p and V q are player p ’s and player q ’s actual preferences (i.e., ground truth),respectively. If all players exactly know all other players’preferences, all players are playing the same game becausetheir view of the game is the same. However, in reality,that assumption may fail. Player p can perceive player q ’spreferences differently from what they are, leading to dif-ferences between p ’s view and q ’s view. A game perceivedby player p based on its perceived preferences about q ’spreferences, V qp , and the game perceived by player q basedon its perceived preferences about p ’s preferences, V pq , canbe given by G p = { V qp } , G q = { V pq } (16)Hence, the ﬁrst-level hypergame H perceived by each playeris written by H = { G p , G q } . In a ﬁrst-level hypergame,analysis is performed at the level of each player’s perceivedgame because each player plays the game based on its belief.Even if the player does not know all outcomes of the game,the outcome can be stable for the player because the playermay not unilaterally change its belief. If a game includesan unknown outcome, the unknown outcome is caused bythe uncertainty. The stability of an outcome about a gameis determined by each player’s reaction to the action by theopponent. An outcome is stable for p ’s game if the outcomeis stable in each of p ’s perceived preference vectors, i.e., ineach V qp . The equilibrium of p ’s game is determined by theoutcome that p believes to resolve the conﬂict [14]. A.2 Hypergame Normal Form (HNF)

Vane [38] provides a hypergame normal form (HNF) thatcan succinctly model hypergames based on players’ beliefsand possible strategies of their opponents. HNF is formu-lated, similar to the normal strategic form in game theory.HNF consists of the following four key aspects: (1) fullgame; (2) row-mixed strategies (RMSs); (3) column-mixedstrategies (CMSs); and (4) belief contexts.The full game is the grid form consisting of row andcolumn strategies, which are associated with the utilities, ru , · · · , ru mn and cu , · · · , cu mn where m is the numberof the row player’s strategies and n is the number of the column player’s strategies. The full game’s grid form U canbe represented by an m × n matrix with an element ru ij , cu ij for i = 1 , · · · , m and j = 1 , · · · , n . U =  ( ru , cu ) · · · ( ru n , cu n ) · · · · · · · · · ( ru m , cu m ) · · · ( ru mn , cu mn )  , (17)where R and C denote the full-game strategies by the rowand column players, respectively. Row-mixed strategies (RMSs) are m strategies the rowplayer considers based on its belief of the column player’sstrategies. A player’s subgame is deﬁned as a subset of thefull game (i.e., a set of all possible strategies by all players)because the player may limit a number of strategies it wantsto consider based on its belief. Therefore, depending on asituation, the player can choose a subgame to play. RMSsfor the κ -th subgame a player perceives are given by:RMS κ = [ r κ , · · · , r κm ] , where m (cid:88) i =1 r κi = 1 , (18)where each probability that a particular strategy i is chosenis estimated by player p ’s belief based on learning from pastexperience. Since a subgame consists of a subset of strategiesin a full game, if a particular strategy i is not in the subgame κ , the probability for the row player to take strategy i atsubgame κ is zero, i.e., r κi = 0 . Column-mixed strategies (CMSs) are a column player’s n strategies, believed by a row player for a κ -th subgame,which are denoted by:CMS κ = [ c κ , · · · , c κn ] , where n (cid:88) j =1 c κj = 1 , (19)where each probability that particular strategy j is chosenis obtained by player p ’s observations (or learning) towards q ’s strategies. Similar to the row-mixed strategies, if strategy j is not in subgame κ , we set c κj = 0 . Belief contexts are the row player’s belief probabilities thateach subgame κ will be played and are represented by: P = [ P , · · · , P K ] , where K (cid:88) κ =0 P κ = 1 . (20) P is the probability that the full game is played wherethe full game considers all possible strategies of a playerbased on the ground truth view of a situation. If the rowplayer is not sure of what subgame κ to played due toperceived uncertainty, the unknown belief probability istreated simply for the probability that a full game is played,denoted by P = 1 − (cid:80) Kκ =1 P κ .The row player’s belief towards the column player’sstrategy j , denoted by S h , is computed by: S j = K (cid:88) κ =0 P κ c κj where n (cid:88) j =1 S j = 1 . (21)The summary of the row player’s belief on the col-umn player’s n strategies is represented by C (cid:80) =[ S , S , · · · , S n ] .13 .3 Hypergame Expected Utility The hypergame expected utility (HEU) can be calculatedbased on EU ( · ) , and the uncertainty probability perceivedby the row player, denoted by g , representing the level ofuncertainty about what is guessed about a given game. g affects the degree of the EU ( · ) of a given hyperstrategy bythe row player. HEU for the given row player’s strategy rs i with uncertainty g is given by [36]:HEU ( rs i , g ) = (1 − g ) · EU ( rs i , C (cid:80) ) + g · EU ( rs i , CMS w ) , (22)where rs i is a given strategy i by the row player.EU ( rs i , C (cid:80) ) refers to the row player’s expected utility inchoosing strategy i when the column player can take astrategy among all available strategies n . EU ( rs i , CMS w ) indicates the row player’s expected utility when choosingstrategy i when the column player chooses strategy w whichgives the minimum utility to the row player. EU ( rs i , C (cid:80) ) and EU ( rs i , CMS w ) are computed by: EU ( rs i , C (cid:80) ) = n (cid:88) j =1 S j · u ij , (23)EU ( rs i , CMS w ) = n · S w · u iw (24)where g = 0 means complete conﬁdence (i.e., completecertainty) in a given strategy while g = 1 implies that therow player is completely occupied with the fear of beingoutguessed (i.e., complete uncertainty) [38].The calculation of EU ( rs i , CMS w ) is based on a pes-simistic perspective in that when a player is uncertain,it estimates utility based on the worst scenario. In ourwork, we consider a realistic scenario in that the playerwill simply choose a random strategy among strategies ina given subgame. For the defender, when it is uncertain, itwill simply play a full game because it does not know whatsubgame the attacker plays. Note that the utility values willbe normalized using min-max normalization method [19]. A PPENDIX BA DDITIONAL E XPERIMENTAL R ESULTS

B.1 Strategy Cost and Hypergame Expected Utility ofthe Attacker and Defender

Fig. 4 shows the attack cost, defense cost, and the HEUs ofthe attacker and defender, respectively, with respect to thenumber of games played between the attacker and defenderwhen the default setting is used based on Table 5. Notethat we only showed the number of games from the 2ndgame for meaningful analysis. The main reason of highﬂuctuations in later games is because only a small numberof simulation runs have long system lifetimes, resultingin high variances. We can clearly see the similar trendsobserved in Fig. 2 of the main paper in that No-DD-basedschemes have shorter lifetimes while DD-based schemesshow much higher system lifetime, which was measuredbased on MTTSF in Fig. 4 of this document. Aligned withthe trends observed in Fig. 2 of the main paper, in Fig. 4a,we can observe that when DD-IPI is used, the attackerincurs higher cost than other schemes as it is less likelyto choose an optimal strategy, which is cost-effective, dueto the confusion or uncertainty introduced by DD-IPI. But

TABLE 5 T ABLE OF N OTATIONS

Symbol Meaning Default ac k Cost of attack strategy k (main paper) Table 2 dc k Cost of defense strategy k (main paper) Table 3 ρ , ρ Thresholds for SF in Eq. (8) in the main paper 1/3, 1/2 N LH Number of low-interaction honeypots deployedbut not activated in a network 50 N HH Number of high-interaction honeypots deployedbut not activated in a network 25 N WS Number of Web servers deployed in a network 25 N DB Number of databases deployed in a network 25 N IoT

Number of IoT nodes deployed in a network 450 N Total number of nodes 500 nv i Total number of security vulnerabilities of node i , including encryption (5), software (5), and un-known (1) 11 P r Probability of two nodes being connected in anErd¨os-R´enyi random network 0.05 ad An attacker’s deception detectability [0 , . λ , µ A constant for normalization for the attacker’s un-certainty and defender’s uncertainty, respectively 0.8, 8 P fp , P fn Probabilities of false positives and false negativesin the NIDS 0.01, 0.1 Th risk Risk threshold used in Eq. (6) in the main paper 0.2 Th c The threshold used in AS (Data exﬁltration) 150 (cid:15) , (cid:15) Increased or decreased percent of a given vulner-ability probability by taking attack strategies (i.e., AS , AS ) or defense strategies (i.e., DS − DS ) 0.1, 0.01 P fake Probability the attacker obtains a fake key − ad c NT Percentage (%) of edges that are hidden by defensestrategy DS under DD-PI, by taking the beneﬁt of perfect information(PI) available, the attacker can take a better action incurringless cost. When No-DD is used, the attacker does not have touse various attack strategies that can be useful as an insiderattack because it is less likely to be the inside attacker due tothe immediate eviction by the NIDS. In addition, there is nochance for the system to intentionally allow them to be inthe system for collecting further attack intelligence. Hence,the attacker can use a limited set of attack strategies that donot incur high cost.In Fig. 4b, we demonstrated the attacker’s hypergameexpected utility (AHEU) with respect to the number ofgames played between the attacker and the defender in thedefault setting. Under DD-based schemes, when PI is used,higher AHEU is obtained. On the other hand, imperfectinformation (IPI) hinders the attacker to choose optimalstrategies, leading to less AHEU. This was indirectly exhib-ited that the proposed DD strategies under uncertainty wereeffective to confuse the attacker. When No-DD is used, PIhelps the attacker to make better attack decisions than IPI.Similarly we demonstrated the defense cost in Fig. 4cand the defender’s HEU (DHEU) in Fig. 4d. Under DD-based schemes, in terms of the data availability with respectto the number of games, the system lifetime is observedlonger than under No-DD-based schemes with the samereasons explained in the attack cost and AHEU as above. Asexpected, PI helps the defender to choose better strategiesdue to no uncertainty perceived than IPI. However, usingDD strategies introduces additional cost to achieve bettersecurity than using No-DD-based strategies. Hence, DD-IPIincurs minimum DHEU overall. However, note that thisdoesn’t mean DD-IPI has less beneﬁt in system security;rather it implies that there is cost to achieve enhancedsecurity.14 a) Attack cost (b) Attacker’s HEU(c) Defense cost (d) Defense HEU Fig. 4. Attack cost and defense cost along with the attacker’s hypergame expected utility (AHEU) and the defender’s HEU (DHEU). The performanceis shown from the second game.

B.2 Probabilities of Attack Strategies

Fig. 5 shows the probabilities of each attack strategy taken,denoted by P AS , with respect to the number of games playedbetween the attacker and defender under the four schemes.Under all the four schemes, attack strategy 1, AS (scanningattack), is dominantly taken. This is because every attackerstarts from scanning a target system in the stage of recon-naissance (R) in the cyber kill chain (CKC), which is theﬁrst step of the advanced persistent threat (APT) attacks.In addition, due to the presence of the NIDS, when theattacker successfully penetrated into the system, it is highlylikely for the attacker to be detected by the NIDS with fairlyhigh detection rate (i.e., P pf = 0 . and P pn = 0 . ). Inaddition, only a compromised node or an attacker, whichare not detected by the NIDS or passed the risk threshold( Th risk ; see Eq. (6) in the main paper), can only remain inthe system. Hence, there won’t be many insider attackerscompared to outsider attackers. Hence, observing the high-est probability using scanning attack ( AS ) is natural. Inaddition, as the attacker tries to get into the system by usingsocial engineering attacks ( AS ) exploiting both softwareand encryption vulnerabilities, it is reasonable to observethe attacker taking AS as the second most attack strategy.In Figs. 5a and 5b, when DD strategies are used, AS and AS are commonly taken. This is because these two attackstrategies, AS and AS , relatively incur less cost (1 and 2for attack costs, respectively). When No-DD strategies areused, it is reasonable to observe that the attacker mainlyuses AS and AS as it does not have many chances to useother strategies due to being detected by the NIDS. When PI is used, the defender also knows more about the attackerdue to no uncertainty. This makes the attacker being evictedquicker and more new attackers are likely to attempt toaccess the defense system. Hence, the attacker uses socialengineering attacks ( AS ) more frequently in the later gameswhen PI is used than when IPI is used. B.3 Probabilities of Defense Strategies

Fig. 6 shows the probabilities of the eight defense strategies( P DS ) used under the four schemes with respect to thenumber of games played between the attacker and defender.Under all the four schemes, defense strategy 1, DS (ﬁre-wall), is dominantly used as it deals with outside attackers.However, when DD-IPI is used, DS (patch management)and DS (honey information) were used more commonlycompared to other defense strategies. This is because thesetwo defense strategies cost less, which makes the corre-sponding DHEU higher, ultimately leading the defender tochoose DS and DS more often than other strategies. WhenNo-DD strategies are used, DS and DS are mainly usedwhile DS and DS are marginally used. B.4 TPR of the NIDS With Respect To the Number ofGames

Fig. 7 shows the TPR of the NIDS with respect to the numberof games played between the attacker and defender. Asexpected, since DD-based schemes allow the defender tolearn additional attack intelligence which is considered inthe NIDS, this naturally leads to the improvement of theTPR in the NIDS.15 a) Under DD-IPI (b) Under DD-PI(c) Under No-DD-IPI (d) Under No-DD-PI

Fig. 5. The probabilities of attack strategies under the four schemes. The performance is shown from the second game. A PPENDIX CR EPRESENTATIONS OF M ODELED A TTACK AND D E - FENSE S TRATEGIES , HEU S , AND

NIDS

USING THE M IND M AP For Figs. 8-11, we demonstrated the Mind Maps on howattack and defense strategies are designed, how AHEU andDHEU are estimated, and how the NIDS operates in thegiven system. These Mind Maps are based on Eqs. (19)-(24) in this supplement document and Eqs. (9) and (10)in the main paper. For Fig. 11, we described the workﬂowon how the NIDS operates in the considered system whereeach number refers to an execution step. We demonstratedthese Mind Maps in order for readers to clearly follow thealgorithms used in this work for easy reproducibility. Forthe demonstrated Mind Maps, we used the Mind Maps toolcalled the MindNode [18]. 16 a) Under DD-IPI (b) Under DD-PI(c) Under No-DD-IPI (d) Under No-DD-PI

Fig. 6. Probabilities of defense strategies under the four schemes. The performance is shown from the second game.Fig. 7. True positive rate (TPR) of the NIDS. ig. 8. Modeling Attack Strategies.Fig. 9. Modeling Defense Strategies.Fig. 10. Modeling HEU. ig. 11. Modeling NIDS.ig. 11. Modeling NIDS.