An Argumentation-Based Reasoner to Assist Digital Investigation and Attribution of Cyber-Attacks
AAn Argumentation-Based Reasoner to AssistDigital Investigation and Attribution ofCyber-Attacks
Erisa Karafili, Linna Wang, and Emil C. Lupu
Department of Computing, Imperial College London180 Queens Gate, SW7 2AZ, London, UK { e.karafili, linna.wang15, e.c.lupu } @imperial.ac.uk Abstract.
We expect an increase in the frequency and severity of cyber-attacks that comes along with the need for efficient security countermea-sures. The process of attributing a cyber-attack helps to construct effi-cient and targeted mitigating and preventive security measures. In thiswork, we propose an argumentation-based reasoner (
ABR ) as a proof-of-concept tool that can help a forensics analyst during the analysis offorensic evidence and the attribution process. Given the evidence col-lected from a cyber-attack, our reasoner can assist the analyst duringthe investigation process, by helping him/her to analyze the evidenceand identify who performed the attack. Furthermore, it suggests to theanalyst where to focus further analyses by giving hints of the missing ev-idence or new investigation paths to follow.
ABR is the first automaticreasoner that can combine both technical and social evidence in theanalysis of a cyber-attack, and that can also cope with incomplete andconflicting information. To illustrate how
ABR can assist in the analysisand attribution of cyber-attacks we have used examples of cyber-attacksand their analyses as reported in publicly available reports and onlineliterature. We do not mean to either agree or disagree with the analysespresented therein or reach attribution conclusions.
The increase in cyber-attacks we are currently facing [33] is expected to continue,especially given the exponential increase in the usage of IoT and smart devices,which drastically increases the attack surface of systems. The increasing depen-dency users have on these connected devices raises the users’ exposure to cyber-attacks. The growth in frequency and severity of cyber-attacks comes along withthe increased economic costs associated to the damages caused by such cyber-attacks [16]. Existing protective and mitigating measures are not sufficient tocope with the sophistication of current attacks. This brings the need to enforceefficient preventive and mitigating measures that are attacker-oriented, i.e., coun-termeasures that are specific to the attacker or group of attackers performingthe attack. Furthermore, discovering who performed an attack and bringing theperpetrators to justice, can act as a deterrent for future cyber-attacks. a r X i v : . [ c s . CR ] J a n ttacker-oriented countermeasures require to discover the perpetrator of theattack or the entity related to it. Attribution is the process of assigning an actionof a cyber-attack to a particular entity/attacker/group of attackers. Currently,the attribution of cyber-attacks is mainly a manual process, performed by theforensic analyst, and is strictly related to the knowledge of the analyst, thus, iseasily human biased and error-prone. Attributing cyber-attacks is not trivial, asattackers often use deceptive and anti-forensics techniques [18], and the analystsneed to analyze an enormous amount of data, filter [23,38] and classify them.The increasing use of IoT devices aggravates the work of the analysts and makesthe attribution process more expensive, as the analysts might need to physicallyaccess the devices to retrieve their data.Digital forensics helps during the attribution process, as it collects and ana-lyzes the evidence left by the attack, but it is not able to deal with conflictingor incomplete information. It only works with technical evidence, and fails toconsider other aspects such as geopolitical situations and social-cultural contextsthat provide useful leads during an investigation. Digital forensics tools mainlyfocus on collecting the evidence, which is then given to the analyst for analysis.This makes the process often extremely human-intensive, requiring many skilledanalysts to work for weeks or even months [31,45]. The problem is aggravated bythe large proportion of unstructured data, which makes the automated analysischallenging.In this work, we propose an automatic reasoner ( ABR ), based on argumenta-tion and abductive reasoning that helps the forensic analyst during the evidenceanalysis and attribution process. Given the pieces of cyber forensic and socialevidence of a cyber-attack, the proposed reasoner analyzes them and derivesnew information that is provided to the analyst. In particular,
ABR can answerqueries, such as, who is a possible perpetrator of an attack, who has the motivesto perform it, what are the capabilities needed to perform an attack or what arethe similarities with past attacks. Furthermore,
ABR can suggest to the analystother paths of investigation, by giving hints on what other pieces of evidence canbe collected to arrive at a conclusion thus, enabling a prioritized evidence col-lection. Our reasoner is based on our preliminary work [25,24] where we brieflypresented the main intuition behind
ABR . To the best of our knowledge, this isthe first automatic reasoner that helps with the analysis of cyber-attacks usingboth technical and social evidence, and that is able to reason with conflictingand incomplete knowledge.The reasoner uses a set of reasoning rules , preferences between them, and background knowledge . The rules of the reasoner are constructed from the inputprovided by the expert user and from the analyses of past attacks. To illus-trate
ABR , we have used in this paper rules extrapolated from the analysis ofwell-known cyber-attacks as published in the public literature (e.g., APT1 [27],Wannacry [32]). In particular, the rules were generalised so that they can be ap-plied across different attack scenarios. The background knowledge incorporatestypical common knowledge that analysts may use during the analysis process.In this work, we have used some of the knowledge extracted from the examples,2s well as other public reports such as [19,7,8]. Our reasoner can assist in the at-tribution of an attack by using both technical evidence and social considerationsthat are represented thanks to the use of a social model [39].
ABR is able to work with incomplete and conflicting evidence. We decidedto base
ABR on an argumentation framework, in particular, a preference-basedargumentation framework [22], which permits to reason with conflicting pieces ofevidence by introducing preferences between the applied rules. We use preference-based argumentation as it is similar to the decision-making process followed bydigital forensics investigators.
ABR is constructed using the Gorgias [20] tool,which uses abductive reasoning [21] combined with preference-based argumen-tation. The use of abduction allows us to reach conclusions even with incomplete information, as the missing information is abduced (hypothesized) and then sug-gested to the analyst as hints of possible further evidence to be collected.
ABR is a proof-of-concept tool that aims to assist the analyst during theanalysis process. Therefore, together with the answer to a query it also providesthe explanation of the reasoning process, applied rules and the information usedto reach that conclusion. Furthermore,
ABR gives hints to the analyst for missingevidence, that, if provided, allows to pursue other investigation paths.
ABR isflexible and adaptable to user requests and changes. The use of
ABR helps topromote best practices and to share lessons learned from past experience asrules and background knowledge can be constructed with expert input and thenshared and re-used across investigations.In Section 2 we present the relevant related work. We introduce our argumentation-based reasoner (
ABR ) in Section 3. In Section 4 and 5 we present
ABR ’s maincomponents, correspondingly its reasoning rules and its background knowledge.We give an overall evaluation and discussion in Section 6. In Section 7 we con-clude and present some interesting future research directions.
Attribution of a cyber-attack is the process of “determining the identity or loca-tion of an attacker or attackers intermediary” [46]. Tracing the origin of a cyber-attack is difficult as attackers can easily forge or obscure information sources,and use anti-forensics tools, to avoid being detected and identified [18]. Digitalforensics plays a significant role in attribution by collecting, examining, analyz-ing and reporting the evidence [26]. Other techniques created for protecting thesystems are also used to collect forensic data, e.g., traceback techniques [46],honeypots [4], or other deception techniques [1,2,44].Digital forensics comes with its own challenges [6], that can mainly be cat-egorised into: complexity problems as the collected data are in the lowest rawformat and require high resources to analyze them and quantity problems as theenormous amount of collected data is too large to be analyzed manually [9].Forensics techniques identify and collect the evidence that is later managed andanalyzed by the forensic analyst. Since often the data are collected from differ-ent sources and the attackers can plant false evidence to lead the investigator3ff his/her trail, the latter is likely to be in a situation with multiple pieces ofconflicting evidence. Digital forensics techniques can deal with conflicting infor-mation during the evidence collection phase [5,15], but lack the ability to workwith conflicting pieces of evidence during the analysis and attribution process.These techniques can collect pieces of evidence [42], but have difficulties reason-ing with incomplete information, and reaching conclusions without having all theneeded pieces of evidence. Digital forensics only uses technical evidence [45] andfails to consider other factors such as geopolitical situations and social-culturalcontexts, which could provide useful leads during the investigations.A theoretical social science model is proposed in [39], called the
Q-Model that describes how the analysts combine technical and social evidence duringthe attribution process. In this model, attribution is described as an incrementalprocess passing from one level of attribution to the other. The
Q-Model repre-sents how the forensic investigators perform the attribution process and partic-ular attention is placed on the social evidence, where contextual knowledge suchas ongoing conflicts between countries or rivalry between corporations are veryuseful in detecting motives of potential culprits.We decided to use argumentation for our reasoner, as argumentation helpsduring the analysis and attribution process because it is transparent and encour-ages the evaluation of the arguments, by assessing the relative importance of vari-ous factors when making decisions [35]. Argumentation captures the fact that thefinal decision might change if more information is available (i.e., non-monotonicreasoning [12]), where more information may reveal new arguments that arein conflict with the original ones and are stronger than them. Non-monotonicreasoning has previously been proposed to tackle the attribution challenge. Forexample, in [34,43] the authors propose the DeLP3E framework to attribute op-erations of cyber-attacks. This theoretical framework is based on the extensionof Defeasible Logic Programming with probabilistic uncertainty. The DeLP3Eframework does not deal with incomplete evidence, and thus, cannot make as-sumptions to reach a conclusion and cannot suggest new paths of investigationor new evidence to be collected. It also lacks in general technical and social com-mon knowledge, e.g., ongoing conflicts/rivalries between countries/corporations,information about past attacks, cyber-security capabilities of entities, which canbe very useful in detecting motives, capabilities and potential culprits. DeLP3Euses as a measure for its conclusion the probabilities of an event being true.However, this requires the user to provide the probability of being true for eachof the given pieces of evidence. It also does not distinguish the different levels ofreasoning that can be applied to reach certain conclusions.Despite the advances in using digital forensics or defeasible reasoning in at-tribution, some shortcomings still remain to be addressed. The most importantone is that none of the current works considers the social aspects of attribution.The current state of the art does not deal with incomplete evidence, which is animportant aspect of forensic investigations, as usually not all evidence can be col-lected due to time/resource constraints, and anti-forensics tools used by attackerscan hide some of the evidence. We believe, our reasoner is the first attempt to4se a social model to categorize evidence and rules in an argumentation-basedframework, which leads to a more accurate and explainable attribution thathelps the investigator during the analysis process also in case of conflicting andincomplete evidence.
Let us now introduce our argumentation-based reasoner (
ABR ) that is basedon a preference-based argumentation framework.
ABR is composed of two maincomponents the reasoning rules , and the background knowledge , see Figure 1.Given the evidence presented in input,
ABR analyzes it and attempts to answerqueries about the possible perpetrators of the attack, or provides suggestionsfor further pieces of evidence needed to reach a conclusion or perform a moreprecise or a different analysis. The reasoning rules used by ABR were extracted from public reports about past cyber-attacks and formalized in the argumen-tation framework . In actual use, rules could be specified by expert users orextracted automatically from different analyses and then reviewed by expert an-alysts. Rules are divided into three layers: technical, operational and strategiclayer, following the social model structure proposed in [39]. In this paper, wehave used background knowledge based on the information extracted from on-line analyses of past cyber-attacks and relevant information for these attacks. ABR takes as input from the user the pieces of evidence (technical and socialevidence) relevant to the current investigation and then analyzes them by usingthe reasoning rules and the background knowledge. It gives as result to the useranswers to the user’s queries, e.g., if a given entity is a possible culprit of theattack, together with an explanation on how the conclusion was reached, hintsabout what other pieces of evidence the user can provide to perform a moreprecise or a new analysis.
We base our reasoner on a preference-based argumentation framework [22,20], asit permits the user to take decisions while working with conflicting evidence, andit naturally encodes the different reasoning layers with its preference relationsbetween rules. The used framework best simulates the analysis and attributionprocess made by an investigator, who needs to use different reasoning rules thatwork with technical and social aspects of the attack, have exceptions, and canderive conflicting conclusions.Our framework allows the investigator to work with conflicting evidence andreasoning rules that derive conflicting conclusions, by introducing preferencesbetween them. The introduced preferences can be considered as exceptions to Currently the extraction of the rules is done manually by analyzing various reportsand articles about the analysis and attribution of past cyber-attacks. The rules extracted have not been evaluated for correctness and might not be com-plete i.e., they might not capture the complexity of the situations encountered. nvestors Technical Layer Reasoning Rules import OperationalLayer Strategic Layer import importimport import import
User
ABR Input ABR Output
ABR
Domain-SpecificKnowledgeGeneral KnowledgeBackground Knowledge
Fig. 1.
ABR
Overview other rules, or preferences that are context dependent. The use of argumentationpermits to provide an explanation of the given results. Let us briefly introducethe used framework.An argumentation theory is a pair ( T , P ) of argument rules T and preferencerules P . The argument rules T are a set of labeled formulas of the form: rule i : L ← L , . . . , L n where L, L , . . . , L n are positive or negative ground literals, and rule i is the labeldenoting the rule name. In the above argument rule, L denotes the conclusion of the argument rule and L , . . . , L n denote its premises . The premise of anargument rule is the set of conditions required for the conclusion to be true. Inour framework, the argument rules are the reasoning rules used by ABR . Let usshow below a reasoning rule that is part of
ABR : str : isCulprit ( C, Att ) ← ClaimResp ( C, Att )where the rule name is the label of the rule, in this case str ; the head isthe second argument and represents the conclusion of the rule, in this case isCulprit ( C, Att ); the body predicates are the literals following the head, andrepresent the premises of the rule, in this case
ClaimResp ( C, Att ).The preference rules P are a set of labelled formulas of the form: p i : rule > rule where p i is the label denoting the rule name, the head of the rule is rule > rule ,and rule , rule are labels of rules defined in T , and > refers to an irreflexive,transitive and antisymmetric higher priority relation between rules. The above6ule means that rule has higher priority than rule , or better rule is preferredover rule . The preference rules, also called priority rules are true always or incertain conditions or contexts. We show below a priority rule, p , denoting thatrule str is preferred over rule str . p : str > str We have priority rules between rules that are in conflict with each other orbetter that derive conflicting conclusions.
Preference-based argumentation allowsthe investigator to handle non-monotonic reasoning [12] in attribution, wherethe introduction of new evidence might change the result of the attribution(due to conflicting arguments) and the investigator’s confidence in the results.Argumentation is particularly useful as it permits to represent the reasoningrules in an intuitive and simple way.Let us introduce the following rule that is part of
ABR : str : ¬ isCulprit ( X, Att ) ← ¬ hasCap ( X, Att ) . Rule str describes that entity X is not the possible culprit for the attack Att ,because it does not have the capabilities for performing it. Rule str and str arein conflict with each other because when both preconditions are met, they deriveconflicting conclusions. Given the above preference rule p , rule str is preferredover rule str . Thus, in case both preconditions for str and str are given, wetake into consideration only the conclusion from str , ¬ isCulprit ( C, Att ).The inputs of
ABR are pieces of evidence that are used together with thebackground knowledge by the reasoning rules to derive new information. Thereasoning rules and the preferences used in this paper were extracted from realcyber-attacks analyses and attribution taken from online public reports, suchas [27,32].
ABR is the first tool that is able to work with incomplete evidence. It provideshints of missing evidence or new investigation paths to the user, thanks to the useof abductive reasoning [21]. The use of abducible predicates permits to fill the knowledge gaps in the reasoning, by allowing
ABR to perform the analysis andto reach a conclusion even when there are insufficient pieces of evidence. Thisfeature is extremely important to the investigator who is provided with newpossible conclusions and new evidence to be collected. To construct
ABR we usethe Gorgias [20] tool, which is a preference-based argumentation reasoning toolthat uses abduction.Let us now introduce the following rule from
ABR : op : hasM otive ( X, Att ) ← target ( T, Att ) , industry ( T ) ,hasEconM ot ( X, T ) ,contextOf Att ( econ, Att ) ,specif icT arget ( Att ) . which states that X has the motives to perform attack Att , when it has econom-ical motives against the target T of Att , where T is an industrial company, the7ontext of Att was economical ( econ ), and
Att had a specific target.
ABR treats specif icT arget as an abducible predicate. For every abducible predicate we havethe rules that derive the predicate or its negation. For the specif icT arget ab-ducible we can prove that it is not true by using the following rule. op : ¬ specif icT arget ( Att ) ← target ( T , Att ) ,target ( T , Att ) , T (cid:54) = T . In case, we are not able to derive ¬ specif icT arget ( Att ), then we can abduce(hypothesize) that specif icT arget ( Att ) is true, and we can use this result toderive hasM otive ( X, Att ), in case we have the rest of the preconditions.
The main goal of the
ABR reasoner is to assist the forensic analyst duringthe evidence analysis. Given the pieces of evidence of an attack, the reasoneranalyzes the evidence and derives new information, if possible attributes thisattack to one or different possible entities, or provides suggestions on otherpieces of evidence that the user can provide to better analyze and attribute theattack. To perform the attribution process,
ABR also needs to work with non-technical evidence, usually called social evidence . To deal with these aspects, wehave used a social model for attribution, called the
Q-Model [39]. This modelrepresents how the investigators perform the attribution process of cyber-attacks.Following the Q-Model, we categorize the evidence and the reasoning rules intothree layers: technical , operational and strategic . The combination of informationin these layers permits the attribution of a cyber-attack, as it aims to emulatethe investigator’s attribution process. Depending on the layer a rule/evidence ispart of, we call it a technical, operational, or strategic rule/evidence and denoteits name starting correspondingly with t , op , or str .The technical layer is composed of rules that deal with pieces of evidenceobtained from digital forensics processes, related to technical evidence of theattack, and how it was carried out, e.g., the IP address from which the attackwas originated, time of the attack, logs, type of attack, code used. Let us givebelow an example of a technical layer reasoning rule that is part of ABR : t : reqHighRes ( Att ) ← usesZeroDay ( Att ) . Rule t denotes that if the attack Att uses zero-day vulnerabilities, usesZeroDay ( Att ),then this attack requires a lot of resources, reqHighRes ( Att ).The operational layer is composed of rules that deal with non-technical piecesof evidence that relate to the social aspects where the attack took place, e.g.,the motives of the attack, the needed capabilities to perform it, the political oreconomical context where it took place. Let us give below an operational layerreasoning rule that is part of
ABR : op : hasCap ( X, Att ) ← reqHighRes ( Att ) ,hasResources ( X ) . op denotes that if Att requires a large amount of resources, and an entity X has (large amounts of) resources, hasResources ( X ), then X has the capabilityto carry out the attack, hasCap ( X, Att ).The strategic layer is composed of rules that deal with who performed theattack, or who is obtaining advantage from it. Let us give below a strategic layerreasoning rule that is part of
ABR : str : isCulprit ( X, Att ) ← hasM otive ( X, Att ) ,hasCap ( X, Att ) . Rule str denotes that if X has both the capability, hasCap ( X, Att ), and themotive, hasM otive ( X, Att ), to carry out the attack
Att , then X is a possibleculprit of the attack, isCulprit ( X, Att ).As shown in Figure 1, the operational rules use information derived fromthe technical layer, and the strategic rules use information derived from thetechnical and operational layers. All three layers use the evidence given by theuser and the background knowledge. This categorization of the evidence andrules in three layers, following from the Q-Model, aims to emulate the forensicinvestigator’s analysis during the attribution process, where s/he moves from thetechnical layer, to the operational, and finally to the strategic one, by using theconclusions from the previous layers. Furthermore, this categorization improves
ABR ’s usability, given the investigator’s familiarity with these three layers. ABR ’s Reasoning Rules
To illustrate the use of ABR we have extracted around 200 reasoning rulesfrom the analyses of different cyber-attacks reported in the public literature(e.g., APT1 [27] and Wannacry [32]). These rules have then been translated intogeneric argumentation rules to be used within the framework. These reasoningrules are considered as one of the main components of
ABR as they permit toperform the reasoning behind the analysis and attribution of cyber-attacks. Webriefly present some of these rules in this section.As described in the previous section, the reasoning rules, also called simplyrules, are divided into three layers: technical, operational and strategic. Let usgive an overview of some of the strategic rules of the reasoner and show how therules of the different layers are related to each other. The following rules describesome of the circumstances in which we can derive that an entity X is a possibleculprit ( isCulprit ( X, Att )) or not ( ¬ isCulprit ( X, Att )) of an attack
Att .9 tr : isCulprit ( X, Att ) ← hasM otive ( X, Att ) ,hasCap ( X, Att ) .str : isCulprit ( X, Att ) ← malwareU sed ( M , Att ) ,similar ( M , M ,notBlackM arket ( M ,notBlackM arket ( M ,malwareLinked ( M , X ) .str : ¬ isCulprit ( X, Att ) ← ¬ attackOrig ( X, Att ) .str : ¬ isCulprit ( X, Att ) ← target ( X, Att ) . Let us use the strategic rule str , to show the relations of the reasoning rules be-tween the different layers. Rule str uses the predicates hasM otive ( X, Att ) and hasCap ( X, Att ); where the first is a derived predicate of the operational layer,indicating that X has motives to perform the attack Att and hasCap ( X, Att ) isa derived predicate of the technical and operational layer, indicating that entity X has the capabilities to perform Att .The hasM otive predicate can be derived using the rule introduced in Sec-tion 3.1, represented as below: op : hasM otive ( X, Att ) ← target ( T, Att ) , industry ( T ) ,hasEconM ot ( X, T ) ,contextOf Att ( econ, Att ) ,specif icT arget ( Att ) . The above rule says that an entity X has the motives to perform Att , when X haseconomical motives to attack a particular entity T , which is an industry, and theattack was designed to target entity T , and the context of Att was economical.The predicates used in op are: target ( T, Att ) is an evidence, stating that T isthe target of Att ; industry ( T ) is a background fact, stating that T is an industry; hasEconM ot ( X, T ) is an evidence, stating that entity X benefits economicallyfrom attacking industry T , (for example, if countryC has identified industryY as a strategic industry, we say that hasEconM ot ( countryC, industryY ) is true); specif icT arget ( Att ) is an evidence that is true when
Att was constructed to at-tack a particular target; contextOf Att ( Y, Att ) is an evidence stating that: if thetarget of an attack was a “normal” industry, then the context was economical( econ ), if the target was a “political” industry, then the context was political( pol ).We introduce below one of the rules that derives the hasCap predicate . op : hasCap ( X, Att ) ← reqHighRes ( Att ) ,hasResources ( X ) . “Normal” industries are companies that are not closely related to a country’s nationalinterests. A “political” industry is a company that is closely related to a country’snational interests, e.g., the defence or energy sector. Numerous factors can be used to determine the capability. However, for the sakeof space, we introduce only one of the possible rules that can derive the capability( hasCap ) predicate. op states that X has the capability to perform Att , when
Att requires highresources and X has the needed resources. Predicate reqHighRes ( Att ) can bederived from the following technical rules: t : reqHighRes ( Att ) ← target ( T, Att ) , highSecurity ( T ) .t : reqHighRes ( Att ) ← highV olAtt ( Att ) , longDurAtt ( Att ) .t : reqHighRes ( Att ) ← highLevelSkill ( Att ) . where highSecurity ( T ) means that entity T has high security measures in place; highV olAtt ( Att ) means that
Att has a high volume; longDurAtt ( Att ) meansthat
Att was performed over a long duration (few months or even years), and highLevelSkill means that
Att is a complex attack and requires high level skillsto be performed. Rule t states that Att requires high resources if its targethas put in place high security measures, rule t states that Att requires highresources if the attack has a high volume and a long duration, and rule t statesthat Att requires high resources if it requires advanced skills.
ABR ’s rules are used to analyze the evidence and to derive new conclusions,in order to offer new insights to the analyst, as shown in the example below.
Example 1.
Let us consider the example of the US bank hack [17], that occurredin 2012. During this attack, US banks faced denial of service (DoS) attacks,causing websites of many banks to suffer slowdowns and even be unreachablefor many customers. The banks’ web hosting services were infected by a sophis-ticated malware called
Itsoknoproblembro , (denoted as itsOKnp ). Earlier thatyear, US government placed economic sanctions against Iran. Some of the piecesof evidence provided to
ABR for this attack ( usBHack ) are as below: target ( us banks, usBHack ) .targetCountry ( usa, usBHack ) .attackP eriod ( usBHack, [2012 , .highLevelSkill ( usBHack ) .malwareU sed ( itsOKnp, usBHack ) .imposedSanc ( usa, iran, [2012 , . By using rule t and the evidence that this attack required a high level of skill, ABR derives that this attack requires high resources, reqHighRes ( usBHack ).Another rule capturing that an entity might have a political motive if it hasbeen the target of sanctions can be written as follows: op : hasP olM otive ( C, T, Date ) ← imposedSanc ( T, C, Date ) . Rule op would then derive that Iran might have political motives against USbecause of the sanctions imposed by US against Iran [17], hasP olM otive ( iran, us, [2012 , (cid:50) ABR ’s derived evidence is derived from the application of the rules to the inputevidence and thus depends on both rules and evidence being correct. ABR ’s Background Knowledge
ABR uses background knowledge comprising non-case-specific information anddivided into general knowledge and domain-specific knowledge . Some of the back-ground knowledge predicates used in this paper are shown in Table 1. The use ofthe background knowledge alleviates the analysts’ work and helps avoid humanerrors and bias. It comprises of pieces of information that are used as precon-ditions by the reasoning rules to answer the users’ queries.
ABR ’s backgroundknowledge can be updated and enriched by the user. Note that reaching meaning-ful conclusions through the application of the rules to the background knowledgerelies on the correctness of the background knowledge given.
Predicate example Explanationindustry(infocomm) Type predicate for industriespolIndustry(military) Political industriesnorIndustry(infocomm) Non political industriescountry(united states) Type predicate for countriescybersuperpower(united states) List of cyber superpowersgci tier(afghanistan,initiating)gci tier(poland, maturing)gci tier(russian federation,leading) Global Cybersecurity Index (GCI)firstLanguage(english, united states) First language used in the countrygoodRelation(united states, australia) Good relations between countriespoorRelation(united states, north korea) Poor relations between countriesprominentGroup(fancyBear) Prominent hacker groupsgroupOrigin(fancyBear, russian federation) Country of origin of a grouppastTargets(fancyBear, [france,...,poland]) Past targets of a hacker groupmalwareLinked(trojanMiniduke,cozyBear) Past attribution of malwaremalwareUsedInAttack(flame, flameattack)ccServer(gowin7, flame) C&C servers of malwaredomainRegisteredDetails(gowin7,adolph dybevek, prinsen gate 6) Dom. registration details of C&C servers
Table 1.
Some of
ABR ’s background knowledge , divided into: general knowledge (yel-low) and domain-specific knowledge (orange) The general knowledge consists of information about countries characteristics, in-ternational relations between nations, and classification of the types of industry.This information is used together with the given pieces of evidence, to performthe analysis. Below we illustrate how these predicates are used by
ABR ’s rules.
Language indicators in malware can provide useful clues regarding the pos-sible origin of attacks. We use two language artifacts: default system languagesettings, sysLang , and language used in code, langInCode . We present below12wo rules of
ABR , t and t , that use the language evidence to derive the possibleorigin of the attack attackP Orig , when the country’s first language f irstLang matches the one found in the system/code. t : attackP Orig ( X, Att ) ← sysLang ( L, Att ) ,f irstLang ( L, X ) .t : attackP Orig ( X, Att ) ← langInCode ( L, Att ) ,f irstLang ( L, X ) . The cyber capability of a nation is another interesting information as it limitsthe type of attacks that an entity can carry out. We have used the GlobalCybersecurity Index (GCI) Group [19] and the cyber capabilities of countries incyberwar [7] as sources for this information. There are three GCI groups: leading , maturing and initiating , from where we classify the countries according to theircapabilities. Furthermore, based on the cyber capabilities in cyberwar [7] weidentify some countries as cyber “ superpower ”. We show below three of ABR ’srules that use the countries’ cyber capability. t : hasResources ( X ) ← gci tier ( X, leading ) .t : hasResources ( X ) ← cybersuperpower ( X ) .t : hasN oResources ( X ) ← gci tier ( X, initiating ) . A country hasResources if it is in the ‘leading’ GCI group or is a cyber “ super-power ”. Countries in the ‘initiating’ GCI group are considered as hasN oResources . Example 2.
Let us continue with the usBHack introduced in Example 1. Inthe background knowledge, we have that Iran is a cyber “ superpower ” . Thus,using rule t ABR derives that Iran has the resources to carry out sophisticatedattacks, hasResources ( iran ). Similarly the application of the operational rule op derives that Iran has the capabilities to perform the US bank hack, as shownbelow. op : hasCap ( iran, usBHack ) ← reqHighRes ( usBHack ) ,hasResources ( iran ) . (cid:50) Good international relations between two countries can indicate that a state-sponsored attack is unlikely. We encoded this information in
ABR by creatinga list of countries that have good relations with each other ( goodRelation ) anda list of countries that may have poor relations with each other ( poorRelation )according to [47,8]. This information can then be used to narrow down the Iran is mentioned as a “notable player” in [7]. Thus, we identify it as a cyber “su-perpower”. Note that we have a list of countries that have the resources to perform the attack,given their cyber capabilities. Rule op is applied to all these countries. For the sakeof simplicity, we only show the entities that are of interest to the discussed example. op : ¬ hasM otive ( C, Att ) ← targetC ( T, Att ) , country ( T ) ,country ( C ) , goodRelation ( C, T ) . Rule op derives that country C does not have any motive to perform Att ,as it has good relations with T that is the country that Att has as target,( targetC ( T, Att )).
Domain-specific knowledge consists of information about prominent groups of at-tackers and past attacks . These facts are primarily used in the strategic and tech-nical layers. We encoded information on prominent APT groups taken from [14,29],where for each group we have their: name or ID; country of origin; countries/or-ganisations targeted by the group in the past; malware or pieces of malicioussoftware (suspected or confirmed) linked to the group, as well as relations ofthis group with other entities (e.g., governments). We assume these groups havethe capabilities of conducting long term and significant attacks. Thus, we derivethat an entity X has the capabilities to perform an attack, if X is a prominentgroup of attackers, as shown by the rule below op . op : hasCap ( X, Att ) ← prominentGroup ( X ) . Another important part of the domain-specific knowledge is the similarity with past attacks. For example, similarity to an APT-linked malware may in-dicate that the same APT group may be responsible. This is presented in rule str . str : isCulprit ( X, A ← malwareU sed ( M , A ,similar ( M , M ,malwareLinked ( M , X ) ,notBlackM arket ( M ,notBlackM arket ( M . In rule str , we derive that the attacker of Att is most likely entity X , because themalware used is similar to another malware linked to X , and both malware codeswere not found on the black market ( notBlackM arket ) . We use the predicate similar ( M , M
2) to denote that two malwares are similar to each other. Inthe rules below we define that M M similarCodeObf ) mechanism, or they share code, or M M
2, or they have similar command and communication(C&C) servers , where the similarity of C&C servers of two different malwares Currently,
ABR is not able to derive the evidence notBlackMarket . This evidenceis either provided by the user or given in the background knowledge. For the sake of simplicity, in this paper we introduce only a subset of
ABR ’s rulesthat identify similarities between malwares or malicious software.
ABR ’s technical rules. t : similar ( M , M ← similarCodeObf ( M , M .t : similar ( M , M ← sharedCode ( M , M .t : similar ( M , M ← modif iedF rom ( M , M .t : similar ( M , M ← similarCCServer ( M , M . ABR aims to be a flexible tool designed to be part of an iterative process, wherethe user can add other pieces of evidence, rules or preferences after evaluating theanswers produced by the tool.
ABR ’s input is given manually by the analyst,or could be collected, in part, automatically through an automatic extractionprocess by using digital forensics tools.
We have tested
ABR ’s performance and usability using examples of cyber-attacks published in the online literature. During the evaluation,
ABR used thereasoning rules correctly to identify possible attackers. The explanations pro-vided, in the textual and the graphical representations, helped to improve theusage of
ABR as they provided information that was used by the user for thenext iterations.
ABR answered the queries requested (e.g., if a country had themotives or capabilities to perform the attack, or if a particular group of attack-ers could be related to the attack following the technical evidence of the usedmalware) as expected, given the pieces of evidence provided as input. Note that
ABR assumes that the input evidence is correct, and providing inaccurate orincomplete information may lead to incorrect conclusions.For every tested example, we ran
ABR using a subset of the input evidence.Depending on the use-case and the provided evidence,
ABR was able to reachsome conclusions by abducing (hypothesizing) some of the missing predicates.
ABR gave interesting results when asked to provide suggestions for missing ev-idence, as it proposed useful missing evidence and also new (not predicted)investigation paths. When a significant part of the evidence was provided to
ABR , its results coincided with those in the publicly available analyses or theentity attributed in the publicly available analyses was contained in
ABR ’s listof possible culprits, which also contained other possibilities.Let us now briefly introduce some of the cyber-attacks used to evaluate
ABR and its conclusions. For the sake of space, we decided to show some well-knownattacks where
ABR was tested, as they do not need a detailed introduction.
ABR analyzed evidence of the
Stuxnet attack [48,30] and derived two dif-ferent entities as possible culprits: US and Israel. The
Stuxnet attack was firstdiscovered in 2010 at the uranium enrichment plant in Iran. The code used wascomplex, using four zero-day vulnerabilities and mainly targeted Iran.
ABR ex-plained the conclusion based on the high resources required to perform such a15ophisticated attack, and the political conflicts that existed in that period be-tween Iran and US, and between Iran and Israel. In this case, the social evidenceprovided to
ABR mainly included the political conflicts between Iran and thesetwo countries. However, ABR would have listed as possible source of the attacksany entity for which it could derive a motive and that had the resources toperform such sophisticated attacks.
ABR analyzed evidence of the
Sony Pictures attack [40]. The Sony Picturesattack represents the 2014 attack when hackers infiltrated Sony’s computers andstole data from Sony’s servers. A group called “Guardians of Peace” claimedcredit for the attack, but several US government organisations claimed thatthe attack was state-sponsored by North Korea [10,11,40].
ABR attributed thisattack to three possible culprits: the attackers group called “Guardians of Peace”and to two countries, Iran and North Korea. The attribution to Iran came asa consequence of low diplomatic relations between US and Iran.
ABR ’s resultswere unexpected, with respect to the attribution given in [10,11,40]. In this casewe see that
ABR can suggest new possible paths of investigation.
ABR analyzed evidence from the
Conficker [41] attack and, using the ev-idence provided, was not able to reach a result. Some of
ABR ’s suggestionswere to find entities that operated/worked in Ukraine or that had interest inUkraine, as the first version of the attack was constructed to avoid machineswith Ukrainian keyboards [37], thus to avoid a specific country (Ukraine).
ABR suggested also to find evidence about political or economical motivations forthe attack, as the attack was sophisticated and it could either be a nation-stateattack or performed by a cyber-criminal organization.
Example 3.
Let us now show
ABR ’s final steps of the analysis and attributionof the usBHack . Following from Example 1 and 2,
ABR derived the follow-ing predicates: hasCap ( iran, usBHack ) and hasP olM otive ( iran, us, [2012 , ABR can now apply the following operational rule: op : hasM otive ( C, Att ) ← targetCountry ( T, Att ) ,attackP eriod ( Att, Date ,hasP olM otive ( C, T, Date ,specif icT arget ( Att ) dateApplicable ( Date , Date . Rule op permits ABR to derive that Iran has motives to perform the attack,as it has political motives. Furthermore, these motives are applicable for theattack, as they occurred less than 1 year before the attack took place. By ap-plying rule str , ABR derives that Iran is a possible culprit for this attack( isCulprit ( iran, usBHack )). This result is in line with the attribution reportedin [28], while it does not match another attribution reported in [36], which at-tributed the attack to a group of hackers. ABR provides its conclusion to theanalyst together with its derivation tree with all the rules and evidence used.
ABR provides further results of possible culprits, when new information isprovided. For example, when new evidence is provided that a leader of the al- assam Cyber Fighters hackers group has publicly claimed this attack [36], then ABR derives that this group is also one of the possible attackers. (cid:50)
Together with the answers to the queries,
ABR also provides the different ways inwhich the result was derived. Furthermore, every result comes with its explana-tion that is composed of the rules and pieces of evidence used. The explanationsare in the form of text and graphical representation. The given explanationsmake
ABR ’s result and analysis process transparent to the user and providesher/him further information that can be used for the analysis. ABR does not re-quire the user to be familiar with the argumentation reasoning framework usedas the user needs only to provide the evidence (in some cases the evidence isautomatically extracted), and to launch the queries. The main goal of
ABR is tohelp the investigator during the analysis process and provide useful information.
ABR ’s results include hypothesized but missing evidence and suggestionsabout other investigations paths that could be followed by the analyst. Themissing evidence suggested can be collected by the analyst in a second momentand given to
ABR as part of an iterative process. We decided to provide onlythe first list of results of the “missing” pieces of evidence, together with theconclusions that could be derived from them, to keep
ABR ’s running time andcomplexity polynomial. Hence,
ABR does not provide an exhaustive list of allthe possible hypotheses/missing evidence. On the other hand, limiting the sug-gested evidence to be collected can be beneficial for the analyst, who can focushis/her attention on particular evidence, instead of spending time and resourceson checking an exhaustive list.
ABR promotes best practice and helps to share lessons learned between an-alysts, as its reasoning rules can be constructed using the analysts’ reasoningprocess and be used by multiple investigators across different events. It alsohelps investigators cope with large numbers of rules and large knowledge bases.The attribution process is mainly human-based, and thus can be easily biased,e.g., by the resources invested [13]. In some cases, it may be difficult for theanalyst to abandon a path of investigation when substantial resources have beendevoted to it.
ABR permits to reduce the human bias through the rigorousapplication of rules and by suggesting new paths of investigation.
ABR relies on the reasoning rules with which it has been provided. Thus,
ABR can fail to deal with new evidence that has not been encountered before andwhich is not included in the reasoning rules. Furthermore, ABR relies on the rulesbeing correct and complete. To illustrate its operation we have extracted 200rules from public reports and analyses of cyber-attacks. However, the rules needto be validated with expert analysts and the rule base would need to be enrichedwith further rules for broader use. To facilitate the extraction and update ofthe reasoning rules and the background knowledge, we plan to investigate theautomated extraction of rules and knowledge through the use of NLP techniquesin conjunction with ontologies for cyber-attack investigations.17s
ABR uses in its reasoning information from past attacks and past attri-butions, it will derive the wrong conclusions if the information is incorrect. Inparticular, if a past attribution was incorrect, the error can be propagated to
ABR ’s new results. For example, the Sony attack attribution [40] was built onthe (alleged) claim that North Korea was responsible for the assault on SouthKorean banks in 2013 [3]. We can avoid this problem, by not using past attri-bution decisions as part of the knowledge, but it would make the attributionmore difficult or cumbersome as it would require a larger amount of additionalevidence. Furthermore, using results of past attributions is a common practiceadopted by forensic analysts during their analysis and attribution process, as itpermits to identify existing groups of attackers and to use their modus operandi as an important factor for the attribution.
In this work, we proposed a method and proof-of-concept argumentation-basedreasoner (
ABR ) that aims to help forensic investigators during the analysis andattribution process of cyber-attacks. Our aim was to demonstrate how such a toolcan be constructed using the proposed argumentation framework.
ABR aims tohelp attribute cyber-attacks by leveraging both social and technical evidence. Itprovides explanations of the given results and hints of new investigation paths.The use of preference-based argumentation and abductive reasoning permits
ABR to work with conflicting pieces of evidence and to fill the knowledge gapsthat derive from incomplete ones. We introduced
ABR ’s main components thatare its reasoning rules (that are based on past analyses and expert knowledge),and its background knowledge. We improve
ABR usability by applying the Q-Model and categorizing the evidence and rules in three layers, thus following amodel familiar to the forensic analysts. Our reasoner emphasises the incrementaland iterative nature of attribution, by making the derivations of the solutionsfully transparent to the user.In our future work, we plan to increase
ABR ’s reasoning capabilities byadding new reasoning rules, and new background knowledge. In this work, wemainly focused on constructing the
ABR reasoner, addressing its usability andshowing its possible use. We leave a careful and possibly semi-automated pop-ulation of the reasoning rules and background knowledge for future work. Inparticular, we plan to use NLP techniques to automatically extract the reason-ing rules and social evidence used by forensic analysts. Furthermore, we intendto enhance the expressive power of
ABR ’s reasoner using ontologies. We also aimto address its integration with forensic tools and data mining techiniques. Weplan to apply
ABR to other cyber-attacks, across a broad range of threats and toimprove its usability using feedback from forensic analysts. Another interestingfuture work is to include probabilities for our pieces of evidence and reasoningrules, in order to provide probabilistic measures for the analysis and attributionresults. 18 cknowledgments
Erisa Karafili was supported by the European Union’s H2020 research and inno-vation programme under the Marie Sk(cid:32)lodowska-Curie grant agreement No. 746667.
References
1. Mohammed H. Almeshekah and Eugene H. Spafford. Planning and integratingdeception into computer security defenses. In
NSPW , pages 127–138. ACM, 2014.2. Mohammed H. Almeshekah and Eugene H. Spafford. Cyber security deception. InSushil Jajodia, V.S. Subrahmanian, Vipin Swarup, and Cliff Wang, editors,
CyberDeception: Building the Scientific Foundation , pages 23–50. Springer, 2016.3. Alex Altman and Zeke J Miller. Sony Hack: FBI Accuses North Korea in AttackThat Nixed The Interview. http://time.com/3642161/sony-hack-north-korea-the-interview-fbi/ , 2014. Last accessed: 2019-12-19.4. Kostas G. Anagnostakis, Stelios Sidiroglou, Periklis Akritidis, Konstantinos Xini-dis, Evangelos P. Markatos, and Angelos D. Keromytis. Detecting targeted attacksusing shadow honeypots. In
USENIX , pages 129–144, 2005.5. Benjamin Aziz. Modelling and refinement of forensic data acquisition specifica-tions.
Digital Investigation , 11(2):90–101, 2014.6. Nicole Beebe. Digital forensic research: The good, the bad and the unaddressed. In
Advances in Digital Forensics V - Fifth IFIP WG 11.9 International Conferenceon Digital Forensics , pages 17–36, 2009.7. Keith Breene. Who are the cyberwar superpowers? , 2016. Last accessed:2019-12-19.8. Brilliant Maps. Who Americans Consider Their Allies, Friends and Enemies. http://brilliantmaps.com/us-allies-enemies/ , 2017. Last accessed: 2019-12-19.9. Brian Carrier. Defining digital forensic examination and analysis tools using ab-straction layers.
Intern. Journal of Digital Evidence , 1(4):1–12, 2003.10. Department of Justice. North korean regime-backed programmer chargedwith conspiracy to conduct multiple cyber attacks and intrusions. , 2018. Lastaccessed: 2019-12-19.11. Antonio DeSimone and Nicholas Horton. SONY’s nightmare before christmas. , 2017. Last accessed: 2019-12-19.12. Phan Minh Dung. On the acceptability of arguments and its fundamental role innonmonotonic reasoning, logic programming and n-person games.
Artif. Intell. ,77(2):321–358, 1995.13. Diane Felmlee and Susan Sprecher. Close relationships and social psychology:Intersections and future paths.
Social Psychology Quarterly , 63:365–376, 2000.14. FireEye. Advanced Persistent Threat Groups. , 2019.15. M. Fontani, T. Bianchi, A. De Rosa, A. Piva, and M. Barni. A framework fordecision fusion in image forensics based on dempster-shafer theory of evidence.
IEEE Transactions on Information Forensics and Security , 8(4):593–607, 2013.
6. Josh Fruhlinger. Top cybersecurity facts, figures and statistics for 2018. , 2018. Last accessed: 2019-12-19.17. David Goldman. Major banks hit with biggest cyberattacks in history. http://money.cnn.com/2012/09/27/technology/bank-cyberattacks/index.html , 2012.Last accessed: 2019-12-19.18. Rajesh Kumar Goutam. The problem of attribution in cyber security.
Intern. J.of Computer Applications, Foundation of computer science , 131(7):34–36, 2015.19. International Telecommunication Union. Global Cybersecurity Index (GCI)2017. , 2017. Last accessed: 2019-12-19.20. Antonis Kakas and Pavlos Moraitis. Argumentation based decision making forautonomous agents. In
AAMAS ’03 , pages 883–890, 2003.21. Antonis C. Kakas, Robert A. Kowalski, and Francesca Toni. Abductive logic pro-gramming.
J. Log. Comput. , 2(6):719–770, 1992.22. Antonis C. Kakas, Paolo Mancarella, and Phan Minh Dung. The acceptabilitysemantics for logic programs. In
ICLP , pages 504–519, 1994.23. Erisa Karafili, Matteo Cristani, and Luca Vigan`o. A formal approach to analyz-ing cyber-forensics evidence. In
ESORICS (1) , volume 11098 of
Lecture Notes inComputer Science , pages 281–301. Springer, 2018.24. Erisa Karafili, Antonis C. Kakas, Nikolaos I. Spanoudakis, and Emil C. Lupu.Argumentation-based Security for Social Good. In
AAAI Fall Symposium Series ,pages 164–170, 2017.25. Erisa Karafili, Linna Wang, Antonis C. Kakas, and Emil Lupu. Helping forensicanalysts to attribute cyber-attacks: An argumentation-based reasoner. In
PRIMA ,volume 11224, pages 510–518. Springer, 2018.26. Karen Kent, Suzanne Chevalier, Timothy Grance, and Hung Dang. SP 800-86.Guide to Integrating Forensic Techniques into Incident Response. Technical report,NIST, 2006.27. Mandiant. Exposing One of China’s Cyber Espionage Units. Technical report,Mandiant, 2013.28. Steve Mansfield-Devine. Us banks attacked - but by whom?
Network Security ,2013(1):2, 2013.29. Sean Martin. 8 Active APT Groups To Watch. , 2016. Last ac-cessed: 2019-12-19.30. McAfee. What is Stuxnet. , 2019. Last accessed:2019-12-19.31. L. F. D. C. Nassif and E. R. Hruschka. Document clustering for forensic analysis:an approach for improving computer inspection.
IEEE Trans. Inf. Forensic Secur. ,8(1):46–54, 2013.32. National Audit Office. Investigation: WannaCry cyber attack and theNHS. , 2017. Last accessed: 2019-12-19.33. Lily Hay Newman. The Biggest Cybersecurity Disasters of 2017 So Far. , 2017. Last accessed: 2019-12-19.34. Eric Nunes, Paulo Shakarian, and Gerardo I. Simari. Toward argumentation-basedcyber attribution. In
AAAI Workshops , pages 177–184, 2016.
5. Wassila Ouerdane, Nicolas Maudet, and Alexis Tsoukias. Argumentation theoryand decision aiding. In
Trends in Multiple Criteria Decision Analysis , pages 177–208. Springer, 2010.36. Nicole Perlroth and Quentin Hardy. Bank Hacking Was the Work of Irani-ans, Officials Say. , 2013. Last ac-cessed: 2019-12-19.37. Phillip Porras, Hassen Saidi, and Vinod Yegneswaran. An analysis of Conficker’slogic and rendezvous points. , 2009. Last accessed: 2019-12-19.38. Jo˜ao Rasga, Cristina Sernadas, Erisa Karafili, and Luca Vigan`o. Time-stampedclaim logic. https://arxiv.org/abs/1907.06541 , 2019.39. Thomas Rid and Ben Buchanan. Attributing Cyber Attacks.
Journal of StrategicStudies , 38(1-2):4–37, 2015.40. Jeffrey Roman. FBI Defends Sony Hack Attribution. , 2015. Last accessed: 2019-12-19.41. Jason Sattler. What we’ve learned from 10 years of the Conficker mys-tery. https://blog.f-secure.com/what-weve-learned-from-10-years-of-the-conficker-mystery/ , 2019. Last accessed: 2019-12-19.42. Bradley L. Schatz. Wirespeed: Extending the AFF4 forensic container format forscalable acquisition and live analysis.
Digital Investigation , 14:45 – 54, 2015.43. Paulo Shakarian, Gerardo I. Simari, Geoffrey Moores, Damon Paulo, Simon Par-sons, Marcelo A. Falappa, and Ashkan Aleali. Belief revision in structured proba-bilistic argumentation - model and application to cyber security.
Ann. Math. Artif.Intell. , 78(3-4):259–301, 2016.44. Fulvio Valenza, Cataldo Basile, Daniele Canavese, and Antonio Lioy. Classificationand analysis of communication protection policy anomalies.
IEEE/ACM Transac-tions on Networking , 25(5):2601–2614, 2017.45. Timothy Vidas, Brian Kaplan, and Matthew Geiger. OpenLV: Empowering inves-tigators and first responders in the digital forensics process.
Digital Investigation ,11:45–53, 2014.46. David A Wheeler and Gregory N Larsen. Techniques for cyber attack attribution.Technical report, Institute for Defense Analyses Alexandria VA, 2003.47. YouGov. America’s Friends and Enemies. https://today.yougov.com/topics/politics/articles-reports/2017/02/02/americas-friends-and-enemies ,2017. Last accessed: 2019-12-19.48. Kim Zetter.
Countdown to Zero Day: Stuxnet and the launch of the world’s firstdigital weapon . Crown Publishing Group, 2014.. Crown Publishing Group, 2014.