An approach to predictively securing critical cloud infrastructures through probabilistic modeling
AAn approach to predictively securing critical cloud infrastructuresthrough probabilistic modeling
Satvik JainDept. of Computer EngineeringNetaji Subhas Institute of TechnologyUniversity of Delhi, New Delhi, [email protected] Arun Balaji BuduruDept. of Computer ScienceIIIT DelhiNew Delhi, [email protected] Anshuman ChhabraDept. of Computer ScienceUniversity of CaliforniaDavis, [email protected]
Abstract — Cloud infrastructures are being increasinglyutilized in critical infrastructures such as banking/finance,transportation and utility management. Sophistication andresources used in recent security breaches including thoseon critical infrastructures show that attackers are no longerlimited by monetary/computational constraints. In fact,they may be aided by entities with large financial andhuman resources. Hence there is urgent need to developpredictive approaches for cyber defense to strengthen cloudinfrastructures specifically utilized by critical infrastructures.Extensive research has been done in the past on applyingtechniques such as Game Theory, Machine Learning andBayesian Networks among others for the predictive defense ofcritical infrastructures. However a major drawback of theseapproaches is that they do not incorporate probabilistic humanbehavior which limits their predictive ability. In this paper, astochastic approach is proposed to predict less secure states incritical cloud systems which might lead to potential securitybreaches. These less-secure states are deemed as ‘risky’ statesin our approach. Markov Decision Process (MDP) is used toaccurately incorporate user behavior(s) as well as operationalbehavior of the cloud infrastructure through a set of features.The developed reward/cost mechanism is then used to selectappropriate ‘actions’ to identify risky states at future timesteps by learning an optimal policy. Experimental results showthat the proposed framework performs well in identifyingfuture ‘risky’ states. Through this work we demonstratethe effectiveness of using probabilistic modeling (MDP) topredictively secure critical cloud infrastructures.
Keywords:
Cloud security, critical infrastructures, predic-tive security, stochastic environments, probabilistic modelingI. INTRODUCTIONEffective abstraction of services such as computing, net-working and storage through efficient virtualization and dis-tributed system technologies have ensured massive success ofcloud applications. In cloud applications, since data is storedon servers, and the services are provided through softwaretools (such as web browsers), it ameliorates problem of scala-bility and flexibility in the fast-paced and dynamic real worldenvironments. With minimal investments, organizations andusers can leverage services such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS), among others. However, despite the obvious gains of utilizing cloudinfrastructures in today’s service ecosystem, there are manypotential security issues that plague these systems. Securingcloud infrastructures is a tougher challenge compared tosecuring standalone systems due to the inherent ‘shared’nature of the cloud. Cloud systems not only have to dealwith traditional network based attacks such as Denial ofService , Man in the Middle, Phishing attacks, but alsocounteract specifically personalized attacks. These includeexploits targeting the use of shared technology [1], attacksmade via insecure interfaces and APIs [2], cloud malwareinjection attacks [3], among others.Since cloud systems are being increasingly adopted incritical infrastructures such as military, finance, utilities andtransportation etc. the need of the hour is to build robustsecurity frameworks. In this paper, we propose a predictivesecurity framework for critical cloud infrastructures at thesub-system level. Markov Decision Process (MDP) is usedto model a sub-system and has representational features thatcan capture user behavior(s) and operational behavior of thesub-system. The contributions of the proposed approach canbe summarized into the following three areas: (1) A gener-alizable security framework with capability to continuouslylearn and predict future ‘risky’ states, which in turn can beused to efficiently deploy and enforce security configurations,(2) Novel techniques to estimate the reward and utility of thestates, and (3) Incorporation and utilization of probabilisticuser behavior(s) which enables the system to be more robustand usableIn Section 2 we discuss the existing approaches to protectcritical cloud infrastructures and their limitations. In Section3 we outline the steps in our approach to predict ’risky’states in the subsystem of a cloud infrastructure. In Section4 we detail the approach proposed in this work by discussingthe implementation of each step of the general approachmentioned in the previous section . In Section 5, we describethe different experiments performed and results obtained byevaluating the accuracy of the proposed approach. In Section6, we conclude by discussing the usefulness of our workalong with directions for future work. a r X i v : . [ c s . A I] O c t I. R
ELATED W ORK
Given the sophistication and resources used in some ofthe recent successful security breaches on critical cloudinfrastructures, relying on reactive security techniques [4],[5] is no longer sufficient. Providing sufficient deterrence-based security cover requires large amount of computationalresources since these techniques must be actively deployedall the time. Further, they lack the ability to detect newgeneration threats which are targeted, persistent, stealthy,and unknown. Hence, there is a pressing need to rely onintelligent cyber security approaches having predictive abilityto better protect critical cloud infrastructures.Game theory is one of the most commonly used ap-proaches to protect cyber infrastructures using predictiveability [6], [7], [8]. Game theoretic techniques are basedon assumptions such as rationality of players, existence ofa Nash Equilibrium and synchronization in actions of theattackers and defenders. However these constraints may notalways hold true in real scenarios, which is one major pitfallof using Game Theoretic Techniques. Further, these tech-niques are not scalable with realistic sizes and complexityof the infrastructures.Recent work has also shown the effectiveness of applyingmachine learning and data mining techniques for cybersecurity [9], [10], [11], [12]. However in these techniques,models trained on a particular dataset become specific tomimicking the observations from history and therefore taketime to adapt to unforeseen patterns. Keywhan Chung etal.[13] in turn proposed a Q-Learning model which reactsautomatically to the adversarial behavior of a suspicioususer. However, the model does not provide any measurefor quantifying the rewards of a successful attack or attackdetection and relies on expert knowledge to enumerate thedamage for taking a particular action.Probabilistic techniques used to protect cyber infrastruc-tures include Bayesian networks [14] and Markov Chains[15], [16], [17]. Bayesian methods are able to deal withcomplex traffic distributions by using probabilities obtainedfrom historical data to calculate probability of specific events.However it is very difficult to obtain prior distributions ofa normal and an attack state. In the case of Markov chainmodels, the computations involved are relatively simple butthere is apprehension in constructing the state profile of com-plex systems as all transition probabilities between possiblestates need to be calculated. Further, the predictive abilityof these approaches is limited as they do not incorporateprobabilistic human behavior in their attacker model. Yauet. al [18] introduced the possibility of using a combinationof Bayesian networks and MDP for predictive security.However, the authors presented a very brief overview of theirvision without illustrating how the MDP should be modeledand do not show any empirical evaluation to validate theproposed idea.In an attempt to address the limitations posed by thepredictive cyber security techniques mentioned above- mostcrucial being their lack of ability to capture probabilistic human behavior and in turn the system operational behavior,we have proposed an MDP based predictive approach forsecuring cloud infrastructures. To the best of our knowledge,an MDP based approach has never been used in the past workto model cloud infrastructure subsystems and further learn apolicy to predict ‘risky’ states at future time instants. Detailsof the proposed framework are presented in the followingsections. III. P
ROPOSED A PPROACH
In this paper, we aim to predict less secure states ina critical cloud sub-system which have potential securitybreach risk. These less secure states of the cloud sub-systemare termed as ‘risky’ states in our approach. By identify-ing these future ‘risky’ states prior to their occurence, theproposed model enables the cloud/security administrator todeploy the necessary security provisisons in time. We providea predictive security approach by using Markov DecisionProcesses (MDP) to model the sub-system and capture theuser and operational behavior(s). The MDP is solved toobtain an optimal policy which can lead the administratorto future ‘risky’ states, given that currently the subsystem isin a ‘safe/non-risky’ state. In contrast to previous techniqueswhich do not incorporate user behavior/attacker modeling,our framework includes features representing user behavior.Since probabilistic human behavior affects the system oper-ational behavior, it is imperative that predictive approachesshould have some mechanism to facilitate the capturingand analysis of probabilistic human behavior accurately andefficiently. The policy learnt using our framework is basedon information such as normal usage patterns, malicious be-havior and subsystem performance metrics. Further, since thelist of features is expandable, the proposed framework offersflexibility by allowing the incorporation of more informationin the future. Rather than proposing a model that completelyeliminates human intervention, our solution intends to assistadministrators in better deploying and enforcing security.This allows for more robust security of the infrastructure.In this section, an outline of the approach with the stepsinvolved in predicting risky states is provided. Implementa-tion and specifics of each step in our model are given inSection 4. These steps are sequentially represented in Figure1. Now we describe each step as shown in Figure 1:
A. Gathering data from cloud infrastructure
Critical cloud infrastructures can be thought of as acomposition of multiple subsystems or functional units, eachserving a specific purpose. Each of these subsystems may bedistributed over one or more virtual machines. To representeach sub-system as an MDP, data is collected to construct thefeatures for the state space. Data for a specific time interval iscollected for a cloud sub-system during which it would haveboth malicious as well as non-malicious incoming traffic.
B. Constructing State Features
Each MDP state is defined by a set of features/attributes.The data collected for a subsystem in the previous step is ig. 1. Proposed approach as a flowchart used to populate these features. The feature set can be di-vided into a finite number of categories with features in eachcategory representative of a particular class of information.To model the cloud subsystem as an MDP we define thefollowing three categories of features:1)
Normal User Behavior:
This category contains featureswhich characterize the normal traffic(non-malicious)incoming on the cloud subsystem.2) Subsystem Performance: This category contains fea-tures which characterize the performance of the VirtualMachine(s) hosting the subsystem.3)
Malicious Activity:
Contains features which can de-scribe different types of attacks such as DoS, CSRF,XSS and SQL injections on the subsystem.The above three categories are not exhaustive , howevereach of them has a variety of features providing sufficientinformation to define a state of the MDP.
C. State space construction
To construct a finite state space, continuous data valuesunder each feature need to be discretized. Different dis-cretization techniques can be used such as Standard Binning,Fayyad and Irani’s MDL method [19], Class-attribute inter-dependence maximization (CAIM) [20] and Class-AttributeContin-gency Coefficient (CACC) [21]Selection of the method depends on the problem anddata-set at hand. Once each feature has been discretizedinto values corresponding to each time step, the size of thegenerated state space for N features is evaluated as follows: | S | = n ∗ n ∗ n ∗ ...n N In the above expression, n , n , n ...n N are the number ofunique values taken by each feature. D. State Space Abstraction
Even after the feature set comprising of continuous datahas been discretized in the previous step, the state space obtained is generally very large. State Space Abstraction isused to reduce the size of state-space of the MDP. Need forstate space abstraction stems from the fact that most dynamicprogramming based algorithms that are used to solve theMDP have high overheads for large state spaces [22], [23].Commonly used techniques for state space abstraction areClustering based abstraction [24],Model-irrelevance abstrac-tion [25] and a * -irrelevance abstraction [25]. E. MDP tuple construction
A Markov Decision Process (MDP) is defined by tuple ( S , A , P , R ), where S represents states, A represents actions, P : S x A x S → [0 , represents a transition probability functionand R : S x A → R represents a reward function for eachstate-action pair. Actions are chosen to maximize an expectedcumulative discounted reward: V ( s n ) = E [ ∞ (cid:88) n =0 γ n R ( s n , a n )] (1)Here, a n ∈ A is the action selected for current state s n ∈ S and γ ∈ (0 , is the discount factor. F. Generating optimal policy
Dynamic programming (DP) based algorithms such asvalue iteration [26] policy iteration [27] modified policyiteration [28], relative value iteration [29] and Gauss Seidelvalue iteration [30] are used to solve the MDP and generatethe optimal policy. The optimal policy aims to maximize theset of rewards or ‘value’ that can be obtained at each state.
G. Evaluating optimal policy
The evaluation of obatined optimal policy involves, firstmapping the optimal policy from the abstracted state spaceto the original large state space and then determining it’saccuracy. Accuracy of the obtained policy can be calculatedas the percentage of instances where the policy correctlyidentified ‘risky’ states. More details on how the optimalpolicy is evaluated are presented in Section 4-G.
H. Identify future risky states
If the accuracy achieved by the model at the previous stepis satisfactory, it can be deployed to predict risky states atfuture time steps. The technique for identifying risky statesof the critical cloud infrastructure subsystem at future timesteps using the evaluated policy has been illustrated in detailin Section 4-H.IV. I
MPLEMENTATION AND S PECIFICS
We now describe the specifics for each step of theproposed solution outlined in the previous section. Eachstep of the previous section is covered here, with detailsconcerning it’s implementation, algorithm evaluation anddesign decisions.In this paper, to simulate the critical cloud infrastructure, aBanking application was deployed on Amazon Web Servicesusing the Elastic Beanstalk platform. The Banking systemconsisted of three subsystems:Admin portal,Customer portalnd Staff portal which were differentiated on basis of theuser group accessing them. Cloud resources allocated to eachsubsystem included an EC2 instance (web server) and anRDS instance (database server).Since the aim of this work is to be able to predictpotentially vulnerable states at the subsystem level of acritical cloud infrastructure, the admin portal was chosenas the subsystem for experimentation. The above choiceof subsystem was not biased and the approach describedin further steps can be replicated for any of the threesubsystems.
A. Simulating traffic to obtain data
The first step involves collecting data for state features ofthe MDP. Traffic simulations were performed on the cloudsubsystem in which both normal user traffic as well as ma-licious traffic was injected simultaneously. Now we describethe definition, configuration and tools used to simulate thetwo categories of traffic: • Normal user traffic:
The term ‘normal user traffic’ indicates the absenceof any malicious activity in the form of attacks andcomprises of only standard GET/POST http requestsneeded to perform various operations in the admin sub-system.
Apache Jmeter was used to simulate the normaltraffic on the EC2 instance hosting the subsystem. • Malicious traffic:
Three different types of flood attacks- SYN flood at-tacks, UDP flood attacks and the ICMP flood attackswere simulated on the EC2 instance hosting the subsys-tem. The
Hping3 tool was used to generate these floodattacks. At the target server (that is, the EC2 instancehosting the subsystem)
Snort was deployed to detect thepresence of a flood attack and generate correspondingalertsData for 300 seconds was collected from the server. Thisdata was then used to generate features using the proceduredescribed in the next step.
B. Generate features using data
We now discuss the description and evaluation of featuresin each of the three feature categories explained previously.Every feature value was evaluated for successive time stepsof 1 second over the 300 second time window:1)
Normal user behavior: a) Number of http requests : Total number of http-GET/POST requests on the EC2 server hostingthe sub-system.b)
Number of unique users : Number of userslogged in to the subsystem and performing vari-ous operations.c)
Requests-user distribution : Distribution of httprequests among the unique users logged intothe subsystem. It is the ratio of number of httprequests and number of unique users. d)
Average bytes sent : Calculates the averageamount of data in bytes sent to the server throughhttp requests.2)
Subsystem performance: a) Average latency : Measures the average serverlatency during each 1 second time step.b)
Average response time : Measures the averageserver response time during each 1 second timestep.3)
Malicious Activity :a)
DoS attack : This feature describes whether or notany Denial-of-Service (DoS) attack attempts havebeen made on the host server. If any, it also differ-entiates between the three categories of Flood at-tacks simulated previously (SYN, UDP or ICMPbased) and indicates the presence/absence of eachduring a time step. The presence of an attackis determined using the corresponding alert gen-erated from
Snort .The reason for selecting DoSattacks specifically amongst other attack types inthese experiments is because of the large-scaleharm that DoS attacks are capable of causing.This feature is discrete-valued and takes 8 dif-ferent values corresponding to all 8 occurencecombinations of the three flood attacks.It is important to note that apart from the last feature(DoS attacks) all the other features have continuous values.In order to define a state space having a finite number ofstates, each of these 6 features would need to be discretized.This is described in the next step.
C. State space construction
Standard binning technique was used to convert the con-tinuous range of values of the first six features into discreteintegral values. This technique was chosen over the othertechniques because of it’s simplicity of evaluation, thusadding the least amount of computational overhead at thisintermediate step. Further, this technique does not makeany assumptions on the structure of data obtained postdiscretization nor introduces any bias like the other contextbased methods. The number of discrete values obtained forthe 7 features were: • Number of http requests:5 (Range:0-50, Bin size:10) • Number of Unique Users: 5 (Range:0-50, Bin size: 10) • Requests-User Distribution:4 (Range:1-4, Bin size:0.75) • Average bytes sent:4 (Range:800-1300, Bin size:125) • Average latency:4 (Range:100-3500, Bin size: 850) • Average response time:4 (Range:0-8000,Bin size:2000) • DoS attacks: 8Based on the number of discrete values obtained for eachfeature, the size of state space is ∗ ∗ ∗ ∗ ∗ ∗ . D. State space abstraction
As evaluated in the last step, we have constructed a statespace of 51200 states for the Markov Decision Process.However, as mentioned before, such a large state space isot practically solvable using most dynamic programmingapproaches which is why it becomes necessary to reduce thestate space.In this paper, we opt for the clustering based abstractionmethod. Clustering based techniques perform unsupervisedabstraction based on just the distribution of data pointswhile most of the other state abstraction techniques such asmodel-irrelevance abstraction and a * -irrelevance abstractionmake some assumptions on the importance of states toaggregate them. As we cannot label some states as eitherimportant or irrelevant in our model, it is appropriate to usea fair algorithm that aggregates states without any bias. Weuse a number of clustering approaches for generating theabstracted state space in this step and find the one whichgives the optimal performance.We use the following algorithms to perform the ab-straction: K-Means using Euclidean distance metric (KME),K-Means using Mahalanobis distance metric (KMM) andGaussian Mixture Models (GMM). To find the clustering al-gorithm best-suited to our use case amongst KME, KMM andGMM, we employ each of the algorithms and run throughour proposed solution to find the highest accuracy achievedin the evaluation step (Section 4-G). The algorithm whichgives the highest performance is chosen as the algorithm forcarrying out the state space abstraction. The results and thedetails of the experiments regarding the choice of clusteringalgorithm are described in the Section 5. E. MDP Tuple construction
Now we define the action set, reward matrix and transitionmatrices for the MDP representing a subsystem of the cloudbanking infrastructure. Action set : : In our action set we define 2 actions: • Remain in the same state : When this action is per-formed, there is a higher probability of remaining inthe same state than of jumping to any different state. Inthe remaining text, this action is denoted by the integer‘0’. • Jump to a different state : When this action is performed,no probability restriction is imposed on either jumpingto a different state or remaining in the same state. Inthe remaining text, this action is denoted by the integer‘1’.The distinction between the above two actions becomesmore clear with the description of the reward function andthe transition function given below. Reward function : : This function returns the rewardvalue for performing an action a in state s . In our model,we define this reward value as follows: R ( s, a ) = ( w ∗ F ) + ( w ∗ F ) + ( w ∗ F ) + ( w ∗ F )+ ( w ∗ F ) + ( w ∗ F ) + ( w ∗ F ) + ( w a ∗ R a ) (2)In the above expression, w , w , w , w , w , w , w are theweights given to each of the features proportional to theirpotential contribution in determining the ‘risk’ associatedwith a state. From the description of features presented in Section 4-B,it can be seen that the feature category 3 (Malicious activity)is the most prominent indicator of potential security riskin a state, followed by features in Category 2 (Subsystemperformance) and finally Category 1 (Normal user Behavior).Based on this logic the assigned weight values are w =1000 , w = 1000 , w = 1000 , w = 1000 , w = 2000 , w = 2000 , w = 3000 . F , F , F , F , F , F , F can take two values: either 0or 1. If the value of a feature at the current state lies in the‘safe range’, corresponding value of F i is 0, otherwise it is1. The ‘safe range’ values for each of the 7 features is takenas the first half of the range of values taken by that feature.This is based on the logic that as the integral value taken bya particular feature increases, the corresponding contributionof the feature in increasing the ‘risk’ of a state also increases.Accordingly an unbiased approach to assign the ‘safe’ valuesis to take the first half of the range of discretized values ofa feature.The last term in the reward metric expression, ( w a ∗ R a ) takes into account the effect of action a performed in a state.Thus, w a is the weight and R a is the reward term associatedwith action a . To define the term R a , we first define twoterms: ‘Risk Metric’ and ‘Risk Threshold’.Risk metric RM s gives the measure of ‘risk’ associatedwith a particular state and is given by: (3) RM s = w ∗ F + w ∗ F + w ∗ F + w ∗ F + w ∗ F + w ∗ F + w ∗ F ‘Risk threshold’ ( R th ) is defined as: R th = α ∗ ( w + w + w + w + w + w + w ) (4)Risk threshold is a fixed value such that if risk metric isgreater than risk threshold, the state is termed as ‘risky’and if risk metric is less than risk threshold, the state isconsidered ‘non-risky’. Moreover, α can be any fixed valueranging between 0 to 1, depending on the criterion that theadmin managing the cloud infrastructure sets to classify astate as ‘risky’ or ‘non-risky’. A higher value of α implies astricter criterion, and vice-versa. For our experiments, we setthe value of α as 0.5, since dividing the risk metric into tworanges about the half-way mark introduces the least amountof bias towards either ’riskiness’ or ’non-riskiness’ of thestate.Now that we have the risk metric and risk threshold, R a is defined using the following conditions: • If RM s > R th :1) If action a is 0, then R a is +12) If action a is 1, then R a is -1 • If RM s < R th :1) If action a is 0, then R a is -12) If action a is 1, then R a is +1Since we want our optimal policy to identify the riskystates, a positive reward of +1 is given in case the currentstate is ‘risky’ and the action taken is ‘remain in the sametate’ or if the current state is ‘non-risky’ and the actiontaken is ‘jump to a different state’. Apart from these twoscenarios, a negative reward of -1 is given correspondingto the action taken. In order to make the range of thereward term R a comparable to RM s so that the outcomeof the action taken has an observable effect on the valueof R ( s, a ) , a weight w a is multiplied with R a .Value of w a is taken as average of the other seven weights ( w to w )associated with RM s .Average value has been taken to ensurefair contribution of the action in determining the rewardvalue. This average value is approximately 1500.The reward metric R ( s, a ) is defined for a single state inthe original state space. For abstract states (containing oneor more states from the original state space) we calculatethe average reward metric R ( s (cid:48) , a ) . If s (cid:48) is an abstract statecontaining the states s , s , s ... sN from the original statespace, then R ( s (cid:48) , a ) is given by the following expression: R ( s (cid:48) , a ) = ( R ( s , a )+ R ( s , a )+ R ( s , a )+ .... + R ( sN, a )) /N (5)Finally, a reward matrix of dimensions 1000 x 2 isgenerated. The 1000 rows correspond to the 1000 abstractstates and the columns correspond to the actions ‘0’ and ‘1’.The value of R ( s (cid:48) , a ) derived above is used to generate thevalues that are inserted into the matrix. Transition function: : This function generates a prob-ability value of reaching state s (cid:48) from state s by performingan action a .In order to calculate the transition probabilities, we usethe data for the 300 time steps generated from traffic simu-lations on our cloud banking infrastructure. Each time stepcorresponds to a state in the original state space which inturn belongs to one of the abstracted states in the abstractstate space. Hence, we generate an abstract state transitionvector for the 300 time steps from the original data.From the abstract state transitions data, the probability oftransitioning from an abstract state s to an abstract state s (cid:48) is given by: P ( s (cid:48) | s ) = N ( s (cid:48) | s ) /N (( s (cid:48)(cid:48) (cid:54) = s (cid:48) ) | s ) (6)In the above equation, N ( s (cid:48) | s ) represents the number oftransitions from s to s (cid:48) in the abstract state transition vector.Here, s is the abstract state at time step t , s (cid:48) is the abstractstate at time step t +1 . N (( s (cid:48)(cid:48) (cid:54) = s (cid:48) ) | s ) represents the numberof transitions from s to any abstract state s (cid:48)(cid:48) other than s (cid:48) ,from the state s , in the abstract state transition vector.Thus, the transition function metric P ( s (cid:48) | s, a ) is calculatedfrom P ( s (cid:48) | s ) as follows: • If a is 1 then P ( s (cid:48) | s, a ) = P ( s (cid:48) | s ) • If a is 0 then: – If s (cid:48) = s , then P ( s (cid:48) | s, a ) = t s where t s ∈ (0 . , – If s (cid:48) (cid:54) = s , then P ( s (cid:48) | s, a ) = (1 − t s ) . ∗ P ( s (cid:48) | s ) P ( s (cid:48) | s, a ) when s (cid:48) = s is the case of self transitions, henceif the action taken is ‘0’ (‘remain in the same state’) there isa very high stochastic probability of making the jump from s to s itself. Since this probability of self-transition should be directly dependent on the risk associated with the state s (risk metric), we linearly map the risk metric of a particularstate with the self-transition stochastic probability associatedwith it. The linear transformation is done such that t s takevalues greater than 0.5 and less than or equal to 1. Thus thelinear transformation will involve mapping from the RM s range i.e. from [ min ( RM s ) , max ( RM s )] to the range of t s ,which is from (0.5,1]. Considering this linear transformationfunction to be f ( x ) , we can write: f ( x ) = ( x − min ( x ) max ( x ) − min ( x ) ∗ .
49) + 0 . (7)Using the above function, we get f ( RM s ) = t s .This function gives the self-transition probability for‘risky’ states as greater than 0.75, enforcing the logic thatfor these states, the action to ‘remain’ in the same state isaccompanied with a very high stochastic probability.The non-self transition probability when a is ‘0’ is dis-tributed from (1-t s ) on the basis of P ( s (cid:48) | s ) .When a is ‘1’, the action taken is ‘Jump to a differentstate’. There is no imposed restriction and therefore allprobabilities are calculated directly using P ( s (cid:48) | s ) .Finally, two transition probability matrices of dimensions1000 x 1000 corresponding to the actions ‘0’ and ‘1’ arecreated. The rows and columns correspond to the 1000abstract states and the values in both matrices are filled fromthe above derived value of P ( s (cid:48) | s, a ) . F. Optimal policy generation
Once the MDP is constructed, an optimal policy needsto be generated. In our case, the optimal policy is a set ofactions that can achieve the task of identifying risky statesi.e. it should aim to ‘jump’ from secure states and ‘remain’at risky states. Since we construct the MDP on the abstractedstate space consisting of 1000 states, the optimal policy is avector of length 1000, containing actions from the action setof the MDP.We intend to find the DP algorithm which gives us the‘best’ optimal policy for the MDP framework and also takesthe minimum time to do so. Thus, in the results section(Section 5), we evaluate the DP algorithms not only onthe basis of their accuracy but also on the time takenfor them to solve the MDP. The intersection of these twoperformance metrics is chosen as the MDP solving algorithmfor our framework. The results for these are obtained throughexperiments described in the next section.
G. Evaluation of optimal policy
The optimal policy for each of the 1000 states consistsof either ‘jump’ to the next state or ‘remain’ in the currentstate. To map this logic back to the original state space, weaward the same ‘optimal’ action in the obtained policy fromthe abstracted (cluster) state to all the states in the originalstate space that belong to this cluster.With the reverse mapping available, we can evaluate theoptimal policy on both the abstracted state space and theoriginal large state space. First, we find the ‘risky’ states ig. 2. Finding risky states using optimal policy using the risk metric, RM s (equation (3)) and risk threshold, R th (equation (4)). If RM s is greater than R th , the state istermed as ’risky’.With the states labeleled as either ‘risky’ or ‘not-risky’, weevaluate the performance of the optimal policy by calculatingit’s accuracy. This is done by calculating the ratio of stateswhere the optimal policy gave a favourable outcome to thetotal number of states. Favourable outcomes are ‘remaining’at ‘risky’ states and ‘jumping’ from ‘non-risky’ states toidentify the risky states. The numerical results of the accu-racy obtained from the experiments performed are presentedin Section 5. H. Using the obtained optimal policy to predict risky states
Once the optimal policy is obtained with a sufficientaccuracy using the evaluation criterion mentioned in theprevious step, it is used to identify future ‘risky’ states. Itis assumed that at the current time instant (taken as t =0),the subsystem is in a ‘non-risky’ or ‘safe state’. At thecurrent instant, the cloud infrastructure administrator woulddeploy the optimal policy starting from that safe state. Asmentioned previously, the expected optimal policy is suchthat it generates the action ‘jump to a different state’ in anon-risky state and ‘remain in the same state’ in a risky state.Accordingly, once this optimal policy is deployed startingfrom the current safe state, a state transition tree is generated(according to the transition function defined in Section 4-E)extending into future time instants. This is shown in Figure2. As seen from Figure 2, the risky states at any time step t end up becoming the leaf nodes. This is because accordingto the transition function, defined in Section 4-E, the ‘remainin the same state’ action taken by the optimal policy atrisky states ensures a probability of at least 0.75 (refer toSection 4-E) of remaining in the same state and hence anyfurther state transitions from that state would hold negligibleprobability. However, in the case of non-risky states, theaction taken is ‘jump to a different state’ and therefore thatstate becomes the root of a tree representing transitions fromthat state to other states. Therefore, from the tree generatedin this manner, the risky states can be easily identified forany future time step as they will constitute all the leaf nodes. Fig. 3. Finding optimal clustering algorithm
Furthermore, the probability of reaching a risky state can alsobe easily calculated by taking the product of the transitionprobabilities depicted on the branches of the tree, startingfrom the root node to the risky leaf node itself.Hence, at any given time instant when the subsystem issecure, the proposed approach can help the infrastructureadministrator identify the risky states at future time steps aswell as the probability of reaching those risky states. Thisinformation can be used to alert the admin so that necessaryaction can be taken to prevent the potential security breachesin time. V. R
ESULTS AND A NALYSIS
In this section, we present the results obtained by ourframework using the experimental set-up described in in theprevious section. We undertake three experiments as part ofour proposed solution:
A. Experiment 1: Choice of clustering algorithm for stateabstraction
We choose policy iteration (with discount factor, γ as 0.1)as our MDP solving algorithm to obtain the optimal policyand then compute the accuracies on the original state spaceby varying the clustering algorithm for state space abstrac-tion. We run experiments using K-Means with Euclideandistance metric (KME), K-Means with Mahalanobis distancemetric (KMM) and Gaussian Mixture Models (GMM). Wealso vary the number of clusters between 250, 500, 750 and1000 and see where we obtain the highest accuracy.It can be seen in Figure 3, that KME for 1000 abstractedstates obtains an accuracy of 96.516% compared to KMMfor 1000 clusters which obtains an accuracy of 88.775% aswell as GMM for 1000 clusters which achieves an accuracyof just 74.601%. Here, policy iteration ( γ is 0.1) is usedto obtain the optimal policy (although even value iteration,Gauss Seidel value iteration and modified policy iterationgive the same optimal policies). We can also see in Figure3 that for 1000 clusters, we obtain the highest original statespace accuracy as compared to other cluster sizes in KME.However, to reaffirm the choice of 1000 clusters, we plot the ig. 4. The Elbow curve to find optimal cluster size mean square error (MSE) obtained while increasing clustersizes. This is known as an elbow curve and is shown inFigure 4. It can be seen that the error tremendously decreasestowards 1000 cluster size and does not decrease much afterthat. Therefore, we choose KME with 1000 abstracted statesas our state space abstraction algorithm. B. Experiment 2: Finding best-suited MDP solving algo-rithm
Through this experiment, we seek to find the optimalchoice for the MDP solving algorithm giving the highest ac-curacy and minimal computation time. Here we choose KMEwith 1000 states as found in the previous experiment, as ourclustering algorithm and compare modified policy iteration,policy iteration, value iteration, relative value iteration andGauss Seidel value iteration to obtain the optimal policiesand see which algorithm gives the highest original state spaceand abstracted state space accuracy. For all DP algorithmsthe value of discount factor ( γ ) is kept as 0.1.We observe that modified policy iteration, policy iteration,value iteration and Gauss Seidel value iteration give thesame optimal policy which achieves the highest original stateaccuracy of 96.516% and highest abstracted space accuracyof 98.5%. We then choose the best-suited algorithm out ofthese by finding the one that takes the minimum time tocompute the optimal policy. As can be seen in Figure 5,modified policy iteration is the fastest algorithm out of allof the four with a completion time of 0.0335149 seconds.This is a much smaller value compared to even the nextfastest algorithm, value iteration, which has a completiontime of 0.81714 seconds. Therefore, modified policy iterationis chosen to be the algorithm to compute the optimal policyfor our framework. C. Experiment 3: Finding optimal value of discount factor( γ ) In the previous two experiments, γ was chosen to be 0.1.However, we also need to experimentally find if a betterchoice of γ exists that can yield higher accuracy for theoriginal state space. In this experiment, we choose KMEwith 1000 states as the clustering algorithm and run modified Fig. 5. Completion times for MDP solving algorithms policy iteration to solve the MDP, but with varying valuesof γ .As can be seen in Figure 6, we vary γ between 0.1 and0.9 and find that on increasing γ , the accuracy for theoriginal state space starts to decrease. The earlier obtainedaccuracy of 96.516% for γ set to 0.1, is the highest valueof accuracy. Therefore, the optimal value for the discountfactor is chosen to be 0.1.Therefore, we have been able to find the best parametersand choice of algorithms for our framework. These are: • K-Means with Euclidean metric for state space abstrac-tion • Modified policy iteration for solving the MDP • Discount factor ( γ ) set to 0.1These settings give state-of-the-art results, with an accu-racy of 98.5% on abstracted state space and 96.516% on theoriginal large state space.Since the obtained optimal policy is highly accurate (per-formance accuracy is 96.516%) with respect to the expectedoptimal policy of ‘remaining’ in risky states and ‘jumping’to different states while in a non-risky state, it can beconfidently used to identify future risky states along withtheir probabilities using the approach mentioned in Section4-H. VI. C ONCLUSION AND F UTURE W ORK
In this paper, an approach for predictively securing crit-ical cloud infrastructures is developed and evaluated. Theframework utilizes MDP at the sub-system level to captureprobabilistic user behavior and operational behavior of thesub-system through a set of features which define the MDPstate space. Further, a suitable reward function is definedthrough which the learnt optimal policy is able to identifyfuture ‘risky’ states. These ‘risky’ states can lead to potentialsecurity breaches. The step-wise procedure of the proposedapproach have been detailed in the aforementioned sections.Various experimental evaluations are performed in order tomaximize the prediction accuracy of the generated policy.These include: (1) comparing different clustering algorithmsfor state-space abstraction; (2) empirically determining that ig. 6. Finding best value of γ modified policy iteration provides the highest accuracy andhas least convergence time among other DP techniques. Theresulting framework is designed based on the above evalu-ations and achieves an accuracy of 96.516% in identifyingfuture ‘risky’ states . This reflects the effectiveness of usingprobabilistic modeling through MDP to predictively securecritical cloud infrastructures.Future work primarily aims at expanding the feature setespecially in the ‘malicious activity’ category by includ-ing more sophisticated and varied attack models to ensurefurther robustness of the security framework. In addition,for experimental feasibility, traffic simulations on the cloudinfrastructure were performed at a relatively small-scale.The step-wise approach of our proposed framework willbe undertaken for more large scale cloud systems. Further-more, the proposed MDP framework for predicting securitybreaches at the subsystem level of the cloud infrastructurecan be extended by using it in combination with BayesianNetworks to predict security breaches at the system-widelevel. Each subsystem can send the predicted risky statesto the system-wide Bayesian Network, either directly orindirectly. Utilization of Bayesian Network has much lesseroverhead and is better equipped to monitor all the eventswhich occur across subsystems, and thus predict system-widesecurity breaches. R BHUSA09-Kortchinsky-Cloudburst-SLIDES. pdf (vid.p´ag. 13) , 2009.[2] H. Shah, S. S. Anandane, and Shrikanth, “Security issues on cloudcomputing,”
CoRR , vol. abs/1308.5996, 2013. [Online]. Available:http://arxiv.org/abs/1308.5996[3] M. Jensen, J. Schwenk, N. Gruschka, and L. L. Iacono, “On technicalsecurity issues in cloud computing,” in , Sept 2009, pp. 109–116.[4] A. Benameur, N. S. Evans, and M. C. Elder, “Cloud resiliency andsecurity via diversified replica execution and monitoring,” in
ResilientControl Systems (ISRCS), 2013 6th International Symposium on .IEEE, 2013, pp. 150–155.[5] K. W. Ullah, A. S. Ahmed, and J. Ylitalo, “Towards building anautomated security compliance tool for the cloud,” in
Trust, Securityand Privacy in Computing and Communications (TrustCom), 201312th IEEE International Conference on . IEEE, 2013, pp. 1587–1593. [6] S. Shiva, S. Roy, and D. Dasgupta, “Game theory for cyber security,”in
Proceedings of the Sixth Annual Workshop on Cyber Securityand Information Intelligence Research , ser. CSIIRW ’10. NewYork, NY, USA: ACM, 2010, pp. 34:1–34:4. [Online]. Available:http://doi.acm.org/10.1145/1852666.1852704[7] A. Ptracu and E. Simion, “Game theory in cyber security defence,”in
Proceedings of the International Conference on ELECTRONICS,COMPUTERS and ARTIFICIAL INTELLIGENCE - ECAI-2013 , June2013, pp. 1–6.[8] W. Jiang, Z. h. Tian, H. l. Zhang, and X. f. Song, “A stochasticgame theoretic approach to attack prediction and optimal activedefense strategy decision,” in , April 2008, pp. 648–653.[9] O. Thonnard and M. Dacier, “Actionable knowledge discovery forthreats intelligence support using a multi-dimensional data miningmethodology,” in , Dec 2008, pp. 154–163.[10] H. Farhadi, M. Amir Haeri, and M. Khansari, “Alert correlation andprediction using data mining and hmm,” vol. 3, pp. 77–102, 01 2011.[11] C. Tang, Y. Xie, B. Qiang, X. Wang, and R. Zhang, “Securitysituation prediction based on dynamic bp neural with covariance,”
Procedia Engineering
IEICE- Trans. Inf. Syst. , vol. E91-D, no. 5, pp. 1234–1241, May 2008.[Online]. Available: http://dx.doi.org/10.1093/ietisy/e91-d.5.1234[13] K. Chung, C. A. Kamhoua, K. A. Kwiat, Z. T. Kalbarczyk, and R. K.Iyer, “Game theory with learning for cyber security monitoring,” in , Jan 2016, pp. 1–8.[14] J. Wu, L. Yin, and Y. Guo, “Cyber attacks prediction model based onbayesian network,” in , Dec 2012, pp. 730–731.[15] D. H. Kim, T. Lee, S. O. D. Jung, H. P. In, and H. J. Lee, “Cyber threattrend analysis model using hmm,” in
Third International Symposiumon Information Assurance and Security , Aug 2007, pp. 177–182.[16] D. Man, Y. Wang, W. Yang, and W. Wang, “A combined predictionmethod for network security situation,” in , Dec2010, pp. 1–4.[17] D. S. Fava, S. R. Byers, and S. J. Yang, “Projecting cyberattacksthrough variable-length markov models,”
IEEE Transactions on Infor-mation Forensics and Security , vol. 3, no. 3, pp. 359–369, Sept 2008.[18] S. S. Yau, A. B. Buduru, and V. Nagaraja, “Protecting criticalcloud infrastructures with predictive capability,” in , June 2015, pp. 1119–1124.[19] U. Fayyad and K. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” 1993.[20] L. A. Kurgan and K. J. Cios, “Caim discretization algorithm,”
IEEEtransactions on Knowledge and Data Engineering , vol. 16, no. 2, pp.145–153, 2004.[21] C.-J. Tsai, C.-I. Lee, and W.-P. Yang, “A discretization algorithm basedon class-attribute contingency coefficient,”
Information Sciences , vol.178, no. 3, pp. 714–731, 2008.[22] M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexityof solving markov decision problems,” in
Proceedings of the Eleventhconference on Uncertainty in artificial intelligence . Morgan Kauf-mann Publishers Inc., 1995, pp. 394–402.[23] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of markovdecision processes,”
Mathematics of operations research , vol. 12,no. 3, pp. 441–450, 1987.[24] P. Berkhin, “A survey of clustering data mining techniques,” in
Grouping multidimensional data . Springer, 2006, pp. 25–71.[25] L. Li, T. J. Walsh, and M. L. Littman, “Towards a unified theory ofstate abstraction for mdps.”[26] R. Bellman,
Dynamic Programming , ser. Dover Books on ComputerScience. Dover Publications, 2013. [Online]. Available: https://books.google.co.in/books?id=CG7CAgAAQBAJ[27] L. G. Telser, “Dynamic programming and markov processes. ronalda. howard,”
Journal of Political Economy , vol. 69, no. 3, pp. 296–297,1961. [Online]. Available: https://doi.org/10.1086/25847728] M. L. Puterman and M. C. Shin, “Modified policy iteration algorithmsfor discounted markov decision problems,”
Management Science ,vol. 24, no. 11, pp. 1127–1137, 1978.[29] D. White, “Dynamic programming, markov chains, and the methodof successive approximations,”
Journal of Mathematical Analysis andApplications