Ankit Shah | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ankit Shah is active.

Explore More

Publication

Featured researches published by Ankit Shah.

ACM Transactions on Intelligent Systems and Technology | 2016

Dynamic Scheduling of Cybersecurity Analysts for Minimizing Risk Using Reinforcement Learning

Rajesh Ganesan; Sushil Jajodia; Ankit Shah; Hasan Cam

An important component of the cyber-defense mechanism is the adequate staffing levels of its cybersecurity analyst workforce and their optimal assignment to sensors for investigating the dynamic alert traffic. The ever-increasing cybersecurity threats faced by today’s digital systems require a strong cyber-defense mechanism that is both reactive in its response to mitigate the known risk and proactive in being prepared for handling the unknown risks. In order to be proactive for handling the unknown risks, the above workforce must be scheduled dynamically so the system is adaptive to meet the day-to-day stochastic demands on its workforce (both size and expertise mix). The stochastic demands on the workforce stem from the varying alert generation and their significance rate, which causes an uncertainty for the cybersecurity analyst scheduler that is attempting to schedule analysts for work and allocate sensors to analysts. Sensor data are analyzed by automatic processing systems, and alerts are generated. A portion of these alerts is categorized to be significant, which requires thorough examination by a cybersecurity analyst. Risk, in this article, is defined as the percentage of significant alerts that are not thoroughly analyzed by analysts. In order to minimize risk, it is imperative that the cyber-defense system accurately estimates the future significant alert generation rate and dynamically schedules its workforce to meet the stochastic workload demand to analyze them. The article presents a reinforcement learning-based stochastic dynamic programming optimization model that incorporates the above estimates of future alert rates and responds by dynamically scheduling cybersecurity analysts to minimize risk (i.e., maximize significant alert coverage by analysts) and maintain the risk under a pre-determined upper bound. The article tests the dynamic optimization model and compares the results to an integer programming model that optimizes the static staffing needs based on a daily-average alert generation rate with no estimation of future alert rates (static workforce model). Results indicate that over a finite planning horizon, the learning-based optimization model, through a dynamic (on-call) workforce in addition to the static workforce, (a) is capable of balancing risk between days and reducing overall risk better than the static model, (b) is scalable and capable of identifying the quantity and the right mix of analyst expertise in an organization, and (c) is able to determine their dynamic (on-call) schedule and their sensor-to-analyst allocation in order to maintain risk below a given upper bound. Several meta-principles are presented, which are derived from the optimization model, and they further serve as guiding principles for hiring and scheduling cybersecurity analysts. Days-off scheduling was performed to determine analyst weekly work schedules that met the cybersecurity system’s workforce constraints and requirements.

International Journal of Information Security | 2018

A methodology to measure and monitor level of operational effectiveness of a CSOC

Ankit Shah; Rajesh Ganesan; Sushil Jajodia; Hasan Cam

In a cybersecurity operations center (CSOC), under normal operating conditions in a day, sufficient numbers of analysts are available to analyze the amount of alert workload generated by intrusion detection systems (IDSs). For the purpose of this paper, this means that the cybersecurity analysts can fully investigate each and every alert that is generated by the IDSs in a reasonable amount of time. However, there are a number of disruptive factors that can adversely impact the normal operating conditions such as (1) higher alert generation rates from a few IDSs, (2) new alert patterns that decreases the throughput of the alert analysis process, and (3) analyst absenteeism. The impact of all the above factors is that the alerts wait for a long duration before being analyzed, which impacts the readiness of the CSOC. It is imperative that the readiness of the CSOC be quantified, which in this paper is defined as the level of operational effectiveness (LOE) of a CSOC. LOE can be quantified and monitored by knowing the exact deviation of the CSOC conditions from normal and how long it takes for the condition to return to normal. In this paper, we quantify LOE by defining a new metric called total time for alert investigation (TTA), which is the sum of the waiting time in the queue and the analyst investigation time of an alert after its arrival in the CSOC database. A dynamic TTA monitoring framework is developed in which a nominal average TTA per hour (avgTTA/hr) is established as the baseline for normal operating condition using individual TTA of alerts that were investigated in that hour. At the baseline value of avgTTA/hr, LOE is considered to be ideal. Also, an upper-bound (threshold) value for avgTTA/hr is established, below which the LOE is considered to be optimal. Several case studies illustrate the impact of the above disruptive factors on the dynamic behavior of avgTTA/hr, which provide useful insights about the current LOE of the system. Also, the effect of actions taken to return the CSOC to its normal operating condition is studied by varying both the amount and the time of action, which in turn impacts the dynamic behavior of avgTTA/hr. Results indicate that by using the insights learnt from measuring, monitoring, and controlling the dynamic behavior of avgTTA/hr, a manager can quantify and color-code the LOE of the CSOC. Furthermore, the above insights allow for a deeper understanding of acceptable downtime for the IDS, acceptable levels for absenteeism, and the recovery time and effort needed to return the CSOC to its ideal LOE.

Proceedings of the 2017 Workshop on Moving Target Defense | 2017

Detecting Stealthy Botnets in a Resource-Constrained Environment using Reinforcement Learning

Sridhar Venkatesan; Massimiliano Albanese; Ankit Shah; Rajesh Ganesan; Sushil Jajodia

Modern botnets can persist in networked systems for extended periods of time by operating in a stealthy manner. Despite the progress made in the area of botnet prevention, detection, and mitigation, stealthy botnets continue to pose a significant risk to enterprises. Furthermore, existing enterprise-scale solutions require significant resources to operate effectively, thus they are not practical. In order to address this important problem in a resource-constrained environment, we propose a reinforcement learning based approach to optimally and dynamically deploy a limited number of defensive mechanisms, namely honeypots and network-based detectors, within the target network. The ultimate goal of the proposed approach is to reduce the lifetime of stealthy botnets by maximizing the number of bots identified and taken down through a sequential decision-making process. We provide a proof-of-concept of the proposed approach, and study its performance in a simulated environment. The results show that the proposed approach is promising in protecting against stealthy botnets.

service-oriented computing and applications | 2018

Adaptive reallocation of cybersecurity analysts to sensors for balancing risk between sensors

Ankit Shah; Rajesh Ganesan; Sushil Jajodia; Hasan Cam

Cyber Security Operations Center (CSOC) is a service-oriented system. Analysts work in shifts, and the goal at the end of each shift is to ensure that all alerts from each sensor (client) are analyzed. The goal is often not met because the CSOC is faced with adverse conditions such as variations in alert generation rates or in the time taken to thoroughly analyze new alerts. Current practice at many CSOCs is to pre-assign analysts to sensors based on their expertise, and the alerts from the sensors are triaged, queued, and presented to analysts. Under adverse conditions, some sensors have more number of unanalyzed alerts (backlogs) than others, which results in a major security gap for the clients if left unattended. Hence, there is a need to dynamically reallocate analysts to sensors; however, there does not exist a mechanism to ensure the following objectives: (i) balancing the number of unanalyzed alerts among sensors while maximizing the number of alerts investigated by optimally reallocating analysts to sensors in a shift, (ii) ensuring desirable properties of the CSOC: minimizing the disruption to the analyst to sensor allocation made at the beginning of the shift when analysts report to work, balancing of workload among analysts, and maximizing analyst utilization. The paper presents a technical solution to achieve the objectives and answers two important research questions: (i) detection of triggers, which determines when-to reallocate, and (ii) how to optimally reallocate analysts to sensors, which enable a CSOC manager to effectively use reallocation as a decision-making tool.

International Journal of Information Security | 2018

A methodology for ensuring fair allocation of CSOC effort for alert investigation

Ankit Shah; Rajesh Ganesan; Sushil Jajodia

A Cyber Security Operations Center (CSOC) often sells services by entering into a service level agreement (SLA) with various customers (organizations) whose network traffic is monitored through sensors. The sensors produce data that are processed by automated systems (such as the intrusion detection system) that issue alerts. All alerts need further investigation by human analysts. The alerts are triaged into high-, medium-, and low-priority alerts, and the high-priority alerts are investigated first by cybersecurity analysts—a process known as priority queueing. In unexpected situations such as (i) higher than expected high-priority alert generation from some sensors, (ii) not enough analysts at the CSOC in a given time interval, and (iii) a new type of alert, which increases the time to analyze alerts from some sensors, the priority queueing mechanism leads to two major issues. The issues are: (1) some sensors with normal levels of alert generation are being analyzed less than those with excessive high-priority alerts, with the potential for complete starvation of alert analysis for sensors with only medium- or low-priority alerts, and (2) the above ad hoc allocation of CSOC effort to sensors with excessive high-priority alerts over other sensors results in SLA violations, and there is no enforcement mechanism to ensure the matching between the SLA and the actual service provided by a CSOC. This paper develops a new dynamic weighted alert queueing mechanism (DWQ) which relates the CSOC effort as per SLA to the actual allocated in practice, and ensures via a technical enforcement system that the total CSOC effort is proportionally divided among customers such that fairness is guaranteed in the long run. The results indicate that the DWQ mechanism outperforms priority queueing method by not only analyzing high-priority alerts first but also ensuring fairness in CSOC effort allocated to all its customers and providing a starvation-free alert investigation process.

ACM Transactions on Privacy and Security (TOPS) archive | 2018

VULCON: A System for Vulnerability Prioritization, Mitigation, and Management

Katheryn A. Farris; Ankit Shah; George Cybenko; Rajesh Ganesan; Sushil Jajodia

Vulnerability remediation is a critical task in operational software and network security management. In this article, an effective vulnerability management strategy, called VULCON (VULnerability CONtrol), is developed and evaluated. The strategy is based on two fundamental performance metrics: (1) time-to-vulnerability remediation (TVR) and (2) total vulnerability exposure (TVE). VULCON takes as input real vulnerability scan reports, metadata about the discovered vulnerabilities, asset criticality, and personnel resources. VULCON uses a mixed-integer multiobjective optimization algorithm to prioritize vulnerabilities for patching, such that the above performance metrics are optimized subject to the given resource constraints. VULCON has been tested on multiple months of real scan data from a cyber-security operations center (CSOC). Results indicate an overall TVE reduction of 8.97% when VULCON optimizes a realistic security analyst workforce’s effort. Additionally, VULCON demonstrates that it can determine monthly resources required to maintain a target TVE score. As such, VULCON provides valuable operational guidance for improving vulnerability response processes in CSOCs.

ACM Transactions on Intelligent Systems and Technology | 2018

Dynamic Optimization of the Level of Operational Effectiveness of a CSOC Under Adverse Conditions

Ankit Shah; Rajesh Ganesan; Sushil Jajodia; Hasan Cam

The analysts at a cybersecurity operations center (CSOC) analyze the alerts that are generated by intrusion detection systems (IDSs). Under normal operating conditions, sufficient numbers of analysts are available to analyze the alert workload. For the purpose of this article, this means that the cybersecurity analysts in each shift can fully investigate each and every alert that is generated by the IDSs in a reasonable amount of time and perform their normal tasks in a shift. Normal tasks include analysis time, time to attend training programs, report writing time, personal break time, and time to update the signatures on new patterns in alerts as detected by the IDS. There are several disruptive factors that occur randomly and can adversely impact the normal operating condition of a CSOC, such as (1) higher alert generation rates from a few IDSs, (2) new alert patterns that decrease the throughput of the alert analysis process, and (3) analyst absenteeism. The impact of the preceding factors is that the alerts wait for a long duration before being analyzed, which impacts the level of operational effectiveness (LOE) of the CSOC. To return the CSOC to normal operating conditions, the manager of a CSOC can take several actions, such as increasing the alert analysis time spent by analysts in a shift by canceling a training program, spending some of his own time to assist the analysts in alert investigation, and calling upon the on-call analyst workforce to boost the service rate of alerts. However, additional resources are limited in quantity over a 14-day work cycle, and the CSOC manager must determine when and how much action to take in the face of uncertainty, which arises from both the intensity and the random occurrences of the disruptive factors. The preceding decision by the CSOC manager is nontrivial and is often made in an ad hoc manner using prior experiences. This work develops a reinforcement learning (RL) model for optimizing the LOE throughout the entire 14-day work cycle of a CSOC in the face of uncertainties due to disruptive events. Results indicate that the RL model is able to assist the CSOC manager with a decision support tool to make better decisions than current practices in determining when and how much resource to allocate when the LOE of a CSOC deviates from the normal operating condition.

Archive | 2017

A Novel Metric for Measuring Operational Effectiveness of a Cybersecurity Operations Center

Rajesh Ganesan; Ankit Shah; Sushil Jajodia; Hasan Cam

Cybersecurity threats are on the rise with evermore digitization of the information that many day-to-day systems depend upon. The demand for cybersecurity analysts outpaces supply, which calls for optimal management of the analyst resource. In this chapter, a new notion of cybersecurity risk is defined, which arises when alerts from intrusion detection systems remain unanalyzed at the end of a work-shift. The above risk poses a security threat to the organization, which in turn impacts the operational effectiveness of the cybersecurity operations center (CSOC). The chapter considers four primary analyst resource parameters that influence risk. For a given risk threshold, the parameters include (1) number of analysts in a work-shift, and in turn within the organization, (2) expertise mix of analysts in a work-shift to investigate a wide range of alerts, (3) optimal sensor to analyst allocation, and (4) optimal scheduling of analysts that guarantees both number and expertise mix of analysts in every work-shift. The chapter presents a thorough treatment of risk and the role it plays in analyst resource management within a CSOC under varying alert generation rates from sensors. A simulation framework to measure risk under various model parameter settings is developed, which can also be used in conjunction with an optimization model to empirically validate the optimal settings of the above model parameters. The empirical results, sensitivity study, and validation study confirms the viability of the framework for determining the optimal management of the analyst resource that minimizes risk under the uncertainty of alert generation and model constraints.

arXiv: Cryptography and Security | 2018