Kalyanaraman Vaidyanathan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kalyanaraman Vaidyanathan is active.

Explore More

Publication

Featured researches published by Kalyanaraman Vaidyanathan.

Ibm Journal of Research and Development | 2001

Proactive management of software aging

Vittorio Castelli; Richard E. Harper; Philip Heidelberger; Steven W. Hunter; Kishor S. Trivedi; Kalyanaraman Vaidyanathan; William P. Zeggert

Software failures are now known to be a dominant source of system outages. Several studies and much anecdotal evidence point to software aging as a common phenomenon, in which the state of a software system degrades with time. Exhaustion of system resources, data corruption, and numerical error accumulation are the primary symptoms of this degradation, which may eventually lead to performance degradation of the software, crash/hang failure, or other undesirable effects. Software rejuvenation is a proactive technique intended to reduce the probability of future unplanned outages due to aging. The basic idea is to pause or halt the running software, refresh its internal state, and resume or restart it. Software rejuvenation can be performed by relying on a variety of indicators of aging, or on the time elapsed since the last rejuvenation. In response to the strong desire of customers to be provided with advance notice of unplanned outages, our group has developed techniques that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reaches a critical level, and automatically perform proactive software rejuvenation of an application, process group, or entire operating system, depending on the pervasiveness of the resource exhaustion and our ability to pinpoint the source. This technology has been incorporated into the IBM Director for xSeries servers. To quantitatively evaluate the impact of different rejuvenation policies on the availability of cluster systems, we have developed analytical models based on stochastic reward nets (SRNs). For timebased rejuvenation policies, we determined the optimal rejuvenation interval based on system availability and cost. We also analyzed a rejuvenation policy based on prediction, and showed that it can further increase system availability and reduce downtime cost. These models are very general and can capture a multitude of cluster system characteristics, failure behavior, and performability measures, which we are just beginning to explore.

dependable systems and networks | 2004

A method for modeling and quantifying the security attributes of intrusion tolerant systems

Bharat B. Madan; Katerina Goseva-Popstojanova; Kalyanaraman Vaidyanathan; Kishor S. Trivedi

Complex software and network based information server systems may exhibit failures. Quite often, such failures may not be accidental. Instead some failures may be caused by deliberate security intrusions with the intent ranging from simple mischief, theft of confidential information to loss of crucial and possibly life saving services. Not only it is important to prevent and/or tolerate security intrusions, it is equally important to treat security as a QoS attribute at par with other QoS attributes such as availability and performance. This paper deals with various issues related to quantifying the security attributes of an intrusion tolerant system, such as the SITAR system. A security intrusion and the response of an intrusion tolerant system to an attack is modeled as a random process. This facilitates the use of stochastic modeling techniques to capture the attacker behavior as well as the systems response to a security intrusion. This model is used to analyze and quantify the security attributes of the system. The security quantification analysis is first carried out for steady-state behavior leading to measures like steady-state availability. By transforming this model to a model with absorbing states, we compute a security measure called the mean time (or effort) to security failure (MTTSF) and also compute probabilities of security failure due to violations of different security attributes.

IEEE Transactions on Dependable and Secure Computing | 2005

A comprehensive model for software rejuvenation

Kalyanaraman Vaidyanathan; Kishor S. Trivedi

Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaustion of operating system resources, data corruption, and numerical error accumulation. To counteract software aging, a technique called software rejuvenation has been proposed, which essentially involves occasionally terminating an application or a system, cleaning its internal state and/or its environment, and restarting it. Since rejuvenation incurs an overhead, an important research issue is to determine optimal times to initiate this action. In this paper, we first describe how to include faults attributed to software aging in the framework of Grays software fault classification (deterministic and transient), and study the treatment and recovery strategies for each of the fault classes. We then construct a semi-Markov reward model based on workload and resource usage data collected from the UNIX operating system. We identify different workload states using statistical cluster analysis, estimate transition probabilities, and sojourn time distributions from the data. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource depletion in each state. The model is then solved to obtain estimated times to exhaustion for each resource. The result from the semi-Markov reward model are then fed into a higher-level availability model that accounts for failure followed by reactive recovery, as well as proactive recovery. This comprehensive model is then used to derive optimal rejuvenation schedules that maximize availability or minimize downtime cost.

IEEE Transactions on Reliability | 2006

Analysis of Software Aging in a Web Server

Michael Grottke; Lei Li; Kalyanaraman Vaidyanathan; Kishor S. Trivedi

Several recent studies have reported & examined the phenomenon that long-running software systems show an increasing failure rate and/or a progressive degradation of their performance. Causes of this phenomenon, which has been referred to as software aging, are the accumulation of internal error conditions, and the depletion of operating system resources. A proactive technique called software rejuvenation has been proposed as a way to counteract software aging. It involves occasionally terminating the software application, cleaning its internal state and/or its environment, and then restarting it. Due to the costs incurred by software rejuvenation, an important question is when to schedule this action. While periodic rejuvenation at constant time intervals is straightforward to implement, it may not yield the best results. The rate at which software ages is usually not constant, but it depends on the time-varying system workload. Software rejuvenation should therefore be planned & initiated in the face of the actual system behavior. This requires the measurement, analysis, and prediction of system resource usage. In this paper, we study the development of resource usage in a web server while subjecting it to an artificial workload. We first collect data on several system resource usage & activity parameters. Non-parametric statistical methods are then applied toward detecting & estimating trends in the data sets. Finally, we fit time series models to the data collected. Unlike the models used previously in the research on software aging, these time series models allow for seasonal patterns, and we show how the exploitation of the seasonal variation can help in adequately predicting the future resource usage. Based on the models employed here, proactive management techniques like software rejuvenation triggered by actual measurements can be built

international symposium on software reliability engineering | 1999

A measurement-based model for estimation of resource exhaustion in operational software systems

Kalyanaraman Vaidyanathan; Kishor S. Trivedi

Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of software aging, in which the state of the software system degrades with time, has been reported (S. Garg et al., 1998). The primary causes of this degradation are the exhaustion of operating system resources, data corruption and numerical error accumulation. This may eventually lead to performance degradation of the software or crash/hang failure, or both. Earlier work in this area to detect aging and to estimate its effect on system resources did not take into account the system workload. In this paper, we propose a measurement-based model to estimate the rate of exhaustion of operating system resources both as a function of time and the system workload state. A semi-Markov reward model is constructed based on workload and resource usage data collected from the UNIX operating system. We first identify different workload states using statistical cluster analysis and build a state-space model. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource exhaustion in the different states. The model is then solved to obtain trends and the estimated exhaustion rates and the time-to-exhaustion for the resources. With the help of this measure, proactive fault management techniques such as software rejuvenation (Y. Huang et al., 1995) may be employed to prevent unexpected outages.

dependable systems and networks | 2002

Modeling and quantification of security attributes of software systems

Bharat B. Madan; K. Gogeva-Popstojanova; Kalyanaraman Vaidyanathan; Kishor S. Trivedi

Quite often failures in network based services and server systems may not be accidental, but rather caused by deliberate security intrusions. We would like such systems to either completely preclude the possibility of a security intrusion or design them to be robust enough to continue functioning despite security attacks. Not only is it important to prevent or tolerate security intrusions, it is equally important to treat security as a QoS attribute at par with, if not more important than other QoS attributes such as availability and performability. This paper deals with various issues related to quantifying the security attribute of an intrusion tolerant system, such as the SITAR system. A security intrusion and the response of an intrusion tolerant system to the attack is modeled as a random process. This facilitates the use of stochastic modeling techniques to capture the attacker behavior as well as the systems response to a security intrusion. This model is used to analyze and quantify the security attributes of the system. The security quantification analysis is first carried out for steady-state behavior leading to measures like steady-state availability. By transforming this model to a model with absorbing states, we compute a security measure called the mean time (or effort) to security failure and also compute probabilities of security failure due to violations of different security attributes.

international symposium on empirical software engineering | 2002

An approach for estimation of software aging in a Web server

Lei Li; Kalyanaraman Vaidyanathan; Kishor S. Trivedi

A number of recent studies have reported the phenomenon of software aging, characterized by progressive performance degradation or a sudden hang/crash of a software system due to exhaustion of operating system resources, fragmentation and accumulation of errors. To counteract this phenomenon, a proactive technique called software rejuvenation has been proposed. This essentially involves stopping the running software, cleaning its internal state and then restarting it. Software rejuvenation, being preventive in nature, begs the question as to when to schedule it. Periodic rejuvenation, while straightforward to implement, may not yield the best results. A better approach is based on actual measurement of system resource usage and activity that detects and estimates resource exhaustion times. Estimating the resource exhaustion times makes it possible for software rejuvenation to be initiated or better planned so that the system availability is maximized in the face of time-varying workload and system behavior. We propose a methodology based on time series analysis to detect and estimate resource exhaustion times due to software aging in a Web server while subjecting it to an artificial workload. We first collect and log data on several system resource usage and activity parameters on a Web server. Time-series ARMA models are then constructed from the data to detect aging and estimate resource exhaustion times. The results are then compared with previous measurement-based models and found to be more efficient and computationally less intensive. These models can be used to develop proactive management techniques like software rejuvenation which are triggered by actual measurements.

annual simulation symposium | 2000

Modeling and analysis of software aging and rejuvenation

Kishor S. Trivedi; Kalyanaraman Vaidyanathan; Katerina Goseva-Popstojanova

Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. To counteract this phenomenon, a proactive approach of fault management, called software rejuvenation, has been proposed. This essentially involves gracefully terminating an application or a system and restarting it in a clean internal state. We discuss stochastic models to evaluate the effectiveness of proactive fault management in operational software systems and determine optimal times to perform rejuvenation, for different scenarios. The latter part of the paper deals with measurement-based methodologies to detect software aging and estimate its effect on various system resources. Models are constructed using workload and resource usage data collected from the UNIX operating system over a period of time. The measurement-based models are intended to help development of strategies for software rejuvenation triggered by actual measurements.

measurement and modeling of computer systems | 2001

Analysis and implementation of software rejuvenation in cluster systems

Kalyanaraman Vaidyanathan; Richard E. Harper; Steven W. Hunter; Kishor S. Trivedi

Several recent studies have reported the phenomenon of software aging, one in which the state of a software system degrades with time. This may eventually lead to performance degradation of the software or crash/hang failure or both. Software rejuvenation is a pro-active technique aimed to prevent unexpected or unplanned outages due to aging. The basic idea is to stop the running software, clean its internal state and restart it. In this paper, we discuss software rejuvenation as applied to cluster systems. This is both an innovative and an efficient way to improve cluster system availability and productivity. Using Stochastic Reward Nets (SRNs), we model and analyze cluster systems which employ software rejuvenation. For our proposed time-based rejuvenation policy, we determine the optimal rejuvenation interval based on system availability and cost. We also introduce a new rejuvenation policy based on prediction and show that it can dramatically increase system availability and reduce downtime cost. These models are very general and can capture a multitude of cluster system characteristics, failure behavior and performability measures, which we are just beginning to explore. We then briefly describe an implementation of a software rejuvenation system that performs periodic and predictive rejuvenation, and show some empirical data from systems that exhibit aging

darpa information survivability conference and exposition | 2001

Characterizing intrusion tolerant systems using a state transition model

Katerina Goseva-Popstojanova; Feiyi Wang; Rong Wang; Fengmin Gong; Kalyanaraman Vaidyanathan; Kishor S. Trivedi; B. Muthusamy

Intrusion detection and response research has so far mostly concentrated on known and well-defined attacks. We believe that this narrow focus of attacks accounts for both the successes and limitation of commercial intrusion detection systems (IDS). Intrusion tolerance, on the other hand, is inherently tied to functions and services that require protection. This paper presents a state transition model to describe the dynamic behavior of intrusion-tolerant systems. This model provides a framework from which we can define the vulnerability and the threat set to be addressed. We also show how this model helps us to describe both known and unknown security exploits by focusing on impacts rather than specific attack procedures. By going through the exercise of mapping known vulnerabilities to this transition model, we identify a reasonably complete fault space that should be considered in a general intrusion-tolerant system.

Explore More