Javier Alonso | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Javier Alonso is active.

Explore More

Publication

Featured researches published by Javier Alonso.

Performance Evaluation | 2013

A comparative experimental study of software rejuvenation overhead

Javier Alonso; Rivalino Matias; Elder Vicente; Ana Maria; Kishor S. Trivedi

In this paper we present a comparative experimental study of the main software rejuvenation techniques developed so far to mitigate the software aging effects. We consider six different rejuvenation techniques with different levels of granularity: (i) physical node reboot, (ii) virtual machine reboot, (iii) OS reboot, (iv) fast OS reboot, (v) standalone application restart, and (vi) application rejuvenation by a hot standby server. We conduct a set of experiments injecting memory leaks at the application level. We evaluate the performance overhead introduced by software rejuvenation in terms of throughput loss, failed requests, slow requests, and memory fragmentation overhead. We also analyze the selected rejuvenation techniques efficiency in mitigating the aging effects. Due to the growing adoption of virtualization technology, we also analyze the overhead of the rejuvenation techniques in virtualized environments. The results show that the performance overheads introduced by the rejuvenation techniques are related to the granularity level. We also capture different levels of memory fragmentation overhead induced by the virtualization demonstrating some drawbacks of using virtualization in comparison with non-virtualized rejuvenation approaches. Finally, based on these research findings we present comprehensive guidelines to support decision making during the design of rejuvenation scheduling algorithms, as well as in selecting the appropriate rejuvenation mechanism.

dependable systems and networks | 2013

An empirical investigation of fault repairs and mitigations in space mission system software

Javier Alonso; Michael Grottke; Kishor S. Trivedi

Faults in software systems can have different characteristics. In an earlier paper, the anomaly reports for a number of JPL/NASA missions were analyzed and the underlying faults were classified as Bohrbugs, non-aging-related Mandelbugs, and aging-related bugs. In another paper the times to failure for each of these fault types were examined to identify trends within missions as well as across the missions. The results of those papers are now starting to provide guidance to improve the dependability of space mission software. Just as there are different types of faults, there are different kinds of mitigations of faults and failures. This paper analyzes the mitigations associated with each fault studied in our previous papers. We identify trends of mitigation type proportions within missions as well as from mission to mission. We also look for relationships between fault types and mitigation types. The results will be used to increase the reliability of space mission software.

international symposium on software reliability engineering | 2013

Towards fast OS rejuvenation: An experimental evaluation of fast OS reboot techniques

Antonio Bovenzi; Javier Alonso; Hiroshi Yamada; Stefano Russo; Kishor S. Trivedi

Continuous or high availability is a key requirement for many modern IT systems. Computer operating systems play an important role in IT systems availability. Due to the complexity of their architecture, they are prone to suffer failures due to several types of software faults. Software aging causes a nonnegligible fraction of these failures. It leads to an accumulation of errors with time, increasing the system failure rate. This phenomenon can be accompanied by performance degradation and eventually system hang or even crash. As a countermeasure, software rejuvenation entails stopping the system, cleaning its internal state, and resuming its operation. This process usually incurs downtime. For an operating system, the downtime impacts any application running on top of it. Several solutions have been developed to speed up the boot time of operating systems in order to reduce the downtime overhead. We present a study of two fast OS reboot techniques for rejuvenation of Linux-based operating systems, namely Kexec and Phase-based reboot. The study measures the performance penalty they introduce and the gain in reduction of downtime overhead. The results reveal that the Kexec and Phase-based reboot have no statistically significant impact in terms of performance penalty from the user perspective. However, they may require extra resource (e.g., CPU) usage. The downtime overhead reduction, compared with normal Linux and VM reboots, is 77% and 79% in Kexec and Phase-based reboot, respectively.

international symposium on software reliability engineering | 2012

The Nature of the Times to Flight Software Failure during Space Missions

Javier Alonso; Michael Grottke; Kishor S. Trivedi

The growing complexity of mission-critical space mission software makes it prone to suffer failures during operations. The success of space missions depends on the ability of the systems to deal with software failures, or to avoid them in the first place. In order to develop more effective mitigation techniques, it is necessary to understand the nature of the failures and the underlying software faults. Based on their characteristics, software faults can be classified into Bohrbugs, non-aging-related Mandelbugs, and aging-related bugs. Each type of fault requires different kinds of mitigation techniques. While Bohrbugs are usually easy to fix during development or testing, this is not the case for non-aging-related Mandelbugs and aging-related bugs due to their inherent complexity. Systems need mechanisms like software restart, software replication or software rejuvenation to deal with failures caused by these faults during the operational phase. In a previous study, we classified space mission flight software faults into the three above-mentioned categories based on problems reported during operations. That study concentrated on the percentages of the faults of each type and the variation of these percentages within and across different missions. This paper extends that work by exploring the nature of the times to software failure due to Bohrbugs and non-aging-related Mandelbugs for eight JPL/NASA missions. We start by applying trend tests to the times to failure to check if there is any reliability growth (or decay) for each type of failure. For those times to failure sequences with no trend, we fit distributions to the data sets and carry out goodness-of-fit tests. The results will be used to guide the development of improved operational failure mitigation techniques, thereby increasing the reliability of space mission software.

symposium on reliable distributed systems | 2012

Availability Modeling and Analysis for Data Backup and Restore Operations

Xiaoyan Yin; Javier Alonso; Fumio Machida; Ermeson C. Andrade; Kishor S. Trivedi

Data backup operation is an essential part of common IT system administration to protect against data loss caused by any storage failures, human errors, or disasters. Lost data can be recovered from the backed up data if it exists. Since the backup and restore operations accrue downtime overhead or performance degradation, they have to be designed to ensure the data reliability while minimizing the performance and availability overhead. In this paper, we study the impacts of different backup policies on availability measures such as storage availability, system availability, and user-perceived availability. Backup and restore operations are designed using SysML Activity diagrams that are automatically translated into Stochastic Reward Net (SRN) to compute the availability measures. Our numerical results show the effectiveness of the combination of full backup and partial backup in terms of user-perceived data availability and data loss rate. Furthermore, the sensitivity ranking can help improve the availability measures.

IEEE Transactions on Reliability | 2016

Optimization of Two-Granularity Software Rejuvenation Policy Based on the Markov Regenerative Process

Gaorong Ning; Jing Zhao; Yunlong Lou; Javier Alonso; Rivalino Matias; Kishor S. Trivedi; Beibei Yin; Kai-Yuan Cai

Software rejuvenation is a proactive software control technique that is used to improve a computing system performance when it suffers from software aging. In this paper, a two-granularity inspection-based software rejuvenation policy, which works as a closed-loop control technique, is proposed. This policy mitigates the negative impact of two-level software aging. The two levels considered are the user-level applications and the operating system. A Markov regenerative process model is constructed based on the system condition. We obtain the degradation rate of the application software and operating system from fault injection experiments. The diagnostic accuracy of the adopted monitor and analysis system, which is applied to inspect the application software and operating system, is considered as we provide the optimal rejuvenation strategies. Finally, the availability and the overall loss probability with their corresponding optimal inspection time intervals are obtained numerically based on the parameter values estimated from the experiments. Experimental results show that two-granularity software rejuvenation is much more effective than traditional single-level software rejuvenation. In our experimental study, when two-granularity software rejuvenation is used, the unavailability and the overall loss probability of the system were reduced by 17.9% and 2.65%, respectively, in comparison with the single-level rejuvenation.

IEEE Transactions on Dependable and Secure Computing | 2014

Ensuring the Performance of Apache HTTP Server Affected by Aging

Jing Zhao; Kishor S. Trivedi; Michael Grottke; Javier Alonso; Yanbin Wang

Failures due to software aging are typically caused by resource exhaustion, which is often preceded by progressive software performance degradation. Response time as a customer-affecting metric can thus be used to detect the onset of software aging. In this paper, we propose the distribution-based rejuvenation algorithm (DBRA), which uses a validated M/E2/1/K queuing model of the Apache HTTP server to decide when to trigger rejuvenation. We compare the performance of the DBRA with the one of the static rejuvenation algorithm with averaging (SRAA) presented by Avritzer et al. Simulation results show the effectiveness of the DBRA and its advantages over the SRAA in reducing the average response time. However, the DBRA generally tends to trigger rejuvenation more frequently than the SRAA, which increases the request blocking probability.

international symposium on software reliability engineering | 2015

WAP: Models and metrics for the assessment of critical-infrastructure-targeted malware campaigns

Michael Grottke; Alberto Avritzer; Daniel Sadoc Menasché; Javier Alonso; Leandro Pfleger de Aguiar; Sara G. Alvarez

Ensuring system survivability in the wake of advanced persistent threats is a big challenge that the security community is facing to ensure critical infrastructure protection. In this paper, we define metrics and models for the assessment of coordinated massive malware campaigns targeting critical infrastructure sectors. First, we develop an analytical model that allows us to capture the effect of neighborhood on different metrics (infection probability and contagion probability). Then, we assess the impact of putting operational but possibly infected nodes into quarantine. Finally, we study the implications of scanning nodes for early detection of malware (e.g., worms), accounting for false positives and false negatives. Evaluating our methodology using a small four-node topology, we find that malware infections can be effectively contained by using quarantine and appropriate rates of scanning for soft impacts.

international symposium on software reliability engineering | 2012

Software Rejuvenation: Do IT a Telco Industries Use It?

Javier Alonso; Antonio Bovenzi; Jinghui Li; Yakun Wang; Stefano Russo; Kishor S. Trivedi

Software rejuvenation has been addressed in hundreds of papers since it was proposed in 1995 by Huang et al. The growing number of research papers shows the great importance of this topic. However, no paper has studied yet software rejuvenation in the real world. This paper investigates to what extent software rejuvenation techniques are integrated in the IT and Telco solutions. For this purpose, it has been conducted an intensive search of different sources such as companys product websites, technical papers, white papers, US patents, and consultant surveys. The results show that IT and Telco companies develop software rejuvenation solutions to deal with software aging. The number of US patents addressing this issue confirms the interest of industry to develop mechanisms to deal with software aging-related failures. It has been observed that real software rejuvenation solutions mainly use time-based or threshold-based policies, while the US patents are focused on predictive approaches.

Journal of Systems Engineering and Electronics | 2015

Neural network based approach for time to crash prediction to cope with software aging

Moona Yakhchi; Javier Alonso; Mahdi Fazeli; Amir Akhavan Bitaraf; Ahmad Patooqhy

Recent studies have shown that software is one of the main reasons for computer systems unavailability. A growing accumulation of software errors with time causes a phenomenon called software aging. This phenomenon can result in system performance degradation and eventually system hang/crash. To cope with software aging, software rejuvenation has been proposed. Software rejuvenation is a proactive technique which leads to removing the accumulated software errors by stopping the system, cleaning up its internal state, and resuming its normal operation. One of the main challenges of software rejuvenation is accurately predicting the time to crash due to aging factors such as memory leaks. In this paper, different machine learning techniques are compared to accurately predict the software time to crash under different aging scenarios. Finally, by comparing the accuracy of different techniques, it can be concluded that the multilayer perceptron neural network has the highest prediction accuracy among all techniques studied.

Explore More