Axel W. Krings | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Axel W. Krings is active.

Explore More

Publication

Featured researches published by Axel W. Krings.

ieee aerospace conference | 2008

Multivariate Survival Analysis (I): Shared Frailty Approaches to Reliability and Dependence Modeling

Zhanshan (Sam) Ma; Axel W. Krings

The latest advances in survival analysis have been centered on multivariate systems. Multivariate survival analysis has two major categories of models: one is multi-state modeling; the other is shared frailty modeling. Multi-state models, although formulated differently in both fields, have been extensively studied in reliability analysis in the context of Markov chain analysis. In contrast, shared frailty modeling seems little known in reliability analysis and computer science. In this article, we focus exclusively on shared frailty modeling. Shared frailty refers to the often-unobserved factors or risks responsible for the common risks dependence between multiple events. It is well recognized as the most effective modeling approach to address common risks dependence and, more recently, the event-related dependence. The only exclusion of dependence modeling for the frailty approach is the common events type, which is best addressed by multi-state modeling. We argue that shared frailty modeling not only is perfectly applicable for engineering reliability, but also is of significant potential in other fields of computer science, such as networking and software reliability and survivability, machine learning, and prognostics and health management (PHM).

IEEE Transactions on Dependable and Secure Computing | 2009

Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing

Samir Jafar; Axel W. Krings; Thierry Gautier

Large applications executing on Grid or cluster architectures consisting of hundreds or thousands of computational nodes create problems with respect to reliability. The source of the problems are node failures and the need for dynamic configuration over extensive run-time. This paper presents two fault-tolerance mechanisms called theft induced checkpointing and systematic event logging. These are transparent protocols capable of overcoming problems associated with both, benign faults, i.e., crash faults, and node or subnet volatility. Specifically, the protocols base the state of the execution on a dataflow graph, allowing for efficient recovery in dynamic heterogeneous systems as well as multi-threaded applications. By allowing recovery even under different numbers of processors, the approaches are especially suitable for applications with need for adaptive or reactionary configuration control. The low-cost protocols offer the capability of controlling or bounding the overhead. A formal cost model is presented, followed by an experimental evaluation. It is shown that the overhead of the protocol is very small and the maximum work lost by a crashed process is small and bounded.

hawaii international conference on system sciences | 2003

A simple GSPN for modelling common mode failures in critical infrastructures

Axel W. Krings; Paul W. Oman

It is now apparent that our nations infrastructures and essential utilities have been optimized for reliability in benign operating environments. As such, they are susceptible to cascading failures induced by relatively minor events such weather phenomena, accidental damage to system components, and/or cyber attack. In contrast, survivable complex control structures should and could be designed to lose sizable portions of the system and still maintain essential control functions. This paper discusses the need for defining independent, survivable software control systems for automated regulation of critical infrastructures like electric power, telecommunications, and emergency communications systems. To exemplify the issue we describe an actual power blackout, and use that description to identify and analyze common mode faults leading to the cascading failure. We suspect that sources of common mode faults in real-time control systems are widespread and many, so we define modelling primitives that allow us to use generalized stochastic Petri nets (GSPN) for representing interdependency failures in very simple control systems. As such, this work provides the initial step toward creating a framework for modelling and analyzing reliability and survivability characteristics of critical infrastructures with both hardware and software controls.

Proceedings. The Second NASA/DoD Workshop on Evolvable Hardware | 2000

The test vector problem and limitations to evolving digital circuits

Kosuke Imamura; James A. Foster; Axel W. Krings

How do we know the correctness of an evolved circuit? While Evolutionary Hardware is exhibiting its effectiveness, we argue that it is very difficult to design a large-scale digital circuit by conventional evolutionary techniques alone, if we are using a subset of the entire truth table for fitness evaluation. The test vector generation problem for testing VLSI (Very Large Scale Integration) suggests that there is no efficient way to determine a training set which assures full correctness of an evolved circuit.

modeling analysis and simulation of wireless and mobile systems | 2008

Dynamic hybrid fault models and the applications to wireless sensor networks (WSNs)

Zhanshan (Sam) Ma; Axel W. Krings

In this paper, we introduce a new concept termed dynamic hybrid fault models together with the mathematic models and approaches for applying the new concept to reliability and fault tolerance analyses. It extends the traditional hybrid fault models and their relevant constraints in agreement algorithms with survival analysis and evolutionary game theory. The new dynamic hybrid fault models (i) transform hybrid fault models into time and covariate dependent models; (ii) make real-time prediction of reliability more realistic and allows for real-time prediction of fault-tolerance; (iii) set the foundations for integrating hybrid fault models with reliability and survivability analyses by introducing evolutionary game modeling; (iv) extend evolutionary game theory in its modeling of the survivals of game players. The application domain is wireless sensor network (WSN), but the large part of the modeling architecture also applies to general engineering reliability and network survivability.

ieee aerospace conference | 2008

Competing Risks Analysis of Reliability, Survivability, and Prognostics and Health Management (PHM)

Zhanshan Sam Ma; Axel W. Krings

Competing risks analysis is a field of applied statistics with research dating back to the eighteenth century. Starting in the 1980s, the interaction with survival analysis has lead to significant advances in competing risks analysis, especially in dealing with the dependency and identifiability issues, both of which are often intermingled with each other and have been the focus of the controversy surrounding classical competing risks analysis. The usefulness of competing risks analysis in engineering reliability has been recognized since the 1960s, and several important models in competing risks analysis were developed in the context of reliability modeling [e.g., Marshall-Olkin (1967) model]. However, the interaction between competing risks analysis and reliability has gradually withered during the period when significant advances were made in competing risks analysis. Consequently, it seems that the application of competing risks analysis in engineering reliability has fallen behind the theory of competing risks analysis. In particular, the advances in dependence and identifiability research are of extremely important significance in reliability field. We hope that this review article will contribute to the reestablishment of the connections between competing risks analysis and engineering reliability. In perspective, we suggest that the competing risks analysis has great potential in other fields of computer science and engineering, besides engineering reliability. In particular, network reliability and survivability, software reliability and test measurements, prognostics and health management, stand out as fields with very compelling reasons for further exploring.

european conference on parallel processing | 2005

A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing

Samir Jafar; Thierry Gautier; Axel W. Krings; Jean-Louis Roch

This paper presents a new checkpoint/recovery method for dataflow computations using work-stealing in heterogeneous environments as found in grid or cluster computing. Basing the state of the computation on a dynamic macro dataflow graph, it is shown that the mechanisms provide effective checkpointing for multithreaded applications in heterogeneous environments. Two methods, Systematic Event Logging and Theft-Induced Checkpointing, are presented that are efficient and extremely flexible under the system-state model, allowing for recovery on different platforms under different number of processors. A formal analysis of the overhead induced by both methods is presented, followed by an experimental evaluation in a large cluster. It is shown that both methods have very small overhead and that trade-offs between checkpointing and recovery cost can be controlled.

Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004. | 2004

Analyzing the security and survivability of real-time control systems

Paul W. Oman; Axel W. Krings; D. Conte de Leon; Jim Alves-Foss

Many problems found in complex real-time control systems can be transformed into graph and scheduling problems, thereby inheriting a wealth of potential solutions and prior knowledge. This paper describes a transformation from a real-time control system problem into a graph theoretical formulation in order to leverage existing knowledge of graph theory back into the real world network being analyzed. We use a five-step transformation that converts an example electric power SCADA system into a graph model that allows for solutions derived from graph algorithms. Physical and logical characteristics of the SCADA system are represented within the model in a manner that permits manipulation of the network data. System vulnerabilities are identified and compared via graph algorithms prior to transformation back into the real-time control system problem space. The SCADA system analysis serves as an example of exploiting graph representations and algorithms in order to encapsulate and simplify complex problems into manageable and quantifiable models

European Journal of Operational Research | 2005

A graph based model for survivability applications

Axel W. Krings; Azad H. Azadmanesh

Many problems found in standard security and survivability applications can be transformed into graph and scheduling problems, thereby opening up the problems to a wealth of potential solutions or knowledge of limitations, infeasibility, scalability or intractability. This paper introduces a model to aid in the design, analysis, or operations of applications with security and survivability concerns. Specifically, a five step model is presented that transforms such applications into a parameterized graph model that, together with model abstraction and representations, can be the basis for solutions derived from graph and scheduling algorithms. A reverse transformation translates the solutions back to the application domain. The model is demonstrated using migratory agent security and fault-tolerant agreement and their transformation into chain constrained and group scheduling problems, respectively.

hawaii international conference on system sciences | 1999

The Byzantine agreement problem: optimal early stopping

Axel W. Krings; Thomas Feyer

Addresses solutions to the problem of reaching agreement in the presence of faults. Whereas the need for agreement has surfaced mainly in fault-tolerant real-time applications, agreement can be a useful mechanism in network security to mask intrusions. However, due to the communication overhead involved and the fact that the system is expected to operate without problems most of the time, early stopping algorithms are of special interest. We introduce a non-authenticated early stopping algorithm that is optimal in terms of rounds and the number of processors in the system. The basic idea of the algorithm as closely related to the work of P. Berman et al. (1992). However, our algorithm is easier to implement because of its algorithmic definition. It is directly derived from the algorithm by L. Lamport et al. (1982) and is based on two simple functions only. We are convinced that the construction of the early stopping algorithm presented in this paper increases understanding and clarifies the underlying problems of early stopping.

Explore More