F. Di Giandomenico | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where F. Di Giandomenico is active.

Explore More

Publication

Featured researches published by F. Di Giandomenico.

Journal of Systems Architecture | 2000

The meaning and role of value in scheduling flexible real-time systems

Alan Burns; Divya Prasad; Andrea Bondavalli; F. Di Giandomenico; Krithi Ramamritham; John A. Stankovic; Lorenzo Strigini

The real-time community is devoting considerable attention to flexible scheduling and adaptive systems. One popular means of increasing the flexibility, and hence effectiveness, of real-time systems is to use value-based scheduling. It is surprising however, how little attention has been devoted, in the scheduling field, to the actual assignment of value. This paper deals with value assignment and presents a framework for undertaking value-based scheduling and advises on the different methods that are available. A distinction is made between ordinal and cardinal value functions. Appropriate techniques from utility theory are reviewed. An approach based on constant value modes is introduced and evaluated via a case example.

IEEE Transactions on Computers | 2000

Threshold-based mechanisms to discriminate transient from intermittent faults

Andrea Bondavalli; Silvano Chiaradonna; F. Di Giandomenico; F. Grandoni

This paper presents a class of count-and-threshold mechanisms, collectively named /spl alpha/-count, which are able to discriminate between transient faults and intermittent faults in computing systems. For many years, commercial systems have been using transient fault discrimination via threshold-based techniques. We aim to contribute to the utility of count-and-threshold schemes, by exploring their effects on the system. We adopt a mathematically defined structure, which is simple enough to analyze by standard tools. /spl alpha/-count is equipped with internal parameters that can be tuned to suit environmental variables (such as transient fault rate, intermittent fault occurrence patterns). We carried out an extensive behavior analysis for two versions of the count-and-threshold scheme, assuming, first, exponentially distributed fault occurrencies and, then, more realistic fault patterns.

symposium on reliable distributed systems | 1990

Adjudicators for diverse-redundant components

F. Di Giandomenico; Lorenzo Strigini

The authors define the adjudication problem, summarize the existing literature on the topic, and investigate the use of probabilistic knowledge about error/faults in the subcomponents of a fault-tolerant component to obtain good adjudication functions. They prove the existence of an optimal adjudication function, which is useful both as an upper bound on the probability of correctly adjudged obtainable output and as a guide for design decisions.<<ETX>>

symposium on reliable distributed systems | 2006

Hidden Markov Models as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution

Alessandro Daidone; F. Di Giandomenico; Silvano Chiaradonna; Andrea Bondavalli

In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heuristics. This paper proposes instead a general framework and a formalism to model such over-time diagnosis scenarios, and to find appropriate solutions. As such, it is very beneficial to system designers to support design choices. Taking advantage of the characteristics of the hidden Markov models formalism, widely used in pattern recognition, the paper proposes a formalization of the diagnosis process, addressing the complete chain constituted by monitored component, deviation detection and state diagnosis. Hidden Markov models are well suited to represent problems where the internal state of a certain entity is not known and can only be inferred from external observations of what this entity emits. Such over-time diagnosis is a first class representative of this category of problems. The accuracy of diagnosis carried out through the proposed formalization is then discussed, as well as how to concretely use it to perform state diagnosis and allow direct comparison of alternative solutions

ieee international symposium on fault tolerant computing | 1997

Discriminating fault rate and persistency to improve fault treatment

Andrea Bondavalli; Silvano Chiaradonna; F. Di Giandomenico; F. Grandoni

In this paper the consolidate identification of faults, distinguished as transient or permanent/intermittent, is approached. Transient faults discrimination has long been performed in commercial systems: threshold-based techniques have been practice for several years for this purpose. The present work aims to contribute to the usefulness of the count-and-threshold scheme, through the analysis of its behaviour and the exploration of its effects on the system. To this goal, the scheme is mechanized as a device named /spl alpha/-count, endowed with a few controllable parameters. /spl alpha/-count tries to balance between two conflicting requirements: to keep in the system those components that have experienced just transient faults; and to remove quickly those affected by permanent or intermittent faults. Analytical models are derived, allowing detailed study of /spl alpha/-counts behaviour; the actual evaluation, in a range of configurations, is performed by standard tools, in terms of the delay in spotting faulty components and the probability of improperly blaming correct ones.

high assurance systems engineering | 1998

Optimal discrimination between transient and permanent faults

M. Pizza; Lorenzo Strigini; Andrea Bondavalli; F. Di Giandomenico

An important practical problem in fault diagnosis is discriminating between permanent faults and transient faults. In many computer systems, the majority of errors are due to transient faults. Many heuristic methods have been used for discriminating between transient and permanent faults; however, we have found no previous work stating this decision problem in clear probabilistic terms. We present an optimal procedure for discriminating between transient and permanent faults, based on applying Bayesian inference to the observed events (correct and erroneous results). We describe how the assessed probability that a module is permanently faulty must vary with observed symptoms. We describe and demonstrate our proposed method on a simple application problem, building the appropriate equations and showing numerical examples. The method can be implemented as a run-time diagnosis algorithm at little computational cost; it can also be used to evaluate any heuristic diagnostic procedure by comparison.

IEEE Transactions on Mobile Computing | 2003

Service-level availability estimation of GPRS

Stefano Porcarelli; F. Di Giandomenico; Andrea Bondavalli; Massimo Barbera; Ivan Mura

The General Packet Radio Service (GPRS) extends the Global System Mobile Communication (GSM) by introducing a packet-switched transmission service. This paper analyzes the GPRS behavior under critical conditions. In particular, we focus on outages, which significantly impact the GPRS dependability. In fact, during outage periods, the cumulative number of users trying to access the service grows proportionally over time. When the system resumes its operations, the overload caused by accumulated users determines a higher probability of collisions on resources assignment and, therefore, a degradation of the overall QoS. This paper adopts a stochastic activity network modeling approach for evaluating the dependability of a GPRS network under outage conditions. The major contribution of this study lies in the novel perspective the dependability study is framed in. Starting from a quite classical availability analysis, the network dependability figures are incorporated into a very detailed service model that is used to analyze the overload effect GPRS has to face after outages, gaining deep insights on its impact on users perceived QoS. The result of this modeling is an enhanced availability analysis, which takes into account not only the bare estimation of unavailability periods, but also the important congestion phenomenon following outages that contribute to service degradation for a certain period of time after operations resume.

international symposium on object component service oriented real time distributed computing | 1998

State restoration in a COTS-based N-modular architecture

Andrea Bondavalli; F. Di Giandomenico; F. Grandoni; David Powell; Christophe Rabéjac

Mechanisms for restoring the state of a channel in an N-modular redundant architecture are necessary to prevent redundancy attrition due to transient faults and to allow failed channels to be brought back on line after repair. This paper considers software-implemented mechanisms for state restoration (SR) in a generic fault-tolerant architecture in which both the underlying hardware and operating system are commercial off-the-shelf (COTS) components. State restoration involves copying the values of state variables from the active channel(s) across to the joining channel. Concurrent updating of state variables by application tasks is considered. Two state restoration schemes are considered: Running SR and Recursive SR. In the former, each state variable is copied exactly once while concurrent updates are written through to the joining channel. In the latter state variables are copied once and then recopied recursively until no concurrent updates are detected.

symposium on reliable distributed systems | 1991

Flexible schemes for application-level fault tolerance

Lorenzo Strigini; F. Di Giandomenico

It is pointed out that the design of fault-tolerance provisions in the application level is normally necessary, but difficult and error-prone due to its ad-hoc nature. Structuring schemes have been proposed to reduce the difficulty of this task, but they appear too restrictive for the building of large, heterogeneous applications. The redundant structures that can be used in the individual components of a system depend on their requirements or inherent characteristics; it would be useful to combine components using different basic schemes. As an example, the authors propose a solution for interfacing components using conversations for backward recovery with components using atomic transactions. Constraints for the designers of the components to be interfaced and requirements on the virtual machine supporting their execution are defined. Ways a classification of components could be organized to allow the formulation of more general solutions are discussed.<<ETX>>

Journal of Systems Architecture | 2002

An adaptive approach to achieving hardware and software fault tolerance in a distributed computing environment

Andrea Bondavalli; Silvano Chiaradonna; F. Di Giandomenico; Jie Xu

This paper focuses on the problem of providing tolerance to both hardware and software faults in independent applications running on a distributed computing environment. Several hybrid-fault-tolerant architectures are identified and proposed. Given the highly varying and dynamic characteristics of the operating environment, solutions are developed mainly exploiting the adaptation property. They are based on the adaptive execution of redundant programs so as to minimise hardware resource consumption and to shorten response time, as much as possible, for a required level of fault tolerance. A method is introduced for evaluating the proposed architectures with respect to reliability, resource utilisation and response time. Examples of quantitative evaluations are also given.

Explore More