Diamantino Costa
University of Coimbra
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diamantino Costa.
dependable systems and networks | 2000
Henrique Madeira; Diamantino Costa; Marco Vieira
This paper presents an experimental study on the emulation of software faults by fault injection. In a first experiment, a set of real software faults has been compared with faults injected by a SWIFI tool (Xception) to evaluate the accuracy of the injected faults. Results revealed the limitations of Xception (and other SWIFI tools) in the emulation of different classes of software faults (about 44% of the software faults cannot be emulated). The use of field data about real faults was discussed and software metrics were suggested as an alternative to guide the injection process when field data is nor available. In a second experiment, a set of rules for the injection of errors meant to emulate classes of software faults was evaluated. The fault triggers used seem to be the cause for the observed strong impact of the faults in the target system and in the program results. The results also show the influence in the fault emulation of aspects such as code size, complexity of data structures, and recursive versus sequential execution.
dependable systems and networks | 2002
Henrique Madeira; Raphael R. Some; Francisco Moreira; Diamantino Costa; David A. Rennels
This paper evaluates the impact of transient errors in the operating system of a COTS-based system (CETIA board with two PowerPC 750 processors running LynxOS) and quantifies their effects at both the OS and at the application level. The study has been conducted using a Software-Implemented Fault Injection tool (Xception) and both realistic programs and synthetic workloads (to focus on specific OS features) have been used. The results provide a comprehensive picture of the impact of faults on LynxOS key features (process scheduling and the most frequent system calls), data integrity, error propagation, application termination, and correctness of application results.
ieee international symposium on fault tolerant computing | 1996
João Gabriel Silva; Joao Carreira; Henrique Madeira; Diamantino Costa; P. Moreira
In the research reported in this paper, transient faults were injected in the nodes and in the communication subsystem (by using software fault injection) of a commercial parallel machine running several real applications. The results showed that a significant percentage of faults caused the system to produce wrong results while the application seemed to terminate normally, thus demonstrating that fault tolerance techniques are required in parallel systems, not only to assure that long-running applications can terminate but also (and more important) that the results produced are correct. Of the techniques tested to reduce the percentage of undetected wrong results only ABFT proved to be effective. For other simple error detection methods to be effective, they have to be designed in, and not added as an after thought. Faults injected in the communication subsystem proved the effectiveness of end-to-end CRCs on the data movements between processors.
dependable systems and networks | 2000
Diamantino Costa; Tiago Rilho; Henrique Madeira
Presents and discusses observed failure modes of a commercial off-the-shelf (COTS) database management system (DBMS) under the presence of transient operational faults induced by SWIFI (software-implemented fault injection). The Transaction Processing Performance Council (TPC) standard TPC-C benchmark and its associated environment is used, together with fault-injection technology, building a framework that discloses both dependability and performance figures. Over 1600 faults were injected in the database server of a client/server computing environment built on the Oracle 8.1.5 database engine and Windows NT running on COTS machines with Intel Pentium processors. A macroscopic view on the impact of faults revealed that: (1) a large majority of the faults caused no observable abnormal impact in the database server (in 96% of hardware faults and 80% of software faults, the database server behaved normally); (2) software faults are more prone to letting the database server hang or to causing abnormal terminations; (3) up to 51% of software faults lead to observable failures in the client processes.
engineering of computer based systems | 2001
Diamantino Costa; Tiago Rilho; Marco Vieira; Henrique Madeira
The paper presents and evaluates a methodology for the emulation of software faults in COTS components using software implemented fault injection (SWIFI) technology. ESFFI (Emulation of Software Faults by Fault Injection) leverages matured fault injection techniques, which have been used so far for the emulation of hardware faults, and adds new features that make possible the insertion of errors mimicking those caused by real software faults. The major advantage of ESFFI over other techniques that also emulate software faults (mutations, for instance) is making fault locations ubiquitous; every software module can be targeted, no matter if it is a device driver running in operating kernel mode or a third party component whose source code is not available. Experimental results have shown that for specific fault classes, e.g. assignment and checking, the accuracy obtained by this technique is quite good.
dependable systems and networks | 2002
Ricardo Maia; Luis Henriques; Diamantino Costa; Henrique Madeira
Discusses Xception, an automated fault injection environment that enables accurate and flexible V&V (verification & validation) and evaluation of mission and business critical computer systems using fault injection. Xception is designed to accommodate a variety of fault injection techniques (according to a wide range of configurations of the tool) and emulate in this way different classes of faults, with particular emphasis to hardware and software faults.
pacific rim international symposium on dependable computing | 1999
Diamantino Costa; Henrique Madeira
This paper evaluates the behavior of a common off-the-shelf (COTS) database management system (DBMS) in presence of transient faults. Database applications have traditionally been a field with fault-tolerance needs, concerning both data integrity and availability. While most of the commercially available DBMS provide support for data recovery and fault-tolerance, very limited knowledge was available regarding the impact of transient faults in a COTS database system. In this experimental study, a strict off-the-shelf target system is used (Oracle 7.3 server running on top of Wintel platform), combined with a TPC-A based workload and a software implemented fault injection tool, XceptionNT. It was found out that a non-negligible amount of induced faults, 13%, lead to the database server hanging or premature termination. However, the results also show that COTS DBMS products has a reasonable behavior concerning data integrity, none of the injected faults affected end user data.
emerging technologies and factory automation | 1997
Diamantino Costa; J. Carreira; J.G. Silva
Using off-the-shelf hardware and software in factory computer systems is one attractive way of increasing flexibility and reducing costs in development, maintenance and training. However, targeting factory critical applications for the low-end PC market and general-purpose operating systems such as Microsoft Windows 95/NT poses some problems. These applications are usually required to run 24 hours a day and demand high availability and data integrity. Using systems without any fault tolerance support is unreliable and dangerous. This article presents a software framework, named WinFT, which provides fault tolerance support for Windows applications. WinFT performs automatic detection and restart of failed processes, diagnostic and reboot of a malfunctioning or strangled operating system, checkpointing and recovery of critical volatile data and preventive actions such as software rejuvenation.
Archive | 2003
Diamantino Costa; Henrique Madeira; Joao Carreira; João Gabriel Silva
This chapter addresses Xception — a software implemented fault injection tool. Among its key features are the usage of the advanced debugging and performance monitoring resources available in modern processors to emulate realistic faults by software, and to monitor the activation of the faults and their impact on the target system behaviour in detail. Xception has been used extensively on the field and is one the very few fault injection tools commercially available and supported.
worst-case execution time analysis | 2003
Manuel Rodríguez; Nuno Silva; Joao Esteves; Luis Henriques; Diamantino Costa; Niklas Holsti; Kjeld Hjortnaes