Jacob A. Abraham
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jacob A. Abraham.
IEEE Transactions on Reliability | 1979
Jacob A. Abraham
Boolean algebra has been used to find the probability of communication between a pair of nodes in a network by starting with a Boolean product corresponding to simple paths between the pair of nodes and making them disjoint (mutually exclusive). A theorem is given, the use of which enables the disjoint products to be found much faster than by existing methods. An algorithm and results of its implementation on a computer are given. Comparisons with existing methods show the usefulness of the algorithm for large networks.
IEEE Transactions on Computers | 1995
Ghani A. Kanawati; Nasser A. Kanawati; Jacob A. Abraham
A major step toward the development of fault-tolerant computer systems is the validation of the dependability properties of these systems. Fault/error injection has been recognized as a powerful approach to validate the fault tolerance mechanisms of a system and to obtain statistics on parameters such as coverages and latencies. This paper describes the methodology and guidelines for the design of flexible software based fault and error injection and presents a tool, FERRARI, that incorporates the techniques. The techniques used to emulate transient errors and permanent faults in software are described in detail. Experimental results are presented for several error detection techniques, and they demonstrate the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems. >
IEEE Transactions on Computers | 1988
Jing Yang Jou; Jacob A. Abraham
Two concurrent error detection (CED) schemes are proposed for N-point fast Fourier transform (FFT) networks that consists of log/sub 2/N stages with N/2 two-point butterfly modules for each stage. The method assumes that failures are confined to a single complex multiplier or adder or to one input or output set of lines. Such a fault model covers a broad class of faults. It is shown that only a small overhead ratio, O(2/log/sub 2/N) of hardware, is required for the networks to obtain fault-secure results in the first scheme. A novel data retry technique is used to locate the faulty modules. Large roundoff errors can be detected and treated in the same manner as functional errors. The retry technique can also distinguish between the roundoff errors and functional errors that are caused by some physical failures. In the second scheme, a time-redundancy method is used to achieve both error detection and location. It is sown that only negligible hardware overhead is required. However, the throughput is reduced to half that of the original system, without both error detection and location, because of the nature of time-redundancy methods. >
ieee international symposium on fault tolerant computing | 1992
Ghani A. Kanawati; Nasser A. Kanawati; Jacob A. Abraham
The authors present FERRARI, a fault and error automatic real-time injector, which can evaluate complex systems by emulating most hardware faults in software. The current version of FERRARI runs on SPARC workstations, in an Xwindow environment. The motivation, methodology, design, implementation, and evaluation of FERRARI are presented. The techniques used to emulate permanent faults and transient errors in software are described in detail. Experimental results are presented for several error detection techniques. They demonstrate the effectiveness of FERRARI in its role as a fault and error injector.<<ETX>>
IEEE Transactions on Parallel and Distributed Systems | 1999
Zeyad Alkhalifa; V. S. S. Nair; Narayanan Krishnamurthy; Jacob A. Abraham
This paper evaluates the concurrent error detection capabilities of system-level checks, using fault and error injection. The checks comprise application and system level mechanisms to detect control flow errors. We propose Enhanced Control-Flow Checking Using Assertions (ECCA). In ECCA, branch-free intervals (BFI) in a given high or intermediate level program are identified and the entry and exit points of the intervals are determined. BFls are then grouped into blocks, the size of which is determined through a performance/overhead analysis. The blocks are then fortified with preinserted assertions. For the high level ECCA, we describe an implementation of ECCA through a preprocessor that will automatically insert the necessary assertions into the program. Then, we describe the intermediate implementation possible through modifications made on gee to make it ECCA capable. The fault detection capabilities of the checks are evaluated both analytically and experimentally. Fault injection experiments are conducted using FERRARI to determine the fault coverage of the proposed techniques.
IEEE Transactions on Software Engineering | 1982
Timothy C. K. Chou; Jacob A. Abraham
In a distributed computing system made up of different types of processors each processor in the system may have different performance and reliability characteristics. In order to take advantage of this diversity of processing power, a modular distributed program should have its modules assigned in such a way that the applicable system performance index, such as execution time or cost, is optimized. This paper describes an algorithm for making an optimal module to processor assignment for a given performance criteria. We first propose a computational model to characterize distributed programs, consisting of tasks and an operational precedence relationship. This model alows us to describe probabilistic branching as well as concurrent execution in a distributed program. The computational model along with a set of seven program descriptors completely specifies a model for dynamic execution of a program on a distributed system. The optimal task to processor assignment is found by an algorithm based on results in Markov decision theory. The algorithm given in this paper is completely general and applicable to N-processor systems.
international conference on computer aided design | 1992
Daniel G. Saab; Youssef G. Saab; Jacob A. Abraham
An approach to cultivating a test for combinational and sequential VLSI circuits described hierarchically at the transistor, gate, and higher levels is discussed. The approach is based on continuous mutation of a given input sequence and on analyzing the mutated vectors for selecting the test set. The approach uses a hierarchical simulation technique in the analysis to drastically reduce the memory requirement, thus allowing the test generation for large VLSI circuits. The algorithms are at the switch level so that general MOS digital designs can be handled, and both stuck-at and transistor faults are handled accurately. The approach was implemented in a hierarchical test generation system, CRIS, that runs under UNIX on SPARC workstations. CRIS was used successfully to generate tests with high fault coverage for large combinational and sequential circuits.<<ETX>>
Proceedings of the IEEE | 1986
Jacob A. Abraham; W.K. Fuchs
This paper describes a variety of fault and error models which are used as the basis for designing fault-tolerant Very Large Scale Integrated (VLSI) systems. The fault models describe physical defects and failures and the input patterns which will expose them, and are suitable for testing, while error models describe the effects on the functional outputs of defects and are useful for on-line error detection. The models are described at various levels of abstraction. The differences between fault and error models for identical functional modules are also illustrated.
ieee international conference on high performance computing data and analytics | 2014
Marc Snir; Robert W. Wisniewski; Jacob A. Abraham; Sarita V. Adve; Saurabh Bagchi; Pavan Balaji; Jim Belak; Pradip Bose; Franck Cappello; Bill Carlson; Andrew A. Chien; Paul W. Coteus; Nathan DeBardeleben; Pedro C. Diniz; Christian Engelmann; Mattan Erez; Saverio Fazzari; Al Geist; Rinku Gupta; Fred Johnson; Sriram Krishnamoorthy; Sven Leyffer; Dean A. Liberty; Subhasish Mitra; Todd S. Munson; Rob Schreiber; Jon Stearley; Eric Van Hensbergen
We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.
IEEE Transactions on Computers | 1990
Prithviraj Banerjee; Joseph T. Rahmeh; Craig B. Stunkel; V. S. S. Nair; Kaushik Roy; Vijay Balasubramanian; Jacob A. Abraham
The design of fault-tolerant hypercube multiprocessor architecture is discussed. The authors propose the detection and location of faulty processors concurrently with the actual execution of parallel applications on the hypercube using a novel scheme of algorithm-based error detection. System-level error detection mechanisms have been implemented for three parallel applications on a 16-processor Intel iPSC hypercube multiprocessor: matrix multiplication, Gaussian elimination, and fast Fourier transform. Schemes for other applications are under development. Extensive studies have been done of error coverage of the system-level error detection schemes in the presence of finite-precision arithmetic, which affects the system-level encodings. Two reconfiguration schemes are proposed that allow the authors to isolate and replace faulty processors with spare processors. >