Cristian Constantinescu

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cristian Constantinescu is active.

Explore More

Publication

Featured researches published by Cristian Constantinescu.

dependable systems and networks | 2002

Impact of deep submicron technology on dependability of VLSI circuits

Cristian Constantinescu

Advances in semiconductor technology have led to impressive performance gains of VLSI circuits, in general, and microprocessors, in particular. However, smaller transistor and interconnect dimensions, lower power voltages, and higher operating frequencies have contributed to increased rates of occurrence of transient and intermittent faults. We address the impact of deep submicron technology on permanent, transient and intermittent classes of faults, and discuss the main trends in circuit dependability. Two case studies exemplify this analysis. The first one deals with intermittent faults induced by manufacturing residuals. The second case study shows that transients generated by timing violations are capable of silently corrupting data. It is concluded that the semiconductor industry is approaching a new stage in the design and manufacturing of VLSI circuits. Fault-tolerance features, specific to custom designed computers, have to be integrated into commercial-off-the-shelf (COTS) VLSI systems in the future, in order to preserve data integrity and limit the impact of transient and intermittent faults.

IEEE Transactions on Reliability | 2003

Experimental evaluation of error-detection mechanisms

Cristian Constantinescu

Effective error-detection is paramount for building highly dependable computing systems. A new methodology, based on physical and simulated fault injection, has been developed for assessing the effectiveness of error-detection mechanisms. This approach has 2 steps: (1) transient faults are physically injected at the IC pin level of a prototype, in order to derive the error-detection coverage. Experiments are carried out in a 3-dimensional space of events. Fault location, time of occurrence, and duration of the injected fault are the dimensions of this space. (2) Simulated fault-injection is performed to assess the effectiveness of new error-detection mechanisms, designed to improve the detection coverage. Complex circuitry, based on checking for protocol violations, is considered. A temporal model of the protocol checker is used, and transient faults are injected in signal traces captured from the prototype system. These traces are used as inputs of the simulation engine. s-confidence intervals of the error-detection coverage are derived, both for the initial design and the new detection mechanism. Physical fault-injection, carried out on a prototype server, proved that several signals were sensitive to transient faults and error-detection coverage was unacceptably low. Simulated fault injection shows that an error-detection mechanism, based on checking for protocol violations, can appreciably increase the detection coverage, especially for transient faults longer that 200 nanoseconds. Additional research is required for improving the error-detection of shorter transients. Fault injection experiments also show that error-detection coverage is a function of fault duration: the shorter the transient fault, the lower the coverage. As a consequence, injecting faults that have a unique, predefined duration, as it was frequently done in the past, does not provide accurate information on the effectiveness of the error-detection mechanisms. Injecting only permanent faults leads to unrealistically high estimates of the coverage. These experiments prove that combined physical and simulated fault injection, performed in a 3-dimensional space of events, is a superior approach, which allows the designers to accurately assess the efficacy of various candidate error-detection mechanisms without building expensive test circuits.

IEEE Transactions on Reliability | 2005

Dependability evaluation of a fault-tolerant processor by GSPN modeling

Cristian Constantinescu

High dependability has become a paramount requirement for computing systems, as they are increasingly used in business & life critical applications. Advances in the design & manufacturing of semiconductor devices have increased the performance of computing systems at a dazzling pace. However, smaller transistor dimensions, lower power voltages, and higher operating frequencies have negatively impacted dependability by increasing the probability of occurrence of transient & intermittent faults. This paper discusses the main trends in dependability of semiconductor devices, and presents a candidate architecture for a fault-tolerant microprocessor. Dependability of the processor is analyzed, and the advantages provided by fault tolerance are underscored. The effect of the higher rates of occurrence of the transient & intermittent faults on a typical microprocessor is evaluated with the aid of GSPN modeling. Dependability analysis shows that a five times increase of the rate of occurrence of the transients leads to about five time lower MTBF, if no error recovery mechanisms are employed. Significantly lower processor availability is also observed. The fault-tolerant processor is devised to mitigate the impact of the higher transient & intermittent fault rates. The processor is based on core redundancy & state checkpointing, and supports three levels of error recovery. First, recovery from a saved state (SSRC) is attempted. The second level consists of a retry (SSRR), and is activated when the first level of recovery fails. Processor reset, followed by reintegration under the operating system control (RB), is the third level of recovery. Dependability analysis, based on GSPN, shows that fault-tolerance features of the processor preserve the MTBF, even if the rate of the transient faults nearly doubles. In terms of availability, a four-time increase of the rate of occurrence of the transients is compensated. The effect of intermittent faults is also analyzed. A five-time increase of the failure rate of the intermittent faults may lower MTBF by 31% to 33%. MTBF decreases even more, by 45% to 67%, if bursts of errors are considered. Intermittent faults have a negative impact on availability as well. Maintaining the dependability of complex integrated circuits to the level available today is becoming a challenge as semiconductor integration continues at a fast pace. Fault avoidance techniques, mainly based on process technology & circuit design, will no be able to fully mitigate the impact of higher rates of occurrence of transient & intermittent faults. As a result, fault-tolerant features, specific to custom designed components today, ought to be employed by COTS circuits, in the future. Enhanced concurrent error detection & correction, self checking circuits, space & time redundancy, triplication, and voting all need to be integrated into semiconductor devices in general, and microprocessors in particular, in order to improve fault & error handling.

IEEE Transactions on Computers | 2000

Teraflops supercomputer: architecture and validation of the fault tolerance mechanisms

Cristian Constantinescu

Intel Corporation developed the Teraflops supercomputer for the US Department of Energy (DOE) as part of the Accelerated Strategic Computing Initiative (ASCI). This is the most powerful computing machine available today, performing over two trillion floating point operations per second with the aid of more than 9,000 Intel processors. The Teraflops machine employs complex hardware and software fault/error handling mechanisms for complying with DOEs reliability requirements. This paper gives a brief description of the system architecture and presents the validation of the fault tolerance mechanisms. Physical fault injection at the IC pin level was used for validation purposes. An original approach was developed for assessing signal sensitivity to transient faults and the effectiveness of the fault/error handling mechanisms. Dependency between fault/error detection coverage and fault duration was also determined. Fault injection experiments unveiled several malfunctions at the hardware, firmware, and software levels. The supercomputer performed according to the DOE requirements after corrective actions were implemented. The fault injection approach presented in this paper can be used for validation of any fault-tolerant or highly available computing system.

ieee international symposium on fault tolerant computing | 1998

Validation of the fault/error handling mechanisms of the Teraflops supercomputer

Cristian Constantinescu

The Teraflops system, the worlds most powerful supercomputer, was developed by Intel Corporation for the US Department of Energy (DOE) as part of the Accelerated Strategic Computing Initiative (ASCI). The machine contains more than 9000 Intel Pentium (R) Pro processors and performs over one trillion floating point operations per second. Complex hardware and software mechanisms were devised for complying with DOEs reliability requirements. This paper gives a brief description of the Teraflops system architecture and presents the validation of the fault/error handling mechanisms. The validation process was based on an enhanced version of the physical fault injection at the IC pin level. An original approach was developed for assessing signal sensitivity to transient faults and the effectiveness of the fault tolerance mechanisms. Several malfunctions were unveiled by the fault injection experiments. After corrective actions had been undertaken, the supercomputer performed according to the specification.

pacific rim international symposium on dependable computing | 1999

Using physical and simulated fault injection to evaluate error detection mechanisms

Cristian Constantinescu

Effective error detection is paramount for building highly dependable computing systems. A new methodology, based on physical and simulated fault injection, is developed for evaluating error detection mechanisms. Our approach consists of two steps. First, transient faults are physically injected at the IC pin level of a prototype server. Experiments are carried our in a three dimensional space of events, the location, time of occurrence and duration of the fault being randomly selected. Improved detection circuitry is devised for decreasing signal sensitivity to transients. Second, simulated fault injection is performed to asses the effectiveness of the new detection mechanisms, without using expensive silicon implementations. Physical fault injection experiments, carried out on the server, and simulated fault injection, performed on protocol checker, are presented. Detection effectiveness is measured by the error detection coverage, defined as the conditional probability that an error is detected given that an error occurs. Fault injection reveals that coverage probability is a function of fault duration. The protocol checker significantly improves error detection. Although, further research is required to increase detection coverage of the errors induced by short transient faults.

european dependable computing conference | 1999

Assessing Error Detection Coverage by Simulated Fault Injection

Cristian Constantinescu

Server dependability is of increasing importance as more critical applications rely on the client-server computing model. As a consequence, complex fault/error handling mechanisms are becoming common features of today servers. This paper presents a new simulated fault injection method, which allows the assessment of the effectiveness of error detection mechanisms without using expensive test circuits. Fault injection was performed in two stages. First, physical fault injection was performed on a prototype server. Transient faults were injected in randomly selected signals. Traces of the signals sensitive to transients were captured. A complex protocol checker was devised for increasing error detection. The new detection circuitry was simulated in the second stage of the experiment. Signal traces, injected with transient faults, were used as inputs of the simulation. The error detection coverage and latency were derived. Fault injection also showed that coverage probability was a function of fault duration.

pacific rim international symposium on dependable computing | 2001

Dependability analysis of a fault-tolerant processor

Cristian Constantinescu

Advances in semiconductor technology have improved the performance of integrated circuits, in general, and microprocessors, in particular, at a dazzling pace. Although, smaller transistor dimensions, lower power voltages and higher operating frequencies have significantly increased the circuit sensitivity to transient and intermittent faults. In this paper we present the architecture of a fault-tolerant processor and analyze its dependability with the aid of a generalized stochastic Petri net (GSPN) model. The effect of transient and intermittent faults is evaluated. It is concluded that fault tolerance mechanisms, usually employed by custom designed systems, have to be integrated into commercial-off-the-shelf (COTS) devices, in order to mitigate the impact of higher rates of occurrence of the transient and intermittent faults.

reliability and maintainability symposium | 1997

Numerical techniques for assessing reliability and performance of gracefully degrading computer systems

Cristian Constantinescu

Gracefully degrading systems represent a cost effective alternative to the massively redundant fault-tolerant computing systems. Assessing the effectiveness of these systems requires combined reliability and performance measures such as computational availability, performability and accumulated reward. This paper compares, for the first time, two numerical algorithms used for assessing the complementary distribution of the accumulated reward and the expected accumulated reward, respectively. Both methods are employed for analyzing a multiprocessor server. The first one, based on Laplace transforms, numerical evaluation of eigenvalues, and analytical and numerical inversion of the Laplace transforms, gives accurate results for low values of the accumulated reward. However, instability of the numerical inversion routine negatively affects the results when the accumulated reward approaches the maximum attainable performance of the system. The second method, which relies on randomization, proves to be insensitive to the performance level reached by the system. This approach is used to analyze the impact of the fault/error coverage probability, spare processing units, repair, and performance degradation on the expected accumulated reward of the server. We conclude that the randomization based method is a more accurate approach for assessing the reliability and performance of gracefully degrading systems.

dependable systems and networks | 2005

Neutron SER characterization of microprocessors

Cristian Constantinescu

Explore More

Collaboration

Dive into the Cristian Constantinescu's collaboration.

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Cristian Constantinescu is active.

Publication

Featured researches published by Cristian Constantinescu.

Impact of deep submicron technology on dependability of VLSI circuits

Experimental evaluation of error-detection mechanisms

Dependability evaluation of a fault-tolerant processor by GSPN modeling

Teraflops supercomputer: architecture and validation of the fault tolerance mechanisms

Validation of the fault/error handling mechanisms of the Teraflops supercomputer

Using physical and simulated fault injection to evaluate error detection mechanisms

Assessing Error Detection Coverage by Simulated Fault Injection

Dependability analysis of a fault-tolerant processor

Numerical techniques for assessing reliability and performance of gracefully degrading computer systems

Neutron SER characterization of microprocessors

Collaboration

Dive into the Cristian Constantinescu's collaboration.