Savio N. Chau
California Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Savio N. Chau.
Performance Evaluation | 1999
Ann T. Tai; Leon Alkalai; Savio N. Chau
Abstract With respect to the long-life missions associated with NASA’s X2000 Advanced Deep-Space System Development Program, reliability implies a system’s continuous operation for many years in an unsurveyed radiation-intense environment. Further, the stringent constraints on the mass of a spacecraft and the power on-board create unprecedented challenges on the means for achieving the ultra-high mission reliability. In this paper, we present an approach to on-board preventive maintenance which rejuvenates a system by letting system components rotate between on-duty and off-duty shifts, slowing down a system’s aging process and thus enhancing mission reliability. By exploiting nondedicated system redundancy, hardware and software rejuvenation are realized simultaneously without significant performance penalty. Our design-oriented analysis confirms a potential for significant gains in mission reliability from on-board preventive maintenance and provides to us useful insights about the collective effect of age-dependent failure behavior, residual mission life, risk of unsuccessful maintenance and maintenance frequency on mission reliability.
ieee aerospace conference | 2005
Richard J. Terrile; Christoph Adami; Hrand Aghazarian; Savio N. Chau; Van Dang; Michael I. Ferguson; Wolfgang Fink; Terry Huntsberger; Gerhard Klimeck; M.A. Kordon; Seungwon Lee; P. von Allmen; J. Xu
The Evolvable Computation Group, at NASAs Jet Propulsion Laboratory, is tasked with demonstrating the utility of computational engineering and computer optimized design for complex space systems. The group is comprised of researchers over a broad range of disciplines including biology, genetics, robotics, physics, computer science and system design, and employs biologically inspired evolutionary computational techniques to design and optimize complex systems. Over the past two years we have developed tools using genetic algorithms, simulated annealing and other optimizers to improve on human design of space systems. We have further demonstrated that the same tools used for computer-aided design and design evaluation can be used for automated innovation and design. These powerful techniques also serve to reduce redesign costs and schedules
Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248) | 1998
A.T. Tai; Leon Alkalai; Savio N. Chau
The long-life deep-space missions associated with NASAs X2000 Advanced Flight Systems Program creates many unprecedented challenges. In particular the stringent constraints on the mass of a spacecraft and the power on-board preclude traditional fault tolerance approaches which rely on extensive component/subsystem replication, calling for novel approaches to mission reliability enhancement. In this paper we present an approach to on-board preventive maintenance which rejuvenates a system via periodical duty switching between system components, slowing down a systems aging process and enhancing mission reliability. By exploiting the nondedicated system redundancy hardware and software rejuvenation are realized simultaneously without significant performance penalty. Our model-based evaluation confirms a potential for significant gains in mission reliability from on-board preventive maintenance and provides to us useful insights about the collective effect of age-dependent failure behavior residual mission life, risk of unsuccessful maintenance and maintenance frequency on mission reliability.
international symposium on software reliability engineering | 2009
Yansheng Zhang; I-Ling Yen; Farokh B. Bastani; Ann T. Tai; Savio N. Chau
Cyber-physical systems (CPS) are complex net-centric hardware/software systems that can be applied to transportation, healthcare, defense, and other real-time applications. To meet the high reliability and safety requirements for these systems, proactive system health monitoring and management (HMM) techniques can be used. However, to be effective, it is necessary to ensure that the operation of the underlying HMM system does not adversely impact the normal operation of the system being monitored. In particular, it must be ensured that the operation of the HMM system will not lead to resource contentions that may prevent the system being monitored from timely completion of critical tasks. This paper presents an adaptive HMM system model that defines the fault diagnosis quality metrics and supports diagnosis requirement specifications. Based on the model, the sensor activation decision problem (SADP) is defined along with a steepest descent based heuristic algorithm to make the HMM configuration decisions that best satisfy the diagnosis quality requirements. Evaluation results show that the technique reduces the overall system resource consumption without adversely impacting the diagnosis capability of the HMM.
dependable systems and networks | 2005
Ann T. Tai; Kam S. Tso; William H. Sanders; Savio N. Chau
While inherent resource redundancies in distributed applications facilitate gracefully degradable services, methods to enhance their dependability may have subtle, yet significant, performance implications, especially when such applications are stateful in nature. In this paper, we present a performability-oriented framework that enables the realization of software rejuvenation in stateful distributed applications. The framework is constructed based on three building blocks, namely, a rejuvenation algorithm, a set of performability metrics, and a performability model. We demonstrate via model-based evaluation that this framework enables error-accumulation-prone distributed applications to deliver services at the best possible performance level, even in environments in which a system is highly vulnerable to failures.
international conference on distributed computing systems | 2000
Ann T. Tai; Kam S. Tso; Leon Alkalai; Savio N. Chau; William H. Sanders
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). We focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tolerance means. In order to mitigate the effect of residual software faults at low performance cost, we take a crucial step in devising error containment and recovery methods by introducing the confidence-driven notion. This notion complements the message-driven (or communication-induced) approach employed by a number of existing checkpointing protocols for tolerating hardware faults. In particular, we discriminate between the individual software components with respect to our confidence in their reliability and keep track of changes of our confidence (due to knowledge about potential process state contamination) in particular processes. This, in turn, enables the individual processes in the spaceborne distributed system to make decisions locally at run-time, on whether to establish a checkpoint upon message passing and whether to roll back or roll forward during error recovery. The resulting message-driven confidence-driven approach enables cost-effective checkpointing and cascading-rollback free recovery.
international test conference | 1994
Savio N. Chau
In this paper, we propose a design technique called the Fault Injection Boundary Scan (FIBS) for fault injection that is much more efficient than the traditional hardwired pin-level fault injection. The FIBS augments the boundary scan design to facilitate the injection of faults to the input and output pins of a VLSI chip. In addition to the capabilities of a conventional boundary scan design, the FIBS can interpret the test vector contained in the boundary scan cells as markers for fault-injected pins during fault injection. The compatibility of the FIBS with the boundary scan also promises relatively small overhead.
pacific rim international symposium on dependable computing | 2001
Savio N. Chau; J. Smith; T. Tai
The paper describes a COTS bus network architecture consisting of the IEEE 1394 and SpaceWire buses. This architecture is based on the multi-level fault tolerance design methodology proposed by S.N. Chau et al. (1999) but has much less overhead than the original IEEE 1394/I/sup 2/C implementation. The simplifications are brought about by the topological flexibility and high performance of the SpaceWire. The SpaceWire can form a connected graph that embeds multiple spanning trees. This is a significant advantage because it allows the IEEE 1394 bus to select a different tree topology when a fault occurs. It also has sufficient performance to stand in for the IEEE 1394 bus during fault recovery, so that a backup IEEE 1394 bus is no longer required. These two buses are very compatible at the physical level and therefore can easily be combined. Analysis of the effectiveness of the IEEE 1394/SpaceWire architecture shows that it can achieve the same fault tolerance capability as the IEEE 1394/I/sup 2/ C architecture with less redundancy.
ieee aerospace conference | 2007
Alexandre Guillaume; Seugnwon Lee; Yeou Fang Wang; Hua Zheng; Robert Hovden; Savio N. Chau; Yu Wen Tung; Richard J. Terrile
The deep space network (DSN) is an international network of antennas that supports all of NASAs deep space missions. With the increasing demand of tracking time, DSN is highly over-subscribed. Therefore, the allocation of the DSN resources should be optimally scheduled to satisfy the requirements of as many missions as possible. Currently, the DSN schedules are manually and iteratively generated through several meetings to resolve conflicts. In an attempt to ease the burden of the DSN scheduling task, we have applied evolutionary computational techniques to the DSN scheduling problem. These methods provide a decision support system by automatically generating a population of optimized schedules under varying conflict conditions. These schedules are used to decide the simplest path to resolve conflicts as new scheduled items are added or changed along the scheduled 26 weeks. This paper presents the specific approach taken to formulate the problem in terms of gene encoding, fitness function, and genetic operations. The genome is encoded such that a subset of the scheduling constraints is automatically satisfied. Several fitness functions are formulated to emphasize different aspects of the scheduling problem. The optimal solutions of the different fitness functions demonstrate the trade-off of the scheduling problem and provide insight into a conflict resolution process.
systems, man and cybernetics | 2005
Will Hua Zheng; Neville I. Marzwell; Savio N. Chau
This paper presents a methodology for partially reconfiguring a field programmable gate array (FPGA) device using only limited onboard resources. This paper also seeks to provide a roadmap to developing necessary tools and technologies to help design self-sufficient partial run-time reconfigurable systems for spacecraft avionic systems. To provide a vision for the technology, this paper recommends a few possible applications in spacecraft avionic systems, in fault tolerance and space-saving hardware. In addition, some previous work done on the research for reconfigurable, modular avionics are also presented at the end as an example of applications.