Is this you? Create Your Porfile

Ann T. Tai

California Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ann T. Tai is active.

Explore More

Publication

Featured researches published by Ann T. Tai.

dependable systems and networks | 2004

Cluster-based failure detection service for large-scale ad hoc wireless network applications

Ann T. Tai; Kam S. Tso; William H. Sanders

The growing interest in ad hoc wireless network applications that are made of large and dense populations of lightweight system resources, calls for scalable approaches to fault tolerance. Moreover, the nature of these systems creates significant challenges for the development of failure detection services (FDSs), because their quality often depends heavily on reliable communication. In particular, ad hoc wireless networks are notoriously vulnerable to message loss, which precludes deterministic guarantees for the completeness and accuracy properties of FDSs. To meet the challenges, we propose an FDS based on the notion of clustering. Specifically, we use a cluster-based communication architecture to permit the FDS to be implemented in a distributed manner via intra-cluster heartbeat diffusion and to allow a failure report to be forwarded across clusters through the upper layer of the communication hierarchy. In doing so, we extensively exploit the message redundancy that is inherent in ad hoc wireless settings to mitigate the effects of message loss on the accuracy and completeness properties of failure detection. As shown by our mathematical analysis, the resulting FDS is able to provide satisfactory probabilistic guarantees for the desired properties.

Performance Evaluation | 1999

On-board preventive maintenance: a design-oriented analytic study for long-life applications

Ann T. Tai; Leon Alkalai; Savio N. Chau

Abstract With respect to the long-life missions associated with NASA’s X2000 Advanced Deep-Space System Development Program, reliability implies a system’s continuous operation for many years in an unsurveyed radiation-intense environment. Further, the stringent constraints on the mass of a spacecraft and the power on-board create unprecedented challenges on the means for achieving the ultra-high mission reliability. In this paper, we present an approach to on-board preventive maintenance which rejuvenates a system by letting system components rotate between on-duty and off-duty shifts, slowing down a system’s aging process and thus enhancing mission reliability. By exploiting nondedicated system redundancy, hardware and software rejuvenation are realized simultaneously without significant performance penalty. Our design-oriented analysis confirms a potential for significant gains in mission reliability from on-board preventive maintenance and provides to us useful insights about the collective effect of age-dependent failure behavior, residual mission life, risk of unsuccessful maintenance and maintenance frequency on mission reliability.

workshop on object-oriented real-time dependable systems | 1997

On-board preventive maintenance: analysis of effectiveness and optimal duty period

Ann T. Tai; Savio N. Chau; Leon Alkalaj; Herbert Hecht

To maximize the reliability of a spacecraft which performs a long-life (over 10-year) deep-space mission (to an outer planet), a fault-tolerant environment incorporating on-board preventive maintenance is highly desirable. In this paper, we present an initial model-based study which identifies the key factors for the reliability gained from on-board preventive maintenance and demonstrates the capability of analytic modeling in determining the optimal interval between maintenance (duty period).

international symposium on software reliability engineering | 2009

Optimal Adaptive System Health Monitoring and Diagnosis for Resource Constrained Cyber-Physical Systems

Yansheng Zhang; I-Ling Yen; Farokh B. Bastani; Ann T. Tai; Savio N. Chau

Cyber-physical systems (CPS) are complex net-centric hardware/software systems that can be applied to transportation, healthcare, defense, and other real-time applications. To meet the high reliability and safety requirements for these systems, proactive system health monitoring and management (HMM) techniques can be used. However, to be effective, it is necessary to ensure that the operation of the underlying HMM system does not adversely impact the normal operation of the system being monitored. In particular, it must be ensured that the operation of the HMM system will not lead to resource contentions that may prevent the system being monitored from timely completion of critical tasks. This paper presents an adaptive HMM system model that defines the fault diagnosis quality metrics and supports diagnosis requirement specifications. Based on the model, the sensor activation decision problem (SADP) is defined along with a steepest descent based heuristic algorithm to make the HMM configuration decisions that best satisfy the diagnosis quality requirements. Evaluation results show that the technique reduces the overall system resource consumption without adversely impacting the diagnosis capability of the HMM.

dependable systems and networks | 2005

A performability-oriented software rejuvenation framework for distributed applications

Ann T. Tai; Kam S. Tso; William H. Sanders; Savio N. Chau

While inherent resource redundancies in distributed applications facilitate gracefully degradable services, methods to enhance their dependability may have subtle, yet significant, performance implications, especially when such applications are stateful in nature. In this paper, we present a performability-oriented framework that enables the realization of software rejuvenation in stateful distributed applications. The framework is constructed based on three building blocks, namely, a rejuvenation algorithm, a set of performability metrics, and a performability model. We demonstrate via model-based evaluation that this framework enables error-accumulation-prone distributed applications to deliver services at the best possible performance level, even in environments in which a system is highly vulnerable to failures.

international conference on distributed computing systems | 2000

On low-cost error containment and recovery methods for guarded software upgrading

Ann T. Tai; Kam S. Tso; Leon Alkalai; Savio N. Chau; William H. Sanders

To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). We focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tolerance means. In order to mitigate the effect of residual software faults at low performance cost, we take a crucial step in devising error containment and recovery methods by introducing the confidence-driven notion. This notion complements the message-driven (or communication-induced) approach employed by a number of existing checkpointing protocols for tolerating hardware faults. In particular, we discriminate between the individual software components with respect to our confidence in their reliability and keep track of changes of our confidence (due to knowledge about potential process state contamination) in particular processes. This, in turn, enables the individual processes in the spaceborne distributed system to make decisions locally at run-time, on whether to establish a checkpoint upon message passing and whether to roll back or roll forward during error recovery. The resulting message-driven confidence-driven approach enables cost-effective checkpointing and cascading-rollback free recovery.

document analysis systems | 2003

A human factors testbed for command and control of unmanned air vehicles

Kam S. Tso; Gregory K. Tharp; Ann T. Tai; Mark H. Draper; Gloria L. Calhoun; Heath A. Ruff

In this paper, the testbed which is built upon the Multi-Modal Immersive Intelligent Interface for Remote Operation (MIIIRO) to support UAV control is presented. The testbed implements a client/server architecture in which UAV operations are simulated on a server that maintains the states of the UAVs. The testbed supports both the route planning and execution of human factors experiments.

dependable systems and networks | 2001

Synergistic coordination between software and hardware fault tolerance techniques

Ann T. Tai; Kam S. Tso; Leon Alkalai; Savio N. Chau; William H. Sanders

Describes an approach for enabling the synergistic coordination between two fault-tolerance protocols to simultaneously tolerate software and hardware faults in a distributed computing environment. Specifically, our approach is based on a message-driven confidence-driven (MDCD) protocol that we have devised for tolerating software design faults, and a time-based (TB) checkpointing protocol that was developed by N. Neves and W.K. Fuchs (1996) for tolerating hardware faults. By carrying out algorithm modifications that are conducive to synergistic coordination between volatile-storage and stable-storage checkpoint establishments, we are able to circumvent the potential interference between the MDCD and TB protocols, and to allow them to effectively complement each other to extend a systems fault tolerance capability. Moreover, the protocol coordination approach preserves and enhances the features and advantages of the individual protocols that participate in the coordination, keeping the performance cost low.

international symposium on object/component/service-oriented real-time distributed computing | 2006

Deductive glue code synthesis for embedded software systems based on code patterns

Jian Liu; Jicheng Fu; Yansheng Zhang; Farokh B. Bastani; I-Ling Yen; Ann T. Tai; Savio N. Chau

Automated code synthesis is a constructive process that can be used to generate programs from specifications. It can, thus, greatly reduce the software development cost and time. The use of formal code synthesis approach for software generation further increases the dependability of the system. Though code synthesis has many potential benefits, the synthesis techniques are still limited. Meanwhile, components are widely used in embedded system development. Applying code synthesis to component based software development (CBSD) process can greatly enhance the capability of code synthesis while reducing the component composition efforts. In this paper, we discuss the issues and techniques for applying deductive code synthesis techniques to CBSD. For deductive synthesis in CBSD, a rule base is the key for inferring appropriate component composition. We use the code patterns to guide the development of rules. Code patterns have been proposed to capture the typical usages of the components. Several general composition operations have been identified to facilitate systematic composition. We present the technique for rule development and automated generation of new patterns from existing code patterns. A case study of using this method in building a real-time control system is also presented

dependable systems and networks | 2008

A recurrence-relation-based reward model for performability evaluation of embedded systems

Ann T. Tai; Kam S. Tso; William H. Sanders

Embedded systems for closed-loop applications often behave as discrete-time semi-Markov processes (DTSMPs). Performability measures most meaningful to iterative embedded systems, such as accumulated reward, are thus difficult to solve analytically in general. In this paper, we propose a recurrence-relation-based (RRB) reward model to evaluate such measures. A critical element in RRB reward models is the notion of state-entry probability. This notion enables us to utilize the embedded Markov chain in a DTSMP in a novel way. More specifically, we formulate state-entry probabilities, state-occupancy probabilities, and expressions concerning accumulated reward solely in terms of state-entry probability and its companion term, namely the expected accumulated reward at the point of state entry. As a result, recurrence relations abstract away all the intermediate points that lack the memoryless property, enabling a solvable model to be directly built upon the embedded Markov chain. To show the usefulness of RRB reward models, we evaluate an embedded system for which we leverage the proposed notion and methods to solve a variety of probabilistic measures analytically.

Explore More