Timothy Tsai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Timothy Tsai is active.

Explore More

Publication

Featured researches published by Timothy Tsai.

Journal of Systems and Software | 2004

NT-SwiFT: software implemented fault tolerance on Windows NT

Deron Liang; P. Emerald Chung; Yennun Huang; Chandra Mohan Rao Kintala; Woei-Jyh Lee; Timothy Tsai; Chung-Yih Wang

More and more high available applications are implemented on Windows NT. However, the current version of Windows NT (NT4) does not provide some facilities that are needed to implement these fault tolerant applications. In this paper, we describe a set of components collectively named NT-SwiFT (Software Implemented Fault Tolerance) which facilitates building fault-tolerant and highly available applications on Windows NT. NT-SwiFT provides components for automatic error detection and recovery, checkpointing, event logging and replay, communication error recovery, incremental data replications, IP packets re-routing, etc. SwiFT components were originally designed on UNIX. The UNIX version was first ported to NT to run on UWIN [Korn97]. Gradually a large portion of the software has been re-implemented to take advantage of native NT system services. This paper describes these components and compares the differences in the UNIX and NT implementations. We also describe some applications using these components and discuss how to leverage NT system services and cope with some missing features.

dependable systems and networks | 2002

Libsafe: transparent system-wide protection against buffer overflow attacks

Timothy Tsai; Navjot Singh

Libsafe is a practical solution that protects against the most common forms of buffer overflow attacks. Such attacks often result in granting the attacker full privileges on the target system. Libsafe is implemented as a shared library that intercepts calls to vulnerable standard library functions. Based on an inspection of the process stack and the function arguments, Libsafe ensures that no return addresses can be overwritten, thus preventing the most common form of buffer overflow attack.

dependable systems and networks | 2012

A study of soft error consequences in hard disk drives

Timothy Tsai; Nawanol Theera-Ampornpunt; Saurabh Bagchi

Hard disk drives have multiple layers of fault tolerance mechanisms that protect against data loss. However, a few failures occasionally breach the entire set of mechanisms. To prevent such scenarios, we rely on failure prediction mechanisms to raise alarms with sufficient warning to allow the at-risk data to be copied to a safe location. A common failure prediction technique monitors the occurrence of soft errors and triggers an alarm when the soft error rate exceeds a specified threshold. This study uses data collected from a population of over 50,000 customer deployed disk drives to examine the relationship between soft errors and failures, in particular failures manifested as hard errors. The data analysis shows that soft errors alone cannot be used as a reliable predictor of hard errors. However, in those cases where soft errors do accurately predict hard errors, sufficient warning time exists for preventive actions.

international symposium on performance analysis of systems and software | 2017

SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation

Siva Kumar Sastry Hari; Timothy Tsai; Mark Stephenson; Stephen W. Keckler; Joel S. Emer

As GPUs become more pervasive in both scalable high-performance computing systems and safety-critical embedded systems, evaluating and analyzing their resilience to soft errors caused by high-energy particle strikes will grow increasingly important. GPU designers must develop tools and techniques to understand the effect of these soft errors on applications. This paper presents an error injection-based methodology and tool called SASSIFI to study the soft error resilience of massively parallel applications running on state-of-the-art NVIDIA GPUs. Our approach uses a low-level assembly-language instrumentation tool called SASSI to profile and inject errors. SASSI provides efficiency by allowing instrumentation code to execute entirely on the GPU and provides the ability to inject into different architecture-visible state. For example, SASSIFI can inject errors in general-purpose registers, GPU memory, condition code registers, and predicate registers. SASSIFI can also inject errors into addresses and register indices. In this paper, we describe the SASSIFI tool, its capabilities, and present experiments to illustrate some of the analyses SASSIFI can be used to perform.

ieee international conference on high performance computing data and analytics | 2017

Understanding error propagation in deep learning neural network (DNN) accelerators and applications

Guanpeng Li; Siva Kumar Sastry Hari; Michael B. Sullivan; Timothy Tsai; Karthik Pattabiraman; Joel S. Emer; Stephen W. Keckler

Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in datacenters (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems. Traditional methods for building resilient systems, e.g., Triple Modular Redundancy (TMR), are agnostic of the DNN algorithm and the DNN accelerators architecture. Hence, these traditional resilience approaches incur high overheads, which makes them challenging to deploy. In this paper, we experimentally evaluate the resilience characteristics of DNN systems (i.e., DNN software running on specialized accelerators). We find that the error resilience of a DNN system depends on the data types, values, data reuses, and types of layers in the design. Based on our observations, we propose two efficient protection techniques for DNN systems.

Archive | 2008