Joseph Tucek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joseph Tucek is active.

Explore More

Publication

Featured researches published by Joseph Tucek.

symposium on operating systems principles | 2007

Triage: diagnosing production run failures at the user's site

Joseph Tucek; Shan Lu; Chengdu Huang; Spiros Xanthos; Yuanyuan Zhou

Diagnosing production run failures is a challenging yet importanttask. Most previous work focuses on offsite diagnosis, i.e.development site diagnosis with the programmers present. This is insufficient for production-run failures as: (1) it is difficult to reproduce failures offsite for diagnosis; (2) offsite diagnosis cannot provide timely guidance for recovery or security purposes; (3)it is infeasible to provide a programmer to diagnose every production run failure; and (4) privacy concerns limit the release of information(e.g. coredumps) to programmers. To address production-run failures, we propose a system, called Triage, that automatically performs onsite software failure diagnosis at the very moment of failure. It provides a detailed diagnosis report, including the failure nature, triggering conditions, related code and variables, the fault propagation chain, and potential fixes. Triage achieves this by leveraging lightweight reexecution support to efficiently capture the failure environment and repeatedly replay the moment of failure, and dynamically--using different diagnosis techniques--analyze an occurring failure. Triage employs afailure diagnosis protocol that mimics the steps a human takes in debugging. This extensible protocol provides a framework to enable the use of various existing and new diagnosis techniques. We also propose a new failure diagnosis technique, delta analysis, to identify failure related conditions, code, and variables. We evaluate these ideas in real system experiments with 10 real software failures from 9 open source applications including four servers. Triage accurately diagnoses the evaluated failures, providing likely root causes and even the fault propagation chain, while keeping normal-run overhead to under 5%. Finally, our user study of the diagnosis and repair of real bugs shows that Triagesaves time (99.99% confidence), reducing the total time to fix by almost half.

Operating Systems Review | 2010

Efficiency matters

Eric Anderson; Joseph Tucek

Current data intensive scalable computing (DISC) systems, although scalable, achieve embarrassingly low rates of processing per node. We feel that current DISC systems have repeated a mistake of old high-performance systems: focusing on scalability without considering efficiency. This poor efficiency comes with issues in reliability, energy, and cost. As the gap between theoretical performance and what is actually achieved has become glaringly large, we feel there is a pressing need to rethink the design of future data intensive computing and carefully consider the direction of future research.

architectural support for programming languages and operating systems | 2009

Efficient online validation with delta execution

Joseph Tucek; Weiwei Xiong; Yuanyuan Zhou

Software systems are constantly changing. Patches to fix bugs and patches to add features are all too common. Every change risks breaking a previously working system. Hence administrators loathe change, and are willing to delay even critical security patches until after fully validating their correctness. Compared to off-line validation, on-line validation has clear advantages since it tests against real life workloads. Yet unfortunately it imposes restrictive overheads as it requires running the old and new versions side-by-side. Moreover, due to spurious differences (e.g. event timing, random number generation, and thread interleavings), it is difficult to compare the two for validation. To allow more effective on-line patch validation, we propose a new mechanism, called delta execution, that is based on the observation that most patches are small. Delta execution merges the two side-by-side executions for most of the time and splits only when necessary, such as when they access different data or execute different code. This allows us to perform on-line validation not only with lower overhead but also with greatly reduced spurious differences, allowing us to effectively validate changes. We first validate the feasibility of our idea by studying the characteristics of 240 patches from 4 server programs; our examination shows that 77% of the changes should not be expected to cause large changes and are thereby feasible for Delta execution. We then implemented Delta execution using dynamic instrumentation. Using real world patches from 7 server applications and 3 other programs, we compared our implementation of Delta execution against a traditional side-by-side on-line validation. Delta execution outperformed traditional validation by up to 128%; further, for 3 of the changes, spurious differences caused the traditional validation to fail completely while Delta execution succeeded. This demonstrates that Delta execution can allow administrators to use on-line validation to confidently ensure the correctness of the changes they apply.

dependable systems and networks | 2010

Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive

Eric Anderson; Xiaozhou Li; Arif Merchant; Mehul A. Shah; Kevin Smathers; Joseph Tucek; Mustafa Uysal; Jay J. Wylie

Cloud computing demands cheap, always-on, and reliable storage. We describe Pahoehoe, a key-value cloud storage system we designed to store large objects cost-effectively with high availability. Pahoehoe stores objects across multiple data centers and provides eventual consistency so to be available during network partitions. Pahoehoe uses erasure codes to store objects with high reliability at low cost. Its use of erasure codes distinguishes Pahoehoe from other cloud storage systems, and presents a challenge for efficiently providing eventual consistency. We describe Pahoehoes put, get, and convergence protocols—convergence being the decentralized protocol that ensures eventual consistency. We use simulated executions of Pahoehoe to evaluate the efficiency of convergence, in terms of message count and message bytes sent, for failure-free and expected failure scenarios (e.g., partitions and server unavailability). We describe and evaluate optimizations to the naïve convergence protocol that reduce the cost of convergence in all scenarios.

electronic imaging | 2005

MediaBench II video: expediting the next generation of video systems research

Jason E. Fritts; Frederick W. Steiling; Joseph Tucek

The first step towards the design of video processors and video systems is to achieve an accurate understanding of the major video applications, including not only the fundamentals of the many video compression standards, but also the workload characteristics of those applications. Introduced in 1997, the MediaBench benchmark suite provided the first set of full application-level benchmarks for studying video processing characteristics, and has consequently enabled significant research in computer architecture and compiler research for multimedia systems. To expedite the next generation of systems research, the MediaBench Consortium is developing the MediaBench II benchmark suite, incorporating benchmarks from the latest multimedia technologies, and providing both a single composite benchmark suite as well as separate benchmark suites for each area of multimedia. In the area of video, MediaBench II Video includes both the popular mainstream video compression standards, such as Motion-JPEG, H.263, and MPEG-2, and the more recent next-generation standards, including MPEG-4, Motion-JPEG2000, and H.264. This paper introduces MediaBench II Video and provides a comprehensive workload evaluation of its major processing characteristics.

ieee conference on mass storage systems and technologies | 2005

Trade-offs in protecting storage: a meta-data comparison of cryptographic, backup/versioning, immutable/tamper-proof, and redundant storage solutions

Joseph Tucek; Paul Stanton; Elizabeth Haubert; Ragib Hasan; Larry Brumbaugh; William Yurcik

Modern storage systems are responsible for increasing amounts of data and the value of the data itself is growing in importance. Several primary storage system solutions have emerged for the protection of data: (1) secure storage through cryptography, (2) backup and versioning systems, (3) immutable and tamper-proof storage, and (4) redundant storage. Using results from published studies, we compare these four solutions against different requirements highlighting trade-offs in performance, space, attack resistance, and cost. We also present a case study of applying these solutions based on design work at NCSA. Lastly, we conclude that while different storage protection solutions may be appropriate for different requirements, some general conclusions can be made about current state-of-the-art storage protection solutions as well as directions for future research.

modeling, analysis, and simulation on computer and telecommunication systems | 2009

Efficient tracing and performance analysis for large distributed systems

Eric Anderson; Christopher Hoover; Xiaozhou Li; Joseph Tucek

Distributed systems are notoriously difficult to implement and debug. One important tool for understanding the behavior of distributed systems is tracing. Unfortunately, effective tracing for modern distributed systems faces several challenges. First, many interesting behaviors in distributed systems only occur rarely, or at full production scale. Hence we need tracing mechanisms which impose minimal overhead, in order to allow always-on tracing of production instances. Second, for high-speed systems, messages can be delivered in significantly less time than the error of traditional time synchronization techniques such as network time protocol (NTP), necessitating time adjustment techniques with much higher precision. Third, distributed systems today may generate millions of events per second systemwide, resulting in traces consisting of billions of events. Such large traces can overwhelm existing trace analysis tools. These challenges make effective tracing difficult. We present techniques that address these three challenges. Our contributions include 1) a low-overhead tracing mechanism, which allows tracing of large systems without impacting their behavior or performance (0.14 μs/event), 2) a post hoc technique for producing highly accurate time synchronization across hosts (within 10 /ts, compared to between 100 μs to 2 ms for NTP), and 3) incremental data processing techniques which facilitate analyzing traces containing billions of trace points on desktop systems. We have successfully applied these techniques to two distributed systems, a cooperative caching system and a distributed storage system, and from our experience, we believe our techniques are applicable to other distributed systems.

electronic imaging | 2005

Tamper-resistant storage techniques for multimedia systems

Elizabeth Haubert; Joseph Tucek; Larry Brumbaugh; William Yurcik

Tamper-resistant storage techniques provide varying degrees of authenticity and integrity for data. This paper surveys five implemented tamper-resistant storage systems that use encryption, cryptographic hashes, digital signatures and error-correction primitives to provide varying levels of data protection. Five key evaluation points for such systems are: (1) authenticity guarantees, (2) integrity guarantees, (3) confidentiality guarantees, (4) performance overhead attributed to security, and (5) scalability concerns. Immutable storage techniques can enhance tamper-resistant techniques. Digital watermarking is not appropriate for tamper-resistance implemented in the storage system rather than at the application level.

ACM Transactions on Computer Systems | 2017

Reliability Analysis of SSDs Under Power Fault

Mai Zheng; Joseph Tucek; Feng Qin; Mark David Lillibridge; Bill W. Zhao; Elizabeth S. Yang

Modern storage technology (solid-state disks (SSDs), NoSQL databases, commoditized RAID hardware, etc.) brings new reliability challenges to the already-complicated storage stack. Among other things, the behavior of these new components during power faults—which happen relatively frequently in data centers—is an important yet mostly ignored issue in this dependability-critical area. Understanding how new storage components behave under power fault is the first step towards designing new robust storage systems. In this article, we propose a new methodology to expose reliability issues in block devices under power faults. Our framework includes specially designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our testing framework, we test 17 commodity SSDs from six different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that 14 of the 17 tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.

electronic imaging | 2005

The techniques and challenges of immutable storage with applications in multimedia

Ragib Hasan; Joseph Tucek; Paul Stanton; William Yurcik; Larry Brumbaugh; Jeff Rosendale

Security of storage and archival systems has become increasingly important in recent years. Due to the increased vulnerability of the existing systems and the need to comply with government regulations, different methods have been explored to create a secure storage system. One of the primary problems to ensuring the integrity of storage systems is to make sure a file cannot be changed without proper authorization. Immutable storage is storage whose content cannot be changed once it has been written. For example, it is apparent that critical system files and other important documents should never be changed and thus stored as immutable. In multimedia systems, immutability provides proper archival of indices as well as content. In this paper, a survey of existing techniques for immutability in file systems is presented.

Explore More