Is this you? Create Your Porfile

Pin Zhou

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pin Zhou is active.

Explore More

Publication

Featured researches published by Pin Zhou.

architectural support for programming languages and operating systems | 2004

Dynamic tracking of page miss ratio curve for memory management

Pin Zhou; Vivek Pandey; Jagadeesan Sundaresan; Anand Raghuraman; Yuanyuan Zhou; Sanjeev Kumar

Memory can be efficiently utilized if the dynamic memory demands of applications can be determined and analyzed at run-time. The page miss ratio curve(MRC), i.e. page miss rate vs. memory size curve, is a good performance-directed metric to serve this purpose. However, dynamically tracking MRC at run time is challenging in systems with virtual memory because not every memory reference passes through the operating system (OS).This paper proposes two methods to dynamically track MRC of applications at run time. The first method is using a hardware MRC monitor that can track MRC at fine time granularity. Our simulation results show that this monitor has negligible performance and energy overheads. The second method is an OS-only implementation that can track MRC at coarse time granularity. Our implementation results on Linux show that it adds only 7--10% overhead.We have also used the dynamic MRC to guide both memory allocation for multiprogramming systems and memory energy management. Our real system experiments on Linux with applications including Apache Web Server show that the MRC-directed memory allocation can speed up the applications execution/response time by up to a factor of 5.86 and reduce the number of page faults by up to 63.1%. Our execution-driven simulation results with SPEC2000 benchmarks show that the MRC-directed memory energy management can improve the Energy * Delay metric by 27--58% over previously proposed static and dynamic schemes.

international symposium on microarchitecture | 2004

AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants

Pin Zhou; Wei Liu; Long Fei; Shan Lu; Feng Qin; Yuanyuan Zhou; Samuel P. Midkiff; Josep Torrellas

This paper makes two contributions to architectural support for software debugging. First, it proposes a novel statistics-based, on-the-fly bug detection method called PC-based invariant detection. The idea is based on the observation that, in most programs, a given memory location is typically accessed by only a few instructions. Therefore, by capturing the invariant of the set of PCs that normally access a given variable, we can detect accesses by outlier instructions, which are often caused by memory corruption, buffer overflow, stack smashing or other memory-related bugs. Since this method is statistics-based, it can detect bugs that do not violate any programming rules and that, therefore, are likely to be missed by many existing tools. The second contribution is a novel architectural extension called the Check Look-aside Buffer (CLB). The CLB uses a Bloom filter to reduce monitoring overheads in the recently-proposed iWatcher architectural framework for software debugging. The CLB significantly reduces the overhead of PC-based invariant debugging. We demonstrate a PC-based invariant detection tool called AccMon that leverages architectural, run-time system and compiler support. Our experimental results with seven buggy applications and a total of ten bugs, show that AccMon can detect all ten bugs with few false alarms (0 for five applications and 2-8 for two applications) and with low overhead (0.24-2.88 times). Several existing tools evaluated, including Purify, CCured and value-based invariant detection tools, fail to detect some of the bugs. In addition, Purifys overhead is one order of magnitude higher than AccMons. Finally, we show that the CLB is very effective at reducing overhead.

international symposium on computer architecture | 2004

iWatcher: Efficient Architectural Support for Software Debugging

Pin Zhou; Feng Qin; Wei Liu; Yuanyuan Zhou; Josep Torrellas

Recent impressive performance improvements in computer architecture have not led to significant gains in ease of debugging. Software debugging often relies on inserting run-time software checks. In many cases, however, it is hard to find the root cause of a bug. Moreover, program execution typically slows down significantly, often by 10-100 times. To address this problem, this paper introduces the Intelligent Watcher (iWatcher), novel architectural support to monitor dynamic execution with minimal overhead, automatically, and flexibly. iWatcher associates program-specified monitoring functions with memory locations. When any such location is accessed, the monitoring function is automatically triggered with low overhead. To further reduce overhead and support rollback, iWatcher can leverage Thread-Level Speculation (TLS). To test iWatcher, we use applications with various bugs. Our results show that iWatcher detects many more software bugs than Valgrind, a well-known open-source bug detector. Moreover, iWatcher only induces a 4-80% execution overhead, which is orders of magnitude less than Valgrind. Even with 20% of the dynamic loads monitored in a program, iWatcher adds only 66-174% overhead. Finally, TLS is effective at reducing overheads for programs with substantial monitoring.

high-performance computer architecture | 2007

HARD: Hardware-Assisted Lockset-based Race Detection

Pin Zhou; Radu Teodorescu; Yuanyuan Zhou

The emergence of multicore architectures will lead to an increase in the use of multithreaded applications that are prone to synchronization bugs, such as data races. Software solutions for detecting data races generally incur large overheads. Hardware support for race detection can significantly reduce that overhead. However, all existing hardware proposals for race detection are based on the happens-before algorithm which is sensitive to thread interleaving and cannot detect races that are not exposed during the monitored run. The lockset algorithm addresses this limitation. Unfortunately, due to the challenging issues such as storing the lockset information and performing complex set operations, so far it has been implemented only in software with 10-30 times performance hit. This paper proposes the first hardware implementation (called HARD) of the lockset algorithm to exploit the race detection capability of this algorithm with minimal overhead. HARD efficiently stores lock sets in hardware bloom filters and converts the expensive set operations into fast bitwise logic operations with negligible overhead. We evaluate HARD using six SPLASH-2 applications with 60 randomly injected bugs. Our results show that HARD can detect 54 out of 60 tested bugs, 20% more than happens-before, with only 0.1-2.6% of execution overhead. We also show our hardware design is cost-effective by comparing with the ideal lockset implementation, which would require a large amount of hardware resources

architectural support for programming languages and operating systems | 2004

Performance directed energy management for main memory and disks

Xiaodong Li; Zhenmin Li; Francis M. David; Pin Zhou; Yuanyuan Zhou; Sarita V. Adve; Sanjeev Kumar

Much research has been conducted on energy management for memory and disks. Most studies use control algorithms that dynamically transition devices to low power modes after they are idle for a certain threshold period of time. The control algorithms used in the past have two major limitations. First, they require painstaking, application-dependent manual tuning of their thresholds to achieve energy savings without significantly degrading performance. Second, they do not provide performance guarantees. In one case, they slowed down an application by 835.This paper addresses these two limitations for both memory and disks, making memory/disk energy-saving schemes practical enough to use in real systems. Specifically, we make three contributions: (1) We propose a technique that provides a performance guarantee for control algorithms. We show that our method works well for all tested cases, even with previously proposed algorithms that are not performance-aware. (2) We propose a new control algorithm, Performance-directed Dynamic (PD), that dynamically adjusts its thresholds periodically, based on available slack and recent workload characteristics. For memory, PD consumes the least energy, when compared to previous hand-tuned algorithms combined with a performance guarantee. However, for disks, PD is too complex and its self-tuning is unable to beat previous hand-tuned algorithms. (3) To improve on PD, we propose a simple, optimization-based, threshold-free control algorithm, Performance-directed Static (PS). PS periodically assigns a static configuration by solving an optimization problem that incorporates information about the available slack and recent traffic variability to different chips/disks. We find that PS is the best or close to the best across all performanceguaranteed disk algorithms, including hand-tuned versions.

international symposium on microarchitecture | 2006

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

Shan Lu; Pin Zhou; Wei Liu; Yuanyuan Zhou; Josep Torrellas

Dynamic software bug detection tools are commonly used because they leverage run-time information. However, they suffer from a fundamental limitation, the path coverage problem: they detect bugs only in taken paths but not in non-taken paths. In other words, they require bugs to be exposed in the monitored execution. This paper makes one of the first attempts to address this fundamental problem with a simple hardware extension. First, we propose PathExpander, a novel design that dynamically increases the code path coverage of dynamic bug detection tools with no programmer involvement. As a program executes, PathExpander selectively executes non-taken paths in a sandbox without side effects. This enables dynamic bug detection tools to find bugs that are present in these non-taken paths and would otherwise not be detected. Second, we propose a simple hardware extension to control the huge overhead in its pure software implementation to a moderate level. To further minimize overhead, PathExpander provides an optimization option to execute non-taken paths on idle cores in chip multi-processor architectures that support speculative execution. To evaluate PathExpander, we use three dynamic bug detection methods: dynamic software-only checker (CCured), dynamic hardware-assisted checker (iWatcher) and assertions; and conduct side-by-side comparison with PathExpanders counterpart software implementation. Our experiments with seven buggy programs using general inputs that do not expose the tested bugs show that PathExpander is able to help these tools detect 21 (out of 38) tested bugs that are otherwise missed. This is because PathExpander increases the code coverage of each test case from 40% to 65% on average, based on the branch coverage metric. When applications are tested with multiple inputs, the cumulative coverage also significantly improves by 19%. We also show that PathExpander introduces modest false positives (4 on average) and overhead (less than 9.9%). The 3-4 orders of magnitude lower overhead compared with pure-software implementation further justifies the hardware design in PathExpander

ACM Transactions on Architecture and Code Optimization | 2005

Efficient and flexible architectural support for dynamic monitoring

Yuanyuan Zhou; Pin Zhou; Feng Qin; Wei Liu; Josep Torrellas

Recent impressive performance improvements in computer architecture have not led to significant gains in the case of debugging. Software debugging often relies on inserting run-time software checks. In many cases, however, it is hard to find the root cause of a bug. Moreover, program execution typically slows down significantly, often by 10--100 times.To address this problem, this paper introduces the intelligent watcher (iWatcher), a novel architectural scheme to monitor dynamic execution automatically, flexibly, and with minimal overhead. iWatcher associates program-specified monitoring functions with memory locations. When any such location is accessed, the monitoring function is automatically triggered with low overhead. To further reduce overhead and support rollback, iWatcher can optionally leverage thread-level speculation (TLS). The iWatcher architecture can be used to detect various bugs, including buffer overflow, accessing freed locations, memory leaks, stack-smashing and value-invariant violations. To evaluate iWatcher, we use seven applications with various real and injected bugs. Our results show that iWatcher detects many more software bugs than Valgrind, a well-known open-source bug detector. Moreover, iWatcher only induces a 0.1--179&percent; execution overhead, which is orders of magnitude less than Valgrind. Our sensitivity study shows that even with 20&percent; of the dynamic loads monitored in a program, iWatcher adds only 72--182&percent; overhead. Finally, TLS is effective at reducing overheads for programs with substantial monitoring.

IEEE Micro | 2004

iWatcher: simple, general architectural support for software debugging

Pin Zhou; Feng Uin; Wei Liu; Yuanyuan Zhou; Josep Torrellas

We propose Intelligent Watcher (iWatcher), a combination of hardware and software support that can detect large variations of software bugs with only modest hardware changes to current processor implementations. iWatcher lets programmers associate specified functions to watched memory locations or objects. Access to any such location automatically triggers the monitoring function in the hardware. Relative to other approaches, iWatcher detects many real bugs at a fraction of the execution-time overhead

IEEE Micro | 2004

Performance-directed energy management for storage systems

Xiaodong Li; Zhenmin Li; Pin Zhou; Yuanyuan Zhou; Sarita V. Adve; Sanjeev Kumar

Energy consumption has become an important issue in the design of battery-operated mobile devices and sophisticated data centers. The storage hierarchy, which includes memory and disks, is a major energy consumer in such systems; especially for high-end servers at data centers. Much work has focused on energy control algorithms for storage systems that transition a device into a low power mode when a certain usage function exceeds a specified threshold. These algorithms are difficult to use in real systems, however, because designers must painstakingly and manually tune threshold values, and even then a performance guarantee is difficult. To address these limitations, we develop three algorithms: 1) a performance guarantee technique that designers can use with any underlying energy-control algorithm 2) a performance-directed control algorithm that periodically assigns a static configuration to different devices by solving an optimization problem 3) another performance-directed control algorithm that dynamically self-tunes according to an optimal set of thresholds

Archive | 2005