Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Windsor Wee Sun Hsu.
ACM Transactions on Computer Systems | 2005
Windsor Wee Sun Hsu; Alan Jay Smith; Honesty C. Young
Disk I/O is increasingly the performance bottleneck in computer systems despite rapidly increasing disk data transfer rates. In this article, we propose Automatic Locality-Improving Storage (ALIS), an introspective storage system that automatically reorganizes selected disk blocks based on the dynamic reference stream to increase effective storage performance. ALIS is based on the observations that sequential data fetch is far more efficient than random access, that improving seek distances produces only marginal performance improvements, and that the increasingly powerful processors and large memories in storage systems have ample capacity to reorganize the data layout and redirect the accesses so as to take advantage of rapid sequential data transfer. Using trace-driven simulation with a large set of real workloads, we demonstrate that ALIS considerably outperforms prior techniques, improving the average read performance by up to 50% for server workloads and by about 15% for personal computer workloads. We also show that the performance improvement persists as disk technology evolves. Since disk performance in practice is increasing by only about 8% per year, the benefit of ALIS may correspond to as much as several years of technological progress.
ACM Transactions on Database Systems | 2001
Windsor Wee Sun Hsu; Alan Jay Smith; Honesty C. Young
As improvements in processor performance continue to far outpace improvements in storage performance, I/O is increasingly the bottleneck in computer systems, especially in large database systems that manage huge amoungs of data. The key to achieving good I/O performance is to thoroughly understand its characteristics. In this article we present a comprehensive analysis of the logical I/O reference behavior of the peak productiondatabase workloads from ten of the worlds largest corporations. In particular, we focus on how these workloads respond to different techniques for caching, prefetching, and write buffering. Our findings include several broadly applicable rules of thumb that describe how effective the various I/O optimization techniques are for the production workloads. For instance, our results indicate that the buffer pool miss ratio tends to be related to the ratio of buffer pool size to data size by an inverse square root rule. A similar fourth root rule relates the write miss ratio and the ration of buffer pool size to data size. In addition, we characterize the reference characteristics of workloads similar to the Transaction Processing Performance Council (TPC) benchmarks C (TPC-C) and D(TPC-D), which are de facto standard performance measures for online transaction processing (OLTP) systems and decision support systems (DSS), respectively. Since benchmarks such as TPC-C and TPC-D can only be used effectively if their strengths and limitations are understood, a major focus of our analysis is to identify aspects of the benchmarks that stress the system differently than the production workloads. We discover that for the most part, the reference behavior of TPC-C and TPC-D fall within the range of behavior exhibited by the production workloads. However, there are some noteworthy exceptions that affect well-known I/O optimization techniques such as caching (LRU is further from the optimal for TPC-C, while there is little sharing of pages between transactions for TPC-D), prefetching (TPC-C exhibits no significant sequentiality), and write buffering (write buffering is lees effective for the TPC benchmarks). While the two TPC benchmarks generally complement one another in reflecting the characteristics of the production workloads, there remain aspects of the real workloads that are not represented by either of the benchmarks.
architectural support for programming languages and operating systems | 1998
Jih-Kwon Peir; Yongjoon Lee; Windsor Wee Sun Hsu
Memory references exhibit locality and are therefore not uniformly distributed across the sets of a cache. This skew reduces the effectiveness of a cache because it results in the caching of a considerable number of less-recently-used lines which are less likely to be re-referenced before they are replaced. In this paper, we describe a technique that dynamically identifies these less-recently-used lines and effectively utilizes the cache frames they occupy to more accurately approximate the global least-recently-used replacement policy while maintaining the fast access time of a direct-mapped cache. We also explore the idea of using these underutilized cache frames to reduce cache misses through data prefetching. In the proposed design, the possible locations that a line can reside in is not predetermined. Instead, the cache is dynamically partitioned into groups of cache lines. Because both the total number of groups and the individual group associativity adapt to the dynamic reference pattern, we call this design the adaptive group-associative cache. Performance evaluation using trace-driven simulations of the TPC-C benchmark and selected programs from the SPEC95 benchmark suite shows that the group-associative cache is able to achieve a hit ratio that is consistently better than that of a 4-way set-associative cache. For some of the workloads, the hit ratio approaches that of a fully-associative cache.
Ibm Journal of Research and Development | 2004
Windsor Wee Sun Hsu; Alan Jay Smith
In this paper, we use real server and personal computer workloads to systematically analyze the true performance impact of various I/O optimization techniques, including read caching, sequential prefetching, opportunistic prefetching, write buffering, request scheduling, striping, and short-stroking. We also break down disk technology improvement into four basic effects--faster seeks, higher RPM, linear density improvement, and increase in track density--and analyze each separately to determine its actual benefit. In addition, we examine the historical rates of improvement and use the trends to project the effect of disk technology scaling. As part of this study, we develop a methodology for replaying real workloads that more accurately models I/O arrivals and that allows the I/O rate to be more realistically scaled than previously. We find that optimization techniques that reduce the number of physical I/Os are generally more effective than those that improve the efficiency in performing the I/Os. Sequential prefetching and write buffering are particularly effective, reducing the average read and write response time by about 50% and 90%, respectively. Our results suggest that a reliable method for improving performance is to use larger caches up to and even beyond 1% of the storage used. For a given workload, our analysis shows that disk technology improvement at the historical rate increases performance by about 8% per year if the disk occupancy rate is kept constant, and by about 15% per year if the same number of disks are used. We discover that the actual average seek time and rotational latency are, respectively, only about 35% and 60% of the specified values. We also observe that the disk head positioning time far dominates the data transfer time, suggesting that to effectively utilize the available disk bandwidth, data should be reorganized such that accesses become more sequential.
Ibm Systems Journal | 2001
Windsor Wee Sun Hsu; A. Jay Smith; Hunter Young
There has been very little empirical analysis of any real production database workloads. Although the Transaction Processing Performance Council benchmarks C (TPC-CTM) and D (TPC-DTM) have become the standard benchmarks for on-line transaction processing and decision support systems, respectively, there has not been any major effort to systematically analyze their workload characteristics, especially in relation to those of real production database workloads. In this paper, we examine the characteristics of the production database workloads of ten of the worlds largest corporations, and we also compare them to TPC-C and TPC-D. We find that the production workloads exhibit a wide range of behavior. In general, the two TPC benchmarks complement one another in reflecting the characteristics of the production workloads, but some aspects of real workloads are still not represented by either of the benchmarks. Specifically, our analysis suggests that the TPC benchmarks tend to exercise the following aspects of the system differently than the production workloads: concurrency control mechanism, workload-adaptive techniques, scheduling and resource allocation policies, and I/O optimizations for temporary and index files. We also reexamine Amdahls rule of thumb for a typical data processing system and discover that both the TPC benchmarks and the production workloads generate on the order of 0.5 to 1.0 bit of logical I/O per instruction, surprisingly close to the much earlier figure.
international conference on management of data | 2005
Qingbo Zhu; Windsor Wee Sun Hsu
As critical records are increasingly stored in electronic form, which tends to make for easy destruction and clandestine modification, it is imperative that they be properly managed to preserve their trustworthiness, i.e., their ability to provide irrefutable proof and accurate details of events that have occurred. The need for proper record keeping is further underscored by the recent corporate misconduct and ensuing attempts to destroy incriminating records. Currently, the industry practice and regulatory requirements (e.g., SEC Rule 17a-4) rely on storing records in WORM storage to immutably preserve the records. In this paper, we contend that simply storing records in WORM storage is increasingly inadequate to ensure that they are trustworthy. Specifically, with the large volume of records that are typical today, meeting the ever more stringent query response time requires the use of direct access mechanisms such as indexes. Relying on indexes for accessing records could, however, provide a means for effectively altering or deleting records, even those stored in WORM storage.In this paper, we establish the key requirements for a fossilized index that protects the records from such logical modification. We also analyze current indexing methods to determine how they fall short of these requirements. Based on our insights, we propose the Generalized Hash Tree (GHT). Using both theoretical analysis and simulations with real system data, we demonstrate that the GHT can satisfy the requirements of a fossilized index with performance and cost that are comparable to regular indexing techniques such as the B-tree. We further note that as records are indexed on multiple fields to facilitate search and retrieval, the records can be reconstructed from the corresponding index entries even after the records expire and are disposed of, Therefore, we also present a novel method to eliminate this disclosure risk by allowing an index entry to be effectively disposed of when its record expires.
international conference on parallel and distributed systems | 2000
Windsor Wee Sun Hsu; Alan Jay Smith; Honesty C. Young
Recent developments in both hardware and software have made it worthwhile to consider embedding intelligence in storage to handle general-purpose processing that can be off-loaded from the hosts. In particular, low-cost processing power is now widely available and software can be made robust, secure and mobile. In this paper, we propose a general smart storage (SmartSTOR) architecture in which a processing unit that is coupled to one or more disks can be used to perform such off-loaded processing. A major part of the paper is devoted to understanding the performance potential of the SmartSTOR architecture for decision support workloads. Our analysis suggests that there is a definite performance advantage in using fewer but more powerful processors, a result that bolsters the case for sharing a powerful processor among multiple disks. As for software architecture, we find that the off-loading of database operations that involve only a single relation is not very promising. In order to achieve significant speed-up, we have to consider the off-loading of multiple-relation operations. In general, if embedding intelligence in storage is an inevitable architectural trend, we have to focus on developing parallel software systems that can effectively take advantage of the large number of processing units that will be in the system.
european conference on parallel processing | 2000
Ying Chen; Windsor Wee Sun Hsu; Honesty C. Young
Parity-based disk arrays provide high reliability and high performance for read and large write accesses at low storage cost. However, small writes are notoriously slow due to the well-known read-modify-write problem. This paper presents logging RAID, a disk array architecture that adopts data logging techniques to overcome the small-write problem in parity-based disk arrays. Logging RAID achieves high performance for a wide variety of I/O access patterns with very small disk space overhead. We show this through trace-driven simulations.
european symposium on research in computer security | 2008
Hong Chen; Xiaonan Ma; Windsor Wee Sun Hsu; Ninghui Li; Qihua Wang
Outsourced data publishing is a promising approach to achieve higher distribution efficiency, greater data survivability, and lower management cost. In outsourced data publishing (sometimes referred to as third-party publishing), a data owner gives the content of databases to multiple publishers which answer queries sent by clients. In many cases, the trustworthiness of the publishers cannot be guaranteed; therefore, it is important for a client to be able to verify the correctness of the query results. Meanwhile, due to privacy concerns, it is also required that such verification does not expose information that is outside a clients access control area. Current approaches for verifying the correctness of query results in third-party publishing either do not consider the privacy preserving requirement, or are limited to one dimensional queries. In this paper, we introduce a new scheme for verifying the correctness of query results while preserving data privacy. Our approach handles multi-dimensional range queries. We present both theoretical analysis and experimental results to demonstrate that our approach is time and space efficient.
architectural support for programming languages and operating systems | 1996
Jih-Kwon Peir; Windsor Wee Sun Hsu; Honesty C. Young; Shauchi Ong
There are two concurrent paths in a typical cache access --- one through the data array and the other through the tag array. The path through the data array drives the selected set out of the array. The path through the tag array determines cache hit/miss and, for set-associative caches, selects the appropriate line from within the selected set. In both direct-mapped and set-associative caches, the path through the tag array is significantly longer than that through the data array. In this paper, we propose a path balancing technique help match the delays of the tag and data paths. The basic idea behind this technique is to employ a separate subset of the tag array to decouple the one-to-one relationship between address tags and cache lines so as to achieve a design that provides higher performance. Performance evaluation using both TPC-C and SPEC92 benchmarks shows that this path balancing technique offers impressive improvements in overall system performance over conventional cache designs. For TPC-C, improvements in the range of 6% to 28% are possible.