Jiwu Shu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jiwu Shu is active.

Explore More

Publication

Featured researches published by Jiwu Shu.

international conference on computer design | 2014

Loose-Ordering Consistency for persistent memory

Youyou Lu; Jiwu Shu; Long Sun; Onur Mutlu

Emerging non-volatile memory (NVM) technologies enable data persistence at the main memory level at access speeds close to DRAM. In such persistent memories, memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to a strict order for writes to persistent memory significantly degrades system performance as it requires flushing dirty data blocks from CPU caches and waiting for their completion at the main memory in the order specified by the program. This paper introduces a new mechanism, called Loose-Ordering Consistency (LOC), that satisfies the ordering requirements of persistent memory writes at significantly lower performance degradation than state-of-the-art mechanisms. LOC consists of two key techniques. First, Eager Commit reduces the commit overhead for writes within a transaction by eliminating the need to perform a persistent commit record write at the end of a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the ordering of writes between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism requires the tracking of committed transaction ID and support for multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of strict write ordering from 66.9% to 34.9% on a variety of workloads.

IEEE Transactions on Power Systems | 2005

A parallel transient stability simulation for power systems

Jiwu Shu; Wei Xue; Weimin Zheng

As power systems continue to develop, online dynamic security analysis and real-time simulation using parallel computing are becoming increasingly important. This paper presents a novel multilevel partition scheme for parallel computing based on power network regional characteristics and describes the design and implementation of a hierarchical block bordered diagonal form (BBDF) algorithm for power network computation. Some optimization schemes are further proposed to reduce the computation and communication time and to improve the scalability of the program. The simulation results show that, for a large network with 2115 nodes, 2614 branches, 248 generators, and 544 loads, the proposed algorithms and schemes run ten times faster on a cluster system with eight CPUs than on a single CPU. Thus, they satisfy the real-time simulation requirement for large-scale power grids.

ACM Transactions on Storage | 2009

GRID codes: Strip-based erasure codes with high fault tolerance for storage systems

Mingqiang Li; Jiwu Shu; Weimin Zheng

As storage systems grow in size and complexity, they are increasingly confronted with concurrent disk failures together with multiple unrecoverable sector errors. To ensure high data reliability and availability, erasure codes with high fault tolerance are required. In this article, we present a new family of erasure codes with high fault tolerance, named GRID codes. They are called such because they are a family of strip-based codes whose strips are arranged into multi-dimensional grids. In the construction of GRID codes, we first introduce a concept of matched codes and then discuss how to use matched codes to construct GRID codes. In addition, we propose an iterative reconstruction algorithm for GRID codes. We also discuss some important features of GRID codes. Finally, we compare GRID codes with several categories of existing codes. Our comparisons show that for large-scale storage systems, our GRID codes have attractive advantages over many existing erasure codes: (a) They are completely XOR-based and have very regular structures, ensuring easy implementation; (b) they can provide up to 15 and even higher fault tolerance; and (c) their storage efficiency can reach up to 80% and even higher. All the advantages make GRID codes more suitable for large-scale storage systems.

design automation conference | 2015

Ambient energy harvesting nonvolatile processors: from circuit to system

Yongpan Liu; Zewei Li; Hehe Li; Yiqun Wang; Xueqing Li; Kaisheng Ma; Shuangchen Li; Meng-Fan Chang; Jack Sampson; Yuan Xie; Jiwu Shu; Huazhong Yang

Energy harvesting is gaining more and more attentions due to its characteristics of ultra-long operation time without maintenance. However, frequent unpredictable power failures from energy harvesters bring performance and reliability challenges to traditional processors. Nonvolatile processors are promising to solve such a problem due to their advantage of zero leakage and efficient backup and restore operations. To optimize the nonvolatile processor design, this paper proposes new metrics of nonvolatile processors to consider energy harvesting factors for the first time. Furthermore, we explore the nonvolatile processor design from circuit to system level. A prototype of energy harvesting nonvolatile processor is set up and experimental results show that the proposed performance metric meets the measured results by less than 6.27% average errors. Finally, the energy consumption of nonvolatile processor is analyzed under different benchmarks.

ACM Transactions on Storage | 2007

SLAS: An efficient approach to scaling round-robin striped volumes

Guangyan Zhang; Jiwu Shu; Wei Xue; Weimin Zheng

Round-robin striping, due to its uniform distribution and low-complexity computation, is widely used by applications which demand high bandwidth and massive storage. Because many systems are nonstoppable when their storage capacity and I/O bandwidth need increasing, an efficient and online mechanism to add more disks to striped volumes is very important. In this article, it is presented and proved that during data redistribution caused by scaling a round-robin striped volume, there is always a reordering window where data consistency can be maintained while changing the order of data movements. Furthermore, by exploiting the reordering window characteristic, SLAS is proposed to scale round-robin striped volumes, which reduces the cost of data redistribution effectively. First, SLAS applies a new mapping management solution based on a sliding window to support data redistribution without loss of scalability; second, it uses lazy updates of mapping metadata to decrease the number of metadata writes required by data redistribution; third, it changes the order of data chunk movements to aggregate reads/writes of data chunks. Our results from detailed simulations using real-system workloads show that, compared with the traditional approach, SLAS can reduce redistribution duration by up to 40.79% with similar maximum response time of foreground I/Os. Finally, our discussion indicates that the SLAS approach works for both disk addition and disk removal to/from striped volumes.

international symposium on microarchitecture | 2013

Aegis: partitioning data block for efficient recovery of stuck-at-faults in phase change memory

Jie Fan; Song Jiang; Jiwu Shu; Youhui Zhang; Weimin Zhen

While Phase Change Memory (PCM) holds a great promise as a complement or even replacement of DRAM-based memory and flash-based storage, it must effectively overcome its limit on write endurance to be a reliable device for an extended period of intensive use. The limited write endurance can lead to permanent stuck-at faults after a certain number of writes, which causes some memory cells permanently stuck at either ‘0’ or ‘1’. State-of-the-art solutions apply a bit inversion technique on selected bit groups of a data block after its partitioning. The effectiveness of this approach hinges on how a data block is partitioned into bit groups. While all existing solutions can separate faults into different groups for error correction, they are inadequate on three fundamental capabilities desired for any partition scheme. First, it can maximize probability of successfully re-partitioning a block so that two faults currently in the same group are placed into two new groups. Second, it can partition a block into a small number of groups for space efficiency. Third, it should spread out faults across the groups as uniformly as possible, so that more faults can be accommodated within the same number of groups. A recovery solution with these capabilities can provide strong fault tolerance with minimal overhead. We propose Aegis, a recovery solution with a systematical partition scheme using fewer groups to accommodate more faults compared with state-of-the-art schemes. The uniqueness of Aegiss partition scheme lies on its guarantee that any two bits in the same group will not be in the same group after a re-partition. Empowered by the partition scheme, Aegis can recover significantly more faults with reduced space overhead relative to state-of-the-art solutions.

IEEE Transactions on Computers | 2010

ALV: A New Data Redistribution Approach to RAID-5 Scaling

Guangyan Zhang; Weiman Zheng; Jiwu Shu

When a RAID-5 volume is scaled up with added disks, data have to be redistributed from original disks to all disks including the original and the new. Existing online scaling techniques suffer from long redistribution times as well as negative impacts on application performance. By leveraging our insight into a reordering window, this paper presents ALV, a new data redistribution approach to RAID-5 scaling. The reordering window is a result of the natural space hole as data being redistributed, and it grows in size. The data inside the reordering window can migrate in any order without overwriting other in-use data chunks. The ALV approach exploits three novel techniques. First, ALV changes the movement order of data chunks to access multiple successive chunks via a single I/O. Second, ALV updates mapping metadata lazily to minimize the number of metadata writes while ensuring data consistency. Third, ALV uses an on/off logical valve to adaptively adjust the redistribution rate depending on application workload. We implemented ALV in Linux Kernel 2.6.18 and evaluated its performance by replaying three real-system traces: TPC-C, Cello-99, and SPC-Web. The results demonstrated that ALV outperformed the conventional approach consistently by 53.31-73.91 percent in user response time and by 24.07-29.27 percent in redistribution time.

european conference on computer systems | 2016

A high performance file system for non-volatile main memory

Jiaxin Ou; Jiwu Shu; Youyou Lu

Emerging non-volatile main memories (NVMMs) provide data persistence at the main memory level. To avoid the double-copy overheads among the user buffer, the OS page cache, and the storage layer, state-of-the-art NVMM-aware file systems bypass the OS page cache which directly copy data between the user buffer and the NVMM storage. However, one major drawback of existing NVMM technologies is the slow writes. As a result, such direct access for all file operations can lead to suboptimal system performance. In this paper, we propose HiNFS, a high performance file system for non-volatile main memory. Specifically, HiNFS uses an NVMM-aware Write Buffer policy to buffer the lazy-persistent file writes in DRAM and persists them to NVMM lazily to hide the long write latency of NVMM. However, HiNFS performs direct access to NVMM for eager-persistent file writes, and directly reads file data from both DRAM and NVMM as they have similar read performance, in order to eliminate the double-copy overheads from the critical path. To ensure read consistency, HiNFS uses a combination of the DRAM Block Index and Cacheline Bitmap to track the latest data between DRAM and NVMM. Finally, HiNFS employs a Buffer Benefit Model to identify the eager-persistent file writes before issuing the write operations. Using software NVMM emulators, we evaluate HiNFSs performance with various workloads. Comparing with state-of-the-art NVMM-aware file systems - PMFS and EXT4-DAX, surprisingly, our results show that HiNFS improves the system throughput by up to 184% for filebench microbenchmarks and reduces the execution time by up to 64% for data-intensive traces and macro-benchmarks, demonstrating the benefits of hiding the long write latency of NVMM.

ieee conference on mass storage systems and technologies | 2015

Blurred persistence in transactional persistent memory

Youyou Lu; Jiwu Shu; Long Sun

Persistent memory provides data persistence at main memory level and enables memory-level storage systems. To ensure consistency of the storage systems, memory writes need to be transactional and are carefully moved across the boundary between the volatile CPU cache and the persistent memory. Unfortunately, the CPU cache is hardware-controlled, and it incurs high overhead for programs to track and move data blocks from being volatile to persistent. In this paper, we propose a software-based mechanism, Blurred Persistence, to blur the volatility-persistence boundary, so as to reduce the overhead in transaction support. Blurred Persistence consists of two techniques. First, Execution in Log executes a transaction in the log to eliminate duplicated data copies for execution. It allows the persistence of volatile uncommitted data, which can be detected by reorganizing the log structure. Second, Volatile Checkpoint with Bulk Persistence allows the committed data to aggressively stay volatile by leveraging the data durability in the log, as long as the commit order across threads is kept. By doing so, it reduces the frequency of forced persistence and improves cache efficiency. Evaluations show that our mechanism improves system performance by 56.3% to 143.7% for a variety of workloads.

international conference on computer design | 2013

LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions

Youyou Lu; Jiwu Shu; Jia Guo; Shuai Li; Onur Mutlu

Flash memory has accelerated the architectural evolution of storage systems with its unique characteristics compared to magnetic disks. The no-overwrite property of flash memory has been leveraged to efficiently support transactions, a commonly used mechanism in systems to provide consistency. However, existing transaction designs embedded in flash-based Solid State Drives (SSDs) have limited support for transaction flexibility, i.e., support for different isolation levels between transactions, which is essential to enable different systems to make tradeoffs between performance and consistency. Since they provide support for only strict isolation between transactions, existing designs lead to a reduced number of on-the-fly requests and therefore cannot exploit the abundant internal parallelism of an SSD. There are two design challenges that need to be overcome to support flexible transactions: (1) enabling a transaction commit protocol that supports parallel execution of transactions; and (2) efficiently tracking the state of transactions that have pages scattered over different locations due to parallel allocation of pages. In this paper, we propose LightTx to address these two challenges. LightTx supports transaction flexibility using a lightweight embedded transaction design. The design of LightTx is based on two key techniques. First, LightTx uses a commit protocol that determines the transaction state solely inside each transaction (as opposed to having dependencies between transactions that complicate state tracking) in order to support parallel transaction execution. Second, LightTx periodically retires the dead transactions to reduce transaction state tracking cost. Experiments show that LightTx provides up to 20.6% performance improvement due to transaction flexibility. LightTx also achieves nearly the lowest overhead in garbage collection and mapping persistence compared to existing embedded transaction designs.

Explore More