Zhiguang Chen
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhiguang Chen.
Science in China Series F: Information Sciences | 2011
Nong Xiao; Zhiguang Chen; Fang Liu; MingChe Lai; Longfei An
Driven by data-intensive applications, flash-based solid state drives (SSDs) have become increasingly popular in enterprise-scale storage systems. Flash memory exhibits inherent parallelism. However, existing solid state drives have not fully exploited this superiority. We propose P3Stor, a parallel solid state storage architecture that makes full use of flash memory by utilizing module- and bus-level parallelisms to increase average bandwidth and employing chip-level interleaving to hide I/O latency. To improve the bandwidth utilization of traditional interface protocols (e.g., SATA), P3Stor adopts PCI-E interface to support concurrent transactions. Based on the proposed parallel architecture, we design a lazy flash translation layer (LazyFTL) to manage the address space. The proposed LazyFTL adopts flexible super page-level mapping scheme to support multi-level parallelisms. It is able to distinguish hot data from cold data, and hot data identification enables LazyFTL to direct hot and cold data to separate physical blocks, which reduces page migrations when reclaiming blocks. As garbage collector migrates fewer valid pages, write amplification is significantly reduced, which in turn helps to extend the life span. Moreover, LazyFTL rarely triggers wear-leveling process. The lazy wear-leveling mechanism protects users’ requests from being disrupted by background operations. With the guidance of hot data identification, an intelligent write buffer is used to reduce program operations to flash chips. This is meaningful in extending P3Stor’s life span. The performance evaluation using trace-driven simulations and theoretical analysis shows that P3Stor achieves high performance and its life span is more than doubled.
design, automation, and test in europe | 2016
Hang Zhang; Nong Xiao; Fang Liu; Zhiguang Chen
Emerging Resistive Memory (ReRAM) technology is a promising candidate as the replacement to DRAM due to its low leakage power consumption, good scalability, and high density. By employing crossbar structures, the density of ReRAM can be further improved for capacity benefits. However, such structure also causes an IR drop issue due to wire resistance and sneak currents, which lead to an access latency discrepancy in ReRAM memory banks. Existing designs conservatively utilize the worst-case latency of ReRAM arrays, and thus fail to explore the potential of the fast access speed of ReRAM, resulting in suboptimal performance. In this work, we present an asymmetric ReRAM memory design, which separates a crossbar array into multiple logical regions according to their access latency, and further groups logical regions across different crossbars into virtual regions. Based on the observation of access hotspots inside memory banks, we design a table structure to remap memory requests to different virtual regions with non-uniform access latency, so as to match these access hotspots with the underlying asymmetric bank design. We then introduce both static mapping and dynamic mapping schemes to prioritize memory requests from critical applications to the fast regions for better performance. Experimental results show that our design can improve the 4-core system performance by 13.3% and reduce the memory latency by 21.6% on average for a ReRAM-based memory system across memory intensive applications.
acm international conference on systems and storage | 2012
Zhiguang Chen; Nong Xiao; Fang Liu
Solid-state drives (SSDs) are widely used in storage systems. However, algorithms adopted by existing operating systems generally consider the underlying devices as hard disks, and thus are rarely optimized for SSDs. In this paper, we focus on a classical research issue, the cache replacement policy, and design a new policy by taking the parallelism of SSDs into account. A typical SSD contains several parallel channels. Some channels contain more hot data, thus are busy with read requests. The other channels may only contain cold data. So, workloads among these channels may be unbalanced. Requests issued to busy channels may take a while to address, whereas requests issued to idle channels may be served rapidly. We design a new cache replacement policy for read data, which considers the unbalanced workloads among channels. The policy gives higher priority to evicting pages from idle channels because they are more easily retrieved. On the other hand, pages from busy channels are protected. In this manner, the average latency for obtaining pages is reduced. However, SSDs are black boxes, with operating systems are blind to the channel from which a page comes. Therefore, we propose a simple scheme that determines whether a page is from a busy or an idle channel. The scheme monitor the page requests issued to the underlying storage device. When a page request is returned, and many page requests issued earlier than it have not been returned, the page is assumed to be from an idle channel. We compare our cache replacement policy with others via trace-driven simulations. The performance is measured in terms of average response time. Comparison candidates include LRU, ARC and the policy adopted by Linux. The experimental results show that our policy outperforms the other policies on most traces when workloads among channels are unbalanced.
Knowledge and Information Systems | 2014
Zhiguang Chen; Yutong Lu; Nong Xiao; Fang Liu
Big Data requires a shift in traditional computing architecture. The in-memory computing is a new paradigm for Big Data analytics. However, DRAM-based main memory is neither cost-effective nor energy-effective. This work combines flash-based solid state drive (SSD) and DRAM together to build a hybrid memory, which meets both of the two requirements. As the latency of SSD is much higher than that of DRAM, the hybrid architecture should guarantee that most requests are served by DRAM rather than by SSD. Accordingly, we take two measures to enhance the hit ratio of DRAM. First, the hybrid memory employs an adaptive prefetching mechanism to guarantee that data have already been prepared in DRAM before they are demanded. Second, the DRAM employs a novel replacement policy to give higher priority to replace data that are easy to be prefetched because these data can be served by prefetching once they are demanded once again. On the contrary, the data that are hard to be prefetched are protected by DRAM. The prefetching mechanism and replacement policy employed by the hybrid memory rely on access patterns of files. So, we propose a novel pattern recognition method by improving the LZ data compression algorithm to detect access patterns. We evaluate our proposals via prototype and trace-driven simulations. Experimental results demonstrate that the hybrid memory is able to extend the DRAM by more than twice.
Science in China Series F: Information Sciences | 2012
Nong Xiao; Yingjie Zhao; Fang Liu; Zhiguang Chen
Caching is one of the most effective and commonly used mechanisms to improve performance of storage servers. Replacement policies play a critical role in the cache design due to the limited cache capacity. Recent researchers devote themselves to achieve high hit ratios, but rarely pay attention to reducing miss penalty during the design of a replacement policy. To address the issue, this paper presents a novel algorithm, called dual queues cache replacement algorithm based on sequentiality detection, which prefers to drop sequential blocks and protect random blocks. The buffer cache can serve more subsequent random read requests, so the cache miss penalty could be decreased significantly. Moreover, the algorithm makes use of two queues separately maintaining new blocks and old blocks to avoid the degradation of hit ratios. Our trace-driven simulation results show that it performs better than LRU and ARC for a wide range of cache sizes and workloads.
trust security and privacy in computing and communications | 2013
Fangyong Hou; Nong Xiao; Hongjun He; Fang Liu; Zhiguang Chen
Data encryption is the most important way to provide security in hostile environment. Nearly all of the existing data encryption techniques require a lot of arithmetic and logical computations, which makes their deployment in embedded devices very difficult. To realize firm data encryption without much computation, a novel scheme of physically-embedded data encryption is proposed in this paper. The physically-embedded data encryption extracts the unique and unclonable values that are possessed by the physical device intrinsically, and produces the secret from these values to accomplish the process of data encryption. Because it does not execute arithmetic and logical operations, it is very appropriate to embedded devices with restricted computing resources and computing abilities. At the same time, it provides high assurance of data protection due to the distinct properties of physical effects. One specific design of physically-embedded data encryption is given in this paper, and real physical instantiation of such design is tested. The experiment results show its validity and feasibility. Hence, the proposed physically-embedded data encryption should become a promising substitution of existing encryption techniques for embedded devices.
great lakes symposium on vlsi | 2016
Hang Zhang; Xuhao Chen; Nong Xiao; Fang Liu; Zhiguang Chen
To address the high energy consumption issue of SRAM on GPUs, emerging Spin-Transfer Torque (STT-RAM) memory technology has been intensively studied to build GPU register files for better energy-efficiency, thanks to its benefits of low leakage power, high density, and good scalability. However, STT-RAM suffers from a reliability issue, read disturbance, which stems from the fact that the voltage difference between read current and write current becomes smaller as technology scales. The read disturbance leads to high error rates for read operations, which cannot be effectively protected by SECDEC ECC on large-capacity register files of GPUs. Prior schemes (e.g. read-restore) to mitigate the read disturbance usually incur either non-trivial performance loss or excessive energy overhead, thus not applicable for the GPU register file design which aims to achieve both high performance and energy-efficiency. To combat the read disturbance on GPU register files, we propose a novel software-hardware co-designed solution, i.e. Red-Shield, which consists of three optimizations to overcome limitations of the existing solutions. First, we identify dead reads at compiling stage and augment instructions to avoid unnecessary restores. Second, we employ a small read buffer to accommodate register reads with high access locality to further reduce restores. Third, we propose an adaptive restore mechanism to selectively pick the suitable restore scheme, according to the busy status of corresponding register banks. Experimental results show that our proposed design can effectively mitigate the performance loss and energy overhead caused by restore operations, while still maintaining the reliability of reads.
modeling, analysis, and simulation on computer and telecommunication systems | 2011
Zhiguang Chen; Nong Xiao; Fang Liu; Yimo Du
NAND flash has some inherent peculiarities which increase the access delay seriously. We propose the Page to Block mapping Flash Translation Layer (PBFTL). Solid State Drives (SSDs) adopting PBFTL have lower response time. To achieve low response time for read requests, PBFTL adopts hybrid-level mapping scheme. But, hybrid-level FTL behaves awkwardly for write due to the high overhead of garbage collection. PBFTL takes two measures to optimize garbage collection. The first is to direct hot and cold data to separate blocks, which mitigates write amplification significantly. The second is to reduce the latency of reclaiming a block, which enables PBFTL to spend less time on garbage collection. Users requests are unlikely to be congested for a long time. Trace-driven simulations show that, PBFTL achieves low response for both read- and write-intensive workloads.
international symposium on computers and communications | 2015
Zhengguo Chen; Zhiguang Chen; Nong Xiao; Fang Liu
NAND flash-based Solid State Drives (SSDs) have been widely deployed in data centers of cloud computing due to their high performance compared with hard disks, while the limited lifespan of flash memory makes SSDs not very suitable for write-intensive applications. Deduplication is an effective method used to reduce the write traffic of applications thus can be used to extend the lifespan of SSDs. However, traditional deduplication schemes rely on the time-consuming fingerprint computing process to find duplicated data, which may impair the write performance of SSDs. Accordingly, Pre-hashing was proposed to reduce the chances of fingerprint computing thus improving the performance of SSDs with deduplication, but at the cost of degrading deduplication rate. In this paper, we propose NF-Dedupe, a new deduplication scheme that needs no fingerprint computing for flash-based SSDs. NF-Dedupe determines whether a write page is duplicated or not by comparing the write page with its potential duplicated page read from underlying flash chips byte by byte, rather than relying on the comparison of fingerprints. As flash memory is known for its high parallelism and low read latency, reading a page from flash chip and comparing two pages byte by byte introduce lower overhead than the fingerprint computing does. We evaluate the NF-Dedupe via trace-driven simulations. Experimental results have shown that NF-Dedupe outperforms the other approaches and can achieve the deduplication rate ranging from 5.3% to 29.9% and the write latency is improved by a factor of up to 21% with an average of 12%.
international conference on big data and cloud computing | 2014
Xiaoquan Wu; Nong Xiao; Fang Liu; Zhiguang Chen; Yimo Du; Yuxuan Xing
Flash memory-based SSD RAID has an excellent I/O performance with high stability, which making it get more and more attention from companies and manufacturers, especially in I/O-intensive environments. However, frequently updating parity also makes the SSD have a higher overhead in the process of garbage collection. To this end, we propose RAID-Aware SSD (RA-SSD) that could distinguish user data from parity by detecting the different access patterns from the upper RAID layer, and store them separately at different flash blocks. RA-SSD could effectively reduce the overhead of garbage collection. Simulation results show that, being deployed in a RAID-5 System, RA-SSD could reduce the number of pages copied in the process of garbage collection by up to 10%. As the overhead of garbage collection decreases, the write performance and lifespan will be improved. The extra space consumed by RA-SSD is very small, it is only about 1/10000 of the capacity of the device. Moreover, the processing logic of RA-SSD is so simple that it has very little impact on read performance.