Is this you? Create Your Porfile

Zhiwei Qin

Hong Kong Polytechnic University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhiwei Qin is active.

Explore More

Publication

Featured researches published by Zhiwei Qin.

design automation conference | 2011

MNFTL: an efficient flash translation layer for MLC NAND flash memory storage systems

Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao; Yong Guan

The new write constraints of multi-level cell (MLC) NAND flash memory make most of the existing flash translation layer (FTL) schemes inefficient or inapplicable. In this paper, we solve several fundamental problems in the design of MLC flash translation layer. The objective is to reduce the garbage collection overhead so as to reduce the average system response time. We make the key observation that the valid page copy is the essential garbage collection overhead. Based on this observation, we propose two approaches, namely, concentrated mapping and postponed reclamation, to effective reduce the valid page copies. We conduct experiments on a set of benchmarks from both the real world and synthetic traces. The experimental results show that our scheme can achieve a significant reduction in the average system response time compared with the previous work.

ACM Transactions on Design Automation of Electronic Systems | 2011

Overhead-aware energy optimization for real-time streaming applications on multiprocessor System-on-Chip

Yi Wang; Hui Liu; Duo Liu; Zhiwei Qin; Zili Shao; Edwin Hsing-Mean Sha

In this article, we focus on solving the energy optimization problem for real-time streaming applications on multiprocessor System-on-Chip by combining task-level coarse-grained software pipelining with DVS (Dynamic Voltage Scaling) and DPM (Dynamic Power Management) considering transition overhead, inter-core communication and discrete voltage levels. We propose a two-phase approach to solve the problem. In the first phase, we propose a coarse-grained task parallelization algorithm called RDAG to transform a periodic dependent task graph into a set of independent tasks by exploiting the periodic feature of streaming applications. In the second phase, we propose a scheduling algorithm, GeneS, to optimize energy consumption. GeneS is a genetic algorithm that can search and find the best schedule within the solution space generated by gene evolution. We conduct experiments with a set of benchmarks from E3S and TGFF. The experimental results show that our approach can achieve a 24.4% reduction in energy consumption on average compared with the previous work.

real-time systems symposium | 2011

PCM-FTL: A Write-Activity-Aware NAND Flash Memory Management Scheme for PCM-Based Embedded Systems

Duo Liu; Tianzheng Wang; Yi Wang; Zhiwei Qin; Zili Shao

Due to its properties of high density, in-place update, and low standby power, phase change memory (PCM) becomes a promising main memory alternative in embedded systems. On the other hand, NAND flash memory is widely used as a secondary storage and has been integrated into PCM-based embedded systems. Since both NAND flash memory and PCM have limited lifetime, how to effectively manage NAND flash memory in PCM-based embedded systems, while considering the endurance issue is very important. In this paper, we present for the first time a write-activity-aware NAND flash memory management scheme, called PCM-FTL, to effectively manage NAND flash memory and enhance the endurance of PCM-based embedded systems. The basic idea is to preserve each bit in flash mapping table, which is stored in PCM, from being inverted frequently, i.e., we focus on minimizing the number of bit flips in a PCM cell when updating the flash mapping table. PCM-FTL employs a two-level mapping mechanism, which not only focuses on minimizing the write activities of PCM but also considers the access behavior of I/O requests. We evaluate PCM-FTL using a variety of realistic I/O traces. Experimental results show that the proposed technique can achieve an average reduction of 93.10% and a maximum reduction of 98.98% in the maximum number of bit flips for a PCM-based embedded system with 1GB NAND flash memory. We hope this work can serve as a first step towards the design of write-activity-aware FTL for the PCM-based embedded systems via simple and feasible modifications.

real time technology and applications symposium | 2011

A Two-Level Caching Mechanism for Demand-Based Page-Level Address Mapping in NAND Flash Memory Storage Systems

Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao

The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the Flash Translation Layer (FTL) design. The demand-based approach can reduce the RAM footprint, but extra address translation overhead is also introduced which may degrade the system performance. This paper proposes a two-level caching mechanism to selectively cache the on-demand page-level address mappings by jointly exploiting the temporal locality and the spatial locality of workloads. The objective is to improve the cache hit ratio so as to shorten the system response time and reduce the block erase counts for NAND flash memory storage systems. By exploring the optimized temporal-spatial cache configurations, our technique can well capture the reference locality in workloads so that the hit ratio can be improved. Experimental results show that our technique can achieve a 31.51% improvement in hit ratio, which leads to a 31.11% reduction in average system response time and a 50.83% reduction in block erase counts compared with the previous work.

IEEE Transactions on Very Large Scale Integration Systems | 2012

A Space Reuse Strategy for Flash Translation Layers in SLC NAND Flash Memory Storage Systems

Duo Liu; Yi Wang; Zhiwei Qin; Zili Shao; Yong Guan

This paper presents a space reuse strategy for flash translation layers in SLC nand flash storage systems. The basic idea is to prevent a block with many free pages from being erased in a merge operation. The preserved blocks are further reused as replacement blocks. In such a way, the space utilization and the number of erase counts of each block in a nand flash are enhanced. By employing the reuse strategy, we propose a reuse-aware flash translation layer (FTL) called reuse-aware NFTL (RNFTL) to improve the endurance and space utilization of single level cell (SLC) nand flash. We provide the performance analysis of RNFTL for frequent update operations and sequential write operations, and theoretically compare RNFTL with representative FTL schemes. We also discuss the opportunity to apply the reuse strategy in log-block-based FTL schemes. To the best of our knowledge, this is the first work to employ a space reuse strategy in FTLs to improve the space utilization and endurance of nand flash. The experiments have been conducted on a set of traces collected from real workload in daily life. The results show that the space reuse strategy can effectively improve space utilization, block lifetime and wear-leveling compared with the previous work.

international conference on hardware/software codesign and system synthesis | 2010

Demand-based block-level address mapping in large-scale NAND flash storage systems

Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao

The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the Flash Translation Layer (FTL) design. This paper proposes a novel Demand-based block-level Address mapping scheme with two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without sacrificing too much system response time. In our technique, the block-level address mapping table is stored in fixed pages (called translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand block-level address mapping information. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages into RAM. In such a way, address mapping information for both sequential accesses and most-frequently-accessed translation pages can be found in the cache, and therefore, the system response time can be improved. We conduct experiments on a mixture of real-world and synthetic traces. The experimental results show that our technique can significantly reduce the RAM footprint while the average response time is kept well under control. Moreover, our technique shows big improvement on wear-leveling compared with the previous work.

languages, compilers, and tools for embedded systems | 2010

RNFTL: a reuse-aware NAND flash translation layer for flash memory

Yi Wang; Duo Liu; Meng Wang; Zhiwei Qin; Zili Shao; Yong Guan

In this paper, we propose a hybrid-level flash translation layer (FTL) called RNFTL (Reuse-Aware NFTL) to improve the endurance and space utilization of NAND flash memory. Our basic idea is to prevent a primary block with many free pages from being erased in a merge operation. The preserved primary blocks are further reused as replacement blocks. In such a way, the space utilization and the number of erase counts for each block in NAND flash can be enhanced. To the best of our knowledge, this is the first work to employ a reuse-aware strategy in FTL for improving the space utilization and endurance of NAND flash. We conduct experiments on a set of traces that collected from real workload in daily life. The experimental results show that our technique has significant improvement on space utilization, block lifetime and wear-leveling compared with the previous work.

IEEE Transactions on Computers | 2015

On-Demand Block-Level Address Mapping in Large-Scale NAND Flash Storage Systems

Renhai Chen; Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao; Yong Guan

The density of flash memory chips has doubled every two years in the past decade and the trend is expected to continue. The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping management. This paper proposes a novel Demand-based block-level Address mapping scheme with a two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without excessively compromising system response time. In our technique, the block-level address mapping table is stored in fixed pages (called the translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand address mapping entries. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages. In such a way, both the most-frequently-accessed and sequentially accessed address mapping entries can be stored in the cache so the cache hit ratio can be increased and the system response time can be improved. To the best of our knowledge, this is the first work to reduce the RAM cost by employing the demand-based approach on block-level address mapping schemes. The experiments have been conducted on a real embedded platform. The experimental results show that our technique can effectively reduce the RAM footprint while maintaining similar average system response time compared with previous work.

real time technology and applications symposium | 2012

Real-Time Flash Translation Layer for NAND Flash Memory Storage Systems

Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao

NAND flash memory is widely used in both hard real-time and soft real-time systems because of its unique properties, such as non-volatility, low power consumption, and fast access time. However, due to the variable garbage collection latency, the response time becomes unpredictable when multiple I/O requests are issued from the file system to flash media. In NAND flash memory storage systems, flash translation layer (FTL) is a typical software module to handle the I/O requests and manage NAND flash memory. Most of existing FTL schemes focus on improving average response time but worst-case response time remains an open problem. This paper proposes a real-time flash translation layer (RFTL) scheme to evenly distribute garbage collection time-cost, so as to guarantee a near optimum worst-case response time. This is achieved by using a new hybrid-level address mapping approach, which can provide guaranteed physical space to serve requests in any time period. Moreover, we propose a distributed garbage collection policy that enables RFTL to reclaim the space and serve the requests simultaneously. We conduct a set of experiments on a real hardware platform. Both the proposed scheme and other representative FTL schemes have been implemented in the hardware evaluation board. Experimental results show that our scheme improves worst-case response time by 41.51 percent and average response time by 88.85 percent.

design, automation, and test in europe | 2012

A block-level flash memory management scheme for reducing write activities in PCM-based embedded systems

Duo Liu; Tianzheng Wang; Yi Wang; Zhiwei Qin; Zili Shao

This paper targets at an embedded system with phase change memory (PCM) and NAND flash memory. Although PCM is a promising main memory alternative and is recently introduced to embedded system designs, its endurance keeps drifting down and greatly limits the lifetime of the whole system. Therefore, this paper presents a block-level flash memory management scheme, WAB-FTL, to effectively manage NAND flash memory while reducing write activities of the PCM-based embedded systems. The basic idea is to preserve each bit in flash mapping table hosted by PCM from being inverted frequently during the process of mapping table update. To achieve this, a new merge strategy is adopted in WAB-FTL to delay the mapping table update, and a tiny mapping buffer is used for caching frequently updated mapping records. Experimental results based on Android traces show that WAB-FTL can effectively reduce write activities when compared with the baseline scheme.

Explore More