Renhai Chen
Hong Kong Polytechnic University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Renhai Chen.
IEEE Transactions on Computers | 2015
Renhai Chen; Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao; Yong Guan
The density of flash memory chips has doubled every two years in the past decade and the trend is expected to continue. The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping management. This paper proposes a novel Demand-based block-level Address mapping scheme with a two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without excessively compromising system response time. In our technique, the block-level address mapping table is stored in fixed pages (called the translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand address mapping entries. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages. In such a way, both the most-frequently-accessed and sequentially accessed address mapping entries can be stored in the cache so the cache hit ratio can be increased and the system response time can be improved. To the best of our knowledge, this is the first work to reduce the RAM cost by employing the demand-based approach on block-level address mapping schemes. The experiments have been conducted on a real embedded platform. The experimental results show that our technique can effectively reduce the RAM footprint while maintaining similar average system response time compared with previous work.
international conference on hardware/software codesign and system synthesis | 2013
Renhai Chen; Yi Wang; Zili Shao
Short lifetimes are becoming a critical issue in NAND flash memory with the advent of multi-level cell and triple-level cell flash memory. Researchers at Macronix have recently discovered that heating can cause worn-out NAND flash cells to become reusable and greatly prolong the lifetime of flash memory cells. However, the heating process consumes a substantial amount of power. This means that some fundamental changes are required if existing NAND flash management techniques are to be applied in self-healing NAND flash memory. In particular, all existing wear-leveling techniques are based on the principle of evenly distributing writes and erases. This causes NAND flash cells tend to wear out in a short time period. Moreover, healing these cells in a concentrated manner may cause power outages in mobile devices. In this paper, we propose for the first time a new wear-leveling scheme called DHeating (Dispersed Heating) to solve the concentrated heating problem in self-healing flash memory. In DHeating, rather than evenly distributing writes and erases over a time period, write and erase operations are concentrated on a small portion of flash memory cells, so that these cells can be worn-out and healed by heating first. In this way, we can disperse healing to avoid the problem of concentrated power usage caused by heating. Furthermore, with the very long lifetime that results from self-healing, we can sacrifice lifetime for reliability. Therefore, we propose an early heating strategy to solve the reliability problem caused by concentrated heating. The idea is to start the healing process earlier by heating NAND flash cells before their expected endurance. We evaluate our scheme based on a real embedded platform. The experimental results show that our scheme can effectively solve the concentrated heating problem.
design automation conference | 2014
Chi Zhang; Yi Wang; Tianzheng Wang; Renhai Chen; Duo Liu; Zili Shao
NAND flash memory has long been the dominant storage medium in mobile devices. However, power failure may occur at any time and result in loss of important data. Crash recovery therefore becomes vitally important in NAND flash memory storage systems. As flash translation layer (FTL) directly manages flash memory using various metadata, the problem of FTL crash recovery in NAND flash is how to efficiently and effectively maintain and recover the consistency of FTL metadata after system crash. In this paper, we present DCR, a deterministic approach to crash recovery for NAND flash based storage systems. The basic idea is to exploit the determinism of FTL and reproduce events that happened between the last checkpoint and the crash point during crash recovery. Different from existing approaches which have to scan the whole flash memory chip, we show that DCR can recover the system more efficiently by only checking a limited number of blocks based on deterministic FTL operations. We have implemented DCR for a block-level FTL and compared it with a popular version-based scheme using an ARM11-based embedded evaluation board. Experimental results show that DCR can greatly reduce recovery time and guarantee the consistency of FTL metadata after recovery.
languages, compilers, and tools for embedded systems | 2013
Yong Guan; Guohui Wang; Yi Wang; Renhai Chen; Zili Shao
Log-block-based FTL (Flash Translation Layer) schemes have been widely used to manage NAND flash memory storage systems in industry. In log-block-based FTLs, a few physical blocks called log blocks are used to hold all page updates from a large amount of data blocks. Frequent page updates in log blocks introduce big overhead so log blocks become the system bottleneck. To address this problem, this paper presents a block-level log-block management scheme called BLog (Block-level Log-Block Management). In BLog, with the block level management, the update pages of a data block can be collected together and put into the same log block as much as possible; therefore, we can effectively reduce the associativities of log blocks so as to reduce the garbage collection overhead. We also propose a novel partial merge operation called reduced-order merge by which we can effectively postpone the garbage collection of log blocks so as to maximally utilize valid pages and reduce unnecessary erase operations in log blocks. Based on BLog, we design an FTL called BLogFTL for MLC NAND flash. We conduct experiments on a mixture of real-world and synthetic traces. The experimental results show that our scheme outperforms the previous log-block-based FTLs for MLC NAND flash.
Future Generation Computer Systems | 2016
Hongxing Wei; Zhenzhou Shao; Zhen Huang; Renhai Chen; Yong Guan; Jindong Tan; Zili Shao
ROS, an open-source robot operating system, is widely used and rapidly developed in the robotics community. However, running on Linux, ROS does not provide real-time guarantees, while real-time tasks are required in many robot applications such as robot motion control. This paper for the first time presents a real-time ROS architecture called RT-RTOS on multi-core processors. RT-ROS provides an integrated real-time/non-real-time task execution environment so real-time and non-real-time ROS nodes can be separately run on a real-time OS and Linux, respectively, with different processor cores. In such a way, real-time tasks can be supported by real-time ROS nodes on a real-time OS, while non-real-time ROS nodes on Linux can provide other functions of ROS. Furthermore, high performance is achieved by executing real-time ROS nodes and non-real-time ROS nodes on different processor cores. We have implemented RT-ROS on a dual-core processor and conducted various experiments with real robot applications. The experimental results show that RT-ROS can effectively provide real-time support for the ROS platform with high performance by exploring the multi-core architecture.
asia and south pacific design automation conference | 2015
Renhai Chen; Yi Wang; Jingtong Hu; Duo Liu; Zili Shao; Yong Guan
I/O is becoming one of major performance bottlenecks in NAND-flash-based smartphones. Novel NVMs (nonvolatile memories), such as PCM (Phase Change Memory) and STT-RAM (Spin-Transfer Torque Random Access Memory), can provide fast read/write operations. In this paper, we propose an unified NVM/flash architecture to improve the I/O performance. A transparent scheme, vFlash (Virtualized Flash), is also proposed to manage the unified architecture. Within vFlash, inter-app technique is proposed to optimize the application performance by exploiting the historic locality of applications. Since vFlash is on the bottom of the I/O stack, the application features will be lost. Therefore, we also propose a cross-layer technique to transfer the application information from the application layer to the vFlash layer. The proposed scheme is evaluated based on a real Android platform, and the experimental results show that the read and write performance for the proposed scheme is 2.45 times and 3.37 times better than that of the stock Android 4.2 system, respectively.
embedded and real-time computing systems and applications | 2013
Yi Wang; Renhai Chen; Zili Shao; Tao Li
In this paper, we target at the direct-coupled solar energy powered multicore architectures that provide direct power supply between photovoltaic (PV) generation and the load without the adoption of battery. We present Solar-Tune, a real-time scheduling technique with load tuning for sporadic tasks on solar energy powered multicore systems. The objective is to fully utilize the available solar energy while meeting the deadlines of tasks. To solve the problem, we first perform analysis and formulate the scheduling problem in each duration as an integer linear programming (ILP) model to obtain an optimal schedule. Then we present a heuristic algorithm to dynamically refine the task scheduling based on the predictions of the availability of solar energy. We conduct experiments using real-world meteorological data across different geographic sites. The experimental results show that SolarTune can significantly improve the solar energy utilization ratio and reduce the number of deadline misses compared to the conventional task scheduler.
ieee international conference on high performance computing data and analytics | 2012
Zhongbo Wang; Zhiping Jia; Lei Ju; Renhai Chen
The RSA public-key cryptosystem is widely used to provide security protocols and services in the network communication. However, design and implementation of the RSA cryptosystem to meet the real-time requirements of embedded applications are challenging issues, due to the computation intensive characteristics of the RSA arithmetic operations and the limited resources in the embedded systems. Various implementation and optimization methods have been proposed for RSA algorithm. However, software execution of RSA on general-purpose processors usually suffers from slow execution speed; while application-specific integrated circuit (ASIC) based approaches are lack of flexibility. In this work, we present a systematic design approach of application-specific instruction-set processor(ASIP) for the RSA cryptographic algorithm. We identify and optimize the custom instructions in the RSA algorithm, and extend the instruction set architecture (ISA) of a standard 32-bit RISC processor to accommodate them. We employ the Electronic System Level (ESL) methodology in the development of the proposed ASIP in the Xilinx Virtex5 LX110T FPGA platform. Compared to the original RISC ISA, our extended ASIP achieves approximate 2.69 times performance improvement with only 25.6% more resource required.
IEEE Transactions on Computers | 2017
Renhai Chen; Yi Wang; Duo Liu; Zili Shao; Song Jiang
Substantially reduced lifetimes are becoming a critical issue in NAND flash memory with the advent of multi-level cell and triple-level cell flash memory. Researchers discovered that heating can cause worn-out NAND flash cells to become reusable and greatly extend the lifetime of flash memory cells. However, the heating process consumes a substantial amount of power, and some fundamental changes are required for existing NAND flash management techniques. In particular, all existing wear-leveling techniques are based on the principle of evenly distributing writes and erases. For self-healing NAND flash, this may cause NAND flash cells to be worn out in a short period of time. Moreover, frequently healing these cells may drain the energy quickly in battery-driven mobile devices, which is defined as the concentrated heating problem. In this paper, we propose a novel wear-leveling scheme called DHeating (Dispersed Heating) to address the problem. In DHeating, rather than evenly distributing writes and erases over a time period, write and erase operations are scheduled on a small number of flash memory cells at a time, so that these cells can be worn out and healed much earlier than other cells. In this way, we can avoid quick energy depletion caused by concentrated heating. In addition, the heating process takes several seconds and has become the new performance bottleneck. In order to address this issue, we propose a lazy heating repair scheme. The lazy heating repair scheme can ease the long time delays caused by the heating via delaying the heating operation and using the system idle time to repair. Furthermore, the flash memorys reliability becomes worse with the flash memory cells reaching the excepted worn-out time. We propose an early heating strategy to solve the reliability problem. With the extended lifetime provided by self-healing, we can trade some lifetimes for reliability. The idea is to start the healing process earlier than the expected worn-out time. We evaluate our scheme based on an embedded platform. The experimental results show that the proposed scheme can effectively prolong the consecutive heating time interval, alleviate the long time delays caused by the heating, and enhance the reliability for self-healing flash memory.
high performance computing and communications | 2015
Renhai Chen; Yi Wang; Jingtong Hu; Duo Liu; Zili Shao; Yong Guan
Mobile virtualization introduces extra layers in software stacks, which leads to performance degradation. Especially, each I/O operation has to pass through several software layers to reach the NAND-flash-based storage systems. This paper targets at optimizing I/O for mobile virtualization, since I/O becomes one of major performance bottlenecks that seriously affects the performance of mobile devices. Among all the I/O operations, a large percentage is updating metadata. Frequent updating metadata not only degrades overall I/O performance but also severely reduces flash memory lifetime. In this paper, we propose a novel I/O optimization techniqueto identify the metadata of a guest file system which is storedin a VM (Virtual Machine) image file and frequently updated. Then, these metadata are stored in a small additional NVM(Non-Volatile Memory) which is faster and more endurableto greatly improve flash memorys performance and lifetime. To the best of our knowledge, this is the first work to identifythe file system metadata from regular data in a guest OS VMimage file under mobile virtualization. The proposed schemeis evaluated on a real hardware embedded platform. Theexperimental results show that the proposed techniques canimprove write performance to 45.21% in mobile devices withvirtualization.