Is this you? Create Your Porfile

Zili Shao

Hong Kong Polytechnic University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zili Shao is active.

Explore More

Publication

Featured researches published by Zili Shao.

design automation conference | 2011

MNFTL: an efficient flash translation layer for MLC NAND flash memory storage systems

Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao; Yong Guan

The new write constraints of multi-level cell (MLC) NAND flash memory make most of the existing flash translation layer (FTL) schemes inefficient or inapplicable. In this paper, we solve several fundamental problems in the design of MLC flash translation layer. The objective is to reduce the garbage collection overhead so as to reduce the average system response time. We make the key observation that the valid page copy is the essential garbage collection overhead. Based on this observation, we propose two approaches, namely, concentrated mapping and postponed reclamation, to effective reduce the valid page copies. We conduct experiments on a set of benchmarks from both the real world and synthetic traces. The experimental results show that our scheme can achieve a significant reduction in the average system response time compared with the previous work.

IEEE Transactions on Parallel and Distributed Systems | 2005

Efficient assignment and scheduling for heterogeneous DSP systems

Zili Shao; Qingfeng Zhuge; Chun Xue; Edwin Hsing-Mean Sha

This paper addresses high level synthesis for real-time digital signal processing (DSP) architectures using heterogeneous functional units (FUs). For such special purpose architecture synthesis, an important problem is how to assign a proper FU type to each operation of a DSP application and generate a schedule in such a way that all requirements can be met and the total cost can be minimized. We propose a two-phase approach to solve this problem. In the first phase, we solve the heterogeneous assignment problem, i.e., how to assign proper FU types to applications such that the total cost can be minimized while the timing constraint is satisfied. In the second phase, based on the assignments obtained in the first phase, we propose a minimum resource scheduling algorithm to generate a schedule and a feasible configuration that uses as little resource as possible. We prove that the heterogeneous assignment problem is NP-complete. Efficient algorithms are proposed to find an optimal solution when the given DFG is a simple path or a tree. Three other algorithms are proposed to solve the general problem. The experiments show that our algorithms can effectively reduce the total cost compared with the previous work.

ACM Transactions on Design Automation of Electronic Systems | 2011

Overhead-aware energy optimization for real-time streaming applications on multiprocessor System-on-Chip

Yi Wang; Hui Liu; Duo Liu; Zhiwei Qin; Zili Shao; Edwin Hsing-Mean Sha

In this article, we focus on solving the energy optimization problem for real-time streaming applications on multiprocessor System-on-Chip by combining task-level coarse-grained software pipelining with DVS (Dynamic Voltage Scaling) and DPM (Dynamic Power Management) considering transition overhead, inter-core communication and discrete voltage levels. We propose a two-phase approach to solve the problem. In the first phase, we propose a coarse-grained task parallelization algorithm called RDAG to transform a periodic dependent task graph into a set of independent tasks by exploiting the periodic feature of streaming applications. In the second phase, we propose a scheduling algorithm, GeneS, to optimize energy consumption. GeneS is a genetic algorithm that can search and find the best schedule within the solution space generated by gene evolution. We conduct experiments with a set of benchmarks from E3S and TGFF. The experimental results show that our approach can achieve a 24.4% reduction in energy consumption on average compared with the previous work.

IEEE Transactions on Very Large Scale Integration Systems | 2010

Dynamic and Leakage Energy Minimization With Soft Real-Time Loop Scheduling and Voltage Assignment

Meikang Qiu; Laurence T. Yang; Zili Shao; Edwin Hsing-Mean Sha

With the shrinking of technology feature sizes, the share of leakage in total power consumption of digital systems continues to grow. Traditional dynamic voltage scaling (DVS) fails to accurately address the impact of scaling on system power consumption as the leakage power increases exponentially. The combination of DVS and adaptive body biasing (ABB) is an effective technique to jointly optimize dynamic and leakage energy dissipation. In this paper, we propose an optimal soft real-time loop scheduling and voltage assignment algorithm, loop scheduling and voltage assignment to minimize energy, to minimize both dynamic and leakage energy via DVS and ABB. Voltage transition overhead has been considered in our approach. We conduct simulations on a set of digital signal processor benchmarks based on the power model of 70 nm technology. The simulation results show that our approach achieves significant energy saving compared to that of the integer linear programming approach.

real-time systems symposium | 2011

PCM-FTL: A Write-Activity-Aware NAND Flash Memory Management Scheme for PCM-Based Embedded Systems

Duo Liu; Tianzheng Wang; Yi Wang; Zhiwei Qin; Zili Shao

Due to its properties of high density, in-place update, and low standby power, phase change memory (PCM) becomes a promising main memory alternative in embedded systems. On the other hand, NAND flash memory is widely used as a secondary storage and has been integrated into PCM-based embedded systems. Since both NAND flash memory and PCM have limited lifetime, how to effectively manage NAND flash memory in PCM-based embedded systems, while considering the endurance issue is very important. In this paper, we present for the first time a write-activity-aware NAND flash memory management scheme, called PCM-FTL, to effectively manage NAND flash memory and enhance the endurance of PCM-based embedded systems. The basic idea is to preserve each bit in flash mapping table, which is stored in PCM, from being inverted frequently, i.e., we focus on minimizing the number of bit flips in a PCM cell when updating the flash mapping table. PCM-FTL employs a two-level mapping mechanism, which not only focuses on minimizing the write activities of PCM but also considers the access behavior of I/O requests. We evaluate PCM-FTL using a variety of realistic I/O traces. Experimental results show that the proposed technique can achieve an average reduction of 93.10% and a maximum reduction of 98.98% in the maximum number of bit flips for a PCM-based embedded system with 1GB NAND flash memory. We hope this work can serve as a first step towards the design of write-activity-aware FTL for the PCM-based embedded systems via simple and feasible modifications.

asia and south pacific design automation conference | 2013

Curling-PCM: Application-specific wear leveling for phase change memory based embedded systems

Duo Liu; Tianzheng Wang; Yi Wang; Zili Shao; Qingfeng Zhuge; Edwin Hsing-Mean Sha

Phase change memory (PCM) has been used as NOR flash replacement in embedded systems with its attractive features. However, the endurance of PCM keeps drifting down and greatly limits its adoption in embedded systems. As most embedded systems are application-oriented, we can better utilize PCM by exploring application-specific features such as fixed access patterns and update frequencies to prolong the lifetime of PCM. In this paper, we propose an application-specific wear leveling technique, called Curling-PCM, to evenly distribute write activities across the PCM chip in order to improve the endurance of PCM. The basic idea is to exploit application-specific features in embedded systems and periodically move the hot region across the whole PCM chip. To further reduce the overhead of moving the hot region and improve the performance of PCM-based embedded systems, a fine-grained partial wear leveling policy is proposed in Curling-PCM, by which only part of the hot region is moved during each request handling period. The experimental results show that Curling-PCM can effectively evenly distribute write traffic in PCM chips compared with previous work. We expect this work can serve as a first step towards the full exploration of application-specific features in PCM-based embedded systems.

annual computer security applications conference | 2003

Defending embedded systems against buffer overflow via hardware/software

Zili Shao; Qingfeng Zhuge; Yi He; Edwin Hsing-Mean Sha

Buffer over-flow attacks have been causing serious security problems for decades. With more embedded systems networked, it becomes an important research problem to defend embedded systems against buffer overflow attacks. We propose the hardware/software address protection (HSAP) technique to solve this problem. We first classify buffer overflow attacks into two categories (stack smashing attacks and function pointer attacks) and then provide two corresponding defending strategies. In our technique, hardware boundary check method and function pointer XOR method are used to protect a system against stack smashing attacks and function pointer attacks, respectively. Although the focus of the HSAP technique is on embedded systems because of the availability of hardware support, we show that the HSAP technique is applied to any type of processors to defend against buffer overflow attacks. We use four classes of processors to illustrate that the applicability of our technique is independent of architectures. We experiment with our HSAP technique in ARM Evaluator-7T simulation development environments. The results show that our HSAP technique defends a system against more types of buffer overflow attacks with little overhead.

IEEE Transactions on Very Large Scale Integration Systems | 2012

A Space Reuse Strategy for Flash Translation Layers in SLC NAND Flash Memory Storage Systems

Duo Liu; Yi Wang; Zhiwei Qin; Zili Shao; Yong Guan

This paper presents a space reuse strategy for flash translation layers in SLC nand flash storage systems. The basic idea is to prevent a block with many free pages from being erased in a merge operation. The preserved blocks are further reused as replacement blocks. In such a way, the space utilization and the number of erase counts of each block in a nand flash are enhanced. By employing the reuse strategy, we propose a reuse-aware flash translation layer (FTL) called reuse-aware NFTL (RNFTL) to improve the endurance and space utilization of single level cell (SLC) nand flash. We provide the performance analysis of RNFTL for frequent update operations and sequential write operations, and theoretically compare RNFTL with representative FTL schemes. We also discuss the opportunity to apply the reuse strategy in log-block-based FTL schemes. To the best of our knowledge, this is the first work to employ a space reuse strategy in FTLs to improve the space utilization and endurance of nand flash. The experiments have been conducted on a set of traces collected from real workload in daily life. The results show that the space reuse strategy can effectively improve space utilization, block lifetime and wear-leveling compared with the previous work.

international conference on hardware/software codesign and system synthesis | 2010

Demand-based block-level address mapping in large-scale NAND flash storage systems

Zhiwei Qin; Yi Wang; Duo Liu; Zili Shao

The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the Flash Translation Layer (FTL) design. This paper proposes a novel Demand-based block-level Address mapping scheme with two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without sacrificing too much system response time. In our technique, the block-level address mapping table is stored in fixed pages (called translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand block-level address mapping information. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages into RAM. In such a way, address mapping information for both sequential accesses and most-frequently-accessed translation pages can be found in the cache, and therefore, the system response time can be improved. We conduct experiments on a mixture of real-world and synthetic traces. The experimental results show that our technique can significantly reduce the RAM footprint while the average response time is kept well under control. Moreover, our technique shows big improvement on wear-leveling compared with the previous work.

embedded software | 2014

Building high-performance smartphones via non-volatile memory: the swap approach

Kan Zhong; Tianzheng Wang; Xiao Zhu; Linbo Long; Duo Liu; Weichen Liu; Zili Shao; Edwin Hsing-Mean Sha

Smartphones are getting increasingly high-performance with advances in mobile processors and larger main memories to support feature-rich applications. However, the storage subsystem has always been a prohibitive factor that slows down the pace of reaching even higher performance while maintaining good user experience. Despite todays smart-phones are equipped with larger-than-ever main memories, they consume more energy and still run out of memory. But the slow NAND flash based storage vetoes the possibility of swapping-an important technique to extend main memory-and leaves a system that constantly terminates user applications under memory pressure. In this paper, we revisit swapping for smartphones with fast, byte-addressable, non-volatile memory (NVM) technologies. Instead of using flash, we build the swap area with NVM, to allow high performance without sacrificing user experience. Based on NVMs high performance and byte-addressability, we show that a copy-on-write swap-in scheme can achieve even better performance by avoiding unnecessary memory copy operations. To avoid fast worn-out of certain NVMs, we also propose Heap-Wear, a wear leveling algorithm that more evenly distributes writes in NVM. Evaluation results based on the Google Nexus 5 smartphone show that our solution can effectively enhance smartphone performance and give better wear-leveling of NVM.

Explore More