Mengying Zhao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mengying Zhao is active.

Explore More

Publication

Featured researches published by Mengying Zhao.

design automation conference | 2012

Quality-retaining OLED dynamic voltage scaling for video streaming applications on mobile devices

Xiang Chen; Jian Zheng; Yiran Chen; Mengying Zhao; Chun Jason Xue

This paper developed a dynamic voltage scaling (DVS) technique for the power management of the OLED display on mobile devices in video streaming applications. An optimal voltage control scheme is proposed under input constraints. Fine-grained DVS technique is applied to maximize the power saving by leveraging the locality of the display content. The display quality is retained by monitoring structural-similarity-index (SSIM) during the optimization, subject to the hardware constraints like voltage regulator response time. Simulation results on four typical video test benchmarks show that the proposed technique saves 19.05%~49.05% OLED power on average while maintaining a high display quality (SSIM >; 0.98) all the time. The power saving efficiency of the proposed technique varies at different display resolutions, refresh rates, and display contents.

design automation conference | 2014

SLC-enabled Wear Leveling for MLC PCM Considering Process Variation

Mengying Zhao; Lei Jiang; Youtao Zhang; Chun Jason Xue

Phase change memory is becoming one of the most promising candidates to replace DRAM as main memory in deep silicon regime. Multi-level cell (MLC) PCM outperforms single level cell (SLC) in terms of capacity while suffering from a weaker cell endurance. Wear leveling strategies are proposed to enhance the endurance but encounters more challenges with the aggravating process variation. Due to endurance variations, balanced write traffic cannot fully exploit the PCM endurance since the weak parts will be worn out sooner than others. In this work, considering process variation, we propose an SLC-enabled wear leveling scheme through dynamic and adaptive mode transformation from MLC to SLC. Instead of redistributing write operations, the proposed scheme dynamically transforms weak and write-dense parts into SLC mode for endurance benefits. The experimental results show that the proposed scheme can improve the endurance by 215% with 4% storage overhead while maintaining the capacity advantage of MLC, compared with the most related work.

design automation conference | 2015

Fixing the broken time machine: consistency-aware checkpointing for energy harvesting powered non-volatile processor

Mimi Xie; Mengying Zhao; Chen Pan; Jingtong Hu; Yongpan Liu; Chun Jason Xue

Energy harvesting has become a favorable alternative to batteries for wearable embedded systems since it is more environmental and user friendly. However, harvested energy is intrinsically unstable, which could frequently interrupt a processors execution. To tackle this problem, nonvolatile processors have been proposed to checkpoint the whole volatile processor state into attached non-volatile memories periodically. When power resumes, the processor can copy the checkpointed state back to volatile memories and continue execution. However, without careful consideration, the process of checkpointing and resuming could cause inconsistency among different memory addresses and lead to irreversible errors. In this paper, we present a consistency aware checkpointing scheme that ensures correctness for all checkpoints. The proposed technique efficiently identifies all possible inconsistency positions in programs and inserts auxiliary code to ensure correctness. Evaluation results show that the proposed checkpointing technique can successfully eliminate inconsistency errors and greatly reduce the checkpointing overhead.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Compiler-Assisted STT-RAM-Based Hybrid Cache for Energy Efficient Embedded Systems

Qingan Li; Jianhua Li; Liang Shi; Mengying Zhao; Chun Jason Xue; Yanxiang He

Hybrid caches consisting of static RAM (SRAM) and spin-torque transfer (STT)-RAM have been proposed recently for energy efficiency. To explore the advantages of hybrid cache, most of the management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques involve additional access operations, and thus lead to extra overheads. In this paper, we propose two compilation-based approaches to improve the energy efficiency and performance of STT-RAM-based hybrid cache by reducing the migration overheads. The first approach, migration-aware data layout, is proposed to reduce the migrations by rearranging the data layout. The second approach, migration-aware cache locking, is proposed to reduce the migrations by locking migration-intensive memory blocks into SRAM part of hybrid cache. Furthermore, experiments show that these two methods can be combined to reduce more migrations. The reduction of migration overheads can improve the energy efficiency and performance of STT-RAM-based hybrid cache. Experimental results show that, combining these two methods, on average, the number of write operations on STT-RAM is reduced by 17.6%, the number of migrations is reduced by 38.9%, the total dynamic energy is reduced by 15.6%, and the total access latency is reduced by 13.8%.

ieee conference on mass storage systems and technologies | 2014

Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives

Congming Gao; Liang Shi; Mengying Zhao; Chun Jason Xue; Kaijie Wu; Edwin Hsing-Mean Sha

Solid state drives (SSDs) have been widely deployed in personal computers, data centers, and cloud storages. In order to improve performance, SSDs are usually constructed with a number of channels with each channel connecting to a number of NAND flash chips. Despite the rich parallelism offered by multiple channels and multiple chips per channel, recent studies show that the utilization of flash chips (i.e. the number of flash chips being accessed simultaneously) is seriously low. Our study shows that the low chip utilization is caused by the access conflict among I/O requests. In this work, we propose Parallel Issue Queuing (PIQ), a novel I/O scheduler at the host system, to minimize the access conflicts between I/O requests. The proposed PIQ schedules I/O requests without conflicts into the same batch and I/O requests with conflicts into different batches. Hence the multiple I/O requests in one batch can be fulfilled simultaneously by exploiting the rich parallelism of SSD. And because PIQ is implemented at the host side, it can take advantage of rich resource at host system such as main memory and CPU, which makes the overhead negligible. Extensive experimental results show that PIQ delivers significant performance improvement to the applications that have heavy access conflicts.

design, automation, and test in europe | 2015

Software assisted non-volatile register reduction for energy harvesting based cyber-physical system

Mengying Zhao; Qingan Li; Mimi Xie; Yongpan Liu; Jingtong Hu; Chun Jason Xue

Wearable devices are important components as information collector in many cyber-physical systems. Energy harvesting instead of battery is a better power source for these wearable devices due to many advantages. However, harvested energy is naturally unstable and program execution will be interrupted frequently. Non-volatile processors demonstrate promising advantages to back up volatile state before the system energy is depleted. However, it also introduces non-negligible energy and area overhead. Since the chip size is a vital factor for wearable devices, in this work, we target non-volatile register reduction for application-specific systems. We propose to analyze the application program and determine efficient backup positions, by which the necessary non-volatile register file size can be significantly reduced. The evaluation results deliver an average of 62.9% reduction on non-volatile register file size for stack backup, with negligible storage overheads.

languages, compilers, and tools for embedded systems | 2012

Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

Qingan Li; Mengying Zhao; Chun Jason Xue; Yanxiang He

As technology scales down, energy consumption is becoming a big problem for traditional SRAM-based cache hierarchies. The emerging Spin-Torque Transfer RAM (STT-RAM) is a promising replacement for large on-chip cache due to its ultra low leakage power and high storage density. However, write operations on STT-RAM suffer from considerably higher energy consumption and longer latency than SRAM. Hybrid cache consisting of both SRAM and STT-RAM has been proposed recently for both performance and energy efficiency. Most management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques lead to extra overheads. In this paper, we propose a compiler-assisted approach, preferred caching, to significantly reduce the migration overhead by giving migration-intensive memory blocks the preference for the SRAM part of the hybrid cache. Furthermore, a data assignment technique is proposed to improve the efficiency of preferred caching. The reduction of migration overhead can in turn improve the performance and energy efficiency of STT-RAM based hybrid cache. The experimental results show that, with the proposed techniques, on average, the number of migrations is reduced by 21.3%, the total latency is reduced by 8.0% and the total dynamic energy is reduced by 10.8%.

asia and south pacific design automation conference | 2015

Minimizing MLC PCM write energy for free through profiling-based state remapping

Mengying Zhao; Yuan Xue; Chengmo Yang; Chun Jason Xue

Phase change memory is becoming one of the most promising candidates to replace DRAM as main memory in deep sub-micron regime. Multi-level cell (MLC) PCM outperforms single level cell (SLC) PCM in terms of storage capacity but requires an iterative programming-and-verifying scheme to program cells to different resistance levels. The energy consumed in programming different MLC states varies significantly, thus motivating a state remapping technique to minimize the overall write energy. In this paper, we first compare dynamic and static state remapping strategies in terms of their efficacy in reducing energy, and then propose an effective and low-cost static state remapping algorithm. The experimental studies show 10.6% average (up to 16.9%) reduction in MLC PCM write energy, achieved within negligible hardware and performance overhead. Compared with the most related work, the proposed scheme saves more write energy on average, with near-zero performance, area and energy overhead.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2015

Wear Relief for High-Density Phase Change Memory Through Cell Morphing Considering Process Variation

Mengying Zhao; Lei Jiang; Liang Shi; Youtao Zhang; Chun Jason Xue

Due to the scalability and large leakage power, dynamic random-access memory (DRAM) has a lot of challenges in scaling. As an alternative, phase change memory (PCM) has demonstrated promising potential to serve as the main memory in deep submicrometer regime. The broad resistance range of PCM cells enables several cell modes with various densities, pertaining to multiple level cell (MLC), triple state cell (TSC), and single level cell (SLC). High-density mode outperforms low-density ones in terms of capacity and cost-per-bit, but suffers from a weaker cell endurance. Wear leveling strategies are proposed to enhance the memory endurance but encounter more challenges with the aggravating process variation. Due to endurance variations, physical domains are fabricated with irregular tenacity. As a result, balanced write traffic, which is the objective of traditional wear leveling, cannot fully exploit the PCM endurance since the weak parts will be worn out sooner than others. In this paper, considering process variation, we propose a cell morphing based wear leveling scheme. Cell morphing refers to the cell mode transformation between high density (e.g., MLC) and low densities (e.g., TSC and SLC). Instead of redistributing write operations, the proposed wear leveling scheme dynamically transforms weak and frequently written portions into low-density mode for endurance benefits. Multitier cell morphing schemes are proposed to support mode transformation among multiple density levels. The experimental results show 236% endurance improvement for single-tier cell morphing and 209% for two-tier cell morphing with 2% low-density page percentage, when compared with the most related work.

international conference on computer design | 2014

Leveling to the last mile: Near-zero-cost bit level wear leveling for PCM-based main memory

Mengying Zhao; Liang Shi; Chengmo Yang; Chun Jason Xue

Phase change memory (PCM) has demonstrated great potential as an alternative of DRAM to serve as main memory due to its favorable characteristics of non-volatility, scalability and near-zero leakage power. However, the comparatively poor endurance of PCM largely limits its adoption. Wear leveling strategies targeting to even write distributions have been proposed at different granularities and on various memory hierarchies for PCM endurance enhancement. Write operations are distributed across the memory through migrating data from heavily written locations to less burdened ones, which is usually guided by counters recording the number of writes. However, evenly distributing writes at a coarse granularity cannot deliver the best endurance results as write distributions are highly imbalanced even at the bit level. In this work, we propose a near-zero-cost bit-level wear leveling strategy to improve PCM endurance. The proposed technique can be combined with various coarse-grained wear leveling strategies. Experiment results show 102% endurance enhancement on average, which is 34% higher than the most related work, with significantly lower storage, performance and energy overheads.

Explore More