Qingan Li
City University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qingan Li.
international symposium on low power electronics and design | 2012
Qingan Li; Jianhua Li; Liang Shi; Chun Jason Xue; Yanxiang He
Hybrid caches consisting of both STT-RAM and SRAM have been proposed recently for energy efficiency. To explore the advantages of hybrid cache, most work on hybrid caches employs migration based strategies to dynamically move write-intensive data from STT-RAM to SRAM. Migrations require additional read and write operations for data movement and may lead to significant overheads. To address this issue, this paper proposes a Migration-Aware Compilation (MAC) approach to improve the energy efficiency and performance of STT-RAM based hybrid cache. By re-arranging data layout, the data access pattern in memory blocks is changed such that the number of migrations is reduced without any hardware modification. The reduction of migration overheads in turn improves energy efficiency and performance. The experimental results show that with the proposed approach, on average, the number of write operations on STT-RAM is reduced by 13.4%, the number of migrations is reduced by 16.1%, the total dynamic energy is reduced by 8.5%, and the total latency is reduced by 12.1%.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Qingan Li; Jianhua Li; Liang Shi; Mengying Zhao; Chun Jason Xue; Yanxiang He
Hybrid caches consisting of static RAM (SRAM) and spin-torque transfer (STT)-RAM have been proposed recently for energy efficiency. To explore the advantages of hybrid cache, most of the management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques involve additional access operations, and thus lead to extra overheads. In this paper, we propose two compilation-based approaches to improve the energy efficiency and performance of STT-RAM-based hybrid cache by reducing the migration overheads. The first approach, migration-aware data layout, is proposed to reduce the migrations by rearranging the data layout. The second approach, migration-aware cache locking, is proposed to reduce the migrations by locking migration-intensive memory blocks into SRAM part of hybrid cache. Furthermore, experiments show that these two methods can be combined to reduce more migrations. The reduction of migration overheads can improve the energy efficiency and performance of STT-RAM-based hybrid cache. Experimental results show that, combining these two methods, on average, the number of write operations on STT-RAM is reduced by 17.6%, the number of migrations is reduced by 38.9%, the total dynamic energy is reduced by 15.6%, and the total access latency is reduced by 13.8%.
design, automation, and test in europe | 2015
Mengying Zhao; Qingan Li; Mimi Xie; Yongpan Liu; Jingtong Hu; Chun Jason Xue
Wearable devices are important components as information collector in many cyber-physical systems. Energy harvesting instead of battery is a better power source for these wearable devices due to many advantages. However, harvested energy is naturally unstable and program execution will be interrupted frequently. Non-volatile processors demonstrate promising advantages to back up volatile state before the system energy is depleted. However, it also introduces non-negligible energy and area overhead. Since the chip size is a vital factor for wearable devices, in this work, we target non-volatile register reduction for application-specific systems. We propose to analyze the application program and determine efficient backup positions, by which the necessary non-volatile register file size can be significantly reduced. The evaluation results deliver an average of 62.9% reduction on non-volatile register file size for stack backup, with negligible storage overheads.
IEEE Transactions on Very Large Scale Integration Systems | 2013
Wanyong Tian; Yingchao Zhao; Liang Shi; Qingan Li; Jianhua Li; Chun Jason Xue; Minming Li; Enhong Chen
In this paper, we consider the task allocation problem on a hybrid main memory composed of nonvolatile memory (NVM) and dynamic random access memory (DRAM). Compared to the conventional memory technology DRAM, the emerging NVM has excellent energy performance since it consumes orders of magnitude less leakage power. On the other hand, most types of NVMs come with the disadvantages of much shorter write endurance and longer write latency as opposed to DRAM. By leveraging the energy efficiency of NVM and long write endurance of DRAM, this paper explores task allocation techniques on hybrid memory for multiple objectives such as minimizing the energy consumption, extending the lifetime, and minimizing the memory size. The contributions of this paper are twofold. First, we design the integer linear programming (ILP) formulations that can solve different objectives optimally. Then, we propose two sets of heuristic algorithms including three polynomial time offline heuristics and three online heuristics. Experiments show that compared to the optimal solutions generated by the ILP formulations, the offline heuristics can produce near-optimal results.
languages, compilers, and tools for embedded systems | 2012
Qingan Li; Mengying Zhao; Chun Jason Xue; Yanxiang He
As technology scales down, energy consumption is becoming a big problem for traditional SRAM-based cache hierarchies. The emerging Spin-Torque Transfer RAM (STT-RAM) is a promising replacement for large on-chip cache due to its ultra low leakage power and high storage density. However, write operations on STT-RAM suffer from considerably higher energy consumption and longer latency than SRAM. Hybrid cache consisting of both SRAM and STT-RAM has been proposed recently for both performance and energy efficiency. Most management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques lead to extra overheads. In this paper, we propose a compiler-assisted approach, preferred caching, to significantly reduce the migration overhead by giving migration-intensive memory blocks the preference for the SRAM part of the hybrid cache. Furthermore, a data assignment technique is proposed to improve the efficiency of preferred caching. The reduction of migration overhead can in turn improve the performance and energy efficiency of STT-RAM based hybrid cache. The experimental results show that, with the proposed techniques, on average, the number of migrations is reduced by 21.3%, the total latency is reduced by 8.0% and the total dynamic energy is reduced by 10.8%.
design, automation, and test in europe | 2013
Jianhua Li; Liang Shi; Qingan Li; Chun Jason Xue; Yiran Chen; Yinlong Xu
Spin-Transfer Torque RAM (STT-RAM) is extensively studied in recent years. Recent work proposed to improve the write performance of STT-RAM through relaxing the retention time of STT-RAM cell, magnetic tunnel junction (MTJ). Unfortunately, frequent refresh operations of volatile STT-RAM could dissipate significantly extra energy. In addition, refresh operations can severely conflict with normal read/write operations and results in degraded cache performance. This paper proposes Cache Coherence Enabled Adaptive Refresh (CCear) to minimize refresh operations for volatile STT-RAM. Through novel modifications to cache coherence protocol, CCear can effectively minimize the number of refresh operations on volatile STT-RAM. Full-system simulation results show that CCear approaches the performance of the ideal refresh policy with negligible overhead.
IEEE Transactions on Computers | 2015
Qingan Li; Yanxiang He; Jianhua Li; Liang Shi; Yiran Chen; Chun Jason Xue
Spin-transfer torque RAM (STT-RAM) has been proposed to build on-chip caches because of its attractive features such as high storage density and ultra low leakage power. However, long write latency and high write energy are the two challenges for STT-RAM. Recently, researchers propose to improve the write performance of STT-RAM by relaxing its non-volatility property. To avoid data losses resulting from volatility, refresh schemes have been proposed. However, refresh operations consume additional overhead. In this paper, we propose to significantly reduce the number of refresh operations through re-arranging program data layout at compilation time. An N-refresh scheme is also proposed to further reduce the number of refreshes. Experimental results show that, on average, the proposed methods can reduce the number of refresh operations by 84.2 percent, and reduce the dynamic energy consumption by 38.0 percent for volatile STT-RAM caches while incurring only 4.1 percent performance degradation.
ACM Transactions on Design Automation of Electronic Systems | 2013
Jianhua Li; Liang Shi; Qingan Li; Chun Jason Xue; Yiran Chen; Yinlong Xu; Wei Wang
Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache memories is impeded by its long write latency and high write power. Recent work proposed improving the write performance through relaxing the retention time of STT-RAM cells. The resultant volatile STT-RAM needs to be periodically refreshed to prevent data loss. When volatile STT-RAM is applied as the last-level cache (LLC) in chip multiprocessor (CMP) systems, frequent refresh operations could dissipate significant extra energy. In addition, refresh operations could severely conflict with normal read/write operations to degrade overall system performance. Therefore, minimizing the performance impact caused by refresh operations is crucial for the adoption of volatile STT-RAM. In this article, we propose Cache-Coherence-Enabled Adaptive Refresh (CCear) to minimize the number of refresh operations for volatile STT-RAM, adopted as the LLC for CMP systems. Specifically, CCear interacts with cache coherence protocol and cache management policy to minimize the number of refresh operations on volatile STT-RAM caches. Full-system simulation results show that CCear performs close to an ideal refresh policy with low overhead. Compared with state-of-the-art refresh policies, CCear simultaneously improves the system performance and reduces the energy consumption. Moreover, the performance of CCear could be further enhanced using small filter caches to accommodate the not-refreshed private STT-RAM blocks.
languages, compilers, and tools for embedded systems | 2013
Qingan Li; Lei Jiang; Youtao Zhang; Yanxiang He; Chun Jason Xue
Micro-Controller Units (MCUs) are widely adopted ubiquitous computing devices. Due to tight cost and energy constraints, MCUs often integrate very limited internal RAM memory on top of Flash storage, which exposes Flash to heavy write traffic and results in short system lifetime. Architecting emerging Phase Change Memory (PCM) is a promising approach for MCUs due to its fast read speed and long write endurance. However, PCM, especially multi-level cell (MLC) PCM, has long write latency and requires large write energy, which diminishes the benefits of its replacement of traditional Flash. By studying MLC PCM write operations, we observe that writing MLC PCM can take advantages of two write modes --- fast write leaves cells in volatile state, and slow write leaves cells in non-volatile state. In this paper, we propose a compiler directed dual-write (CDDW) scheme that selects the best write mode for each write operation to maximize the overall performance and energy efficiency. Our experimental results show that CDDW reduces dynamic energy by 32.4%(33.8%) and improves performance by 6.3%(35.9%) compared with an all fast(slow) write approach.
international conference on human computer interaction | 2012
Qingan Li; Yingchao Zhao; Jingtong Hu; Chun Jason Xue; Edwin Hsing-Mean Sha; Yanxiang He
Scratchpad Memory (SPM), a software-controlled on-chip memory, has been widely used as an alternative to caches in modern embedded systems due to its energy efficiency. To further reduce the energy consumption, non-volatile memory (NVM) based hybrid SPM has been proposed recently. This paper targets the problem of allocating program variables into hybrid SPM based systems. Both an ILP formulation and a graph-coloring based algorithm are proposed. The experiments show that the proposed graph-coloring framework achieves both better memory latency and lower energy costs in comparison to previous works.