Keni Qiu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Keni Qiu is active.

Explore More

Publication

Featured researches published by Keni Qiu.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2014

Error Model Guided Joint Performance and Endurance Optimization for Flash Memory

Liang Shi; Keni Qiu; Mengying Zhao; Chun Jason Xue

As flash memory has better performance than hard disks, it has been widely applied in embedded systems, personal computers, and data centers as storage components. However, endurance and write performance are the two key challenges in the deployment of flash memory. In this paper, with the awareness of errors induced from write operations, endurance, and retention time, a stage-based optimization approach is proposed to improve the write performance and endurance at different usage stages of flash memory. A series of trace-driven simulations show that the proposed approach outperforms a set of state-of-the-art approaches in terms of write performance and lifetime.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2014

Migration-Aware Loop Retiming for STT-RAM-Based Hybrid Cache in Embedded Systems

Keni Qiu; Mengying Zhao; Qingan Li; Chenchen Fu; Chun Jason Xue

Recently hybrid cache architecture consisting of both spin-transfer torque RAM (STT-RAM) and SRAM has been proposed for energy efficiency. In hybrid caches, migration-based techniques have been proposed. A migration technique dynamically moves write-intensive and read-intensive data between STT-RAM and SRAM to explore the advantages of hybrid cache. Meanwhile, migrations also introduce extra reads and writes during data movements. For stencil loops with read and write data dependencies, we observe that migration overhead is significant, and migrations closely correlate to the interleaved read and write memory access pattern in a memory block. This paper proposes a loop retiming framework during compilation to reduce the migration overhead by changing the interleaved memory access pattern. With the proposed loop retiming technique, the interleaved memory accesses can be significantly reduced so that migration overhead is mitigated, and energy efficiency of hybrid cache is significantly improved. The experimental results have shown that, with the proposed methods, on average, the migration number is reduced up to 27.1% and the cache dynamic energy is reduced up to 14.0%.

ACM Transactions in Embedded Computing Systems | 2014

Branch Prediction-Directed Dynamic Instruction Cache Locking for Embedded Systems

Keni Qiu; Mengying Zhao; Chun Jason Xue; Alex Orailoglu

Cache locking is a cache management technique to preclude the replacement of locked cache contents. Cache locking is often used to improve cache access predictability in Worst-Case Execution Time (WCET) analysis. Static cache locking methods have been proposed recently to improve average system performance. This paper presents an approach, Branch Prediction directed Dynamic Cache Locking (BPDCL), to improve average system performance through effective cache conflict miss reduction in different execution regions. In this proposed approach, the control flow graph of a program is partitioned into regions and memory blocks worth locking for each region are calculated during compilation time. At runtime, directed by branch predictions, locking routines are prefetched into a high-speed buffer. The pre-determined cache locking contents are loaded and locked at specific execution points during program execution. Experimental results show that the proposed BPDCL method exhibits an average improvement of 21.8% and 10.3% on cache miss rate reduction in comparison to the case with no cache locking and the static locking method respectively.

application specific systems architectures and processors | 2013

Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems

Keni Qiu; Mengying Zhao; Chenchen Fu; Liang Shi; Chun Jason Xue

In hybrid cache architecture consisting of both STT-RAM and SRAM, migration based techniques have been proposed. The migration technique dynamically moves write-intensive and read-intensive data between STT-RAM and SRAM to explore the advantage of hybrid cache. Meanwhile, migrations induce extra read and write overhead during data movements. For loops with intensive data array operations, we observe that migration overhead is significant and migrations closely correlate to the interleaved read and write access pattern in a memory block. This paper proposes a loop retiming framework to reduce the migration overhead by changing the interleaved memory access pattern. The experimental results show that with the proposed method, migrations are significantly reduced without any hardware modification. As a result, energy efficiency and performance of hybrid cache can be improved.

international symposium on quality electronic design | 2016

UM-BUS: An online fault-tolerant bus for embedded systems

Jiqin Zhou; Weigong Zhang; Keni Qiu; Xiaoyan Zhu

Miniaturization, integrated design and comprehensive information utilization are increasing in popularity in embedded system design. In this paper, we present UMBUS, an online fault-tolerant bus suited for embedded systems. UM-BUS is a high-speed, dynamic reconfigurable serial bus with N (≤ 32) concurrent lanes of mutual redundant structure. It provides the capability of remote memory accessing with maximum communication distance of up to 40 meters. In the case of allowing performance reduction of 50%, any faults in N/2 lanes can be tolerated by reconfiguring the lanes online. Based on the topology of UM-BUS, we introduce a new type of plug and play architecture for embedded systems. It can break the traditional chassis bounds of embedded systems by distributing their in-chassis modules to remote locations. Through an experimental validation system, we demonstrate the feasibility of UM-BUS on real-world applications.

computing frontiers | 2015

Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads

Jing Wang; Junwei Zhang; Weigong Zhang; Keni Qiu; Tao Li; Minhua Wu

The breakdown of Dennard scaling has made computing energy limited and therefore restricts the performance and brings rise to dark silicon. To effectively leverage the advantage of increased number of transistors and alleviate the dark silicon problem, designers consider a set of design paradigms in the processor manufacturing. Among those, Near - Threshold Voltage Computing (NTC) is a promising candidate. However, prior efforts largely focus on a specific design option based on legacy desktop applications, lacking comprehensive analysis of emerging scale-out applications with multiple design options. In this paper, we characterize different perspectives including performance and energy efficiency in the context of NTC cloud processors by running emerging scale-out workloads. We find NTC can improve performance by 1.6X, and improve energy efficiency by 50%. Meanwhile, we also show that tiled-OoO architecture improve performance of scale-out workloads upto 3.7X and energy efficiency upto 6X over alternative chip organizations, making it a preferable design paradigm for scale-out workloads. We believe that our observations will provide insights for the design of cloud processors in the era of dark silicon.

ifip ieee international conference on very large scale integration | 2016

Redesigning software and systems for non-volatile processors on self-powered devices

Mengying Zhao; Keni Qiu; Yuan Xie; Jingtong Hu; Chun Jason Xue

Wearable devices gain increasing popularity since they can collect important information for healthcare and well-being purposes. Compared with battery, energy harvesting is a better power source for these wearable devices due to many advantages. However, harvested energy is naturally unstable and program execution will be interrupted frequently. Nonvolatile processor (NVP) demonstrates promising advantages to back up volatile state before the system energy is depleted. Due to the backup and resumption procedures resulted from frequent power failures, non-volatile processor exhibits different characteristics from traditional processors, necessitating a set of adaptive design and optimization strategies. Recently, there have been both hardware and software researches aiming to develop correct and efficient non-volatile processors. In this paper, we summarize the software-level techniques for NVP, covering error-correctness schemes, backup timing determination, backup content optimization, adaptive software modifications and NVP simulators and tools, to provide an overview of state-of-the-art NVP research from the software and system level.

Journal of Systems Architecture | 2018

Efficient Energy Management by Exploiting Retention State for Self-powered Nonvolatile Processors

Keni Qiu; Zhiyao Gong; Dongqin Zhou; Weiwen Chen; Yuanchao Xu; Xin Shi; Yongpan Liu

Abstract Energy harvesting instead of battery is a better power source for wearable devices due to many advantages such as long operation time without maintenance and comfort to users. However, harvested energy is naturally unstable and program execution will be interrupted frequently. To solve this problem, nonvolatile processor (NVP) has been proposed because it can back up volatile state before the system energy is depleted. However, this backup process also introduces non-negligible energy and area overhead. To improve the performance of NVP, retention state has been proposed recently which can enable a system to retain the volatile data to wait for power resumption instead of saving data immediately. The goal of this paper is to forward program execution as much as possible by exploiting retention state. Specifically, two objectives are achieved. The first objective is to minimize power failures of the system if there is a great probability to get power resumption during retention state. The second objective of this paper is to achieve maximum computation efficiency if it is unlikely to avoid power failure. Compared to the instant backup scheme, evaluation results report that power failure can be reduced by 81.6% and computation efficiency can be increased by 2.5x by the proposed retention state-aware energy management strategy.

international conference on parallel processing | 2016

Exploring Variation-Aware Fault-Tolerant Cache under Near-Threshold Computing

Jing Wang; Yanjun Liu; Weigong Zhang; Kezhong Lu; Keni Qiu; Xin Fu; Tao Li

Near threshold voltage computing enables transistor voltage scaling to continue with Moores Law projection and dramatically improves power and energy efficiency. However, reducing the supply voltage to near-threshold level significantly increases the susceptibility of on-chip caches to process variations, leading to the high error rate. Most existing fault-tolerant schemes significantly sacrifice cache capacity and performance. In this paper, we propose a novel fault-tolerant cache architecture at near-threshold computing, which is suitable for high error rate memories. We first propose a variation-aware skewed-associative cache, and then redirect the faulty blocks to the error-free blocks based on it to explore the fault-tolerance cache design. Unlike previous cache reconfiguration schemes for the fault tolerance, our cache design does not need to sacrifice or disable any fault-free blocks to form a completely functional set. We use all error-free blocks and have the least cache capacity waste. More importantly, since the aging impact could also cause cell failures, our skewed cache takes the aggregated process variation and aging impact into the consideration. Last but not least, our skewed cache design avoids the complex remapping from faulty blocks to the error-free blocks and minimizes the hardware overheads. Our evaluation results show that our variation-aware fault-tolerant cache design exhibits strong capability to tolerate the high error rate, and more excitingly, its effectiveness on reducing the cache miss rate and improving the performance is even more obvious as the supply voltage scales down to the near-threshold region.

embedded and real-time computing systems and applications | 2013