Weiwen Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weiwen Chen is active.

Explore More

Publication

Featured researches published by Weiwen Chen.

Journal of Systems Architecture | 2018

Efficient Energy Management by Exploiting Retention State for Self-powered Nonvolatile Processors

Keni Qiu; Zhiyao Gong; Dongqin Zhou; Weiwen Chen; Yuanchao Xu; Xin Shi; Yongpan Liu

Abstract Energy harvesting instead of battery is a better power source for wearable devices due to many advantages such as long operation time without maintenance and comfort to users. However, harvested energy is naturally unstable and program execution will be interrupted frequently. To solve this problem, nonvolatile processor (NVP) has been proposed because it can back up volatile state before the system energy is depleted. However, this backup process also introduces non-negligible energy and area overhead. To improve the performance of NVP, retention state has been proposed recently which can enable a system to retain the volatile data to wait for power resumption instead of saving data immediately. The goal of this paper is to forward program execution as much as possible by exploiting retention state. Specifically, two objectives are achieved. The first objective is to minimize power failures of the system if there is a great probability to get power resumption during retention state. The second objective of this paper is to achieve maximum computation efficiency if it is unlikely to avoid power failure. Compared to the instant backup scheme, evaluation results report that power failure can be reduced by 81.6% and computation efficiency can be increased by 2.5x by the proposed retention state-aware energy management strategy.

acm symposium on applied computing | 2018

Low power driven loop tiling for RRAM crossbar-based CNN

Yuanhui Ni; Keni Qiu; Weiwen Chen; Lixue Xia; Yu Wang

Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Multiply and accumulate (MAC) operations serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design presents a very high overhead on peripheral circuits and memory accesses, limiting the gains of RCS. Addressing the problem, recently a Multi-CLP (Convolutional Layer Processor) structure has been proposed, where the FPGA controlling resources can be shared by multiple computation units. Exploiting this idea, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed, with the underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. This paper adopts the above structures. It is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different CLPs. A loop tiling technique is proposed to enable memory accesses bypassing and further improve the energy of RCS. And to guarantee correct data dependency between layers, the safe starting time for a layer is discussed if its previous layer is tiled in a different CLP. The experiments of two convolutional applications validate that the loop tiling technique integrated with the Multi-CLP structure can efficiently meet power budgets and further reduce energy consumption by 61.7%.

trust security and privacy in computing and communications | 2017

Queuing Theory-Guided Performance Evaluation for a Reconfigurable High-Speed Device Interconnected Bus

Weiwen Chen; Keni Qiu; Jiqin Zhou; Yuanhui Ni; Yuanchao Xu

UM-BUS, a reconfigurable high-speed device- interconnected bus, also characteristic of dynamic fault-tolerance and remote access has been proposed to enablelightweight sensor system design in IoTs. Performanceprediction is a key step to build an idea of the worst or best casesbefore real-world deployment of UM-BUS-based systems. Thispaper proposes a queuing theory-guided analytic model whichallows us to obtain an approximation for the average packetdelay as well as exact upper and lower bounds. A set ofexperiments based on MATLAB simulation are conducted to doperformance evaluation. Finally design insights are given forpragmatic implementation.

trust security and privacy in computing and communications | 2017

Pipeline Optimizations of Architecting STT-RAM as Registers in Rad-Hard Environment

Zhiyao Gong; Keni Qiu; Weiwen Chen; Yuanhui Ni; Yuanchao Xu; Jianlei Yang

Electromagnetic radiation effects can cause several types of errors on traditional SRAM-based registers such as single event upset (SEU) and single event functional interrupt (SEFI). Especially in aerospace where radiation is quite intense, the stability and correctness of systems are greatly affected. By exploiting the beneficial features of high radiation resistance and non-volatility, spin-transfer torque RAM (STT-RAM), a kind of emerging nonvolatile memory (NVM), is promising to be used as registers to avoid errors caused by radiation. However, substituting SRAM with STT-RAM in registers will affect system performance because STT-RAM suffers from long write latency. The early write termination (EWT) method has been accepted as an effective technique to mitigate write problems by terminating redundant writes. Based on the above background, this paper proposes to build registers by STT-RAM for embedded systems in rad-hard environment. Targeting the microarchitecture level of pipeline, the impact of architecting STT-RAM-based registers is discussed considering data hazard due to data dependencies. Furthermore, integrated with the EWT technique, a Read Merging method is proposed to eliminate redundant normal reads or sensing reads which are conducted along with a write. As a result of carrying out these actions, the energy and performance can be improved greatly. The results report 68% (and 75%) and 32% (and 39%) improvements on performance (and energy) by the proposed Read Merging method compared to the cases where STT-RAM is naively used as registers and intelligently used by integrating EWT, respectively.

international conference on hardware software codesign and system synthesis | 2017

Retention state-aware energy management for efficient nonvolatile processors: work-in-progress

Keni Qiu; Zhiyao Gong; Dongqin Zhou; Weiwen Chen; Yongpan Liu

Harvested energy is intrinsically unstable and program execution will be interrupted frequently. To solve this problem, nonvolatile processor (NVP) is proposed because it can back up volatile state before the system energy is depleted. However, the backup and the recovery processes also consume non-negligible energy and delay program progress. To improve the performance of NVP, retention state has been proposed recently which can enable a system to retain the volatile data to wait for power resumption instead of saving data immediately. The objective of this paper is to forward program execution progress as much as possible by exploiting the retention state. Compared to the instant backup scheme, preliminary evaluation results report that power failures can be reduced by 81.6% and computation efficiency can be increased by 105%.

embedded and real-time computing systems and applications | 2017

Retention state-enabled and progress-driven energy management for self-powered nonvolatile processors

Zhiyao Gong; Keni Qiu; Dongqin Zhou; Weiwen Chen; Yuanchao Xu; Xin Shi; Yongpan Liu

Energy harvesting instead of battery is a better power source for wearable devices due to many advantages such as long operation time without maintenance and comfort to users. However, harvested energy is naturally unstable and program execution will be interrupted frequently. To solve this problem, nonvolatile processor (NVP) has been proposed because it can back up volatile state before the system energy is depleted. However, this backup process also introduces non-negligible energy and area overhead. To improve the performance of NVP, retention state has been proposed recently which can enable a system to retain the volatile data to wait for power resumption instead of saving data immediately. The goal of this paper is to forward program execution as much as possible by exploiting retention state. Specifically, two objectives are achieved. The first objective is to minimize power failures of the system if there is a great probability to get power resumption during retention state. The second objective of this paper is to achieve maximum computation efficiency if it is unlikely to avoid power failure. Compared to the instant backup scheme, evaluation results report that power failure can be reduced by 81.6% and computation efficiency can be increased by 2.5x by the proposed retention state-aware energy management strategy.

Vlsi Design | 2017

State-Transition-Aware Spilling Heuristic for MLC STT-RAM-Based Registers

Yuanhui Ni; Zhiyao Gong; Weiwen Chen; Chengmo Yang; Keni Qiu

Multilevel Cell Spin-Transfer Torque Random Access Memory (MLC STT-RAM) is a promising nonvolatile memory technology to build registers for its natural immunity to electromagnetic radiation in rad-hard space environment. Unlike traditional SRAM-based registers, MLC STT-RAM exhibits unbalanced write state transitions due to the fact that the magnetization directions of hard and soft domains cannot be flipped independently. This feature leads to nonuniform costs of write states in terms of latency and energy. However, current SRAM-targeting register allocations do not have a clear understanding of the impact of the different write state-transition costs. As a result, those approaches heuristically select variables to be spilled without considering the spilling priority imposed by MLC STT-RAM. Aiming to address this limitation, this paper proposes a state-transition-aware spilling cost minimization (SSCM) policy, to save power when MLC STT-RAM is employed in register design. Specifically, the spilling cost model is first constructed according to the linear combination of different state-transition frequencies. Directed by the proposed cost model, the compiler picks up spilling candidates to achieve lower power and higher performance. Experimental results show that the proposed SSCM technique can save energy by 19.4% and improve the lifetime by 23.2% of MLC STT-RAM-based register design.

design, automation, and test in europe | 2018