Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuanqing Cheng is active.

Publication


Featured researches published by Yuanqing Cheng.


IEEE Transactions on Very Large Scale Integration Systems | 2013

Thermal-Constrained Task Allocation for Interconnect Energy Reduction in 3-D Homogeneous MPSoCs

Yuanqing Cheng; Lei Zhang; Yinhe Han; Xiaowei Li

3-D technology that stacks silicon dies with through silicon vias (TSVs) is a promising solution to overcome the interconnect scaling problem in giga-scale integrated circuits (ICs). Thermal dissipation is a major challenge for 3-D integration and prior thermal-balanced task scheduling methods for 3-D multiprocessor system-on-chips (MPSoCs) typically balance power gradient across vertical stacks based on the assumption of strong thermal correlation among processing cores within a stack. On the other hand, 3-D MPSoCs typically employ network-on-chip (NoC) as the communication infrastructure which consumes a large portion of the energy budget. As TSVs consume much less energy than horizontal links in 3-D MPSoCs when transmitting the same amount data due to the reduced interconnect distance between vertical adjacent cores, it motivates to allocate heavily communicating tasks within the same vertical stack as much as possible, and thus traffic is restricted in the third dimension to reduce interconnect energy. However, aggregating active tasks within the same stack probably exacerbates the power density and result in hot spots. In this paper, we explore the tradeoff between thermal and interconnect energy when allocating tasks in 3-D Homogeneous MPSoCs, and propose an efficient heuristic. Experimental results show that the proposed technique can reduce interconnect energy by more than 25% on average with almost the same peak temperature when compared with prior thermal-balanced solutions.


Journal of Physics D | 2014

A radiation hardened hybrid spintronic/CMOS nonvolatile unit using magnetic tunnel junctions

Wang Kang; Weisheng Zhao; Erya Deng; Jacques-Olivier Klein; Yuanqing Cheng; D. Ravelosona; Youguang Zhang; C. Chappert

Conventional complementary metal-oxide semiconductor (CMOS)-based devices are approaching their physical limits to continue the Moores law. Spintronic devices that exploit the intrinsic spin freedom of the electron in addition to its fundamental electrical charge show great potential in nanoscale technology nodes. In particular, hybrid spintronic/CMOS technology based on magnetic tunnel junctions (MTJs) has been considered as a very promising approach thanks to the high speed, low power, good scalability and full compatibility of MTJs with CMOS technology. It is also considered as a potential technology for high-reliability electronics due to the intrinsic hardness to the radiation effect of the MTJs. However, hybrid spintronic/CMOS circuits are still vulnerable to the radiation effect due to their CMOS peripheral circuits (e.g. write/read circuits) during the access operations. In this paper, we propose a radiation hardened hybrid spintronic/CMOS nonvolatile unit to address this issue effectively. By using a physics-based pMTJ compact model and a 40 nm CMOS design kit, hybrid simulations are performed to demonstrate the performance of the proposed unit. The simulation results show that the proposed unit is quite robust against radiation effects with less overhead in terms of hardware area and power consumption compared with the previous works.


IEEE Transactions on Electron Devices | 2015

Compact Model of Subvolume MTJ and Its Design Application at Nanoscale Technology Nodes

Yue Zhang; Bonan Yan; Wang Kang; Yuanqing Cheng; Jacques-Olivier Klein; Youguang Zhang; Yiran Chen; Weisheng Zhao

The current-induced perpendicular magnetic anisotropy magnetic tunnel junctions (p-MTJs) offer a number of advantages, such as high density and high speed. As p-MTJs downscale to ~40 nm, further performance enhancements can be realized thanks to high spin-torque efficiency, i.e., lower critical current density and higher thermal stability. In this paper, we investigate the origin of high spin-torque efficiency and give a phenomenological theory to describe the critical current reduction due to the subvolume activation. Based on various physical theories and structural parameters, a compact model of nanoscale MTJ is developed and demonstrates a satisfactory agreement with experimental results. Dynamic, static, and stochastic switching behaviors have been addressed and validated. Then, we perform mixed simulations for hybrid MTJ/CMOS read/write circuits, magnetic random access memory, and magnetic flip-flop to evaluate their performance. Analyses of energy consumption are given to show the prospect of MTJ technology node miniaturization.


ieee computer society annual symposium on vlsi | 2014

HARS: A High-Performance Reliable Routing Scheme for 3D NoCs

Jun Zhou; Huawei Li; Yuntan Fang; Tiancheng Wang; Yuanqing Cheng; Xiaowei Li

The poor yield of current available processes for Through-Silicon Via (TSV) fabrication leads to serious influence on the robustness of the vertical communications in 3D NoCs. The fault-tolerant routing scheme has been regarded as an effective mechanism to ensure the performance of 2D NoCs. In this paper, we propose a high-performance reliable routing scheme HARS, which is deadlock-free by obeying a mid-node-searching method raised for 3D Mesh NoCs without requiring any Virtual Channels (VCs). In HARS, we adopt DyADM routing, extending the classical 2D routing algorithm DyAD to 3D scenario in presence of permanent faults on the vertical links. HARS is able to support both one-fault and multi-fault models. The experimental results show that HARS has better performance, improved reliability and lower overhead compared to the state-of-the-art reliable routing schemes.


asian test symposium | 2011

Wrapper Chain Design for Testing TSVs Minimization in Circuit-Partitioned 3D SoC

Yuanqing Cheng; Lei Zhang; Yinhe Han; Jun Liu; Xiaowei Li

Three dimensional (3D) System-on-Chips (SoCs) that typically employ through-silicon vias (TSVs) as vertical interconnects, emerge as a promising solution to continue Moores law. Whereas, it also brings challenging problems, one of which is the test wrapper chain design and optimization, especially for circuit-partitioned 3D SoCs in which scan chains can cross among layers. Test time is the primary goal for wrapper chain design, both for 2D and 3D SoCs. The 3D SoC wrapper chain design problem can be converted into the well-studied2D one by projecting wrapper chain components of all layers to one virtual layer. Thereafter, we can leverage 2D optimization algorithms to determine the composition of wrapper chains and thus guarantee minimal testing time for 3D SoCs. One specific thing for circuit-partitioned 3D SoCs is that TSVs are needed to connect cross-layer wrapper structures to form the wrapper chains. As TSVs occupy planar chip area and will aggravate the routing congestion problem, it is necessary to reduce TSVs for test purpose as much as possible. In this work, we observe that by varying the connection orders of wrapper chain components, e.g., scan chains and I/O cells, the TSVs consumed vary significantly. Based on the above, we formulate this problem and propose novel heuristic to tackle it. Experimental results show that the proposed solution can save on average 33.2% amount of TSVs when compared to a prior intuitive method.


asia and south pacific design automation conference | 2016

Architecture design with STT-RAM: Opportunities and challenges

Ping Chi; Shuangchen Li; Yuanqing Cheng; Yu Lu; Seung H. Kang; Yuan Xie

The emerging spin-transfer torque magnetic random-access memory (STT-RAM) has attracted a lot of interest from both academia and industry in recent years. It has been considered as a promising replacement of SRAM and DRAM in the cache and memory system design thanks to many advantages, including non-volatility, low leakage power, SRAM comparable read performance and read energy consumption, higher density than SRAM, better scalability than conventional CMOS technologies, and good CMOS compatibility. However, the disadvantages of STT-RAM, such as higher write energy and longer write latency than SRAM, also bring design challenges. This paper introduces state-of-the-art architectural approaches to adopt STT-RAM in the cache and memory system design by taking advantage of the opportunities brought by STT-RAM as well as overcoming the challenges.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016

Radiation-Induced Soft Error Analysis of STT-MRAM: A Device to Circuit Approach

Jianlei Yang; Peiyuan Wang; Yaojun Zhang; Yuanqing Cheng; Weisheng Zhao; Yiran Chen; Hai Helen Li

Spin-transfer torque magnetic random access memory (STT-MRAM) is a promising emerging memory technology due to its various advantageous features such as scalability, nonvolatility, density, endurance, and fast speed. However, the reliability of STT-MRAM is severely impacted by environmental disturbances because radiation strike on the access transistor could introduce potential write and read failures for 1T1MTJ cells. In this paper, a comprehensive approach is proposed to evaluate the radiation-induced soft errors spanning from device modeling to circuit level analysis. The simulation based on 3-D metal-oxide-semiconductor transistor modeling is first performed to capture the radiation-induced transient current pulse. Then a compact switching model of magnetic tunneling junction (MTJ) is developed to analyze the various mechanisms of STT-MRAM write failures. The probability of failure of 1T1MTJ is characterized and built as look-up-tables. This approach enables designers to consider the effect of different factors such as radiation strength, write current magnitude and duration time on soft error rate of STT-MRAM memory arrays. Meanwhile, comprehensive write and sense circuits are evaluated for bit error rate analysis under random radiation effects and transistors process variation, which is critical for performance optimization of practical STT-MRAM read and sense circuits.


asia and south pacific design automation conference | 2017

Building energy-efficient multi-level cell STT-RAM caches with data compression

Liu Liu; Ping Chi; Shuangchen Li; Yuanqing Cheng; Yuan Xie

Spin-transfer torque magnetic random access memory (STT-RAM) technology has emerged as a potential replacement of SRAM in cache design, especially for building large-scale and energy-efficient last level caches. Compared with singlelevel cell (SLC), multi-level cell (MLC) STT-RAM is expected to double cache capacity and increase system performance. However, the two-step read/write access schemes incur considerable energy consumption and performance degradation. In this paper, we propose two techniques using data compression to optimize MLC STT-RAM cache design. The first technique tries to compress a cache line and fit it into only the soft-bit region of the cells, so that reading or writing this cache line takes only one step which is fast and energy-efficient. We introduce a second technique to increase the cache capacity by enabling the left hard-bit region to store another compressed cache line, which can improve the system performance for memory intensive workloads. The experimental results show that, compared with a conventional MLC STT-RAM last level cache design, our overhead minimized technique reduces the dynamic energy consumption by 38.2% on average with the same system performance, and our capacity augmented technique boosts the system performance by 6.1% with 19.2% dynamic energy saving on average, across the evaluated multi-programmed benchmarks.


IEEE Transactions on Very Large Scale Integration Systems | 2017

STT-RAM Buffer Design for Precision-Tunable General-Purpose Neural Network Accelerator

Lili Song; Ying Wang; Yinhe Han; Huawei Li; Yuanqing Cheng; Xiaowei Li

Multilevel spin toque transfer RAM (STT-RAM) is a suitable storage device for energy-efficient neural network accelerators (NNAs), which relies on large-capacity on-chip memory to support brain-inspired large-scale learning models from conventional artificial neural networks to current popular deep convolutional neural networks. In this paper, we investigate the application of multilevel STT-RAM to general-purpose NNAs. First, the error-resilience feature of neural networks is leveraged to tolerate the read/write reliability issue in multilevel cell STT-RAM using approximate computing. The induced read/write failures at the expense of higher storage density can be effectively masked by a wide spectrum of NN applications with intrinsic forgiveness. Second, we present a precision-tunable STT-RAM buffer for the popular general-purpose NNA. The targeted STT-RAM memory design is able to transform between multiple working modes and adaptable to meet the varying quality constraint of approximate applications. Lastly, the reconfigurable STT-RAM buffer not only enables precision scaling in NNA but also provides adaptiveness to the demand for different learning models with distinct working-set sizes. Particularly, we demonstrate the concept of capacity/precision-tunable STT-RAM memory with the emerging reconfigurable deep NNA and elaborate on the data mapping and storage mode switching policy in STT-RAM memory to achieve the best energy efficiency of approximate computing.


international symposium on nanoscale architectures | 2015

An architecture-level cache simulation framework supporting advanced PMA STT-MRAM

Bi Wu; Yuanqing Cheng; Ying Wang; Aida Todri-Sanial; Guangyu Sun; Lionel Torres; Weisheng Zhao

With integration density on-chip rocketing up, leakage power dominates the whole power budget of contemporary CMOS technology based memory, especially for SRAM based on-chip cache. To overcome the aggravating “power wall” issue, some emerging memory technologies such as STT-MRAM (Spin transfer torque magnetic RAM), PCRAM (Phase change RAM), and ReRAM(Resistive RAM) are proposed as promising candidates for next generation cache design. Although there are several existing simulation tools available for cache design, such as NVSim and CACTI, they either cannot support the most advanced PMA (Perpendicular magnetic anisotropy) STT-MRAM model or lack the ability for multi-banked large capacity cache simulation. In this paper, we propose an architecture level design framework for cache design from device level up to array structure level, which can support the most advanced PMA STT-MRAM technology. The simulation results are analyzed and compared with those produced by NVSim, which prove the correctness of our framework. The potential benefits of PMA STT-MRAM used as multi-banked L2 and L3 cache are also investigated in the paper. We believe that our framework will be helpful for computer architecture researchers to adopt the PMA STT-MRAM in on-chip cache design.

Collaboration


Dive into the Yuanqing Cheng's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaowei Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Ying Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yinhe Han

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Bi Wu

Beihang University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge