Xianwei Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xianwei Zhang is active.

Explore More

Publication

Featured researches published by Xianwei Zhang.

design, automation, and test in europe | 2015

Exploiting DRAM restore time variations in deep sub-micron scaling

Xianwei Zhang; Youtao Zhang; Bruce R. Childers; Jun Yang

Recent studies reveal that one of the major challenges in scaling DRAM in deep sub-micron regime is its significant variations on cell restore time, which affects timing constraints such as write recovery time tWR. Adopting traditional approaches results in either low yield rate or large performance degradation. In this paper, we propose schemes to expose the variations to the architectural level. By constructing memory chunks with different accessing speeds and, in particular, exploiting the performance benefits of fast chunks, a variation-aware memory controller can effectively compensate the performance loss due to relaxed timing constraints. Our experimental results show that, comparing to traditional designs such as row sparing and ECC, the proposed schemes help to improve system performance by up to 10.3% and 12.9%, respectively, for 20nm and 14nm tech nodes on a 4-core multiprocessor system.

international symposium on low power electronics and design | 2013

WoM-SET: low power proactive-SET-based PCM write using WoM code

Xianwei Zhang; Lei Jiang; Youtao Zhang; Chuanjun Zhang; Jun Yang

The emerging Phase Change Memory (PCM), while having many advantages, suffers from slow write operations. This is mainly due to its asymmetric write characteristic, i.e., for two types of write operations of PCM, SET is much slower than RESET. Recent study has shown that proactively setting dirty memory lines to all `1s can enable RESET-only writes when these lines are written back from the cache, which helps to reduce the effective write latency. Unfortunately, it results in higher write power demand. In this paper, we propose WoM-SET, a low power proactive-SET-based write strategy. By exploiting the WoM (write-once memory) code, we greatly reduce the number of RESETs per write and hence the write power demand. By applying our design only to write-intensive pages, we restrict the extra space requirement in WoM-SET. Our experiments show that WoM-SET achieves 40% RESET bit reduction, 40% write power reduction, and 12% energy-delay-product improvement over the PreSET scheme.

high-performance computer architecture | 2016

Restore truncation for performance improvement in future DRAM systems

Xianwei Zhang; Youtao Zhang; Bruce R. Childers; Jun Yang

Scaling DRAM below 20nm has become a major challenge due to intrinsic limitations in the structure of a bit cell. Future DRAM chips are likely to suffer from significant variations and degraded timings, such as taking much more time to restore cell data after read and write access. In this paper, we propose restore truncation (RT), a low-cost restore strategy to improve performance of DRAM modules that adopt relaxed restore timing. After an access, RT restores a bit cells voltage only to the level required to persist data to the next scheduled refresh rather than to the default full voltage. Because restore time is shortened, the performance of the cell is improved under process variations. We devise two schemes to balance performance, energy consumption, and hardware overhead. We simulate our proposed RT schemes and compare them with the state of the art. Experimental results show that, on average, RT improves performance by 19.5% and reduces energy consumption by 17%.

international conference on computer design | 2015

TriState-SET: Proactive SET for improved performance of MLC phase change memories

Xianwei Zhang; Youtao Zhang; Jun Yang

The emerging Phase Change Memory (PCM) has many advantages such as good scalability and low leakage. MLC (Multi-Level Cell) PCM further extends the benefits by storing two or more bits per cell and thus reducing the per bit cost. However, adopting MLC PCM in main memory often leads to long write latency, high energy consumption, and degraded performance. In this paper, we propose TriState-SET, a proactive-SET based write strategy for improving MLC PCM write performance. TriState-SET proactively places device cells of a dirty memory line in full SET state. By utilizing only three states of 2bit MLC PCM, TriState-SET involves only fast state transitions when writing such a line at write-back time. Our experimental results show that TriState-SET increases performance by 11% and saves system energy by 6.7% (up to 12.2%), while achieving up to 25% (average 14.1%) energy-delay-product improvement.

ACM Transactions on Design Automation of Electronic Systems | 2017

On the Restore Time Variations of Future DRAM Memory

Xianwei Zhang; Youtao Zhang; Bruce R. Childers; Jun Yang

As the de facto main memory standard, DRAM (Dynamic Random Access Memory) has achieved dramatic density improvement in the past four decades, along with the advancements in process technology. Recent studies reveal that one of the major challenges in scaling DRAM into the deep sub-micron regime is its significant variations on cell restore time, which affect timing constraints such as write recovery time. Adopting traditional approaches results in either low yield rate or large performance degradation. In this article, we propose schemes to expose the variations to the architectural level. By constructing memory chunks with different access speeds and, in particular, exploiting the performance benefits of fast chunks, a variation-aware memory controller can effectively mitigate the performance loss due to relaxed timing constraints. We then proposed restore-time-aware rank construction and page allocation schemes to make better use of fast chunks. Our experimental results show that, compared to traditional designs such as row sparing and Error Correcting Codes, the proposed schemes help to improve system performance by about 16% and 20%, respectively, for 20nm and 14nm technology nodes on a four-core multiprocessor system.

international conference on computer design | 2015

DLB: Dynamic lane borrowing for improving bandwidth and performance in Hybrid Memory Cube

Xianwei Zhang; Youtao Zhang; Jun Yang

The Hybrid Memory Cube (HMC) is an innovative DRAM architecture that adopts 3D-stacking to improve bandwidth and save energy. An HMC module adopts separate receive and transmit lanes and thus may achieve the maximal memory bandwidth only if data can be driven at full speed in both directions. However, due to the natural read and write imbalance in modern applications, the effective memory bandwidth utilization is often low, leading to suboptimal system performance. In this paper, we propose DLB (dynamic lane borrowing) that dynamically tracks link utilization and partitions the lanes in one link between receive and transmit directions. DLB allocates more lanes to transmit if servicing read-intensive applications. With more lanes allocated to either direction, DLB reduces the lane contention along that direction and thus the average memory access latency. Our experimental results show that DLB improves the bandwidth utilization by 10.4% on average, reduces the average utilization gap in two directions from 35.6% to 12.8%, and saves execution time by as much as 22.3%.

international conference on computer design | 2015

Exploit common source-line to construct energy efficient domain wall memory based caches

Xianwei Zhang; Lei Zhao; Youtao Zhang; Jun Yang

Domain wall memory (DWM) is an emerging memory technology that utilizes magnetic domains along a nanowire to achieve high density, short latency and low power. Recent studies showed that it is promising to replace SRAM and STT-MRAM to construct DWM based on-chip caches. However, accessing DWM requires frequent shift operations, which leads to large energy consumption for DWM caches. In this paper, we propose DWM-SSL, an architectural innovation to achieve energy efficiency for multiple-head based DWM caches. DWM-SSL adopts common source line design to re-organize DWM cell arrays such that accessing an N-bit cache line from M-head DWM based cache activates N/M tracks instead of N tracks in the baseline. Our experimental results show that, on average, DWM-SSL reduces around 5.1x track shifts and up to 63% cache energy consumption for a 4-head DWM cache design.

Proceedings of the Second International Symposium on Memory Systems | 2016

AWARD: Approximation-aWAre Restore in Further Scaling DRAM

Xianwei Zhang; Youtao Zhang; Bruce R. Childers; Jun Yang

DRAM further scaling becomes more and more challenging, making restore operation an serious issue in the near future. Fortunately, a wide range of modern applications are able to tolerate error or inexactness, providing a new dimension to mitigate the slow-restore issue. And thus, we can trade-off acceptable QoS loss in those applications to accelerate restore operations, and further to achieve performance and energy improvements. In this extended research abstract, we briefly explore DRAM restore-based approximate computing, and present a preliminary evaluation on impacts of quality-of-service (QoS) degradation and performance speedup. We show that restore-based approximate computing is a challenging work, and dedicated error correction/tolerance techniques are needed to balance QoS and performance.

international conference on parallel architectures and compilation techniques | 2017

DrMP: Mixed Precision-Aware DRAM for High Performance Approximate and Precise Computing

Xianwei Zhang; Youtao Zhang; Bruce R. Childers; Jun Yang

Recent studies showed that DRAM restore time degrades as technology scales, which imposes large performance and energy overheads. This problem, prolonged restore time (PRT), has been identified by the DRAM industry as one of three major scaling challenges.This paper proposes DrMP, a novel fine-grained precision-aware DRAM restore scheduling approach, to mitigate PRT. The approach exploits process variations (PVs) within and across DRAM rows to save data with mixed precision. The paper describes three variants of the approach: DrMP-A, DrMP-P, and DrMP-U. DrMP-A supports approximate computing by mapping important data bits to fast row segments to reduce restore time for improved performance at a low application error rate. DrMP-P pairs memory rows together to reduce the average restore time for precise computing. DrMP-U combines DrMP-A and DrMP-P to better trade performance, energy consumption, and computation precision. Our experimental results show that, on average, DrMP achieves 20% performance improvement and 15% energy reduction over a precision-oblivious baseline. Further, DrMP achieves an error rate less than 1% at the application level for a suite of benchmarks, including applications that exhibit unacceptable error rates under simple approximation that does not differentiate the importance of different bits.

international conference on information science and digital content technology | 2012