Is this you? Create Your Porfile

Danghui Wang

Northwestern Polytechnical University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Danghui Wang is active.

Explore More

Publication

Featured researches published by Danghui Wang.

international conference on computer aided design | 2013

Unleashing the potential of MLC STT-RAM caches

Xiuyuan Bi; Mengjie Mao; Danghui Wang; Hai Li

In this paper, we study the use of multi-level cell (MLC) spin-transfer torque RAM (STT-RAM) in cache design of embedded systems and microprocessors. Compared to the single level cell (SLC) design, a MLC STT-RAM cache is expected to offer higher density and faster system performance. However, the cell design constrains, such as the switching current requirement and asymmetry in write operations, severely limit the density benefit of the conventional MLC STT-RAM. The two-step read/write accesses and inflexible data mapping strategy in the existing MLC STT-RAM cache architecture may even result in system performance degradation. To unleash the real potential of MLC STT-RAM cache, we propose a cross-layer solution. First, we introduce the reverse magnetic junction tunneling (MTJ) into MLC cell design, which offers a more balanced device and design tradeoff and enables 2x storage density than SLC. At architectural level, we propose a cell split mapping method to divide cache lines into fast and slow regions and data migration policies to allocate the frequently-used data to fast regions. Furthermore, an application-aware speed enhancement mode is utilized to adaptively tradeoff cache capacity and speed, satisfying different requirements of various applications. Simulation results show that the proposed techniques can improve the system performance by 10.3% and reduce the energy consumption on cache by 26.0% compared with conventional MLC STT-RAM.

international conference on computer aided design | 2013

CD-ECC: content-dependent error correction codes for combating asymmetric nonvolatile memory operation errors

Wujie Wen; Mengjie Mao; Xiaochun Zhu; Seung H. Kang; Danghui Wang; Yiran Chen

The write operation asymmetry of many memory technologies causes different write failure rates at 0 →1 and 1 → 0 bit-flippings. Conventional error correction codes (ECCs) spend the same efforts on both bit-flipping directions, leading to very unbalanced write reliability enchantment over different bit-flipping distributions of codewords (i.e., the number of 0 →1 or 1 → 0 bit-flippings). In this work, we developed an analytic asymmetric write channel (AWC) model to analyze the asymmetric write errors in spin-transfer torque random access memory (STT-RAM) designs. A new ECC design concept, namely, content-dependent ECC (CD-ECC), is proposed to achieve balanced error correction at both bit-flipping directions. Two CD-ECC schemes - typical-corner-ECC (TCE) and worst-corner-ECC (WCE), are designed for the codewords with different bit-flipping distributions. Our simulation results show that compared to the common ECC schemes utilized in embedded applications like Hamming code, CD-ECCs can improve the STT-RAM write reliability by 10 - 30x with low hardware overhead and very marginal impact on system performance.

asia and south pacific design automation conference | 2014

DPA: A data pattern aware error prevention technique for NAND flash lifetime extension

Jie Guo; Zhijie Chen; Danghui Wang; Zili Shao; Yiran Chen

The recent research reveals that the bit error rate of a NAND flash cell is highly dependent on the stored data patterns. In this work, we propose Data Pattern Aware (DPA) error protection technique to extend the lifespan of NAND flash based storage systems (NFSS). DPA manipulates the ratio of 1s and 0s in the stored data to minimize occurrence of the data patterns which are susceptible to bit error noise. Consequently, the NAND flash cell bit error rate is reduced, leading to system endurance extension. Our simulation result shows that, with marginal hardware and power overhead, DPA scheme can increase the NFSS lifetime by up to 4×, offering a complementing solution to other lifetime enhancement techniques like wear-leveling.

asia and south pacific design automation conference | 2016

Improving read performance of STT-MRAM based main memories through Smash Read and Flexible Read

Lei Jiang; Wujie Wen; Danghui Wang; Lide Duan

Spin Transfer Torque Magnetoresistive RAM (STT-MRAM) has been recently deemed as one promising main memory alternative for high-end mobile processors. With process technology scaling, the amplitude of write current approaches that of read current in deep sub-micrometer STT-MRAM arrays. As a result, read disturbance errors (RDEs) emerge. Both high current restore required (HCRR) reads and low current long latency (LCLL) reads can guarantee read reliability and utterly remove RDEs. However, both of them degrade system performance, because of extra restores or a longer read latency. And neither of them always achieves the better performance when running a wide variety of applications. In this paper, we present two architectural techniques to boost read performance for STT-MRAM based main memories in the presence of RDEs. We first propose Smash Read (S-RD) to shorten the latency of HCRR reads by injecting a larger read current. We further introduce Flexible Read (F-RD) to dynamically adopt different types of read schemes, S-RD and LCLL, to maximize main memory system performance. On average, our techniques improve system performance by 9~13% and reduces total energy by 4~8% over all existing read schemes including HCRR and LCLL.

design automation conference | 2015

FlexLevel: a novel NAND flash storage system design for LDPC latency reduction

Jie Guo; Wujie Wen; Jingtong Hu; Danghui Wang; Hai Li; Yiran Chen

LDPC code is introduced in NAND flash memory to handle high BER (bit error rate) incurred by technology scaling. Despite strong error correction capability, LDPC decoding induces long NAND flash read latency. In this work, we propose FlexLevel - a robust NAND flash storage system design to improve data reliability and read efficiency affected by the LDPC operations. FlexLevel first reduces BER by enlarging noise margins via Vth (threshold voltage) level reduction. It reduces the sensing levels of LDPC but also causes loss of storage capacity. To compensate this capacity loss with minimum impact on read performance, FlexLevel identifies the data with high LDPC overhead and only applies the Vth level reduction technique to those data. Experimental results show that compared with state-of-the-art, FlexLevel can achieve up to 33% read speedup with very moderate capacity loss.

Applied Physics Letters | 2011

Temperature dependence of Raman scattering from 4H-SiC with hexagonal defects

R. Han; B. Han; Danghui Wang; Chang Ming Li

Noncontact temperature measurements based on Raman scattering were performed on 4H-SiC with hexagonal defects. These measurements show that the four-phonon process makes a greater contribution to the E2(TO) mode than to the E1(TO) mode. The longer lifetimes of E2(TO) and E1(TO) phonons in hexagonal defects demonstrate that there are fewer possible decay channels than in the defect free zone. The absence of electronic Raman peaks in the hexagonal defects suggests that hexagonal defects seriously limit the uniformity of the nitrogen distribution. The intensity of electronic Raman spectra is related to the density of neutral nitrogen atoms.

international symposium on low power electronics and design | 2017

XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs

Lei Jiang; Minje Kim; Wujie Wen; Danghui Wang

It is challenging to adopt computing-intensive and parameter-rich Convolutional Neural Networks (CNNs) in mobile devices due to limited hardware resources and low power budgets. To support multiple concurrently running applications, one mobile device needs to perform multiple CNN tests simultaneously in real-time. Previous solutions cannot guarantee a high enough frame rate when serving multiple applications with reasonable hardware and power cost. In this paper, we present a novel process-in-memory architecture to process emerging binary CNN tests in Wide-IO2 DRAMs. Compared to state-of-the-art accelerators, our design improves CNN test performance by 4× ∼ 11× with small hardware and power overhead.

2014 International Conference on Computing, Networking and Communications (ICNC) | 2014

Reduction of data prevention cost and improvement of reliability in MLC NAND flash storage system

Danghui Wang; Jie Guo; Kai Bu; Yiran Chen

In recent years, multi-level-cell (MLC) NAND flash technologies are prevailingly employed in both enterprise and consumer storage systems due to the advantages on power consumption and fabrication cost. However, short endurance and long write access time of NAND flash pose challenge for system designers. The incurred high bit error prevention cost and unreliable backup power for power failure protection are two acute issues in NAND flash based storage system (NFSS). This paper presents a collection of two contributions: DA-RAID-5 is proposed to reduce cost of data protection from bit error; low cost power failure protection is adopted to improve system robustness in terms of power failure. The experimental results show that, compared to the best prior work, DA-RAID-5 can reduce the NFSS response time by 9.7% on average. The low cost power failure protection scheme can substantially improve the backup power reliability with very marginal performance overheads.

international conference on embedded software and systems | 2004

An efficient verification method for microprocessors based on the virtual machine

Jianfeng An; Xiaoya Fan; Shengbing Zhang; Danghui Wang

This paper presents an efficient verification method for microprocessors based on virtual machine. Under memory and I/O device models provided by the virtual machine, our simulation tool can not only simulate test programs but also operating systems. This simulation environment is close to the real environment of microprocessors, so it is sufficient for functional verification of microprocessors before tape out. At the same time, our simulation tool can automatically compare the simulation results using the virtual machine as reference model and find the error positions. This method takes full advantage of the virtual machine and greatly improves speed and efficiency of the verification procedure. This method has been successfully applied in the verification of an embedded microprocessor Amex86 designed in our laboratory for six months by five persons.

design automation conference | 2016

TEMP: thread batch enabled memory partitioning for GPU

Mengjie Mao; Wujie Wen; Xiaoxiao Liu; Jingtong Hu; Danghui Wang; Yiran Chen; Hai Li

As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of pages into a thread batch and dispatches the entire thread batch to a stream multiprocessor. TEMP separates the memory access streams of different thread batches by OS memory management, preserving the intrinsic locality of thread batches and increasing the memory access parallelism. Experimental results show that TEMP can obtain up to 10.3% performance improvement and 14.6% DRAM energy reduction compared to a state-of-the-art scheduler without any memory-side optimizations.

Explore More