Wei Yang
Huawei
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wei Yang.
IEEE Transactions on Nanotechnology | 2015
Yuhao Wang; Hao Yu; Leibin Ni; Guang-Bin Huang; Mei Yan; Chuliang Weng; Wei Yang; Junfeng Zhao
The data-oriented applications have introduced increased demands on memory capacity and bandwidth, which raises the need to rethink the architecture of the current computing platforms. The logic-in-memory architecture is highly promising as future logic-memory integration paradigm for high throughput data-driven applications. From memory technology aspect, as one recently introduced nonvolatile memory device, domain-wall nanowire (or race-track) not only shows potential as future power efficient memory, but also computing capacity by its unique physics of spintronics. This paper explores a novel distributed in-memory computing architecture where most logic functions are executed within the memory, which significantly alleviates the bandwidth congestion issue and improves the energy efficiency. The proposed distributed in-memory computing architecture is purely built by domain-wall nanowire, i.e., both memory and logic are implemented by domain-wall nanowire devices. As a case study, neural network-based image resolution enhancement algorithm, called DW-NN, is examined within the proposed architecture. We show that all operations involved in machine learning on neural network can be mapped to a logic-in-memory architecture by nonvolatile domain-wall nanowire. Domain-wall nanowire-based logic is customized for in machine learning within image data storage. As such, both neural network training and processing can be performed locally within the memory. The experimental results show that the domain-wall memory can reduce 92% leakage power and 16% dynamic power compared to main memory implemented by DRAM; and domain-wall logic can reduce 31% both dynamic and 65% leakage power under the similar performance compared to CMOS transistor-based logic. And system throughput in DW-NN is improved by 11.6x and the energy efficiency is improved by 56x when compared to conventional image processing system.
asia and south pacific design automation conference | 2016
Leibin Ni; Yuhao Wang; Hao Yu; Wei Yang; Chuliang Weng; Junfeng Zhao
Emerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic logic for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-based computing device is mainly assumed on a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional AD- conversion and I/O overhead. This paper explores the data analytics accelerator on binary RRAM-crossbar. Accordingly, one distributed in-memory computing architecture is proposed with design of according component and control protocol. Both memory array and logic accelerator can be implemented by RRAM-crossbar purely in binary, where logic-memory pairs can be distributed with protocol of control bus. Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.
international symposium on low power electronics and design | 2015
Yuan Liang; Hao Yu; Junfeng Zhao; Wei Yang; Yuangang Wang
Free-space EM-wave based GHz interconnect has significant loss and crosstalk that cannot be deployed as low-power and dense I/Os for future network-on-chip (NoC) integration of many-core and memory. This paper proposes an energy-efficient and low-crosstalk sub-THz (0.1T-1T) I/O with use of surface-wave based modulator and interconnects in CMOS. By introducing sub-wavelength periodical corrugation structure onto transmission line, the surface-wave is established to propagate signal that is strongly localized on surface of top-layer metal wire, which results in low coupling into lossy substrate and neighboring metal wires. As such, significant power saving and cross-talk reduction can be observed with high communication bandwidth. In addition, a high on/off-ratio surface-wave modulator is also proposed to support on-chip THz communication. As designed in 65nm CMOS, the results have shown that the proposed surface-wave I/O interface achieves 25Gbps data rate and 0.016pJ/bit/mm energy efficiency at 140GHz carrier frequency over 20mm surface-wave channels. They can be placed with 2.4μm channel spacing and a -20dB crosstalk ratio. The surface-wave modulator also achieves significant reduction of radiation loss with 23dB extinction ratio.
international symposium on low power electronics and design | 2015
Yuhao Wang; Xin Li; Hao Yu; Leibin Ni; Wei Yang; Chuliang Weng; Junfeng Zhao
The emerging resistive random-access-memory (RRAM) crossbar provides an intrinsic fabric for matrix-vector multiplication, which can be leveraged as power efficient linear embedding hardware for data analytics such as compressive sensing. As the matrix elements are represented by resistance of RRAM cells, it imposes constraints for the embedding matrix due to limited RRAM programming resolution. A random Boolean embedding can be efficiently mapped to the RRAM crossbar but suffers from poor performance. Learning-based embedding matrices can deliver optimized performance but are continuous-valued which prevents it from being mapped to RRAM crossbar structure directly. In this paper, we have proposed one algorithm that can find an optimal Boolean embedding matrix for a given learned real-valued embedding matrix, so that it can be effectively mapped to the RRAM crossbar structure while high performance is preserved. The numerical experiments demonstrate that the proposed optimized Boolean embedding can reduce the embedding distortion by 2.7x, and image recovery error by 2.5x compared to the random Boolean embedding, both mapped on RRAM crossbar. In addition, optimized Boolean embedding on RRAM crossbar exhibits 10x faster speed, 17x better energy efficiency, and three orders of magnitude smaller area with slight accuracy penalty, when compared to the optimized real-valued embedding on CMOS ASIC platform.
ieee mtt s international microwave workshop series on advanced materials and processes for rf and thz applications | 2015
Yuan Liang; Hao Yu; Chang Yang; Nan Li; Xiuping Li; Xiong Liu; Junfeng Zhao; Wei Yang; Yuangang Wang
Two novel metamaterial devices including Split Ring Resonator (SRR) modulator and Surface Plasmon Polariton (SPP) interconnect (including SPP T-line and coupler) are proposed with CMOS on-chip integration operated at 140GHz. By introducing sub-wavelength periodical corrugation structure onto T-line, SPP is established to propagate signals with strongly localized surface wave, which results in low crosstalk between two back-to-back placed SPP T-lines. Moreover, by stacking two SRR unit-cells with opposite placement, the SRR based modulator manifests itself as a magnetic metamaterial achieving significant reduction of radiation loss with 23dB extinction ratio at sub-THz. As explored in 65nm CMOS, the proposed surface-wave interconnects and SRR modulator have shown great potential for future sub-THz wireline communication in CMOS.
design, automation, and test in europe | 2015
Yuhao Wang; Hantao Huang; Leibin Ni; Hao Yu; Mei Yan; Chuliang Weng; Wei Yang; Junfeng Zhao
Data analytics such as face recognition involves large volume of image data, and hence leads to grand challenge on mobile platform design with strict power requirement. Emerging non-volatile STT-MRAM has the minimum leakage power and comparable speed to SRAM, and hence is considered as a promising candidate for data-oriented mobile computing. However, there exists significantly higher write-energy for STT-MRAM when compared to the SRAM. Based on the use of STT-MRAM, this paper introduces an energy-efficient non-volatile in-memory accelerator for a sparse-representation based face recognition algorithm. We find that by projecting high-dimension image data to much lower dimension, the current scaling for STT-MRAM write operation can be applied aggressively, which leads to significant power reduction yet maintains quality-of-service for face recognition. Specifically, compared to a baseline with SRAM, leakage power and dynamic power are reduced by 91.4% and 79% respectively with only slight compromise on recognition rate.
international microwave symposium | 2015
Yuan Liang; Nan Li; Fei Wei; Hao Yu; Xiuping Li; Junfeng Zhao; Wei Yang; Yuangang Wang
A digitally-assisted CMOS 60GHz PA is reported with high output power and improved power efficiency during power back-off. To combine large number of CMOS power transistors within compact area, a 2D distributed in-phase power combiner is utilized. Moreover, digitally-assisted self-tuning biasing is introduced for power back-off efficiency improvement, where DC power is reduced along with output power. One digitally-assisted 4-way power-combined PA prototype was implemented in 65nm CMOS process with measured output power of 17.2dBm, PAE of 11.3%, and up to 170~190% efficiency improvement during power back-off for the entire 7GHz band at 60GHz.
Archive | 2016
Yinyin Lin; Yarong Fu; Kai Yang; Wei Yang; Yuangang Wang; Junfeng Zhao
Archive | 2017
Kai Yang; Junfeng Zhao; Yuangang Wang; Wei Yang; Yinyin Lin; Yarong Fu
Archive | 2017
Yarong Fu; Junfeng Zhao; Yuangang Wang; Wei Yang; Yinyin Lin; Kai Yang