Zihan Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zihan Xu is active.

Explore More

Publication

Featured researches published by Zihan Xu.

design, automation, and test in europe | 2015

Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip

Pai-Yu Chen; Deepak Kadetotad; Zihan Xu; Abinash Mohanty; Binbin Lin; Jieping Ye; Sarma B. K. Vrudhula; Jae-sun Seo; Yu Cao; Shimeng Yu

Technology-design co-optimization methodologies of the resistive cross-point array are proposed for implementing the machine learning algorithms on a chip. A novel read and write scheme is designed to accelerate the training process, which realizes fully parallel operations of the weighted sum and the weight update. Furthermore, technology and design parameters of the resistive cross-point array are co-optimized to enhance the learning accuracy, latency and energy consumption, etc. In contrast to the conventional memory design, a set of reverse scaling rules is proposed on the resistive cross-point array to achieve high learning accuracy. These include 1) larger wire width to reduce the IR drop on interconnects thereby increasing the learning accuracy; 2) use of multiple cells for each weight element to alleviate the impact of the device variations, at an affordable expense of area, energy and latency. The optimized resistive cross-point array with peripheral circuitry is implemented at the 65 nm node. Its performance is benchmarked for handwritten digit recognition on the MNIST database using gradient-based sparse coding. Compared to state-of-the-art software approach running on CPU, it achieves >103 speed-up and >106 energy efficiency improvement, enabling real-time image feature extraction and learning.

IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2015

Parallel Architecture With Resistive Crosspoint Array for Dictionary Learning Acceleration

Deepak Kadetotad; Zihan Xu; Abinash Mohanty; Pai-Yu Chen; Binbin Lin; Jieping Ye; Sarma B. K. Vrudhula; Shimeng Yu; Yu Cao; Jae-sun Seo

This paper proposes a parallel architecture with resistive crosspoint array. The design of its two essential operations, read and write, is inspired by the biophysical behavior of a neural system, such as integrate-and-fire and local synapse weight update. The proposed hardware consists of an array with resistive random access memory (RRAM) and CMOS peripheral circuits, which perform matrix-vector multiplication and dictionary update in a fully parallel fashion, at the speed that is independent of the matrix dimension. The read and write circuits are implemented in 65 nm CMOS technology and verified together with an array of RRAM device model built from experimental data. The overall system exploits array-level parallelism and is demonstrated for accelerated dictionary learning tasks. As compared to software implementation running on a 8-core CPU, the proposed hardware achieves more than 3000 × speedup, enabling high-speed feature extraction on a single chip.

IEEE Transactions on Nanotechnology | 2015

On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

Jae-sun Seo; Binbin Lin; Minkyu Kim; Pai Yu Chen; Deepak Kadetotad; Zihan Xu; Abinash Mohanty; Sarma B. K. Vrudhula; Shimeng Yu; Jieping Ye; Yu Cao

Many recent advances in sparse coding led its wide adoption in signal processing, pattern classification, and object recognition applications. Even with improved performance in state-of-the-art algorithms and the hardware platform of CPUs/GPUs, solving a sparse coding problem still requires expensive computations, making real-time large-scale learning a very challenging problem. In this paper, we cooptimize algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding. The principle of hardware acceleration is to recognize the properties of learning algorithms, which involve many parallel operations of data fetch and matrix/vector multiplication/addition. Todays von Neumann architecture, however, is not suitable for such parallelization, due to the separation of memory and the computing unit that makes sequential operations inevitable. Such principle drives both the selection of algorithms and the design evolution from CPU to CMOS application-specific integrated circuits (ASIC) to parallel architecture with resistive crosspoint array (PARCA) that we propose. The CMOS ASIC scheme implements sparse coding with SRAM dictionaries and all-digital circuits, and PARCA employs resistive-RAM dictionaries with special read and write circuits. We show that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140×, respectively, compared to software running on a eight-core CPU. Simulated power for both hardware schemes lie in the milli-Watt range, making it viable for portable single-chip learning applications.

biomedical circuits and systems conference | 2014

Neurophysics-inspired parallel architecture with resistive crosspoint array for dictionary learning

Deepak Kadetotad; Zihan Xu; Abinash Mohanty; Pai-Yu Chen; Binbin Lin; Jieping Ye; Sarma B. K. Vrudhula; Shimeng Yu; Yu Cao; Jae-sun Seo

This paper proposes a parallel architecture with resistive crosspoint array. The design of its two essential operations, Read and Write, is inspired by the biophysical behavior of a neural system, such as integrate-and-fire and time-dependent synaptic plasticity. The proposed hardware consists of an array with resistive random access memory (RRAM) and CMOS peripheral circuits, which perform matrix product and dictionary update in a fully parallel fashion, at the speed that is independent of the matrix dimension. The entire system is implemented in 65nm CMOS technology with RRAM to realize high-speed unsupervised dictionary learning. As compared to state-of-the-art software approach, it achieves more than 3000X speedup, enabling real-time feature extraction on a single chip.

international conference on computer design | 2012

Hierarchical modeling of Phase Change memory for reliable design

Zihan Xu; Ketul B. Sutaria; Chengen Yang; Chaitali Chakrabarti; Yu Cao

As CMOS based memory devices near their end, memory technologies, such as Phase Change Random Access Memory (PRAM), have emerged as viable alternatives. This work develops a hierarchical modeling framework that connects the unique device physics of PRAM with its circuit and state transition properties. Such an approach enables design exploration at various levels in order to optimize the performance and yield. By providing a complete set of compact models, it supports SPICE simulation of PRAM in the presence of process variations and temporal degradation. Furthermore, this work proposes a new metric, State Transition Curve (STC) that supports the assessment of other performance metrics (e.g., power, speed, yield, etc.), helping gain valuable insights on PRAM reliability.

european solid state device research conference | 2013

Compact modeling of STT-MTJ for SPICE simulation

Zihan Xu; Ketul B. Sutaria; Chengen Yang; Chaitali Chakrabarti; Yu Cao

STT-MTJ is a promising device for future high-density and low-power integrated systems. To enable design exploration of STT-MTJ, this paper presents a fully compact model for efficient SPICE simulation. Derived from the fundamental LLG equation, the new model consists of RC elements that are closed-form solutions of device geometry and material properties. They support transient SPICE simulations, providing necessary details beyond the macromodel. The accuracy is validated with numerical results and published data.

Fundamenta Informaticae | 2014

The Stochastic Loss of Spikes in Spiking Neural P Systems: Design and Implementation of Reliable Arithmetic Circuits

Zihan Xu; Matteo Cavaliere; Pei An; Sarma B. K. Vrudhula; Yu Cao

Spiking neural P systems (in short, SN P systems) have been introduced as computing devices inspired by the structure and functioning of neural cells. The presence of unreliable components in SN P systems can be considered in many different aspects. In this paper we focus on two types of unreliability: the stochastic delays of the spiking rules and the stochastic loss of spikes. We propose the implementation of elementary SN P systems with DRAM-based CMOS circuits that are able to cope with these two forms of unreliability in an efficient way. The constructed bio-inspired circuits can be used to encode basic arithmetic modules.

ieee international nanoelectronics conference | 2016

Hardware-efficient learning with feedforward inhibition

Zihan Xu; Pai-Yu Chen; Jae-sun Seo; Shimeng Yu; Yu Cao

On-chip learning and classification have a broad impact on many applications. Yet their hardware implementation is still limited by the scale of computation, as well as practical issues of device fabrication, variability and reliability. Inspired by micro neural-circuits in the cortical system, this work develops a novel solution that efficiently reduces the network size and improves the learning accuracy. The building block is the motif of feedforward inhibition that effectively separates main features and the residual in sparse feature extraction. Other learning rules follow the spike-rate-dependent-plasticity (SRDP). As demonstrated in handwriting recognition, such a bio-plausible solution is able to achieve >95% accuracy, comparable to the sparse coding algorithms; in addition, SRDP, instead of gradient based back propagation, is able to save the computation time by >50X. The utilization of the inhibition motif reduces the network size by >3X at the same accuracy, illustrating its potential in hardware efficiency.

signal processing systems | 2014

Low cost ECC schemes for improving the reliability of DRAM+PRAMMAIN memory systems

Manqing Mao; Chengen Yang; Zihan Xu; Yu Cao; Chaitali Chakrabarti

Hybrid memory, where the DRAM acts as a buffer to the PRAM, is a promising configuration for main memory systems. It has the advantages of fast access time, high storage density and very low standby power. However, it also has reliability issues that need to be addressed. This paper focuses on low cost Error Control Coding (ECC)-based schemes for improving the reliability of hybrid memory. We propose three candidate systems that all guarantee block failure rate of 10-8 but differ in whether the DRAM and/or PRAM data get coded and the strength of the corresponding ECC code. The candidate systems are evaluated with respect to lifetime, Instruction Per Cycle (IPC) and energy. We show that (1) at lower Data Storage Time (DST), the proposed system which has different ECC schemes for DRAM and PRAM has the longest lifetime and one of the highest IPC; (2) at higher DST, stronger ECC codes are necessary for all the systems and longer lifetime can be achieved at the cost of decrease in IPC.

Solid-state Electronics | 2014