Yiran Chen
Duke University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Yiran Chen.
high-performance computer architecture | 2009
Guangyu Sun; Xiangyu Dong; Yuan Xie; Jian Li; Yiran Chen
Magnetic random access memory (MRAM) is a promising memory technology, which has fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it becomes feasible and cost-efficient to stack MRAM atop conventional chip multiprocessors (CMPs). However, one disadvantage of MRAM is its long write latency and its high write energy. In this paper, we first stackMRAM-based L2 caches directly atop CMPs and compare it against SRAM counterparts in terms of performance and energy. We observe that the direct MRAM stacking might harm the chip performance due to the aforementioned long write latency and high write energy. To solve this problem, we then propose two architectural techniques: read-preemptive write buffer and SRAM-MRAM hybrid L2 cache. The simulation result shows that our optimized MRAM L2 cache improves performance by 4.91% and reduces power by 73.5%compared to the conventional SRAM L2 cache with the similar area.
design automation conference | 2008
Xiangyu Dong; Xiaoxia Wu; Guangyu Sun; Yuan Xie; Hai Helen Li; Yiran Chen
Magnetic random access memory (MRAM) has been considered as a promising memory technology due to many attractive properties. Integrating MRAM with CMOS logic may incur extra manufacture cost, due to its hybrid magnetic-CMOS fabrication process. Stacking MRAM on top of CMOS logics using 3D integration is a way to minimize this cost overhead. In this paper, we discuss the circuit design issues for MRAM, and present the MRAM cache model. Based on the model, we compare MRAM against SRAM and DRAM in terms of area, performance, and energy. Finally we conduct architectural evaluation for 3D microprocessor stacking with MRAM. The experimental results show that MRAM stacking offers competitive IPC performance with a large reduction in power consumption compared to SRAM and DRAM counterparts.
IEEE Electron Device Letters | 2009
Xiaobin Wang; Yiran Chen; Haiwen Xi; Hai Li; Dimitar V. Dimitrov
Existence of spintronic memristor in nanoscale is demonstrated based upon spin-torque-induced magnetization switching and magnetic-domain-wall motion. Our examples show that memristive effects are quite universal for spin-torque spintronic device at the time scale that explicitly involves the interactions between magnetization dynamics and electronic charge transport. We also proved that the spintronic device can be designed to explore and memorize the continuum state of current and voltage based on interactions of electron and spin transport.
international conference on hardware/software codesign and system synthesis | 2011
Chun Jason Xue; Guangyu Sun; Youtao Zhang; Jianhua Yang; Yiran Chen; Hai Li
In recent years, non-volatile memory (NVM) technologies have emerged as candidates for future universal memory. N-VMs generally have advantages such as low leakage power, high density, and fast read spead. At the same time, NVM-s also have disadvantages. For example, NVMs often have asymetric read and write speed and energy cost, which poses new challenges when applying NVMs. This paper contains a collection of four contributions, presenting basic introduction on three emerging NVM technologies, their unique characteristics, potential challenges, and new opportunities that they may bring forward in memory systems.
Neural Networks | 2013
Shiping Wen; Gang Bao; Zhigang Zeng; Yiran Chen; Tingwen Huang
This paper deals with the problem of global exponential synchronization of a class of memristor-based recurrent neural networks with time-varying delays based on the fuzzy theory and Lyapunov method. First, a memristor-based recurrent neural network is designed. Then, considering the state-dependent properties of the memristor, a new fuzzy model employing parallel distributed compensation (PDC) gives a new way to analyze the complicated memristor-based neural networks with only two subsystems. Comparisons between results in this paper and in the previous ones have been made. They show that the results in this paper improve and generalized the results derived in the previous literature. An example is also given to illustrate the effectiveness of the results.
Neural Networks | 2015
Shiping Wen; Tingwen Huang; Zhigang Zeng; Yiran Chen; Peng Li
This paper addresses the problem of circuit design and global exponential stabilization of memristive neural networks with time-varying delays and general activation functions. Based on the Lyapunov-Krasovskii functional method and free weighting matrix technique, a delay-dependent criteria for the global exponential stability and stabilization of memristive neural networks are derived in form of linear matrix inequalities (LMIs). Two numerical examples are elaborated to illustrate the characteristics of the results. It is noteworthy that the traditional assumptions on the boundness of the derivative of the time-varying delays are removed.
IEEE Transactions on Very Large Scale Integration Systems | 2011
Wei Xu; Hongbin Sun; Xiaobin Wang; Yiran Chen; Tong Zhang
Because of its high storage density with superior scalability, low integration cost and reasonably high access speed, spin-torque transfer random access memory (STT RAM) appears to have a promising potential to replace SRAM as last-level on-chip cache (e.g., L2 or L3 cache) for microprocessors. Due to unique operational characteristics of its storage device magnetic tunneling junction (MTJ), STT RAM is inherently subject to a write latency versus read latency tradeoff that is determined by the memory cell size. This paper first quantitatively studies how different memory cell sizing may impact the overall computing system performance, and shows that different computing workloads may have conflicting expectations on memory cell sizing. Leveraging MTJ device switching characteristics, we further propose an STT RAM architecture design method that can make STT RAM cache with relatively small memory cell size perform well over a wide spectrum of computing benchmarks. This has been well demonstrated using CACTI-based memory modeling and computing system performance simulations using SimpleScalar. Moreover, we show that this design method can also reduce STT RAM cache energy consumption by up to 30% over a variety of benchmarks.
high-performance computer architecture | 2010
Guangyu Sun; Yongsoo Joo; Yibo Chen; Dimin Niu; Yuan Xie; Yiran Chen; Hai Li
In recent years, many systems have employed NAND flash memory as storage devices because of its advantages of higher performance (compared to the traditional hard disk drive), high-density, random-access, increasing capacity, and falling cost. On the other hand, the performance of NAND flash memory is limited by its “erase-before-write” requirement. Log-based structures have been used to alleviate this problem by writing updated data to the clean space. Prior log-based methods, however, cannot avoid excessive erase operations when there are frequent updates, which quickly consume free pages, especially when some data are updated repeatedly. In this paper, we propose a hybrid architecture for the NAND flash memory storage, of which the log region is implemented using phase change random access memory (PRAM). Compared to traditional log-based architectures, it has the following advantages: (1) the PRAM log region allows in-place updating so that it significantly improves the usage efficiency of log pages by eliminating out-of-date log records; (2) it greatly reduces the traffic of reading from the NAND flash memory storage since the size of logs loaded for the read operation is decreased; (3) the energy consumption of the storage system is reduced as the overhead of writing and reading log data is decreased with the PRAM log region; (4) the lifetime of NAND flash memory is increased because the number of erase operations are reduced. To facilitate the PRAM log region, we propose several management policies. The simulation results show that our proposed methods can substantially improve the performance, energy consumption, and lifetime of the NAND flash memory storage1.
high-performance computer architecture | 2003
Hai Li; Swarup Bhunia; Yiran Chen; T. N. Vijaykumar; Kaushik Roy
With the scaling of technology and the need for higher performance and more functionality, power dissipation is becoming a major bottleneck for microprocessor designs. Pipeline balancing (PLB), a previous technique, is essentially a methodology to clock-gate unused components whenever a programs instruction-level parallelism is predicted to be low. However, no nonpredictive methodologies are available in the literature for efficient clock gating. This paper introduces deterministic clock gating (DCG) based on the key observation that for many of the stages in a modern pipeline, a circuit blocks usage in a specific cycle in the near future is deterministically known a few cycles ahead of time. Our experiments show an average of 19.9% reduction in processor power with virtually no performance loss for an 8-issue, out-of-order superscalar processor by applying DCG to execution units, pipeline latches, D-Cache wordline decoders, and result bus drivers. In contrast, PLB achieves 9.9% average power savings at 2.9% performance loss.
IEEE Transactions on Neural Networks | 2014
Miao Hu; Hai Li; Yiran Chen; Qing Wu; Garrett S. Rose; Richard W. Linderman
By mimicking the highly parallel biological systems, neuromorphic hardware provides the capability of information processing within a compact and energy-efficient platform. However, traditional Von Neumann architecture and the limited signal connections have severely constrained the scalability and performance of such hardware implementations. Recently, many research efforts have been investigated in utilizing the latest discovered memristors in neuromorphic systems due to the similarity of memristors to biological synapses. In this paper, we explore the potential of a memristor crossbar array that functions as an autoassociative memory and apply it to brain-state-in-a-box (BSB) neural networks. Especially, the recall and training functions of a multianswer character recognition process based on the BSB model are studied. The robustness of the BSB circuit is analyzed and evaluated based on extensive Monte Carlo simulations, considering input defects, process variations, and electrical fluctuations. The results show that the hardware-based training scheme proposed in the paper can alleviate and even cancel out the majority of the noise issue.
