Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hai Helen Li is active.

Publication


Featured researches published by Hai Helen Li.


design automation conference | 2008

Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement

Xiangyu Dong; Xiaoxia Wu; Guangyu Sun; Yuan Xie; Hai Helen Li; Yiran Chen

Magnetic random access memory (MRAM) has been considered as a promising memory technology due to many attractive properties. Integrating MRAM with CMOS logic may incur extra manufacture cost, due to its hybrid magnetic-CMOS fabrication process. Stacking MRAM on top of CMOS logics using 3D integration is a way to minimize this cost overhead. In this paper, we discuss the circuit design issues for MRAM, and present the MRAM cache model. Based on the model, we compare MRAM against SRAM and DRAM in terms of area, performance, and energy. Finally we conduct architectural evaluation for 3D microprocessor stacking with MRAM. The experimental results show that MRAM stacking offers competitive IPC performance with a large reduction in power consumption compared to SRAM and DRAM counterparts.


international symposium on microarchitecture | 2011

Multi retention level STT-RAM cache designs with a dynamic refresh scheme

Zhenyu Sun; Xiuyuan Bi; Hai Helen Li; Weng-Fai Wong; Zhong-Liang Ong; Xiaochun Zhu; Wenqing Wu

Spin-transfer torque random access memory (STT-RAM) has received increasing attention because of its attractive features: good scalability, zero standby power, non-volatility and radiation hardness. The use of STT-RAM technology in the last level on-chip caches has been proposed as it minimizes cache leakage power with technology scaling down. Furthermore, the cell area of STT-RAM is only 1/9 ∼ 1/3 that of SRAM. This allows for a much larger cache with the same die footprint, improving overall system performance through reducing cache misses. However, deploying STT-RAM technology in L1 caches is challenging because of the long and power-consuming write operations. In this paper, we propose both L1 and lower level cache designs that use STT-RAM. In particular, our designs use STTRAM cells with various data retention time and write performances, made possible by different magnetic tunneling junction (MTJ) designs. For the fast STT-RAM bits with reduced data retention time, a counter controlled dynamic refresh scheme is proposed to maintain the data validity. Our dynamic scheme saves more than 80% refresh energy compared to the simple refresh scheme proposed in previous works. A L1 cache built with ultra low retention STTRAM coupled with our proposed dynamic refresh scheme can achieve 9.2% in performance improvement, and saves up to 30% of the total energy when compared to one that uses traditional SRAM. For lower level caches with relative large cache capacity, we propose a data migration scheme that moves data between portions of the cache with different retention characteristics so as to maximize the performance and power benefits. Our experiments show that on the average, our proposed multi retention level STT-RAM cache reduces 30 ∼ 70% of the total energy compared to previous works, while improving IPC performance for both 2-level and 3-level cache hierarchy.


design automation conference | 2013

Cross-layer racetrack memory design for ultra high density and low power consumption

Zhenyu Sun; Wenqing Wu; Hai Helen Li

The racetrack memory technology utilizes magnetic domains along a nanoscopic wire to obtain ultra-high data storage density. The recent success in the planar racetrack nanowire promised its fabrication feasibility and future scalability, bringing more design challenges and opportunities. In this paper, we initialize the optimization of racetrack memory embracing design considerations across multiple layers, including cell design, array structure, architecture organization, and data management. Our evaluation shows that racetrack memory based cache can achieve 6.4× area reduction, 25% performance enhancement, and 62% energy saving, compared to STT-RAM cache design. The benefit over SRAM technology is even more significant.


design automation conference | 2014

Exploration of GPGPU Register File Architecture Using Domain-wall-shift-write based Racetrack Memory

Mengjie Mao; Wujie Wen; Yaojun Zhang; Yiran Chen; Hai Helen Li

SRAM based register file (RF) is one of the major factors limiting the scaling of GPGPU. In this work, we propose to use the emerging nonvolatile domain-wall-shift-write based race-track memory (DWSW-RM) to implement a power-efficient GPGPU RF, of which the power consumption is substantially reduced. A holistic technology set is developed to minimize the high access cost of DWSW-RW caused by the sequential access mechanism. Experiment results show that our proposed techniques can improve the GPGPU performance by 4.6% compared to the baseline with SRAM based RF. The RF energy efficiency is also significantly improved by 2.45×.


design automation conference | 2012

Statistical memristor modeling and case study in neuromorphic computing

Robinson E. Pino; Hai Helen Li; Yiran Chen; Miao Hu; Beiye Liu

Memristor, the fourth passive circuit element, has attracted increased attention since it was rediscovered by HP Lab in 2008. Its distinctive characteristic to record the historic profile of the voltage/current creates a great potential for future neuromorphic computing system design. However, at the nano-scale, process variation control in the manufacturing of memristor devices is very difficult. The impact of process variations on a memristive system that relies on the continuous (analog) states of the memristors could be significant. We use TiO2-based memristor as an example to analyze the impact of geometry variations on the electrical properties. A simple algorithm was proposed to generate a large volume of geometry variation-aware three-dimensional device structures for Monte-Carlo simulations. A neuromorphic computing system based on memristor-based bidirectional synapse design is proposed as case study. We analyze and evaluate the robustness of the proposed system in pattern recognition based on massive Monte-Carlo simulations, after considering input defects and process variations.


asia and south pacific design automation conference | 2014

A coherent hybrid SRAM and STT-RAM L1 cache architecture for shared memory multicores

Jianxing Wang; Yenni Tim; Weng-Fai Wong; Zhong-Liang Ong; Zhenyu Sun; Hai Helen Li

STT-RAM is an emerging NVRAM technology that promises high density, low energy and a comparable access speed to conventional SRAM. This paper proposes a hybrid L1 cache architecture that incorporates both SRAM and STT-RAM. The key novelty of the proposal is the exploition of the MESI cache coherence protocol to perform dynamic block reallocation between different cache partitions. Compared to the pure SRAM-based design, our hybrid scheme achieves 38% of energy saving with a mere 0.8% IPC degradation while extending the lifespan of STT-RAM partition at the same time.


great lakes symposium on vlsi | 2013

Coordinating prefetching and STT-RAM based last-level cache management for multicore systems

Mengjie Mao; Hai Helen Li; Yiran Chen

Data prefetching is a common mechanism to mitigate the bottleneck of off-chip memory bandwidth in modern computing systems. Unfortunately, the side effects of prefetching are an additional burden on off-chip communication and increased cache write operations. With the proposal of spin-transfer torque random access memory (STT-RAM) based last-level caches (LLCs) for their high density and low power consumption, the increase of write pressure to the cache from prefetching coupled with the characteristically long write access compared with traditional SRAM caches exacerbates the performance cost of prefetching schemes. In this work, we propose two orthogonal techniques to reduce the negative performance impact induced by aggressive prefetching on multicore systems employing STT-RAM based LLC. First, basic priority assignment prioritizes the different types of access requests of LLC by their criticality and responds to them based on priority. Second, priority boosting differentiates requests by application and prioritizes the relatively few requests from applications with non-intensive accesses to the LLC, which usually creates the most severe performance degradation in multi-core systems. Combining these two prioritization policies can alleviate the negative effect induced by aggressive prefetching. Our results show that these techniques can achieve an 8.3 average application speedup compared to a baseline, prefetch only design without prioritization.


IEEE Transactions on Computers | 2016

Array Organization and Data Management Exploration in Racetrack Memory

Zhenyu Sun; Xiuyuan Bi; Wenqing Wu; Sungjoo Yoo; Hai Helen Li

As the descendant of spin-transfer random access memory (STT-RAM), racetrack memory technology saves data in magnetic domains along nanoscopic wires. Such a unique structure can achieve unprecedentedly high storage density meanwhile inheriting the promising features of STT-RAM, such as fast access speed, non-volatility, zero standby power, hardness to soft errors, and compatibility with CMOS technology. Moreover, the recent success in planar racetrack nanowire promised its fabrication feasibility and continuous scalability. In this paper, we investigate the design and optimization of racetrack memory as last-level cache by embracing design considerations across multiple abstraction layers, including the cell design, the array structure, the architecture organization, and the data management. The cross-layer optimization makes racetrack memory based last-level cache achieve 6.4 × reduction in area, 25 percent enhancement in system performance, and 62 percent saving in energy consumption, compared to STT-RAM cache design. Its benefit over SRAM technology is even more significant.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2017

A Compact Memristor-Based Dynamic Synapse for Spiking Neural Networks

Miao Hu; Yiran Chen; Jianhua Yang; Yu Wang; Hai Helen Li

Recent advances in memristor technology lead to the feasibility of large-scale neuromorphic systems by leveraging the similarity between memristor devices and synapses. For instance, memristor cross-point arrays can realize dense synapse network among hundreds of neuron circuits, which is not affordable for traditional implementations. However, little progress was made in synapse designs that support both static and dynamic synaptic properties. In addition, many neuron circuits require signals in specific pulse shape, limiting the scale of system implementation. Last but not least, a bottom-up study starting from realistic memristor devices is still missing in the current research of memristor-based neuromorphic systems. Here, we propose a memristor-based dynamic (MD) synapse design with experiment-calibrated memristor models. The structure obtains both static and dynamic synaptic properties by using one memristor for weight storage and the other as a selector. We overcame the device nonlinearities and demonstrated spike-timing-based recall, weight tunability, and spike-timing-based learning functions on MD synapse. Furthermore, a temporal pattern learning application was investigated to evaluate the use of MD synapses in spiking neural networks, under both spike-timing-dependent plasticity and remote supervised method learning rules.


custom integrated circuits conference | 2011

A 1.0V 45nm nonvolatile magnetic latch design and its robustness analysis

Peiyuan Wang; Xiang Chen; Yiran Chen; Hai Helen Li; Seung H. Kang; Xiaochun Zhu; Wenqing Wu

A new nonvolatile latch design is proposed based on the magnetic tunneling junction (MTJ) devices. In the standby mode, the latched data can be retained in the MTJs without consuming any power. Two types of operation errors, namely, persistent and non-persistent errors, are quantitatively analyzed by including the process variations and thermal fluctuations during the read and write operations. A design at 45nm technology node is used as the example to discuss the design tradeoffs.

Collaboration


Dive into the Hai Helen Li's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiuyuan Bi

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chenchen Liu

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mengjie Mao

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar

Sicheng Li

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge