Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jishen Zhao is active.

Publication


Featured researches published by Jishen Zhao.


international symposium on computer architecture | 2016

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Ping Chi; Shuangchen Li; Cong Xu; Tao Zhang; Jishen Zhao; Yongpan Liu; Yu Wang; Yuan Xie

Processing-in-memory (PIM) is a promising solution to address the “memory wall” challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrixvector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360x and the energy consumption by ~895x, across the evaluated machine learning benchmarks.


international symposium on microarchitecture | 2013

Kiln: closing the performance gap between systems with and without persistence support

Jishen Zhao; Sheng Li; Doe Hyun Yoon; Yuan Xie; Norman P. Jouppi

Persistent memory is an emerging technology which allows in-memory persistent data objects to be updated at much higher throughput than when using disks as persistent storage. Previous persistent memory designs use logging or copy-on-write mechanisms to update persistent data, which unfortunately reduces the system performance to roughly half that of a native system with no persistence support. One of the great challenges in this application class is therefore how to efficiently enable atomic, consistent, and durable updates to ensure data persistence that survives application and/or system failures. Our goal is to design a persistent memory system with performance very close to that of a native system. We propose Kiln, a persistent memory design that adopts a nonvolatile cache and a nonvolatile main memory to enable atomic in-place updates without logging or copy-on-write. Our evaluation shows that Kiln can achieve 2× performance improvement compared with NVRAM-based persistent memory with write-ahead logging. In addition, our design has numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and eliminating redundant writes to nonvolatile storage media.


international symposium on microarchitecture | 2014

FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems

Jishen Zhao; Onur Mutlu; Yuan Xie

Byte-addressable nonvolatile memories promise a new technology, persistent memory, which incorporates desirable attributes from both traditional main memory (byte-addressability and fast interface) and traditional storage (data persistence). To support data persistence, a persistent memory system requires sophisticated data duplication and ordering control for write requests. As a result, applications that manipulate persistent memory (persistent applications) have very different memory access characteristics than traditional (non-persistent) applications, as shown in this paper. Persistent applications introduce heavy write traffic to contiguous memory regions at a memory channel, which cannot concurrently service read and write requests, leading to memory bandwidth underutilization due to low bank-level parallelism, frequent write queue drains, and frequent bus turnarounds between reads and writes. These characteristics undermine the high-performance and fairness offered by conventional memory scheduling schemes designed for non-persistent applications. Our goal in this paper is to design a fair and high-performance memory control scheme for a persistent memory based system that runs both persistent and non-persistent applications. Our proposal, FIRM, consists of three key ideas. First, FIRM categorizes request sources as non-intensive, streaming, random and persistent, and forms batches of requests for each source. Second, FIRM strides persistent memory updates across multiple banks, thereby improving bank-level parallelism and hence memory bandwidth utilization of persistent memory accesses. Third, FIRM schedules read and write request batches from different sources in a manner that minimizes bus turnarounds and write queue drains. Our detailed evaluations show that, compared to five previous memory scheduler designs, FIRM provides significantly higher system performance and fairness.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2010

Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs

Xiangyu Dong; Jishen Zhao; Yuan Xie

3-D integration technology is emerging as an attractive alternative to increase the transistor count for future chips. The majority of the existing 3-D integrated circuit (IC) research is focused on the performance, power, density, and heterogeneous integration benefits offered by 3-D integration. All such advantages, however, ultimately have to translate into cost evaluation when a design strategy has to be decided. Consequently, system-level cost analysis at early design stages is imperative to decide on whether 3-D integration should be adopted. This paper presents a cost estimation method for 3-D ICs at early design stages and proposes a set of cost models that include wafer cost, 3-D bonding cost, package cost, and cooling cost. The proposed 3-D IC cost estimation method can help designers analyze the cost implication for 3-D ICs during the design space exploration at the early stage, and it enables a cost-driven 3-D IC design flow that can guide the design choice toward a cost-effective direction. Based on the proposed cost estimation method, this paper demonstrates two case studies that explore the cost benefits of 3-D integration for application-specific integrated circuit designs and many-core microprocessor designs style, respectively. Finally, this paper suggests the optimum partitioning strategy for future 3-D IC designs.


international symposium on microarchitecture | 2015

ThyNVM: enabling software-transparent crash consistency in persistent memory systems

Jinglei Ren; Jishen Zhao; Samira Manabi Khan; Jongmoo Choi; Yongwei Wu; Onur Mutiu

Emerging byte-addressable nonvolatile memories (NVMs) promise persistent memory, which allows processors to directly access persistent data in main memory. Yet, persistent memory systems need to guarantee a consistent memory state in the event of power loss or a system crash (i.e., crash consistency). To guarantee crash consistency, most prior works rely on programmers to (1) partition persistent and transient memory data and (2) use specialized software interfaces when updating persistent memory data. As a result, taking advantage of persistent memory requires significant programmer effort, e.g., to implement new programs as well as modify legacy programs. Use cases and adoption of persistent memory can therefore be largely limited. In this paper, we propose a hardware-assisted DRAM+NVM hybrid persistent memory design, Transparent Hybrid NVM (ThyNVM), which supports software-transparent crash consistency of memory data in a hybrid memory system. To efficiently enforce crash consistency, we design a new dual-scheme checkpointing mechanism, which efficiently overlaps checkpointing time with application execution time. The key novelty is to enable checkpointing of data at multiple granularities, cache block or page granularity, in a coordinated manner. This design is based on our insight that there is a tradeoff between the application stall time due to checkpointing and the hardware storage overhead of the metadata for checkpointing, both of which are dictated by the granularity of checkpointed data. To get the best of the tradeoff, our technique adapts the checkpointing granularity to the write locality characteristics of the data and coordinates the management of multiple-granularity updates. Our evaluation across a variety of applications shows that ThyNVM performs within 4.9% of an idealized DRAM-only system that can provide crash consistency at no cost.


design automation conference | 2016

Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories

Shuangchen Li; Cong Xu; Qiaosha Zou; Jishen Zhao; Yu Lu; Yuan Xie

Processing-in-memory (PIM) provides high bandwidth, massive parallelism, and high energy efficiency by implementing computations in main memory, therefore eliminating the overhead of data movement between CPU and memory. While most of the recent work focused on PIM in DRAM memory with 3D die-stacking technology, we propose to leverage the unique features of emerging non-volatile memory (NVM), such as resistance-based storage and current sensing, to enable efficient PIM design in NVM. We propose Pinatubo1, a Processing In Non-volatile memory ArchiTecture for bUlk Bitwise Operations. Instead of integrating complex logic inside the cost-sensitive memory, Pinatubo redesigns the read circuitry so that it can compute the bitwise logic of two or more memory rows very efficiently, and support one-step multi-row operations. The experimental results on data intensive graph processing and database applications show that Pinatubo achieves a ~500 x speedup, ~28000x energy saving on bitwise operations, and 1.12× overall speedup, 1.11× overall energy saving over the conventional processor.


international symposium on low power electronics and design | 2010

3D-nonFAR: three-dimensional non-volatile FPGA architecture using phase change memory

Yibo Chen; Jishen Zhao; Yuan Xie

Memories play a key role in FGPAs in the forms of both programming bits and embedded memory blocks. FPGAs using non-volatile memories have been the focus of attention with zero boot-up delay, real-time reconfigurability, and superior energy efficiency. This paper presents a novel three-dimensional (3D) non-volatile FPGA architecture (3D-Non-FAR) using phase change memory (PCM) and 3D die stacking techniques. Basic structures in a conventional FPGA architecture are renovated with PCM, and components are repartitioned and reorganized in 3D-NonFAR to allow an efficient 3D integration of PCM elements. 3D-NonFAR not only preserves the advantages of existing non-volatile FP-GAs, but also provides high integration density, high performance, and bit-level programmability, which enable PCM as a universal memory replacement in FPGAs. Evaluation results show that 3D-NonFAR has smaller footprint, higher performance, and lower power consumption compared with other FPGA counterparts.


international conference on computer aided design | 2012

Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration

Jishen Zhao; Yuan Xie

While GPUs are designed to hide memory latency with massive multi-threading, the tremendous demands for memory bandwidth and power consumption constrain the system performance scaling. In this paper, we propose a hybrid graphics memory architecture with different memory technologies (DRAM, STT-RAM, and RRAM), to improve the memory bandwidth and reduce the power consumption. In addition, we present an adaptive data migration mechanism that exploits various memory access patterns of GPGPU applications for further memory power reduction. We evaluate our design with a set of multi-threaded GPU workloads. Compared to traditional GDDR5 memory, our design leads to 16% of GPU system power reduction, and improves the system throughput and energy efficiency by 12% and 33%.


design automation conference | 2010

Cost-aware three-dimensional (3D) many-core multiprocessor design

Jishen Zhao; Xiangyu Dong; Yuan Xie

The emerging three-dimensional integrated circuit (3D IC) is beneficial for various applications from both area and performance perspectives. While the general trend in processor design has been shifting from multi-core to many-core, questions such as whether 3D integration should be adopted, and how to choose among various design options must be addressed at the early design stage. In order to guide the final design towards a cost-effective direction, system-level cost evaluation is one of the most critical issues to be considered. In this paper, we propose a 3D many-core multiprocessor cost model, which includes wafer, bonding, package, and cooling cost analysis. Using the proposed cost model, we evaluate the optimal partitioning strategies for 16−, 32− and 64-core multiprocessors from the cost point of view.


design automation conference | 2015

DimNoC: a dim silicon approach towards power-efficient on-chip network

Jia Zhan; Jin Ouyang; Fen Ge; Jishen Zhao; Yuan Xie

The diminishing momentum of Dennard scaling leads to the ever increasing power density of integrated circuits, and a decreasing portion of transistors on a chip that can be switched on simultaneously-a problem recently discovered and known as dark silicon. There has been innovative work to address the “dark silicon” problem in the fields of power-efficient core and cache system. However, dark silicon challenges with Network-on-Chip (NoC) are largely unexplored. To address this issue, we propose DimNoC, a “dim silicon” approach, which leverages drowsy SRAM and STT-RAM technologies to replace pure SRAM-based NoC buffers. Specifically, we propose two novel hybrid buffer architectures: 1) a Hierarchical Buffer (HB) architecture, which divides the input buffers into a hierarchy of levels with different memory technologies operating at various power states; 2) a Banked Buffer (BB) architecture, which organizes drowsy SRAM and STT-RAM into separate banks in order to hide the long write-latency of STT-RAM. Our experiments show that the proposed DimNoC can achieve 30.9% network energy saving, 20.3% energy-delay product (EDP) reduction, and 7.6% router area decrease compared with the baseline SRAM-based NoC design.

Collaboration


Dive into the Jishen Zhao's collaboration.

Top Co-Authors

Avatar

Yuan Xie

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cong Xu

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jia Zhan

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge