Hongzhong Zheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongzhong Zheng is active.

Explore More

Publication

Featured researches published by Hongzhong Zheng.

IEEE Computer Architecture Letters | 2017

LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory

Amirali Boroumand; Saugata Ghose; Minesh Patel; Hasan Hassan; Brandon Lucia; Kevin Hsieh; Krishna T. Malladi; Hongzhong Zheng; Onur Mutlu

Processing-in-memory (PIM) architectures cannot use traditional approaches to cache coherence due to the high off-chip traffic consumed by coherence messages. We propose LazyPIM, a new hardware cache coherence mechanism designed specifically for PIM. LazyPIM uses a combination of speculative cache coherence and compressed coherence signatures to greatly reduce the overhead of keeping PIM coherent with the processor. We find that LazyPIM improves average performance across a range of PIM applications by 49.1 percent over the best prior approach, coming within 5.5 percent of an ideal PIM mechanism.

international symposium on microarchitecture | 2017

DRISA: a DRAM-based Reconfigurable In-Situ Accelerator

Shuangchen Li; Dimin Niu; Krishna T. Malladi; Hongzhong Zheng; Bob Brennan; Yuan Xie

Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.CCS CONCEPTS• Hardware → Dynamic memory; • Computer systems organization → reconfigurable computing; Neural networks;

international symposium on computer architecture | 2016

DRAF: a low-power DRAM-based reconfigurable acceleration fabric

Mingyu Gao; Christina Delimitrou; Dimin Niu; Krishna T. Malladi; Hongzhong Zheng; Bob Brennan; Christos Kozyrakis

FPGAs are a popular target for application-specific accelerators because they lead to a good balance between flexibility and energy efficiency. However, FPGA lookup tables introduce significant area and power overheads, making it difficult to use FPGA devices in environments with tight cost and power constraints. This is the case for datacenter servers, where a modestly-sized FPGA cannot accommodate the large number of diverse accelerators that datacenter applications need. This paper introduces DRAF, an architecture for bit-level reconfigurable logic that uses DRAM subarrays to implement dense lookup tables. DRAF overlaps DRAM operations like bitline precharge and charge restoration with routing within the reconfigurable routing fabric to minimize the impact of DRAM latency. It also supports multiple configuration contexts that can be used to quickly switch between different accelerators with minimal latency. Overall, DRAF trades off some of the performance of FPGAs for significant gains in area and power. DRAF improves area density by 10x over FPGAs and power consumption by more than 3x, enabling DRAF to satisfy demanding applications within strict power and cost constraints. While accelerators mapped to DRAF are 2-3x slower than those in FPGAs, they still deliver a 13x speedup and an 11x reduction in power consumption over a Xeon core for a wide range of datacenter tasks, including analytics and interactive services like speech recognition.

acm international conference on systems and storage | 2016

Software-Defined Emulation Infrastructure for High Speed Storage

Krishna T. Malladi; Manu Awasthi; Hongzhong Zheng

NVMe, being a new I/O communication protocol, suffers from a lack of tools to evaluate storage solutions built on the standard. In this paper, we provide the design and analysis of a comprehensive, fully customizable emulation infrastructure that builds on the NVMe protocol. It provides a number of knobs that allow system architects to quickly evaluate performance implications of a wide variety of storage solutions while natively executing workloads.

Proceedings of the Second International Symposium on Memory Systems | 2016

DRAMScale: Mechanisms to Increase DRAM Capacity

Krishna T. Malladi; Uk-Song Kang; Manu Awasthi; Hongzhong Zheng

New resistive memory technologies promise scalability and non-volatility but suffer from longer, asymmetric read-write latencies and lower endurance, placing the burden of system design on architects. In order to avoid such pitfalls and still provision for exascale data requirements using a much faster DRAM technology, we introduce DRAMScale. It features three novel mechanisms to increase DRAM density while complementing technology scaling and creating a new capacity-optimized DRAM system. Such optimizations enable us to build a two-tier memory system that meets memory latency and capacity requirements.

Proceedings of the Second International Symposium on Memory Systems | 2016

DRAMPersist: Making DRAM Systems Persistent

Krishna T. Malladi; Manu Awasthi; Hongzhong Zheng

Modern applications exercise main memory systems in different ways. A lot of scale-out, in-memory applications exploit a number of desirable properties provided by DRAM such as high capacity, low latency and high bandwidth. Although DRAM technology continues to scale aggressively, new resistive memory technologies are on the horizon, promising scalability, density and non-volatility. However, they still suffer from longer, asymmetric read-write latencies and have lower endurance as compared to DRAM. Considering these factors, scale-out, distributed applications will benefit greatly from main memory architectures that provide the non-volatility of new memory technologies, but still have DRAM-like latencies. To that end, we introduce DRAMPersist -- a novel mechanism to make main memory persistent and complement existing high speed storage, specifically geared for scale-out systems.

networking architecture and storages | 2017

FlashStorageSim: Performance Modeling for SSD Architectures

Krishna T. Malladi; Mu-Tien Chang; Dimin Niu; Hongzhong Zheng

We present FlashStorageSim, an SSD architecture performance model for data center servers, validated with an enterprise SSD. In addition to the SSD controller, SSD organization, and flash devices, FlashStorageSim models the host interface (e.g., SATA, PCIe, DDR). This allows users to explore non-traditional SSD use cases. We also implement mechanisms to improve simulation speed, which is shown to reduce simulation time by more than 7X. We show how FlashStorageSim can help researchers understand SSD design decisions.

IEEE Micro | 2017

DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric

Mingyu Gao; Christina Delimitrou; Dimin Niu; Krishna T. Malladi; Hongzhong Zheng; Bob Brennan; Christos Kozyrakis

The DRAM-Based Reconfigurable Acceleration Fabric (DRAF) uses commodity DRAM technology to implement a bit-level, reconfigurable fabric that improves area density by 10 times and power consumption by more than 3 times over conventional field-programmable gate arrays. Latency overlapping and multicontext support allow DRAF to meet the performance and density requirements of demanding applications in datacenter and mobile environments.

Archive | 2017