Dimin Niu
Samsung
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dimin Niu.
international symposium on microarchitecture | 2017
Shuangchen Li; Dimin Niu; Krishna T. Malladi; Hongzhong Zheng; Bob Brennan; Yuan Xie
Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.CCS CONCEPTS• Hardware → Dynamic memory; • Computer systems organization → reconfigurable computing; Neural networks;
international symposium on computer architecture | 2016
Mingyu Gao; Christina Delimitrou; Dimin Niu; Krishna T. Malladi; Hongzhong Zheng; Bob Brennan; Christos Kozyrakis
FPGAs are a popular target for application-specific accelerators because they lead to a good balance between flexibility and energy efficiency. However, FPGA lookup tables introduce significant area and power overheads, making it difficult to use FPGA devices in environments with tight cost and power constraints. This is the case for datacenter servers, where a modestly-sized FPGA cannot accommodate the large number of diverse accelerators that datacenter applications need. This paper introduces DRAF, an architecture for bit-level reconfigurable logic that uses DRAM subarrays to implement dense lookup tables. DRAF overlaps DRAM operations like bitline precharge and charge restoration with routing within the reconfigurable routing fabric to minimize the impact of DRAM latency. It also supports multiple configuration contexts that can be used to quickly switch between different accelerators with minimal latency. Overall, DRAF trades off some of the performance of FPGAs for significant gains in area and power. DRAF improves area density by 10x over FPGAs and power consumption by more than 3x, enabling DRAF to satisfy demanding applications within strict power and cost constraints. While accelerators mapped to DRAF are 2-3x slower than those in FPGAs, they still deliver a 13x speedup and an 11x reduction in power consumption over a Xeon core for a wide range of datacenter tasks, including analytics and interactive services like speech recognition.
networking architecture and storages | 2017
Krishna T. Malladi; Mu-Tien Chang; Dimin Niu; Hongzhong Zheng
We present FlashStorageSim, an SSD architecture performance model for data center servers, validated with an enterprise SSD. In addition to the SSD controller, SSD organization, and flash devices, FlashStorageSim models the host interface (e.g., SATA, PCIe, DDR). This allows users to explore non-traditional SSD use cases. We also implement mechanisms to improve simulation speed, which is shown to reduce simulation time by more than 7X. We show how FlashStorageSim can help researchers understand SSD design decisions.
IEEE Micro | 2017
Mingyu Gao; Christina Delimitrou; Dimin Niu; Krishna T. Malladi; Hongzhong Zheng; Bob Brennan; Christos Kozyrakis
The DRAM-Based Reconfigurable Acceleration Fabric (DRAF) uses commodity DRAM technology to implement a bit-level, reconfigurable fabric that improves area density by 10 times and power consumption by more than 3 times over conventional field-programmable gate arrays. Latency overlapping and multicontext support allow DRAF to meet the performance and density requirements of demanding applications in datacenter and mobile environments.
Archive | 2017
Mu-Tien Chang; Hongzhong Zheng; Dimin Niu
Archive | 2016
Dimin Niu; Mu-Tien Chang; Hongzhong Zheng
Archive | 2018
Dimin Niu; Shuangchen Li; Bob Brennan; Krishna T. Malladi; Hongzhong Zheng
Archive | 2017
Frederic Sala; Chaohong Hu; Hongzhong Zheng; Dimin Niu; Mu-Tien Chang
Archive | 2017
Mu-Tien Chang; Prasun Gera; Dimin Niu; Hongzhong Zheng
Archive | 2017
Dimin Niu; Mu-Tien Chang; Hongzhong Zheng; Kyung-Chang Ryoo