Mingu Kang
University of Illinois at Urbana–Champaign
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mingu Kang.
international conference on acoustics, speech, and signal processing | 2014
Mingu Kang; Min-Sun Keel; Naresh R. Shanbhag; Sean Eilert; Ken Curewitz
In this paper, we propose the concept of compute memory, where computation is deeply embedded into the memory (SRAM). This deep embedding enables multi-row read access and analog signal processing. Compute memory exploits the relaxed precision and linearity requirements of pattern recognition applications. System-level simulations incorporating various deterministic errors from analog signal chain demonstrates the limited accuracy of analog processing does not significantly degrade the system performance, which means the probability of pattern detection is minimally impacted. The estimated energy saving is 63 % as compared to the conventional system with standard embedded memory and parallel processing architecture, for 256×256 target image.
international conference on acoustics, speech, and signal processing | 2015
Mingu Kang; Sujan K. Gonugondla; Min Sun Keel; Naresh R. Shanbhag
In this paper, an energy efficient, memory-intensive, and high throughput VLSI architecture is proposed for convolutional networks (C-Net) by employing compute memory (CM) [1], where computation is deeply embedded into the memory (SRAM). Behavioral models incorporating CMs circuit non-idealities and energy models in 45nm SOI CMOS are presented. System-level simulations using these models demonstrate that the probability of handwritten digit recognition Pr > 0.99 can be achieved using the MNIST database [2], along with a 24.5× reduced energy delay product, a 5.0× reduced energy, and a 4.9× higher throughput as compared to the conventional system.
international symposium on circuits and systems | 2015
Mingu Kang; Eric P. Kim; Min Sun Keel; Naresh R. Shanbhag
This paper presents an energy-efficient VLSI implementation of Sparse Distributed Memory (SDM). High throughput and energy-efficient Hamming distance-based address decoder (CM-DEC) is proposed by employing compute memory [1], where computation is deeply embedded into a memory (SRAM). Hierarchical binary decision (HBD) is also proposed to enhance area- and energy-efficiency of read operation by minimizing data transfer. The SDM is employed as an auto-associative memory with four read iterations and 16×16 binary noisy input image with input error rates of 15%, 25%, and 30%. The proposed SDM achieves 39× smaller energy delay product with 14.5× and 2.7× reduced delay and energy, respectively as compared to conventional digital implementation of SDM in 45 nm SOI CMOS process with output error rate degradation less than 0.4%.
IEEE Transactions on Biomedical Circuits and Systems | 2016
Mingu Kang; Naresh R. Shanbhag
This paper presents an energy-efficient and high-throughput architecture for Sparse Distributed Memory (SDM)-a computational model of the human brain [1]. The proposed SDM architecture is based on the recently proposed in-memory computing kernel for machine learning applications called Compute Memory (CM) [2], [3]. CM achieves energy and throughput efficiencies by deeply embedding computation into the memory array. SDM-specific techniques such as hierarchical binary decision (HBD) are employed to reduce the delay and energy further. The CM-based SDM (CM-SDM) is a mixed-signal circuit, and hence circuit-aware behavioral, energy, and delay models in a 65 nm CMOS process are developed in order to predict system performance of SDM architectures in the autoand hetero-associative modes. The delay and energy models indicate that CM-SDM, in general, can achieve up to 25 × and 12 × delay and energy reduction, respectively, over conventional SDM. When classifying 16 ×16 binary images with high noise levels (input bad pixel ratios: 15%-25%) into nine classes, all SDM architectures are able to generate output bad pixel ratios (Bo) ≤ 2%. The CM-SDM exhibits negligible loss in accuracy, i.e., its Bo degradation is within 0.4% as compared to that of the conventional SDM.
international symposium on computer architecture | 2018
Prakalp Srivastava; Mingu Kang; Sujan K. Gonugondla; Sungmin Lim; Jungwook Choi; Vikram S. Adve; Nam Sung Kim; Naresh R. Shanbhag
Analog/mixed-signal machine learning (ML) accelerators exploit the unique computing capability of analog/mixed-signal circuits and inherent error tolerance of ML algorithms to obtain higher energy efficiencies than digital ML accelerators. Unfortunately, these analog/mixed-signal ML accelerators lack programmability, and even instruction set interfaces, to support diverse ML algorithms or to enable essential software control over the energy-vs-accuracy tradeoffs. We propose PROMISE, the first end-to-end design of a PROgrammable MIxed-Signal accElerator from Instruction Set Architecture (ISA) to high-level language compiler for acceleration of diverse ML algorithms. We first identify prevalent operations in widely-used ML algorithms and key constraints in supporting these operations for a programmable mixed-signal accelerator. Second, based on that analysis, we propose an ISA with a PROMISE architecture built with silicon-validated components for mixed-signal operations. Third, we develop a compiler that can take a ML algorithm described in a high-level programming language (Julia) and generate PROMISE code, with an IR design that is both language-neutral and abstracts away unnecessary hardware details. Fourth, we show how the compiler can map an application-level error tolerance specification for neural network applications down to low-level hardware parameters (swing voltages for each application Task) to minimize energy consumption. Our experiments show that PROMISE can accelerate diverse ML algorithms with energy efficiency competitive even with fixed-function digital ASICs for specific ML algorithms, and the compiler optimization achieves significant additional energy savings even for only 1% extra errors.
european solid state circuits conference | 2017
Mingu Kang; Sujan K. Gonugondla; Naresh R. Shanbhag
This paper presents IC realization of a random forest (RF) machine learning classifier. Algorithm-architecture-circuit is co-optimized to minimize the energy-delay product (EDP). Deterministic subsampling (DSS) and balanced decision trees result in reduced interconnect complexity and avoid irregular memory accesses. Low-swing analog in-memory computations embedded in a standard 6T SRAM enable massively parallel processing thereby minimizing the memory fetches and reducing the EDP further. The 65nm CMOS prototype achieves a 6.8× lower EDP compared to a conventional design at the same accuracy (94%) for an 8-class traffic sign recognition problem.
international solid-state circuits conference | 2018
Sujan K. Gonugondla; Mingu Kang; Naresh R. Shanbhag
arXiv: Hardware Architecture | 2016
Sai Zhang; Mingu Kang; Charbel Sakr; Naresh R. Shanbhag
arXiv: Hardware Architecture | 2016
Mingu Kang; Sujan K. Gonugondla; Ameya Patil; Naresh R. Shanbhag
international symposium on information theory | 2018
Yongjune Kim; Mingu Kang; Lav R. Varshney; Naresh R. Shanbhag