Is this you? Create Your Porfile

Jaeha Kung

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaeha Kung is active.

Explore More

Publication

Featured researches published by Jaeha Kung.

international symposium on computer architecture | 2016

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

Duckhwan Kim; Jaeha Kung; Sek M. Chai; Sudhakar Yalamanchili; Saibal Mukhopadhyay

This paper presents a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected by 2D mesh network as a processing tier, which is integrated in 3D with multiple tiers of DRAM. The PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. The paper presents the basic architecture of the Neurocube and an analysis of the logic tier synthesized in 28nm and 15nm process technologies. The performance of the Neurocube is evaluated and illustrated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.

international symposium on low power electronics and design | 2015

A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses

Jaeha Kung; Duckhwan Kim; Saibal Mukhopadhyay

This paper proposes a power-aware digital feedforward neural network platform that utilizes the backpropagation algorithm during training to enable energy-quality trade-off. Given a quality constraint, the proposed approach identifies a set of synaptic weights for approximation in a neural network. The approach selects synapses with small impact on output error, estimated by the backpropagation algorithm, for approximation. The approximations are achieved by a coupled software (reduced bit-width) and hardware (approximate multiplication in the processing engine) based design approaches. The full-chip design in 130nm CMOS shows, compared to a baseline accurate design, the proposed approach reduces system power by ~38% with 0.4% lower recognition accuracy in a classification problem.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2015

On the Impact of Energy-Accuracy Tradeoff in a Digital Cellular Neural Network for Image Processing

Jaeha Kung; Duckhwan Kim; Saibal Mukhopadhyay

This paper studies the opportunities of energy-accuracy tradeoff in cellular neural network (CNN). Algorithmic characteristics of CNN is coupled with hardware-induced error distribution of a digital CNN cell to evaluate energy-accuracy tradeoff for simple image processing tasks as well as a complex application. The analysis shows that errors modulate the cell dynamics and propagate through the network degrading the output quality and increasing the convergence time. The error propagation is determined by the task being performed by the CNN, specifically, the strength of the feedback template. Controlling precision is observed to be a more effective approach for energy-accuracy tradeoff in CNN than voltage over scaling.

international symposium on low power electronics and design | 2016

Dynamic Approximation with Feedback Control for Energy-Efficient Recurrent Neural Network Hardware

Jaeha Kung; Duckhwan Kim; Saibal Mukhopadhyay

This paper presents methodology of feedback-controlled dynamic approximation to enable energy-accuracy trade-off in digital recurrent neural network (RNN). A low-power digital RNN engine is presented that employs the proposed dynamic approximation. The on-chip feedback controller is realized by utilizing hysteretic or proportional controller. The dynamic adaptation of bit-precisions during the RNN computation is selected as approximation approach. Considering various applications, the digital RNN engine designed in 28nm CMOS shows ~36% average energy saving compared to the baseline case, with only ~4% of accuracy degradation on average.

international joint conference on neural network | 2016

ReRAM Crossbar based Recurrent Neural Network for human activity detection

Yun Long; Eui Min Jung; Jaeha Kung; Saibal Mukhopadhyay

We present a programmable high-efficient Recurrent Neural Network (RNN) with Synapses design using Resistive Random Access Memory (ReRAM). The presented ReRAM-RNN employs crossbar ReRAM arrays as synapses. A fast synapses programming methodology is realized by CMOS-based neuron with in-built programming circuitry. The simulations are performed using experimentally verified physical resistive switching model, instead of only functional models, providing better estimate of system speed and power efficiency. Simulation results show that ReRAM-RNN can provide higher computation efficiency and/or more compact design than software realization of RNN, and dedicated CMOS based digital-and analog-RNN. We show that the efficiency improvement of ReRAM-based neural network design is more significant in feedback networks than in feedforward networks.

signal processing systems | 2018

Efficient Object Detection Using Embedded Binarized Neural Networks

Jaeha Kung; David C. Zhang; Gooitzen S. van der Wal; Sek M. Chai; Saibal Mukhopadhyay

Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this paper, we utilize and analyze the binarized neural network in doing human detection on infrared images. Our results show comparable algorithmic performance of binarized versus 32bit floating-point networks, with the added benefit of greatly simplified computation and reduced memory overhead. In addition, we present a system architecture designed specifically for computation using binary representation that achieves at least 4× speedup and the energy is improved by three orders of magnitude over GPU.

design, automation, and test in europe | 2017

Adaptive weight compression for memory-efficient neural networks

Jong Hwan Ko; Duckhwan Kim; Taesik Na; Jaeha Kung; Saibal Mukhopadhyay

Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and smoothness of the weight matrix. To minimize the loss of accuracy due to JPEG encoding, we propose to adaptively control the quantization factor of the JPEG algorithm depending on the error-sensitivity (gradient) of each weight. With the adaptive compression technique, the weight blocks with higher sensitivity are compressed less for higher accuracy. The adaptive compression reduces memory requirement, which in turn results in higher performance and lower energy of neural network hardware. The simulation for inference hardware for multilayer perceptron with the MNIST dataset shows up to 42X compression with less than 1% loss of recognition accuracy, resulting in 3X higher effective memory bandwidth and ∼19X lower system energy.

IEEE Transactions on Emerging Topics in Computing | 2017

A Power-Aware Digital Multilayer Perceptron Accelerator with On-Chip Training Based on Approximate Computing

Duckhwan Kim; Jaeha Kung; Saibal Mukhopadhyay

This paper proposes that approximation by reducing bit-precision and using inexact multiplier can save power consumption of digital multilayer perceptron accelerator during the classification of MNIST (inference) with negligible accuracy degradation. Based on the error sensitivity precomputed during the training, synaptic weights with less sensitivity are approximated. Under given bit-precision modes, our proposed algorithm determines bit precision for all synapse to minimize power consumption for given target accuracy. For entire network, earlier layer can be more approximated since it has lower error sensitivity. Proposed algorithm can save power 57.4 percent while accuracy is degraded about 1.7 percent. After approximation, retraining with few iterations can improve the accuracy while maintaining power consumption. The impact of different training conditions on the approximation is also studied. Training with small quantization error (less bit precision) allows more power saving in inference. It also shows that enough number of iteration during the training is important for approximation in inference. Network with more layers is more sensitive to the approximation.

ieee soi 3d subthreshold microelectronics technology unified conference | 2016

NeuroSensor: A 3D image sensor with integrated neural accelerator

Mohammad Faisal Amir; Duckhwan Kim; Jaeha Kung; D. Lie; Sudhakar Yalamanchili; Saibal Mukhopadhyay

3D integration provides opportunities to design high-bandwidth and low-power CMOS image sensors (CIS) [1–4]. The 3D stacking of pixel tier, peripheral tier, memory tier(s), and compute tier(s) enables high degree of parallel processing. Also, each tier can be designed in different technology nodes (heterogeneous integration) to further improve power-efficiency. This paper presents a case study of a smart 3D image sensor with integrated neuro-inspired computing for intelligent vision processing. Hardware acceleration of neuro-inspired computing has received much attention in recent years for recognition and classification [5]. We present the physical design of NeuroSensor, a 3D CIS with an integrated convolutional neural network (CNN) accelerator. The rationale for our approach is that 3D integration of sensor, memory, and computing will effectively harness the inherent parallelism in neural algorithms. We design the NeuroSensor considering different complexities of CNN platform, ranging from only feature extraction to complete classification, and study the trade-offs between complexity, performance, and power.

IEEE Transactions on Components, Packaging and Manufacturing Technology | 2015

Post-Silicon Estimation of Spatiotemporal Temperature Variations Using MIMO Thermal Filters

Jaeha Kung; Wen Yueh; Sudhakar Yalamanchili; Saibal Mukhopadhyay

This paper experimentally demonstrates a methodology for proactive estimation of spatiotemporal variations in junction temperature of a silicon chip using multi-input multi-output (MIMO) thermal filters. The presented approach performs on-chip measurements to estimate the relations between power and temperature variations in the frequency domain to construct a MIMO thermal filter. The extracted filter is then used to predict spatiotemporal temperature variations from a known power pattern, even for locations without temperature sensors. The accuracy of the proposed approach is verified through a thermal emulator designed in 130-nm CMOS technology with on-chip digitally controllable power (heat) generators and temperature sensors. Using the proposed MIMO thermal filter, spatiotemporal temperature variations are accurately estimated with small error bound even at locations with no temperature sensors.

Explore More