Taesik Na
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Taesik Na.
international solid-state circuits conference | 2012
Kyo-Min Sohn; Taesik Na; In-Dal Song; Yong Shim; Won-Il Bae; Sanghee Kang; Dongsu Lee; Hangyun Jung; Hanki Jeoung; Ki-Won Lee; Junsuk Park; Jongeun Lee; Byung-Hyun Lee; Inwoo Jun; Ju-Seop Park; Junghwan Park; Hundai Choi; Sang Hee Kim; Haeyoung Chung; Young Choi; Dae-Hee Jung; Jang Seok Choi; Byung-sick Moon; Jung-Hwan Choi; Byung-Chul Kim; Seong-Jin Jang; Joo Sun Choi; Kyung Seok Oh
A higher performance DRAM is required by the market due to the increasing of bandwidth of networks and the rise of high-capacity multimedia content. DDR4 SDRAM is the next-generation memory that meets these demands in computing and server systems. In comparison with current DDR3 memory, the major changes are supply voltage reduction to 1.2V, pseudo open drain I/O interface, and data rate increase from 1.6 to 3.2Gb/s. To achieve high performance at low supply voltage and reduce power consumption, this work introduces new functions and describes their implementation. Data bus inversion (DBI) is employed for high-speed transactions to reduce power consumption of I/O and SSN noise. Dual-error detection, which adopts cyclic redundancy check (CRC) for DQ, and command address (CA) parity is designed to guarantee reliable transmission. GDDR5 memory also has DBI and CRC functions [1], but in this work, these schemes are implemented in a way that reduces area overhead and timing penalty. Besides these error-check functions, an enhanced gain buffer and a PVT-tolerant fetch scheme improve basic receiving ability. To meet the output jitter requirements of DDR4 SDRAM, the type of delay line for DLL is selected at initial stage according to data rate.
design automation conference | 2017
Jong Hwan Ko; Burhan Ahmad Mudassar; Taesik Na; Saibal Mukhopadhyay
Convolutional neural networks (CNNs) require high computation and memory demand for training. This paper presents the design of a frequency-domain accelerator for energy-efficient CNN training. With Fourier representations of parameters, we replace convolutions with simpler pointwise multiplications. To eliminate the Fourier transforms at every layer, we train the network entirely in the frequency domain using approximate frequency-domain nonlinear operations. We further reduce computation and memory requirements using sinc interpolation and Hermitian symmetry. The accelerator is designed and synthesized in 28nm CMOS, as well as prototyped in an FPGA. The simulation results show that the proposed accelerator significantly reduces training time and energy for a target recognition accuracy.
international symposium on low power electronics and design | 2016
Taesik Na; Saibal Mukhopadhyay
Training convolutional neural network is a major bottleneck when developing a new neural network topology. This paper presents a dynamic precision scaling (DPS) algorithm and flexible multiplier-accumulator (MAC) to speed up convolutional neural network training. The DPS algorithm utilizes dynamic fixed point and finds good enough numerical precision for target network while training. The precision information from DPS is used to configure our proposed MAC. The proposed MAC can perform fixed point computation with variable precision mode providing differentiated computation time which enables speeding up training for lower precision computation. Simulation results show that our work can achieve 5.7x speed-up while consuming 31% energy compared to baseline for modified Alexnet on Flickr image style recognition task.
design, automation, and test in europe | 2016
Taesik Na; Saibal Mukhopadhyay
Timing error due to power supply noise (PSN) is a key challenge for design of digital systems. This paper presents an accurate time-domain behavioral model of timing slack variation due to the PSN while accounting for the clock-data compensation (CDC). The accuracy of the model is verified against SPICE for complex designs including AES engine and LEON3 processor. As a case study, the model is used for time-domain co-simulation of power distribution network (PDN) and LEON3 processor with circuit-based noise tolerance techniques. The analysis shows that the model helps reduce pessimism in estimated timing slack by considering effects of PSN and CDC.
design, automation, and test in europe | 2017
Jong Hwan Ko; Duckhwan Kim; Taesik Na; Jaeha Kung; Saibal Mukhopadhyay
Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and smoothness of the weight matrix. To minimize the loss of accuracy due to JPEG encoding, we propose to adaptively control the quantization factor of the JPEG algorithm depending on the error-sensitivity (gradient) of each weight. With the adaptive compression technique, the weight blocks with higher sensitivity are compressed less for higher accuracy. The adaptive compression reduces memory requirement, which in turn results in higher performance and lower energy of neural network hardware. The simulation for inference hardware for multilayer perceptron with the MNIST dataset shows up to 42X compression with less than 1% loss of recognition accuracy, resulting in 3X higher effective memory bandwidth and ∼19X lower system energy.
international symposium on neural networks | 2017
Taesik Na; Jong Hwan Ko; Jaeha Kung; Saibal Mukhopadhyay
Training of neural network can be accelerated by limited numerical precision together with specialized low-precision hardware. This paper studies how low precision can impact on entire training of RNNs. We emulate low precision training for recently proposed gated recurrent unit (GRU) and use dynamic fixed point as a target numeric format. We first show that batch normalization on input sequences can help speed up training with low precision as well as high precision. We also show that the overflow rate should be carefully controlled for dynamic fixed point. We study low precision training with various rounding options including bit truncation, round to nearest, and stochastic rounding. Stochastic rounding shows superior results than the other options. The effect of fully low precision training is also analyzed by comparing partial low precision training. We show that the piecewise linear activation function with stochastic rounding can achieve comparable training results with floating point precision. Low precision multiplier and accumulator (MAC) with linear-feedback shift register (LFSR) is implemented with 28nm Synopsys PDK for energy and performance analysis. Implementation results show low precision hardware is 4.7x faster, and energy per task is up to 4.55x lower than that of floating point hardware.
advanced video and signal based surveillance | 2016
Jong Hwan Ko; Taesik Na; Saibal Mukhopadhyay
This paper presents a lightweight video sensor node for moving object surveillance using region-of-interest (ROI) based coding and an on-line multi-parameter rate controller. The proposed ROI-based coding scheme determines ROI blocks, pre-processes non-ROI blocks using bit-truncation, and encodes all blocks using Motion JPEG. The on-line rate controller modulates the parameters of the ROI-based coding scheme to match the encoded data rate and transmission data rate under the variations in channel bandwidth and input video content. The low-complexity hardware of the ROI-based coding scheme reduces computation energy, and the on-line rate controller minimizes buffer requirement. The sensor node is designed in 130nm CMOS and prototyped in a Virtex-V FPGA. Simulations show that, under the same ROI quality, the proposed approach reduces system energy by 61% compared to H.264/AVC.
international midwest symposium on circuits and systems | 2017
Jong Hwan Ko; Yun Long; Mohammad Faisal Amir; Duckhwan Kim; Jaeha Kung; Taesik Na; Amit Ranjan Trivedi; Saibal Mukhopadhyay
Enhancing energy/resource efficiency of neural networks is critical to support on-chip neural image processing at Internet-of-Things edge devices. This paper presents recent technology advancements towards energy-efficient neural image processing. 3D integration of image sensor and neural network improves power-efficiency with programmability and scalability. Computation energy of feedforward and recurrent neural networks is reduced by dynamic control of approximation, and storage demand is reduced by image-based adaptive weight compression. Emerging devices such as tunnel FET and Resistive Random Access Memory are utilized to achieve higher computation efficiency than CMOS-based designs.
design, automation, and test in europe | 2017
Taesik Na; Jong Hwan Ko; Saibal Mukhopadhyay
Adaptive clock generation to track critical path delay enables lowering supply voltage with improved timing slack under supply noise. This paper presents how to synthesize clock tree in adaptive clocking to fully exploit the clock data compensation (CDC) effect in digital circuits. The paper first provides analytical proof of ideal CDC effect for ring oscillator based clock generation. Second, the paper analyzes non-ideal CDC effect in a gate dominated critical path and wire dominated clock tree design. The paper shows the delay sensitivity mismatch between clock tree and critical path can degrade CDC effect by analyzing timing slack under power supply noise (PSN). Finally, the paper proposes simple but efficient clock tree synthesis (CTS) technique to maximize timing slack under PSN in digital circuits with adaptive clock generation.
IEEE Transactions on Circuits and Systems | 2017
Taesik Na; Jong Hwan Ko; Saibal Mukhopadhyay
Tolerating timing error due to power supply noise (PSN) in digital circuits can be done with adding voltage margins. Conservative addition of voltage margins leads wastes of power reducing the battery life in Internet of Things (IoT) devices. This paper aims to provide guidelines to avoid over-design due to PSN especially for the low-cost IoT devices. To this end, we first present an accurate time-domain behavioral model of timing slack variation due to PSN accounting for the clock-data compensation. The accuracy of the model is verified against SPICE for complex designs, including AES engine and LEON3 processor. To prove the effectiveness of our model for reducing voltage margin, we utilize our model in standard VLSI design flow for various examples, such as timing slack versus noise frequency analysis, determining optimal value of an on-die capacitor, analyzing the effects of time borrowing technique, and PVT variation simulations. The analysis shows that the model helps reduce pessimism in estimated timing slack.