Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jinsu Lee is active.

Publication


Featured researches published by Jinsu Lee.


international solid-state circuits conference | 2017

14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks

Dongjoo Shin; Jinmook Lee; Jinsu Lee; Hoi-Jun Yoo

Recently, deep learning with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) has become universal in all-around applications. CNNs are used to support vision recognition and processing, and RNNs are able to recognize time varying entities and to support generative models. Also, combining both CNNs and RNNs can recognize time varying visual entities, such as action and gesture, and to support image captioning [1]. However, the computational requirements in CNNs are quite different from those of RNNs. Fig. 14.2.1 shows a computation and weight-size analysis of convolution layers (CLs), fully-connected layers (FCLs) and RNN-LSTM layers (RLs). While CLs require a massive amount of computation with a relatively small number of filter weights, FCLs and RLs require a relatively small amount of computation with a huge number of filter weights. Therefore, when FCLs and RLs are accelerated with SoCs specialized for CLs, they suffer from high memory transaction costs, low PE utilization, and a mismatch of the computational patterns. Conversely, when CLs are accelerated with FCL- and RL-dedicated SoCs, they cannot exploit reusability and achieve required throughput. So far, works have considered acceleration of CLs, such as [2–4], or FCLs and RLs like [5]. However, there has been no work on a combined CNN-RNN processor. In addition, a highly reconfigurable CNN-RNN processor with high energy-efficiency is desirable to support general-purpose deep neural networks (DNNs).


international solid-state circuits conference | 2016

14.3 A 0.55V 1.1mW artificial-intelligence processor with PVT compensation for micro robots

Youchang Kim; Dongjoo Shin; Jinsu Lee; Yongsu Lee; Hoi-Jun Yoo

Micro robots with artificial intelligence (AI) are being investigated for many applications, such as unmanned delivery services. The robots have enhanced controllers that realize AI functions, such as perception (information extraction) and cognition (decision making). Historically, controllers have been based on general-purpose CPUs, and only recently, a few perception SoCs have been reported. SoCs with cognition capability have not been reported thus far, even though cognition is a key AI function in micro robots for decision making, especially autonomous drones. Path planning and obstacle avoidance require more than 10,000 searches within 50ms for a fast response, but a software implementation running on a Cortex-M3 takes ~5s to make decisions. Micro robots require 10× lower power and 100× faster decision making than conventional robots because of their fast movement in the environment, small form factor, and limited battery capacity. Therefore, an ultra-low-power high-performance artificial-intelligence processor (AIP) is necessary for micro robots to make fast and smart maneuvers in dynamic environments filled with obstacles.


international symposium on circuits and systems | 2016

A 17.5 fJ/bit energy-efficient analog SRAM for mixed-signal processing

Jinsu Lee; Dongjoo Shin; Youchang Kim; Hoi-Jun Yoo

An energy-efficient analog SRAM (A-SRAM) is proposed to eliminate redundant analog-to-digital (A/D) and digital-to-analog (D/A) conversions in the mixed-signal processing such as a biomedical and a neural network applications. The D/A and the A/D conversion are integrated into the SRAM readout by the charge sharing of the proposed split bit-line (BL) and the SRAM write by the successive approximation method, respectively. And a data structure is newly proposed to allocate each bit of the input data to the binary-weighted bit-cell array. The proposed A-SRAM is implemented using 65 nm CMOS technology. As a result, it achieves 17.5 fJ/bit read energy-efficiency and 21 Gbit/s read throughput, which are 54% lower and 1.3× higher than the conventional SRAM. Also, the area is reduced by 31% compared to the conventional SRAM with ADC and DAC.


IEEE Micro | 2017

BRAIN: A Low-Power Deep Search Engine for Autonomous Robots

Youchang Kim; Dongjoo Shin; Jinsu Lee; Hoi-Jun Yoo

Autonomous robots are actively studied for many unmanned applications, however, the heavy computational costs and limited battery capacity make it difficult to implement intelligent decision making in robots. In this article, the authors propose a low-power deep search engine (code-named “BRAIN”) for real-time path planning of intelligent autonomous robots. To achieve low power consumption while maintaining high performance, BRAIN adopts a multithreaded core architecture with a transposition table cache to detect and avoid duplicated searches between the processors at the deeper level of the search tree. In addition, dynamic voltage and frequency scaling is adopted to minimize power consumption without any loss of performance because the workload is gradually decreasing while approaching the target position. BRAIN achieves fast search speed (470,000 searches per second) and low energy consumption (79 nJ per search), and it is successfully applied to the robots for autonomous navigation without any collision in dynamic environments.


IEEE Transactions on Circuits and Systems I-regular Papers | 2018

A 0.55 V 1.1 mW Artificial Intelligence Processor With On-Chip PVT Compensation for Autonomous Mobile Robots

Youchang Kim; Dongjoo Shin; Jinsu Lee; Yongsu Lee; Hoi-Jun Yoo

Autonomous mobile robots are receiving a lot of attention for many applications, such as package delivery and smart surveillance, however, the battery capacity is limited to implement intelligent decision making in robots because of the heavy computational costs. In this paper, an ultra-low-power artificial intelligence processor (AIP) is proposed for real-time decision making of autonomous mobile robots. To achieve low power consumption while maintaining high performance, it adopts four key features: 1) an 8-thread tree search processor for real-time path planning; 2) a reinforcement learning accelerator for the avoidance of unexpected obstacles; 3) a 3-level transposition table cache for the reduction of duplicated computations; and 4) a PVT compensation circuit for the stable operation at near-threshold voltage. The proposed 16 mm


symposium on vlsi circuits | 2017

A 31.2pJ/disparity· pixel stereo matching processor with stereo SRAM for mobile UI application

Jinsu Lee; Dongjoo Shin; Kyuho Jason Lee; Hoi-Jun Yoo

^{{{2}}}


IEEE Transactions on Very Large Scale Integration Systems | 2017

A 17.5-fJ/bit Energy-Efficient Analog SRAM for Mixed-Signal Processing

Jinsu Lee; Dongjoo Shin; Youchang Kim; Hoi-Jun Yoo

AIP is fabricated using 65-nm triple-well CMOS technology. It consumes only 1.1 mW at 0.55 V supply voltage and 7 MHz operating frequency, and 151 mW at 1.2 V supply voltage and 245 MHz operating frequency. The AIP achieves fast search speed (470 000 search/s) and low energy consumption (79 nJ/search), and it is successfully applied to a battery-powered robot system for autonomous navigation without any collision in dynamic indoor environments.


2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) | 2017

An energy-efficient deep learning processor with heterogeneous multi-core architecture for convolutional neural networks and recurrent neural networks

Dongjoo Shin; Jinmook Lee; Jinsu Lee; Juhyoung Lee; Hoi-Jun Yoo

An energy-efficient and high-speed stereo matching processor is proposed for smart mobile devices with proposed stereo SRAM (S-SRAM) and independent regional integral cost (IRIC). Cost generation unit (CGU) with the proposed S-SRAM reduces 63.2% of CGU power consumption. The proposed IRIC enables cost aggregation unit (CAU) to obtain 6.4× of speed and 12.3% of the power reduction of CAU with pipelined integral cost generator (PICG). The proposed stereo matching processor, implemented in 65nm CMOS process, achieves 82fps and 31.2pJ/disparity-pixel energy efficiency at 30fps. Its energy efficiency is improved by 77.6% compared to the state-of-the-art.


2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX) | 2016

A 1.1 mW 32-thread artificial intelligence processor with 3-level transposition table and on-chip PVT compensation for autonomous mobile robots

Youchang Kim; Dongjoo Shin; Jinsu Lee; Hoi-Jun Yoo

An energy-efficient analog SRAM (A-SRAM) is proposed to eliminate redundant analog-to-digital (A/D) and digital-to-analog (D/A) conversion in mixed-signal systems, such as neuromorphic chips and neural networks. D/A conversion is integrated into the SRAM readout by charge sharing of the proposed split bitline (BL). Also, A/D conversion is integrated into the SRAM write operation with the successive approximation method in the proposed input–output block. Also, a configurable SRAM bitcell array is proposed to allocate the converted digital data without unfilled bitcells. The multirow access decoder selects multiple bitcells in a single column and configures the bitcell array by controlling the BL switches to split BLs. The proposed A-SRAM is implemented using the 65-nm CMOS technology. It achieves 17.5-fJ/bit energy-efficiency and 21-Gbit/s throughput for the analog readout, which are 64% and 1.3 times better than those of the conventional SRAM followed by a digital-to-analog converter (DAC). Also, the area is reduced by 91% compared with the conventional SRAM with analog-to-digital converter (ADC) and DAC.


international symposium on circuits and systems | 2018

A 141.4 mW Low-Power Online Deep Neural Network Training Processor for Real-time Object Tracking in Mobile Devices

Donghyeon Han; Jinsu Lee; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

An energy-efficient deep learning processor is proposed for convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in mobile platforms. The 16mm2 chip is fabricated using 65nm technology with 3 key features, 1) Reconfigurable heterogeneous architecture to support both CNNs and RNNs, 2) LUT-based reconfigurable multiplier optimized for dynamic fixed-point with the on-line adaptation, 3) Quantization table-based matrix multiplication to reduce off-chip memory access and remove duplicated multiplications. As a result, compared to the [2] and [3], this work shows 20× and 4.5× higher energy efficiency, respectively. Also, DNPU shows 6.5× higher energy efficiency compared to the [5].

Collaboration


Dive into the Jinsu Lee's collaboration.

Researchain Logo
Decentralizing Knowledge