Dongjoo Shin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dongjoo Shin is active.

Explore More

Publication

Featured researches published by Dongjoo Shin.

international solid-state circuits conference | 2015

4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

Seong-Wook Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a -100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a -100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of smart portable devices, such as smartphones and tablets, results in many applications which need big-data processing with machine learning, such as tagging private photos in personal devices. A high-performance and energy-efficient DL/DI (deep inference) processor is required to realize user-centric pattern recognition in portable devices.

international solid-state circuits conference | 2014

10.4 A 1.22TOPS and 1.52mW/MHz augmented reality multi-core processor with neural network NoC for HMD applications

Gyeonghoon Kim; Youchang Kim; Kyuho Jason Lee; Seong-Wook Park; Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Sungpill Choi; Jinwook Oh; Hoi-Jun Yoo

Augmented reality (AR) is being investigated in advanced displays for the augmentation of images in a real-world environment. Wearable systems, such as head-mounted display (HMD) systems, have attempted to support real-time AR as a next generation UI/UX [1-2], but have failed, due to their limited computing power. In a prior work, a chip with limited AR functionality was reported that could perform AR with the help of markers placed in the environment (usually 1D or 2D bar codes) [3]. However, for a seamless visual experience, 3D objects should be rendered directly on the natural video image without any markers. Unlike marker-based AR, markerless AR requires natural feature extraction, general object recognition, 3D reconstruction, and camera-pose estimation to be performed in parallel. For instance, markerless AR for a VGA input-test video consumes ~1.3W power at 0.2fps throughput, with TIs OMAP4430, which exceeds power limits for wearable devices. Consequently, there is a need for a high-performance energy-efficient markerless AR processor to realize a real-time AR system, especially for HMD applications.

IEEE Transactions on Biomedical Circuits and Systems | 2014

A Wearable Neuro-Feedback System With EEG-Based Mental Status Monitoring and Transcranial Electrical Stimulation

Taehwan Roh; Kiseok Song; Hyunwoo Cho; Dongjoo Shin; Hoi-Jun Yoo

A wearable neuro-feedback system is proposed with a low-power neuro-feedback SoC (NFS), which supports mental status monitoring with electroencephalography (EEG) and transcranial electrical stimulation (tES) for neuro-modulation. Self-configured independent component analysis (ICA) is implemented to accelerate source separation at low power. Moreover, an embedded support vector machine (SVM) enables online source classification, configuring the ICA accelerator adaptively depending on the types of the decomposed components. Owing to the hardwired accelerating functions, the NFS dissipates only 4.45 mW to yield 16 independent components. For non-invasive neuro-modulation, tES stimulation up to 2 mA is implemented on the SoC. The NFS is fabricated in 130-nm CMOS technology.

international solid-state circuits conference | 2017

14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks

Dongjoo Shin; Jinmook Lee; Jinsu Lee; Hoi-Jun Yoo

Recently, deep learning with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) has become universal in all-around applications. CNNs are used to support vision recognition and processing, and RNNs are able to recognize time varying entities and to support generative models. Also, combining both CNNs and RNNs can recognize time varying visual entities, such as action and gesture, and to support image captioning [1]. However, the computational requirements in CNNs are quite different from those of RNNs. Fig. 14.2.1 shows a computation and weight-size analysis of convolution layers (CLs), fully-connected layers (FCLs) and RNN-LSTM layers (RLs). While CLs require a massive amount of computation with a relatively small number of filter weights, FCLs and RLs require a relatively small amount of computation with a huge number of filter weights. Therefore, when FCLs and RLs are accelerated with SoCs specialized for CLs, they suffer from high memory transaction costs, low PE utilization, and a mismatch of the computational patterns. Conversely, when CLs are accelerated with FCL- and RL-dedicated SoCs, they cannot exploit reusability and achieve required throughput. So far, works have considered acceleration of CLs, such as [2–4], or FCLs and RLs like [5]. However, there has been no work on a combined CNN-RNN processor. In addition, a highly reconfigurable CNN-RNN processor with high energy-efficiency is desirable to support general-purpose deep neural networks (DNNs).

international solid-state circuits conference | 2015

18.1 A 2.71nJ/pixel 3D-stacked gaze-activated object-recognition system for low-power mobile HMD applications

Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Seong-Wook Park; Kyuho Jason Lee; Youchang Kim; Hoi-Jun Yoo

Smart eyeglasses or head-mounted displays (HMDs) have been gaining traction as next-generation mainstream wearable devices. However, previous HMD systems [1] have had limited application, primarily due to their lacking a smart user interface (Ul) and user experience (UX). Since HMD systems have a small compact wearable platform, their Ul requires new modalities, rather than a computer mouse or a 2D touch panel. Recent speech-recognition-based Uls require voice input to reveal the users intention to not only HMD users but also others, which raises privacy concerns in a public space. In addition, prior works [2-3] attempted to support object recognition (OR) or augmented reality (AR) in smart eyeglasses, but consumed considerable power, >381mW, resulting in <;6 hours operation time with a 2100mWh battery.

IEEE Transactions on Biomedical Circuits and Systems | 2015

An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications

Seong-Wook Park; Jun-Young Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

international solid-state circuits conference | 2016

14.3 A 0.55V 1.1mW artificial-intelligence processor with PVT compensation for micro robots

Youchang Kim; Dongjoo Shin; Jinsu Lee; Yongsu Lee; Hoi-Jun Yoo

Micro robots with artificial intelligence (AI) are being investigated for many applications, such as unmanned delivery services. The robots have enhanced controllers that realize AI functions, such as perception (information extraction) and cognition (decision making). Historically, controllers have been based on general-purpose CPUs, and only recently, a few perception SoCs have been reported. SoCs with cognition capability have not been reported thus far, even though cognition is a key AI function in micro robots for decision making, especially autonomous drones. Path planning and obstacle avoidance require more than 10,000 searches within 50ms for a fast response, but a software implementation running on a Cortex-M3 takes ~5s to make decisions. Micro robots require 10× lower power and 100× faster decision making than conventional robots because of their fast movement in the environment, small form factor, and limited battery capacity. Therefore, an ultra-low-power high-performance artificial-intelligence processor (AIP) is necessary for micro robots to make fast and smart maneuvers in dynamic environments filled with obstacles.

IEEE Journal of Solid-state Circuits | 2016

A 2.71 nJ/Pixel Gaze-Activated Object Recognition System for Low-Power Mobile Smart Glasses

Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Seong-Wook Park; Kyuho Jason Lee; Youchang Kim; Hoi-Jun Yoo

A low-power object recognition (OR) system with intuitive gaze user interface (UI) is proposed for battery-powered smart glasses. For low-power gaze UI, we propose a low-power single-chip gaze estimation sensor, called gaze image sensor (GIS). In GIS, a novel column-parallel pupil edge detection circuit (PEDC) with new pupil edge detection algorithm XY pupil detection (XY-PD) is proposed which results in 2.9× power reduction with 16× larger resolution compared to previous work. Also, a logarithmic SIMD processor is proposed for robust pupil center estimation, (1 pixel error, with low-power floating-point implementation. For OR, low-power multicore OR processor (ORP) is implemented. In ORP, task-level pipeline with keypoint-level scoring is proposed to reduce the number of cores as well as the operating frequency of keypoint-matching processor (KMP) for low-power consumption. Also, dual-mode convolutional neural network processor (CNNP) is designed for fast tile selection without external memory accesses. In addition, a pipelined descriptor generation processor (DGP) with LUT-based nonlinear operation is newly proposed for low-power OR. Lastly, dynamic voltage and frequency scaling (DVFS) for dynamic power reduction in ORP is applied. Combining both of the GIS and ORP fabricated in 65 nm CMOS logic process, only 75 mW average power consumption is achieved with real-time OR performance, which is 1.2× and 4.4× lower power than the previously published work.

international solid-state circuits conference | 2014

18.5 A 2.14mW EEG neuro-feedback processor with transcranial electrical stimulation for mental-health management

Taehwan Roh; Kiseok Song; Hyunwoo Cho; Dongjoo Shin; Unsoo Ha; Kwonjoon Lee; Hoi-Jun Yoo

Recently, mental diseases have been successfully treated by neuro-feedback therapy based on Quantitative EEG (QEEG) and Event Related Potential (ERP) online data measurements. The U.S. Food and Drug Administration (FDA) approved the first EEG test for diagnosing attention deficit hyperactivity disorder (ADHD) in 2013 [1]. The EEG signals are measured by an EEG cap and analyzed by a high performance computer to extract not only the EEG power at a predetermined frequency and site combinations, but also the degree of coherence between all sites. Based on these results, brain stimulation is performed to modulate brain rhythms (EEG) toward the normal values for the therapy.

international symposium on circuits and systems | 2016

A 17.5 fJ/bit energy-efficient analog SRAM for mixed-signal processing

Jinsu Lee; Dongjoo Shin; Youchang Kim; Hoi-Jun Yoo

An energy-efficient analog SRAM (A-SRAM) is proposed to eliminate redundant analog-to-digital (A/D) and digital-to-analog (D/A) conversions in the mixed-signal processing such as a biomedical and a neural network applications. The D/A and the A/D conversion are integrated into the SRAM readout by the charge sharing of the proposed split bit-line (BL) and the SRAM write by the successive approximation method, respectively. And a data structure is newly proposed to allocate each bit of the input data to the binary-weighted bit-cell array. The proposed A-SRAM is implemented using 65 nm CMOS technology. As a result, it achieves 17.5 fJ/bit read energy-efficiency and 21 Gbit/s read throughput, which are 54% lower and 1.3× higher than the conventional SRAM. Also, the area is reduced by 31% compared to the conventional SRAM with ADC and DAC.

Explore More