Kyeongryeol Bong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyeongryeol Bong is active.

Explore More

Publication

Featured researches published by Kyeongryeol Bong.

international solid-state circuits conference | 2015

4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

Seong-Wook Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a -100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a -100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of smart portable devices, such as smartphones and tablets, results in many applications which need big-data processing with machine learning, such as tagging private photos in personal devices. A high-performance and energy-efficient DL/DI (deep inference) processor is required to realize user-centric pattern recognition in portable devices.

international solid-state circuits conference | 2013

A 646GOPS/W multi-classifier many-core processor with cortex-like architecture for super-resolution recognition

Jun-Young Park; Injoon Hong; Gyeonghoon Kim; Youchang Kim; Kyuho Jason Lee; Seong-Wook Park; Kyeongryeol Bong; Hoi-Jun Yoo

Object recognition processors have been reported for the applications of autonomic vehicle navigation, smart surveillance and unmanned air vehicles (UAVs) [1-3]. Most of the processors adopt a single classifier rather than multiple classifiers even though multi-classifier systems (MCSs) offer more accurate recognition with higher robustness [4]. In addition, MCSs can incorporate the human vision system (HVS) recognition architecture to reduce computational requirements and enhance recognition accuracy. For example, HMAX models the exact hierarchical architecture of the HVS for improved recognition accuracy [5]. Compared with SIFT, known to have the best recognition accuracy based on local features extracted from the object [6], HMAX can recognize an object based on global features by template matching and a maximum-pooling operation without feature segmentation. In this paper we present a multi-classifier many-core processor combining the HMAX and SIFT approaches on a single chip. Through the combined approach, the system can: 1) pay attention to the target object directly with global context consideration, including complicated background or camouflaging obstacles, 2) utilize the super-resolution algorithm to recognize highly blurred or small size objects, and 3) recognize more than 200 objects in real-time by context-aware feature matching.

international solid-state circuits conference | 2014

10.4 A 1.22TOPS and 1.52mW/MHz augmented reality multi-core processor with neural network NoC for HMD applications

Gyeonghoon Kim; Youchang Kim; Kyuho Jason Lee; Seong-Wook Park; Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Sungpill Choi; Jinwook Oh; Hoi-Jun Yoo

Augmented reality (AR) is being investigated in advanced displays for the augmentation of images in a real-world environment. Wearable systems, such as head-mounted display (HMD) systems, have attempted to support real-time AR as a next generation UI/UX [1-2], but have failed, due to their limited computing power. In a prior work, a chip with limited AR functionality was reported that could perform AR with the help of markers placed in the environment (usually 1D or 2D bar codes) [3]. However, for a seamless visual experience, 3D objects should be rendered directly on the natural video image without any markers. Unlike marker-based AR, markerless AR requires natural feature extraction, general object recognition, 3D reconstruction, and camera-pose estimation to be performed in parallel. For instance, markerless AR for a VGA input-test video consumes ~1.3W power at 0.2fps throughput, with TIs OMAP4430, which exceeds power limits for wearable devices. Consequently, there is a need for a high-performance energy-efficient markerless AR processor to realize a real-time AR system, especially for HMD applications.

international solid-state circuits conference | 2016

14.2 A 502GOPS and 0.984mW dual-mode ADAS SoC with RNN-FIS engine for intention prediction in automotive black-box system

Kyuho Jason Lee; Kyeongryeol Bong; Chang-Hyeon Kim; Jaeeun Jang; Hyunki Kim; Jihee Lee; Kyoung-Rog Lee; Gyeonghoon Kim; Hoi-Jun Yoo

Advanced driver-assistance systems (ADAS) are being adopted in automobiles for forward-collision warning, advanced emergency braking, adaptive cruise control, and lane-keeping assistance. Recently, automotive black boxes are installed in cars for tracking accidents or theft. In this paper, a dual-mode ADAS SoC is proposed to support both high-performance ADAS functionality in driving-mode (d-mode) and an ultra-low-power black box in parking-mode (p-mode). By operating in p-mode, surveillance recording can be triggered intelligently with the help of our intention-prediction engine (IPE), instead of always-on recording to extend battery life and prevent discharge.

international conference of the ieee engineering in medicine and biology society | 2012

Wearable mental-health monitoring platform with independent component analysis and nonlinear chaotic analysis

Taehwan Roh; Kyeongryeol Bong; Sunjoo Hong; Hyunwoo Cho; Hoi-Jun Yoo

The wearable mental-health monitoring platform is proposed for mobile mental healthcare system. The platform is headband type of 50g and consumes 1.1mW. For the mental health monitoring two specific functions (independent component analysis (ICA) and nonlinear chaotic analysis (NCA)) are implemented into CMOS integrated circuits. ICA extracts heart rate variability (HRV) from EEG, and then NCA extracts the largest lyapunov exponent (LLE) as physiological marker to identify mental stress and states. The extracted HRV is only 1.84% different from the HRV obtained by simple ECG measurement system. With the help of EEG signals, the proposed headband mental monitoring system shows 90% confidence level in stress test, which is better than the test results of only HRV.

international solid-state circuits conference | 2015

18.1 A 2.71nJ/pixel 3D-stacked gaze-activated object-recognition system for low-power mobile HMD applications

Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Seong-Wook Park; Kyuho Jason Lee; Youchang Kim; Hoi-Jun Yoo

Smart eyeglasses or head-mounted displays (HMDs) have been gaining traction as next-generation mainstream wearable devices. However, previous HMD systems [1] have had limited application, primarily due to their lacking a smart user interface (Ul) and user experience (UX). Since HMD systems have a small compact wearable platform, their Ul requires new modalities, rather than a computer mouse or a 2D touch panel. Recent speech-recognition-based Uls require voice input to reveal the users intention to not only HMD users but also others, which raises privacy concerns in a public space. In addition, prior works [2-3] attempted to support object recognition (OR) or augmented reality (AR) in smart eyeglasses, but consumed considerable power, >381mW, resulting in <;6 hours operation time with a 2100mWh battery.

IEEE Transactions on Biomedical Circuits and Systems | 2015

An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications

Seong-Wook Park; Jun-Young Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

international solid-state circuits conference | 2017

14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector

Kyeongryeol Bong; Sungpill Choi; Chang-Hyeon Kim; Sang Hoon Kang; Youchang Kim; Hoi-Jun Yoo

Recently, face recognition (FR) based on always-on CIS has been investigated for the next-generation UI/UX of wearable devices. A FR system, shown in Fig. 14.6.1, was developed as a life-cycle analyzer or a personal black box, constantly recording the people we meet, along with time and place information. In addition, FR with always-on capability can be used for user authentication for secure access to his or her smart phone and other personal systems. Since wearable devices have a limited battery capacity for a small form factor, extremely low power consumption is required, while maintaining high recognition accuracy. Previously, a 23mW FR accelerator [1] was proposed, but its accuracy was low due to its hand-crafted feature-based algorithm. Deep learning using a convolutional neural network (CNN) is essential to achieve high accuracy and to enhance device intelligence. However, previous CNN processors (CNNP) [2–3] consume too much power, resulting in <10 hours operation time with a 190mAh coin battery.

IEEE Journal of Solid-state Circuits | 2016

A 2.71 nJ/Pixel Gaze-Activated Object Recognition System for Low-Power Mobile Smart Glasses

Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Seong-Wook Park; Kyuho Jason Lee; Youchang Kim; Hoi-Jun Yoo

A low-power object recognition (OR) system with intuitive gaze user interface (UI) is proposed for battery-powered smart glasses. For low-power gaze UI, we propose a low-power single-chip gaze estimation sensor, called gaze image sensor (GIS). In GIS, a novel column-parallel pupil edge detection circuit (PEDC) with new pupil edge detection algorithm XY pupil detection (XY-PD) is proposed which results in 2.9× power reduction with 16× larger resolution compared to previous work. Also, a logarithmic SIMD processor is proposed for robust pupil center estimation, (1 pixel error, with low-power floating-point implementation. For OR, low-power multicore OR processor (ORP) is implemented. In ORP, task-level pipeline with keypoint-level scoring is proposed to reduce the number of cores as well as the operating frequency of keypoint-matching processor (KMP) for low-power consumption. Also, dual-mode convolutional neural network processor (CNNP) is designed for fast tile selection without external memory accesses. In addition, a pipelined descriptor generation processor (DGP) with LUT-based nonlinear operation is newly proposed for low-power OR. Lastly, dynamic voltage and frequency scaling (DVFS) for dynamic power reduction in ORP is applied. Combining both of the GIS and ORP fabricated in 65 nm CMOS logic process, only 75 mW average power consumption is achieved with real-time OR performance, which is 1.2× and 4.4× lower power than the previously published work.

international symposium on circuits and systems | 2014

An 1.61mW mixed-signal column processor for BRISK feature extraction in CMOS image sensor

Kyeongryeol Bong; Gyeonghoon Kim; Injoon Hong; Hoi-Jun Yoo

In mobile object recognition (OR) applications, the power consumption of image sensor and data communication between image sensor and digital OR processor becomes crucial as digital OR processor consumes less power in deep sub-micron process. To reduce the amount of data transaction from image sensor to digital OR processor, digital/analog mixed-signal focal-plane processing of Binary Robust Invariant Scalable Keypoints (BRISK) feature extraction in CMOS image sensor (CIS) is proposed. The proposed CIS processor sends BRISK feature vectors instead of the whole image pixel data, resulting in 79% reduction of data communication. In this work, mixed-signal processing of corner detection and successive approximation register (SAR)-based scoring are implemented for BRISK feature point detection. To achieve scale-invariance in object recognition, scale-space is generated and stored in analog line memory. In addition, noise reduction scheme is integrated in column processing chain to remove salt and pepper noise, which degrades recognition accuracy. In a post layout simulation, the proposed system achieves 0.70pW/pixel*frame*feature at 30fps in a 130nm CMOS technology, which is 13.6% lower than the state-of-the-art.

Explore More