Sungpill Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sungpill Choi is active.

Explore More

Publication

Featured researches published by Sungpill Choi.

international solid-state circuits conference | 2015

4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

Seong-Wook Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a -100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a -100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of smart portable devices, such as smartphones and tablets, results in many applications which need big-data processing with machine learning, such as tagging private photos in personal devices. A high-performance and energy-efficient DL/DI (deep inference) processor is required to realize user-centric pattern recognition in portable devices.

international solid-state circuits conference | 2014

10.4 A 1.22TOPS and 1.52mW/MHz augmented reality multi-core processor with neural network NoC for HMD applications

Gyeonghoon Kim; Youchang Kim; Kyuho Jason Lee; Seong-Wook Park; Injoon Hong; Kyeongryeol Bong; Dongjoo Shin; Sungpill Choi; Jinwook Oh; Hoi-Jun Yoo

Augmented reality (AR) is being investigated in advanced displays for the augmentation of images in a real-world environment. Wearable systems, such as head-mounted display (HMD) systems, have attempted to support real-time AR as a next generation UI/UX [1-2], but have failed, due to their limited computing power. In a prior work, a chip with limited AR functionality was reported that could perform AR with the help of markers placed in the environment (usually 1D or 2D bar codes) [3]. However, for a seamless visual experience, 3D objects should be rendered directly on the natural video image without any markers. Unlike marker-based AR, markerless AR requires natural feature extraction, general object recognition, 3D reconstruction, and camera-pose estimation to be performed in parallel. For instance, markerless AR for a VGA input-test video consumes ~1.3W power at 0.2fps throughput, with TIs OMAP4430, which exceeds power limits for wearable devices. Consequently, there is a need for a high-performance energy-efficient markerless AR processor to realize a real-time AR system, especially for HMD applications.

international solid-state circuits conference | 2016

14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses

Seong-Wook Park; Sungpill Choi; Jinmook Lee; Minseo Kim; Jun-Young Park; Hoi-Jun Yoo

This paper presents a low-power natural UI/UX processor with an embedded deep-learning core (NINEX) to provide wearable AR for HMD users without calibration. A low-power and real-time natural UI/UX processor is fabricated using 65nm 8-metal CMOS technology, integrating 4.8M equivalent gates and 390KB SRAM for wearable AR. It consumes 126.1mW at 200 MHz, 1.2V. The NINEX handles the overall HMD UI/UX functionality (from pre-processing to graphics) and achieves 56.5% higher power efficiency vs. the latest HMD processor. It achieves 68.1% higher power efficiency and -2% higher gesture and speech recognition accuracy over a best-in-class pattern recognition processor.

IEEE Transactions on Biomedical Circuits and Systems | 2015

An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications

Seong-Wook Park; Jun-Young Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

international solid-state circuits conference | 2017

14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector

Kyeongryeol Bong; Sungpill Choi; Chang-Hyeon Kim; Sang Hoon Kang; Youchang Kim; Hoi-Jun Yoo

Recently, face recognition (FR) based on always-on CIS has been investigated for the next-generation UI/UX of wearable devices. A FR system, shown in Fig. 14.6.1, was developed as a life-cycle analyzer or a personal black box, constantly recording the people we meet, along with time and place information. In addition, FR with always-on capability can be used for user authentication for secure access to his or her smart phone and other personal systems. Since wearable devices have a limited battery capacity for a small form factor, extremely low power consumption is required, while maintaining high recognition accuracy. Previously, a 23mW FR accelerator [1] was proposed, but its accuracy was low due to its hand-crafted feature-based algorithm. Deep learning using a convolutional neural network (CNN) is essential to achieve high accuracy and to enhance device intelligence. However, previous CNN processors (CNNP) [2–3] consume too much power, resulting in <10 hours operation time with a 190mAh coin battery.

IEEE Transactions on Circuits and Systems | 2016

A CMOS Image Sensor-Based Stereo Matching Accelerator With Focal-Plane Sparse Rectification and Analog Census Transform

Chang-Hyeon Kim; Kyeongryeol Bong; Sungpill Choi; Kyuho Jason Lee; Hoi-Jun Yoo

A low-latency and low-power stereo matching accelerator is monolithically integrated with a CMOS image sensor (CIS) for mobile applications. To reduce the overall latency, focal-plane processing is adopted by using the proposed analog census transform circuit (ACTC), and the image readout is pipelined with the following stereo matching process. In addition, a novel focal-plane rectification pixel array (FRPA) merges the rectification with the image readout without any additional processing latency. For area-efficient pixel design, sparse rectification is proposed, and the image rectification is implemented with only two additional switches in each pixel. A stereo matching digital processor (SMDP) is integrated with the CIS for cost aggregation. We present the full design including the layout with a 65 nm CMOS process, and the FRPA, the ACTC, and the SMDP achieve 11.0 ms latency with complete stereo matching stages, which is suitable for a smooth user interface. As a result, the 2-chip stereo matching system dissipates 573.9 μJ/frame and achieves 17% energy reduction compared to a previous stereo matching SoC.

international conference on industrial technology | 2015

K-glass: Real-time markerless augmented reality smart glasses platform

Gyeonghoon Kim; Sungpill Choi; Hoi-Jun Yoo

Recently, augmented reality (AR) has been actively studied with the proliferation of advanced displays such as a head mounted display (HMD). Nevertheless, it was almost impossible for the conventional HMDs to process the complex AR algorithm in real time, because it requires highly accurate registration and positioning of virtual objects in the real environment. In this paper, we present the worlds first markerless AR HMD system equipped with a high-throughput AR processor. Thanks to the dedicated AR processor, it can recognize general objects without any help of markers and give 3D information to the user in real time (30fps). Also, the processor adopts dynamic voltage and frequency scaling as a power managing scheme for low power consumption. As a result, it sustains one-day long real-time AR operation consuming 381mW on average when it performs the full chain of AR processing including object recognition, camera pose estimation, and 3D graphics rendering on a HD (720p) input video stream.

european solid state circuits conference | 2017

An ultra-low-power and mixed-mode event-driven face detection SoC for always-on mobile applications

Chang-Hyeon Kim; Kyeongryeol Bong; Injoon Hong; Kyuho Jason Lee; Sungpill Choi; Hoi-Jun Yoo

A new face detection SoC integrating CIS array with low-power face detector on a single chip in analog-digital mixed-mode is proposed for ultra-low-power mobile device applications such as always-on user authentication. The proposed event-driven mixed-mode face detection SoC performs Viola-Jones face detection with not only analog face detection circuits but also digital vision processor. The analog face detection circuits enable 85% of workload to be skipped before A/D conversion and digital face detection processing, resulting in 39% power reduction. Implemented in 65nm CMOS technology, 11.09 mm2 chip with 2.5V for CIS and 0.8V for digital vision processor consumes 24μW and 96μW at 1fps with non-face images and face images, respectively.

IEEE Micro | 2017

Low-Power Convolutional Neural Network Processor for a Face-Recognition System

Kyeongryeol Bong; Sungpill Choi; Chang-Hyeon Kim; Hoi-Jun Yoo

The authors propose a low-power convolutional neural network (CNN)-based face recognition system for user authentication in smart devices. The system comprises two chips: an always-on functional CMOS image sensor (CIS) for imaging and face detection (FD) and a low-power CNN processor (CNNP) for face verification (FV). A functional CIS integrated with an FD accelerator enables event-driven chip-to-chip communication for face images only when there is a face. To achieve low power consumption in FD while maintaining the memory size required for the FD processing not to exceed the on-chip memory size, the authors present two-stage FD using an analog FD unit and a digital FD unit. For the event-driven FV, the CNNP adopts dynamic voltage and frequency scaling to minimize the power consumption when the number of faces in input images changes dynamically. In addition, tensor decomposition is used to reduce a CNN’s workload, and the CNNP architecture based on transpose-read SRAM (T-SRAM) allows low power consumption by reducing the local memory access. Implemented in 65-nm CMOS technology, the 3.30 3.36 mmsup2/sup functional CIS and the 4 4 mmsup2/sup CNNP consume 0.62 mW to evaluate one face at 1 frame per second and achieve 97 percent accuracy in the LFW dataset.

international symposium on circuits and systems | 2016

A 43.7 mW 94 fps CMOS image sensor-based stereo matching accelerator with focal-plane rectification and analog census transformation

Chang-Hyeon Kim; Kyeongryeol Bong; Sungpill Choi; Hoi-Jun Yoo

The depth information is actively utilized for many applications such as mobile gesture user interface (UI). However, the previous stereo vision systems are unsuitable for the mobile gesture UI due to the long latency and the high-power consumption of external image sensor in embedded environments. In this paper, we propose a CMOS image sensor-based real-time stereo matching accelerator with low power consumption. For real-time operation, the focal-plane rectification is proposed to perform the image readout, the rectification, and the matching cost generation at the same time. Also, a low-power analog census transformation is implemented by simple comparator circuits. The proposed stereo matching CIS, implemented in 65nm CMOS technology, consumes 43.7 mW at 94.1 fps frame rate. It achieves 5.30×103 MDE/J energy efficiency.

Explore More