Jinmook Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jinmook Lee is active.

Explore More

Publication

Featured researches published by Jinmook Lee.

international solid-state circuits conference | 2015

4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

Seong-Wook Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a -100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a -100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of smart portable devices, such as smartphones and tablets, results in many applications which need big-data processing with machine learning, such as tagging private photos in personal devices. A high-performance and energy-efficient DL/DI (deep inference) processor is required to realize user-centric pattern recognition in portable devices.

international solid-state circuits conference | 2017

14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks

Dongjoo Shin; Jinmook Lee; Jinsu Lee; Hoi-Jun Yoo

Recently, deep learning with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) has become universal in all-around applications. CNNs are used to support vision recognition and processing, and RNNs are able to recognize time varying entities and to support generative models. Also, combining both CNNs and RNNs can recognize time varying visual entities, such as action and gesture, and to support image captioning [1]. However, the computational requirements in CNNs are quite different from those of RNNs. Fig. 14.2.1 shows a computation and weight-size analysis of convolution layers (CLs), fully-connected layers (FCLs) and RNN-LSTM layers (RLs). While CLs require a massive amount of computation with a relatively small number of filter weights, FCLs and RLs require a relatively small amount of computation with a huge number of filter weights. Therefore, when FCLs and RLs are accelerated with SoCs specialized for CLs, they suffer from high memory transaction costs, low PE utilization, and a mismatch of the computational patterns. Conversely, when CLs are accelerated with FCL- and RL-dedicated SoCs, they cannot exploit reusability and achieve required throughput. So far, works have considered acceleration of CLs, such as [2–4], or FCLs and RLs like [5]. However, there has been no work on a combined CNN-RNN processor. In addition, a highly reconfigurable CNN-RNN processor with high energy-efficiency is desirable to support general-purpose deep neural networks (DNNs).

international solid-state circuits conference | 2016

14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses

Seong-Wook Park; Sungpill Choi; Jinmook Lee; Minseo Kim; Jun-Young Park; Hoi-Jun Yoo

This paper presents a low-power natural UI/UX processor with an embedded deep-learning core (NINEX) to provide wearable AR for HMD users without calibration. A low-power and real-time natural UI/UX processor is fabricated using 65nm 8-metal CMOS technology, integrating 4.8M equivalent gates and 390KB SRAM for wearable AR. It consumes 126.1mW at 200 MHz, 1.2V. The NINEX handles the overall HMD UI/UX functionality (from pre-processing to graphics) and achieves 56.5% higher power efficiency vs. the latest HMD processor. It achieves 68.1% higher power efficiency and -2% higher gesture and speech recognition accuracy over a best-in-class pattern recognition processor.

IEEE Transactions on Biomedical Circuits and Systems | 2015

An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications

Seong-Wook Park; Jun-Young Park; Kyeongryeol Bong; Dongjoo Shin; Jinmook Lee; Sungpill Choi; Hoi-Jun Yoo

Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

international symposium on circuits and systems | 2017

A 0.53mW ultra-low-power 3D face frontalization processor for face recognition with human-level accuracy in wearable devices

Sang Hoon Kang; Jinmook Lee; Kyeongryeol Bong; Chang-Hyeon Kim; Hoi-Jun Yoo

An ultra-low-power face frontalization processor (FFP) is proposed for accurate face recognition in wearable devices. 3D face frontalization is essential in face recognition to guarantee human-level accuracy even with rotated or tilted faces. To reduce external memory access (EMA), which causes large power consumption, regression weight quantization with K-means clustering is proposed with the result of 81.25% EMA reduction. In addition, pipelined memory-level zero-skipping regression reduces the EMA by additional 98.43% without latency overhead. Moreover, for low-power consumption of accelerating heterogeneous workload, energy-efficient shared PE array architecture is proposed. While accelerating computation intensive process by allocating large number of PEs for utilizing data-level parallelism, unused PEs are clock-gated for preventing needless power consumption during computationally light process. Proposed workload adaptation with clock-gating showed 37.14% power reduction. The proposed FFP was implemented in 65nm CMOS process, and showed 0.53mW power consumption with 4.73fps throughput, both of which satisfy condition for always-on face recognition in wearable devices.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2017

An Energy-Efficient Speech-Extraction Processor for Robust User Speech Recognition in Mobile Head-Mounted Display Systems

Jinmook Lee; Seong-Wook Park; Injoon Hong; Hoi-Jun Yoo

An energy-efficient speech extraction (SE) processor is proposed for robust user speech recognition (SR) in head-mounted display (HMD) systems. User SE is essential for robust user SR in a noisy environment. For the low-latency SE, the FastSE algorithm is proposed to overcome the time-consuming constrained-independent-component-analysis-based user speech selection process, which results in < 2-ms SE latency. Moreover, a reinforced-FastSE scheme is proposed to achieve 97.2% accuracy with only 33-kB FastSE on-chip memory for the low-power HMD applications. Also, a reconfigurable matrix operation accelerator is implemented for the energy-efficient acceleration of the dominant matrix operation in SE. As a result, the proposed SE processor achieves 1.3× higher speed with 4.24× smaller memory compared to the state-of-the-art work, so SR in a noisy environment becomes possible for mobile HMD applications.

2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) | 2017

An energy-efficient deep learning processor with heterogeneous multi-core architecture for convolutional neural networks and recurrent neural networks

Dongjoo Shin; Jinmook Lee; Jinsu Lee; Juhyoung Lee; Hoi-Jun Yoo

An energy-efficient deep learning processor is proposed for convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in mobile platforms. The 16mm2 chip is fabricated using 65nm technology with 3 key features, 1) Reconfigurable heterogeneous architecture to support both CNNs and RNNs, 2) LUT-based reconfigurable multiplier optimized for dynamic fixed-point with the on-line adaptation, 3) Quantization table-based matrix multiplication to reduce off-chip memory access and remove duplicated multiplications. As a result, compared to the [2] and [3], this work shows 20× and 4.5× higher energy efficiency, respectively. Also, DNPU shows 6.5× higher energy efficiency compared to the [5].

european solid state circuits conference | 2016

An 8.3mW 1.6Msamples/s multi-modal event-driven speech enhancement processor for robust speech recognition in smart glasses

Jinmook Lee; Seong-Wook Park; Injoon Hong; Hoi-Jun Yoo

A low-power and high-speed speech enhancement processor for speech enhancement of noisy inputs is proposed to realize the robust speech recognition in smart glasses. It has 3 key schemes: multi-modal speech selection, look-up table based non-linear approximation circuits, and speech detection controlled dynamic clock gating. The multi-modal speech selection scheme uses three parameters to enhance the limited accuracy of the previous uni-modal user speech selection up to 98.1%. The non-linear function approximation circuit accelerates the throughput of the speech enhancement by 10.7×. The speech detection controlled clock gating reduces the redundant power consumption by 51% when there is no user voice. The proposed speech enhancement processor achieves 1.6Msamples/s throughput and 8.3mW average power consumption with the 98.1% true positive rate of speech selection in 65nm CMOS process.

international symposium on circuits and systems | 2015

A 3.13nJ/sample energy-efficient speech extraction processor for robust speech recognition in mobile head-mounted display systems

Jinmook Lee; Seong-Wook Park; Injoon Hong; Hoi-Jun Yoo

An energy-efficient speech extraction (SE) processor is proposed for the robust speech recognition in the head-mounted display (HMD) systems. Speech extraction is essential for robust speech recognition in noisy environment. For the low-latency speech extraction, FastSE is proposed to overcome 50x larger complex cICA-based selection process which results in <;2ms SE latency. Moreover, a reinforced-FastSE (RFSE) scheme is proposed to achieve 97.2% accuracy with small on-chip memory size of only 33kB for the low-power HMD applications. Also, Reconfigurable matrix operation accelerator (RMAT) is implemented for energy-efficient acceleration of dominant matrix operation on SE. As a result, the proposed SE processor achieves 1.3x lower latency with 4.24x smaller memory compared to the state-of-the-art work, so that speech recognition in noisy environment becomes possible for mobile HMD applications.

international solid-state circuits conference | 2015