Sae Kyu Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sae Kyu Lee is active.

Explore More

Publication

Featured researches published by Sae Kyu Lee.

international symposium on computer architecture | 2016

Minerva: enabling low-power, highly-accurate deep neural network accelerators

Brandon Reagen; Paul N. Whatmough; Robert Adolf; Saketh Rama; Hyunkwang Lee; Sae Kyu Lee; José Miguel Hernández-Lobato; Gu-Yeon Wei; David M. Brooks

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend of accelerating their execution with specialized hardware. While published designs easily give an order of magnitude improvement over general-purpose hardware, few look beyond an initial implementation. This paper presents Minerva, a highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared to an established fixed-point accelerator baseline, we show that fine-grained, heterogeneous datatype optimization reduces power by 1.5×; aggressive, inline predication and pruning of small activity values further reduces power by 2.0×; and active hardware fault detection coupled with domain-aware error mitigation eliminates an additional 2.7× through lowering SRAM voltages. Across five datasets, these optimizations provide a collective average of 8.1× power reduction over an accelerator baseline without compromising DNN model accuracy. Minerva enables highly accurate, ultra-low power DNN accelerators (in the range of tens of milliwatts), making it feasible to deploy DNNs in power-constrained IoT and mobile devices.

international symposium on low power electronics and design | 2013

Characterizing and evaluating voltage noise in multi-core near-threshold processors

Xuan Zhang; Tao Tong; Svilen Kanev; Sae Kyu Lee; Gu-Yeon Wei; David M. Brooks

Lowering the supply voltage to improve energy efficiency leads to higher load current and elevated supply sensitivity. In this paper, we provide the first quantitative analysis of voltage noise in multi-core near-threshold processors in a future 10nm technology across SPEC CPU2006 benchmarks. Our results reveal larger guardband requirement and significant energy efficiency loss due to power delivery nonidealities at near threshold, and highlight the importance of accurate voltage noise characterization for design exploration of energy-centric computing systems using near-threshold cores.

symposium on vlsi circuits | 2015

A 16-core voltage-stacked system with an integrated switched-capacitor DC-DC converter

Sae Kyu Lee; Tao Tong; Xuan Zhang; David M. Brooks; Gu-Yeon Wei

A 16-core voltage-stacked IC integrated with a switched-capacitor DC-DC converter demonstrates efficient power delivery. To overcome inter-layer voltage noise issues, the test chip implements and evaluates the benefits of self-timed clocking and clock-phase interleaving. The integrated converter offers minimum voltage guarantees and further reduces voltage noise.

international solid-state circuits conference | 2017

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications

Paul N. Whatmough; Sae Kyu Lee; Hyunkwang Lee; Saketh Rama; David M. Brooks; Gu-Yeon Wei

Machine Learning (ML) techniques empower Internet of Things (IoT) devices with the capability to interpret the complex, noisy real-world data arising from sensor-rich systems. Achieving sufficient energy efficiency to execute ML workloads on an edge-device necessitates specialized hardware with efficient digital circuits. Razor systems allow excessive worst-case VDD guardbands to be minimized down to the point where timing violations start to occur. By tracking the non-zero timing violation rate, process/voltage/temperature/aging (PVTA) variations are dynamically compensated as they change over time. Resilience to timing violations is achieved using either explicit correction (e.g., replay [1]), or algorithmic tolerance [2]. ML algorithms offer remarkable inherent error tolerance and are a natural fit for Razor timing violation detection without the burden of explicit and guaranteed error correction. Prior ML accelerators have focused either on computer vision CNNs with high-power (e.g., [3] consumes 278mW) or spiking neural networks with low-accuracy (e.g. 84% on MNIST [4]). Programmable fully-connected (FC) deep-neural-network (DNN) accelerators offer flexible support for a range of general classification tasks with high accuracy [5]. However, because there is no parameter reuse in FC layers, both compute and memory resources must be optimized.

symposium on vlsi circuits | 2015

A multi-chip system optimized for insect-scale flapping-wing robots

Xuan Zhang; Mario Lok; Tao Tong; Simon Chaput; Sae Kyu Lee; Brandon Reagen; Hyunkwang Lee; David M. Brooks; Gu-Yeon Wei

We demonstrate a battery-powered multi-chip system optimized for insect-scale flapping wing robots that meets the tight weight limit and real-time performance demands of autonomous flight. Measured results show open-loop wing flapping driven by a power electronics unit and energy efficiency improvements via hardware acceleration.

IEEE Journal of Solid-state Circuits | 2016

A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications

Tao Tong; Sae Kyu Lee; Xuan Zhang; David M. Brooks; Gu-Yeon Wei

This work presents a fully integrated 4-to-1 DC-DC symmetric ladder switched-capacitor converter (SLSCC) for voltage stacking applications. The SLSCC absorbs inter-layer load power mismatch to provide minimum voltage guarantees for the internal rails of a multicore system that implements four-way voltage stacking. A new hybrid feedback control scheme reduces the voltage ripple across stacked voltage layers for high levels of current mismatch, a condition that exacerbates voltage noise in conventional SC converters. Furthermore, the proposed SLSCC dynamically allocates valuable flying capacitor resources according to different load conditions, which improves conversion efficiency and supports more power mismatch between the layers. Implemented in TSMCs 40G process, the SLSCC converts a 3.6 V input voltage down to four stacked output voltage layers, each nominally at 900 mV.

IEEE Transactions on Very Large Scale Integration Systems | 2017

A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC–DC Converter

Sae Kyu Lee; Tao Tong; Xuan Zhang; David M. Brooks; Gu-Yeon Wei

This paper presents a 16-core voltage-stacked system with adaptive frequency clocking (AFClk) and a fully integrated voltage regulator that demonstrates efficient on-chip power delivery for multicore systems. Voltage stacking alleviates power delivery inefficiencies due to off-chip parasitics but adds complexity to combat internal voltage noise. To address the corresponding issue of internal voltage noise, the system utilizes an AFClk scheme with an efficient switched-capacitor dc–dc converter to mitigate noise on the stack layers and to improve system performance and efficiency. Experimental results demonstrate robust voltage noise mitigation as well as the potential of voltage stacking as a highly efficient power delivery scheme. This paper also illustrates that augmenting the hardware techniques with intelligent workload allocation that exploits the inherent properties of voltage stacking can preemptively reduce the interlayer activity mismatch and improve system efficiency.

design automation conference | 2018

Ares: a framework for quantifying the resilience of deep neural networks

Brandon Reagen; Udit Gupta; Lillian Pentecost; Paul N. Whatmough; Sae Kyu Lee; Niamh Mulholland; David M. Brooks; Gu-Yeon Wei

As the use of deep neural networks continues to grow, so does the fraction of compute cycles devoted to their execution. This has led the CAD and architecture communities to devote considerable attention to building DNN hardware. Despite these efforts, the fault tolerance of DNNs has generally been overlooked. This paper is the first to conduct a large-scale, empirical study of DNN resilience. Motivated by the inherent algorithmic resilience of DNNs, we are interested in understanding the relationship between fault rate and model accuracy. To do so, we present Ares: a light-weight, DNN-specific fault injection framework validated within 12% of real hardware. We find that DNN fault tolerance varies by orders of magnitude with respect to model, layer type, and structure.

IEEE Journal of Solid-state Circuits | 2017

A Fully Integrated Battery-Powered System-on-Chip in 40-nm CMOS for Closed-Loop Control of Insect-Scale Pico-Aerial Vehicle

Xuan Zhang; Mario Lok; Tao Tong; Sae Kyu Lee; Brandon Reagen; Simon Chaput; Pierre-Emile J. Duhamel; Robert J. Wood; David M. Brooks; Gu-Yeon Wei

We demonstrate a fully integrated system-on-chip (SoC) optimized for insect-scale flapping-wing pico-aerial vehicles. The SoC is able to meet the stringent weight, power, and real-time performance demands of autonomous flight for a bee-sized robot. The entire integrated system with embedded voltage regulation, data conversion, clock generation, as well as both general-purpose and accelerated computing units, weighs less than 3 mg after die thinning. It is self-contained and can be powered directly off of a lithium battery. Measured results show open-loop wing flapping controlled by the SoC and improved energy efficiency through the use of hardware acceleration and supply resilience through the use of adaptive clocking.

international symposium on low power electronics and design | 2012