Sangwon Seo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sangwon Seo is active.

Explore More

Publication

Featured researches published by Sangwon Seo.

international symposium on microarchitecture | 2008

From SODA to scotch: The evolution of a wireless baseband processor

Mark Woh; Yuan Lin; Sangwon Seo; Scott A. Mahlke; Trevor N. Mudge; Chaitali Chakrabarti; Richard Edward Bruce; Danny Kershaw; Alastair Reid; Mladen Wilder; Krisztian Flautner

With the multitude of existing and upcoming wireless standards, it is becoming increasingly difficult for hardware-only baseband processing solutions to adapt to the rapidly changing wireless communication landscape. Software defined radio (SDR) promises to deliver a cost effective and flexible solution by implementing a wide variety of wireless protocols in software. In previous work, a fully programmable multicore architecture, SODA, was proposed that was able to meet the real-time requirements of 3G wireless protocols. SODA consists of one ARM control processor and four wide single instruction multiple data (SIMD) processing elements. Each processing element consists of a scalar and a wide 512-bit 32-lane SIMD datapath. A commercial prototype based on the SODA architecture, Ardbeg (named after a brand of Scotch whisky), has been developed. In this paper, we present the architectural evolution of going from a research design to a commercial prototype, including the goals, tradeoffs, and final design choices. Ardbegpsilas redesign process can be grouped into the following three major areas: optimizing the wide SIMD datapath, providing long instruction word (LIW) support for SIMD operations, and adding application-specific hardware accelerators. Because SODA was originally designed with 180 nm technology, the wide SIMD datapath is re-optimized in Ardbeg for 90 nm technology. This includes re-evaluating the most efficient SIMD width, designing a wider SIMD shuffle network, and implementing faster SIMD arithmetic units. Ardbeg also provides modest LIW support by allowing two SIMD operations to issue in the same cycle. This LIW execution supports SDR algorithmspsila most common parallel SIMD execution patterns with minimal hardware overhead. A viable commercial SDR solution must be competitive with existing ASIC solutions. Therefore, algorithm-specific hardware is added for performance bottleneck algorithms while still maintaining enough flexibility to support multiple wireless protocols. The combination of these architectural improvements allows Ardbeg to achieve 1.5-7x speedup over SODA across multiple wireless algorithms while consuming less power.

international solid-state circuits conference | 2012

Centip3De: A 3930DMIPS/W configurable near-threshold 3D stacked system with 64 ARM Cortex-M3 cores

David Fick; Ronald G. Dreslinski; Bharan Giridhar; Gyouho Kim; Sangwon Seo; Matthew Fojtik; Sudhir Satpathy; Yoonmyung Lee; Daeyeon Kim; Nurrachman Liu; Michael Wieckowski; Gregory K. Chen; Trevor N. Mudge; Dennis Sylvester; David T. Blaauw

Recent high performance IC design has been dominated by power density constraints. 3D integration increases device density even further, and these devices will not be usable without viable strategies to reduce power consumption. This paper proposes the use of near-threshold computing (NTC) to address this issue in a stacked 3D system. In NTC, cores are operated near the threshold voltage (~200mV above Vth) to optimally balance power and performance [1]. In Centip3De, we operate cores at 650mV, as opposed to the wear-out limited supply voltage of 1.5V. This improves measured energy efficiency by 5.1×. The dramatically lower power consumption of NTC makes it an attractive match for 3D design, which has limited power dissipation capabilities, but also has improved innate power and performance compared to 2D design.

international conference on embedded computer systems architectures modeling and simulation | 2007

The next generation challenge for software defined radio

Mark Woh; Sangwon Seo; Hyunseok Lee; Yuan Lin; Scott A. Mahlke; Trevor N. Mudge; Chaitali Chakrabarti; Krisztian Flautner

Wireless communication for mobile terminals has been a high performance computing challenge. It requires almost super computer performance while consuming very little power. This requirement is being made even more challenging with the move to Fourth Generation (4G) wireless communication. It is projected that by 2010, 4G will be available with data rates from 100Mbps to 1Gbps. These data rates are orders of magnitude greater than current 3G technology and, consequently, will require orders of magnitude more computation power. Leading forerunners for this technology are protocols like 802.16e (mobile WiMAX) and 3GPP LTE. This paper presents an analysis of the major algorithms that comprise these 4G technologies and describes their computational characteristics. We identify the major bottlenecks that need to be overcome in order to meet the requirements of this new technology. In particular, we show that technology scaling alone of current Software Defined Radio architectures will not be able to meet these requirements. Finally, we will discuss techniques that may make it possible to meet the power/performance requirements without giving up programmability.

design automation conference | 2012

Process variation in near-threshold wide SIMD architectures

Sangwon Seo; Ronald G. Dreslinski; Mark Woh; Yongjun Park; Chaitali Charkrabari; Scott A. Mahlke; David T. Blaauw; Trevor N. Mudge

Near-threshold operation has emerged as a competitive approach for energy-efficient architecture design. In particular, a combination of near-threshold circuit techniques and parallel SIMD computations achieves excellent energy efficiency for easy-to-parallelize applications. However, near-threshold operations suffer from delay variations due to increased process variability. This is exacerbated in wide SIMD architectures where the number of critical paths are multiplied by the SIMD width. This paper provides a systematic in-depth study of delay variations in near-threshold operations and shows that simple techniques such as structural duplication and supply voltage/frequency margining are sufficient to mitigate the timing variation problems in wide SIMD architectures at the cost of marginal area and power overhead.

IEEE Journal of Solid-state Circuits | 2013

Centip3De: A Cluster-Based NTC Architecture With 64 ARM Cortex-M3 Cores in 3D Stacked 130 nm CMOS

We present Centip3De, a large-scale 3D CMP with a cluster-based near-threshold computing (NTC) architecture. Centip3De uses a 3D stacking technology in conjunction with 130 nm CMOS. Measured results for a two-layer, 64-core system are discussed, with the system achieving 3930 DMIPS/W energy efficiency, which is >; 3x improvement over traditional operation at full supply voltage. This project demonstrates the feasibility of large-scale 3D design, a synergy between 3D and NTC architectures, a unique cluster-based NTC cache design, and how to maximize performance in a thermally-constrained design.

signal processing systems | 2007

Design and Analysis of LDPC Decoders for Software Defined Radio

Sangwon Seo; Trevor N. Mudge; Yuming Zhu; Chaitali Chakrabarti

Low Density Parity Check (LDPC) codes are one of the most promising error correction codes that are being adopted by many wireless standards. This paper presents a case study for a scalable LDPC decoder supporting multiple code rates and multiple block sizes on a software defined radio (SDR) platform. Since technology scaling alone is not sufficient for current SDR architectures to meet the requirements of the next generation wireless standards, this paper presents three techniques to improve the throughput performance. The techniques are use of data path accelerators, addition of memory units and addition of a few assembly instructions. The proposed LDPC decoder implementation achieved 30.4 Mbps decoding throughput for the n=2304 and R=5/6 LDPC code outlined in the IEEE 802.16e standard.

IEEE Micro | 2013

Centip3De: A 64-Core, 3D Stacked Near-Threshold System

Ronald G. Dreslinski; David Fick; Bharan Giridhar; Gyouho Kim; Sangwon Seo; Matthew Fojtik; Sudhir Satpathy; Yoonmyung Lee; Daeyeon Kim; Nurrachman Liu; Michael Wieckowski; Gregory K. Chen; Dennis Sylvester; David T. Blaauw; Trevor N. Mudge

Centip3De uses the synergy between 3D integration and near-threshold computing to create a reconfigurable system that provides both energy-efficient operation and techniques to address single-thread performance bottlenecks. The original Centip3De design is a seven-layer 3D stacked design with 128 cores and 256 Mbytes of DRAM. Silicon results show a two-layer, 64-core system in 130-nm technology, which achieved an energy efficiency of 3,930 DMIPS/W.

international symposium on low power electronics and design | 2010

Diet SODA: a power-efficient processor for digital cameras

Sangwon Seo; Ronald G. Dreslinski; Mark Woh; Chaitali Chakrabarti; Scott A. Mahlke; Trevor N. Mudge

Power has become the most critical design constraint for embedded handheld devices. This paper proposes a power-efficient SIMD architecture, referred to as Diet SODA, for DSP applications. The key design idea is to apply near-threshold operation on a single instruction and multiple data (SIMD) architecture to significantly lower the power consumption. The major features of Diet SODA are very wide SIMD width, scatter/gather data prefetcher, and dual mode operation. A case study was performed on digital still camera (DSC) applications; the results show that Diet SODA achieves ∼130x better performance and ∼340x better energy efficiency than a DSP solution.

architectural support for programming languages and operating systems | 2012

SIMD defragmenter: efficient ILP realization on data-parallel architectures

Yongjun Park; Sangwon Seo; Hyunchul Park; Hyoun Kyu Cho; Scott A. Mahlke

Single-instruction multiple-data (SIMD) accelerators provide an energy-efficient platform to scale the performance of mobile systems while still retaining post-programmability. The central challenge is translating the parallel resources of the SIMD hardware into real application performance. In scientific applications, automatic vectorization techniques have proven quite effective at extracting large levels of data-level parallelism (DLP). However, vectorization is often much less effective for media applications due to low trip count loops, complex control flow, and non-uniform execution behavior. As a result, SIMD lanes remain idle due to insufficient DLP. To attack this problem, this paper proposes a new vectorization pass called SIMD Defragmenter to uncover hidden DLP that lurks below the surface in the form of instruction-level parallelism (ILP). The difficulty is managing the data packing/unpacking overhead that can easily exceed the benefits gained through SIMD execution. The SIMD degragmenter overcomes this problem by identifying groups of compatible instructions (subgraphs) that can be executed in parallel across the SIMD lanes. By SIMDizing in bulk at the subgraph level, packing/unpacking overhead is minimized. On a 16-lane SIMD processor, experimental results show that SIMD defragmentation achieves a mean 1.6x speedup over traditional loop vectorization and a 31% gain over prior research approaches for converting ILP to DLP.

compilers, architecture, and synthesis for embedded systems | 2010

Mighty-morphing power-SIMD

Ganesh S. Dasika; Mark Woh; Sangwon Seo; Nathan Clark; Trevor N. Mudge; Scott A. Mahlke

In modern wireless devices, two broad classes of compute-intensive applications are common: those with high amounts of data-level parallelism, such as signal processing used in wireless baseband applications, and those that have little data-level parallelism, such as encryption. Wide single-instruction multiple-data (SIMD) processors have become popular for providing high performance, yet power efficient data engines for applications with abundant data parallelism. However, the non-data-parallel applications are relegated to a low-performance scalar datapath on these data engines while the SIMD resources are left idle. To accelerate both types of applications, we propose the design of a more flexible SIMD datapath called SIMD-Morph. In SIMD-Morph, code with data-level parallelism can be executed across the lanes in the traditional manner, but the lanes can be morphed into a feed-forward subgraph accelerator to execute scalar applications more efficiently. The morphed SIMD lanes form an accelerator that exploits both instruction-level parallelism as well as operation chaining to improve the performance of scalar code by exploiting the available resources in the SIMD lanes. Experimental results show that the performance impact is a 2.6X improvement for purely non-SIMD applications and a 1.4X improvement for the non-SIMD-ized portions of applications with data parallelism.

Explore More