Is this you? Create Your Porfile

Seyong Lee

Oak Ridge National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seyong Lee is active.

Explore More

Publication

Featured researches published by Seyong Lee.

ieee international conference on high performance computing data and analytics | 2012

Early evaluation of directive-based GPU programming models for productive exascale computing

Seyong Lee; Jeffrey S. Vetter

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. These directive-based models provide different levels of abstraction, and required different levels of programming effort to port and optimize applications. Understanding these differences among these new models provides valuable insights on their applicability and performance potential. In this paper, we evaluate existing directive-based models by porting thirteen application kernels from various scientific domains to use CUDA GPUs, which, in turn, allows us to identify important issues in the functionality, scalability, tunability, and debuggability of the existing models. Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.

high performance distributed computing | 2014

OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing

Seyong Lee; Jeffrey S. Vetter

This paper presents Open Accelerator Research Compiler (OpenARC): an open-source framework that supports the full feature set of OpenACC V1.0 and performs source-to-source transformations, targeting heterogeneous devices, such as NVIDIA GPUs. Combined with its high-level, extensible Intermediate Representation (IR) and rich semantic annotations, OpenARC serves as a powerful research vehicle for prototyping optimization, source-to-source transformations, and instrumentation for debugging, performance analysis, and autotuning. In fact, OpenARC is equipped with various capabilities for advanced analyses and transformations, as well as built-in performance and debugging tools. We explain the overall design and implementation of OpenARC, and we present key analysis techniques necessary to efficiently port OpenACC applications. Porting various OpenACC applications to CUDA GPUs using OpenARC demonstrates that OpenARC performs similarly to a commercial compiler, while serving as a general research framework.

computing frontiers | 2012

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

Kyle Spafford; Jeremy S. Meredith; Seyong Lee; Dong Li; Philip C. Roth; Jeffrey S. Vetter

With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these accelerators. Traditionally, GPUs have connected to the CPU via the PCIe bus, which has proved to be a significant bottleneck for scalable scientific applications. Now, a trend toward tighter integration between CPU and GPU has removed this bottleneck and unified the memory hierarchy for both CPU and GPU cores. We examine the impact of this trend for high performance scientific computing by investigating AMDs new Fusion Accelerated Processing Unit (APU) as a testbed. In particular, we evaluate the tradeoffs in performance, power consumption, and programmability when comparing this unified memory hierarchy with similar, but discrete GPUs.

international conference on supercomputing | 2015

COMPASS: A Framework for Automated Performance Modeling and Prediction

Seyong Lee; Jeremy S. Meredith; Jeffrey S. Vetter

Flexible, accurate performance predictions offer numerous benefits such as gaining insight into and optimizing applications and architectures. However, the development and evaluation of such performance predictions has been a major research challenge, due to the architectural complexities. To address this challenge, we have designed and implemented a prototype system, named COMPASS, for automated performance model generation and prediction. COMPASS generates a structured performance model from the target applications source code using automated static analysis, and then, it evaluates this model using various performance prediction techniques. As we demonstrate on several applications, the results of these predictions can be used for a variety of purposes, such as design space exploration, identifying performance tradeoffs for applications, and understanding sensitivities of important parameters. COMPASS can generate these predictions across several types of applications from traditional, sequential CPU applications to GPU-based, heterogeneous, parallel applications. Our empirical evaluation demonstrates a maximum overhead of 4%, flexibility to generate models for 9 applications, speed, ease of creation, and very low relative errors across a diverse set of architectures.

Proceedings of the First Workshop on Accelerator Programming using Directives | 2014

OpenARC: extensible OpenACC compiler framework for directive-based accelerator programming study

Seyong Lee; Jeffrey S. Vetter

Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research framework to address those issues in the directive-based accelerator programming. This paper explains important design strategies and key compiler transformation techniques needed to implement the reference OpenACC compiler. Moreover, this paper demonstrates the efficacy of OpenARC as a research framework for directive-based programming study, by proposing and implementing OpenACC extensions in the OpenARC framework to 1) support hybrid programming of the unified memory and separate memory and 2) exploit architecture-specific features in an abstract manner. Porting thirteen standard OpenACC programs and three extended OpenACC programs to CUDA GPUs shows that OpenARC performs similarly to a commercial OpenACC compiler, while it serves as a high-level research framework.

languages and compilers for parallel computing | 2014

Evaluating Performance Portability of OpenACC

Amit Sabne; Putt Sakdhnagool; Seyong Lee; Jeffrey S. Vetter

Accelerator-based heterogeneous computing is gaining momentum in High Performance Computing arena. However, the increased complexity of the accelerator architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle the problem. While the abstraction endowed by OpenACC offers productivity, it raises questions on its portability. This paper evaluates the performance portability obtained by OpenACC on twelve OpenACC programs on NVIDIA CUDA, AMD GCN, and Intel MIC architectures. We study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2013

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications

Jeffrey S. Vetter; Seyong Lee; Dong Li; Gabriel Marin; Collin McCurdy; Jeremy S. Meredith; Philip C. Roth; Kyle Spafford

As detailed in recent reports, HPC architectures will continue to change over the next decade in an effort to improve energy efficiency, reliability, and performance. At this time of significant disruption, it is critically important to understand specific application requirements, so that these architectural changes can include features that satisfy the requirements of contemporary extreme-scale scientific applications. To address this need, we have developed a methodology supported by a toolkit that allows us to investigate detailed computation, memory, and communication behaviors of applications at varying levels of resolution. Using this methodology, we performed a broad-based, detailed characterization of 12 contemporary scalable scientific applications and benchmarks. Our analysis reveals numerous behaviors that sometimes contradict conventional wisdom about scientific applications. For example, the results reveal that only one of our applications executes more floating-point instructions than other types of instructions. In another example, we found that communication topologies are very regular, even for applications that, at first glance, should be highly irregular. These observations emphasize the necessity of measurement-driven analysis of real applications, and help prioritize features that should be included in future architectures.

international parallel and distributed processing symposium | 2016

OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing

Seyong Lee; Jungwon Kim; Jeffrey S. Vetter

This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler, it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high-level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, this design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.

high performance distributed computing | 2016

NVL-C: Static Analysis Techniques for Efficient, Correct Programming of Non-Volatile Main Memory Systems

Joel E. Denny; Seyong Lee; Jeffrey S. Vetter

Computer architecture experts expect that non-volatile memory (NVM) hierarchies will play a more significant role in future systems including mobile, enterprise, and HPC architectures. With this expectation in mind, we present NVL-C: a novel programming system that facilitates the efficient and correct programming of NVM main memory systems. The NVL-C programming abstraction extends C with a small set of intuitive language features that target NVM main memory, and can be combined directly with traditional C memory model features for DRAM. We have designed these new features to enable compiler analyses and run-time checks that can improve performance and guard against a number of subtle programming errors, which, when left uncorrected, can corrupt NVM-stored data. Moreover, to enable recovery of data across application or system failures, these NVL-C features include a flexible directive for specifying NVM transactions. So that our implementation might be extended to other compiler front ends and languages, the majority of our compiler analyses are implemented in an extended version of LLVMs intermediate representation (LLVM IR). We evaluate NVL-C on a number of applications to show its flexibility, performance, and correctness.

international parallel and distributed processing symposium | 2014

Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing

Seyong Lee; Dong Li; Jeffrey S. Vetter

Directive-based GPU programming models are gaining momentum, since they transparently relieve programmers from dealing with complexity of low-level GPU programming, which often reflects the underlying architecture. However, too much abstraction in directive models puts a significant burden on programmers for debugging applications and tuning performance. In this paper, we propose a directive-based, interactive program debugging and optimization system. This system enables intuitive and synergistic interaction among programmers, compilers, and runtimes for more productive and efficient GPU computing. We have designed and implemented a series of prototype tools within our new open source compiler framework, called Open Accelerator Research Compiler (Open ARC), Open ARC supports the full feature set of Opencast V1.0. Our evaluation on twelve Open ACC benchmarks demonstrates that our prototype debugging and optimization system can detect a variety of translation errors. Additionally, the optimization provided by our prototype minimizes memory transfers, when compared to a fully manual memory management scheme.

Explore More