Jae-Seung Yeom | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jae-Seung Yeom is active.

Explore More

Publication

Featured researches published by Jae-Seung Yeom.

acm sigplan symposium on principles and practice of parallel programming | 2009

A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Scott Schneider; Jae-Seung Yeom; Benjamin Rose; John C. Linford; Adrian Sandu; Dimitrios S. Nikolopoulos

On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle this complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet specific enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The first programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address space divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.

IEEE Computer | 2009

Programming Multiprocessors with Explicitly Managed Memory Hierarchies

Scott Schneider; Jae-Seung Yeom; Dimitrios S. Nikolopoulos

A study of two applications programmed using three models of varying complexity reveals that implicit management of locality can produce code with performance comparable to code generated from explicit management of locality.

international parallel and distributed processing symposium | 2014

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters

Jae-Seung Yeom; Abhinav Bhatele; Keith R. Bisset; Eric J. Bohm; Abhishek Gupta; Laxmikant V. Kalé; Madhav V. Marathe; Dimitrios S. Nikolopoulos; Martin Schulz; Lukasz Wesolowski

Modeling dynamical systems represents an important application class covering a wide range of disciplines including but not limited to biology, chemistry, finance, national security, and health care. Such applications typically involve large-scale, irregular graph processing, which makes them difficult to scale due to the evolutionary nature of their workload, irregular communication and load imbalance. EpiSimdemics is such an application simulating epidemic diffusion in extremely large and realistic social contact networks. It implements a graph-based system that captures dynamics among co-evolving entities. This paper presents an implementation of EpiSimdemics in Charm++ that enables future research by social, biological and computational scientists at unprecedented data and system scales. We present new methods for application-specific processing of graph data and demonstrate the effectiveness of these methods on a Cray XE6, specifically NCSAs Blue Waters system.

international parallel and distributed processing symposium | 2012

Simulating the Spread of Infectious Disease over Large Realistic Social Networks Using Charm

Keith R. Bisset; Ashwin M. Aji; Eric J. Bohm; Laxmikant V. Kalé; Tariq Kamal; Madhav V. Marathe; Jae-Seung Yeom

Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. EpiSimdemics is an implementation of a scalable parallel algorithm to simulate the spread of contagion, including disease, fear and information, in large (108 individuals), realistic social contact networks using individual-based models. It also has a rich language for describing public policy and agent behavior. We describe CharmSimdemics and evaluate its performance on national scale populations. Charm++ is a machine independent parallel programming system, providing high-level mechanisms and strategies to facilitate the task of developing highly complex parallel applications. Our design includes mapping of application entities to tasks, leveraging the efficient and scalable communication, synchronization and load balancing strategies of Charm++. Our experimental results on a 768 core system show that the Charm++ version achieves up to a 4-fold increase in performance when compared to the MPI version.

international parallel and distributed processing symposium | 2012

High-Performance Interaction-Based Simulation of Gut Immunopathologies with ENteric Immunity Simulator (ENISI)

Keith R. Bisset; Md. Maksudul Alam; Josep Bassaganya-Riera; Adria Carbo; Stephen Eubank; Raquel Hontecillas; Stefan Hoops; Yongguo Mei; Katherine Wendelsdorf; Dawen Xie; Jae-Seung Yeom; Madhav V. Marathe

Here we present the ENteric Immunity Simulator (ENISI), a modeling system for the inflammatory and regulatory immune pathways triggered by microbe-immune cell interactions in the gut. With ENISI, immunologists and infectious disease experts can test and generate hypotheses for enteric disease pathology and propose interventions through experimental infection of an in silico gut. ENISI is an agent based simulator, in which individual cells move through the simulated tissues, and engage in context-dependent interactions with the other cells with which they are in contact. The scale of ENISI is unprecedented in this domain, with the ability to simulate 107 cells for 250 simulated days on 576 cores in one and a half hours, with the potential to scale to even larger hardware and problem sizes. In this paper we describe the ENISI simulator for modeling mucosal immune responses to gastrointestinal pathogens. We then demonstrate the utility of ENISI by recreating an experimental infection of a mouse with Helicobacter pylori 26695. The results identify specific processes by which bacterial virulence factors do and do not contribute to pathogenesis associated with H. pylori strain 26695. These modeling results inform general intervention strategies by indicating immunomodulatory mechanisms such as those used in inflammatory bowel disease may be more appropriate therapeutically than directly targeting specific microbial populations through vaccination or by using antimicrobials.

cluster computing and the grid | 2008

Scheduling Asymmetric Parallelism on a PlayStation3 Cluster

Filip Blagojevic; Matthew Curtis-Maury; Jae-Seung Yeom; Scott Schneider; Dimitrios S. Nikolopoulos

Understanding the potential and implications of asymmetric multi-core processors for cluster computing is necessary, as these processors are rapidly becoming mainstream components in HPC environments. In this paper we evaluate a Linux cluster of Sony PlayStation3 consoles, using microbenchmarks and bioinformatics applications. We proceed to develop a model and scheduling techniques for effective execution of parallel applications on this low-cost, yet unconventional HPC platform based on the Cell/BE processor. We present an analytical formulation of layered parallelism for clusters of asymmetric multi-core multiprocessors and propose new co-scheduling heuristics for effectively executing MPI code with nested task and data parallelism on these systems. Our model has low execution time prediction error and is reliable in predicting optimal mappings of nested parallelism in MPI programs on the PS3 cluster. The presented co-scheduling heuristics reduce slack time on the accelerator cores of the PS3 and improve the performance of MPI applications by 1.7-2.7x, when compared against the native OS scheduler.

international conference on parallel processing | 2014

TRAM: Optimizing Fine-Grained Communication with Topological Routing and Aggregation of Messages

Lukasz Wesolowski; Ramprasad Venkataraman; Abhishek Gupta; Jae-Seung Yeom; Keith R. Bisset; Yanhua Sun; Pritish Jetley; Thomas R. Quinn; Laxmikant V. Kalé

Fine-grained communication in supercomputing applications often limits performance through high communication overhead and poor utilization of network bandwidth. This paper presents Topological Routing and Aggregation Module (TRAM), a library that optimizes fine-grained communication performance by routing and dynamically combining short messages. TRAM collects units of fine-grained communication from the application and combines them into aggregated messages with a common intermediate destination. It routes these messages along a virtual mesh topology mapped onto the physical topology of the network. TRAM improves network bandwidth utilization and reduces communication overhead. It is particularly effective in optimizing patterns with global communication and large message counts, such as all-to-all and many-to-many, as well as sparse, irregular, dynamic or data dependent patterns. We demonstrate how TRAM improves performance through theoretical analysis and experimental verification using benchmarks and scientific applications. We present speedups on petascale systems of 6x for communication benchmarks and up to 4x for applications.

ieee international conference on high performance computing data and analytics | 2017

Performance modeling under resource constraints using deep transfer learning

Aniruddha Marathe; Rushil Anirudh; Nikhil Jain; Abhinav Bhatele; Jayaraman J. Thiagarajan; Bhavya Kailkhura; Jae-Seung Yeom; Barry Rountree; Todd Gamblin

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.

ieee international conference on high performance computing data and analytics | 2010

Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Jae-Seung Yeom; Dimitrios S. Nikolopoulos

Multi-core processors with explicitly-managed local memories provide advanced capabilities to optimize data caching and prefetching in software. Unfortunately, these capabilities are neither easily accessible to programmers, nor exploited to their maximum potential by current language, compiler, or runtime frameworks. We present Strider, a runtime framework for optimizing compilers on multi-core processors with software- managed memories. Strider transparently optimizes grouping, decomposition, and scheduling of explicit software-managed accesses to multi-dimensional arrays in nested loops, given a high- level specification of loops and their data access patterns. In particular, Strider contributes new methods to improve temporal locality, optimize the critical path of scheduling data transfers for multi-stride accesses in regular nested parallel loops, and distribute accesses between cores. The prototype of Strider on the IBM Cell processor performs competitively to hand-optimized code and better than contemporary language frameworks, in both non-trivial parallel applications and important application kernels.

ieee international conference on high performance computing data and analytics | 2016

Data-driven performance modeling of linear solvers for sparse matrices

Jae-Seung Yeom; Jayaraman J. Thiagarajan; Abhinav Bhatele; Greg Bronevetsky; Tzanio V. Kolev

Performance of scientific codes is increasingly dependent on the input problem, its data representation and the underlying hardware with the increase in code and architectural complexity. This makes the task of identifying the fastest algorithm for solving a problem more challenging. In this paper, we focus on modeling the performance of numerical libraries used to solve a sparse linear system. We use machine learning to develop data-driven models of performance of linear solver implementations. These models can be used by a novice user to identify the fastest preconditioner and solver for a given input matrix. We use a variety of features that represent the matrix structure, numerical properties of the matrix and the underlying mesh or input problem as input to the model. We model the performance of nine linear solvers and thirteen preconditioners available in Trilinos using 1240 sparse matrices obtained from two different sources. Our prediction models perform significantly better than a blind classifier and black-box SVM and k-NN classifiers.

Explore More