Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kyungjoo Kim is active.

Publication


Featured researches published by Kyungjoo Kim.


arXiv: Mathematical Software | 2016

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kyungjoo Kim; Sivasankaran Rajamanickam; George Stelle; H. Carter Edwards; Stephen L. Olivier

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Cholesky-by-blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.


ieee international conference on high performance computing data and analytics | 2017

Designing vector-friendly compact BLAS and LAPACK kernels

Kyungjoo Kim; Timothy B. Costa; Mehmet Deveci; Andrew M. Bradley; Simon D. Hammond; Murat Efe Guney; Sarah Knepper; Shane Story; Sivasankaran Rajamanickam

Many applications, such as PDE based simulations and machine learning, apply blas/lapack routines to large groups of small matrices. While existing batched blas APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to blas/lapack routines that can be used within a hierarchical parallel application. Our layout provides up to 14X, 45X, and 27X speedup against OpenMP loops around optimized dgemm, dtrsm and dgetrf kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched blas/lapack implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.


Journal of Computational Physics | 2017

Modeling electrokinetic flows by consistent implicit incompressible smoothed particle hydrodynamics

Wenxiao Pan; Kyungjoo Kim; Mauro Perego; Alexandre M. Tartakovsky; Michael L. Parks

Abstract We present a consistent implicit incompressible smoothed particle hydrodynamics ( I 2 SPH) discretization of Navier–Stokes, Poisson–Boltzmann, and advection–diffusion equations subject to Dirichlet or Robin boundary conditions. It is applied to model various two and three dimensional electrokinetic flows in simple or complex geometries. The accuracy and convergence of the consistent I 2 SPH are examined via comparison with analytical solutions, grid-based numerical solutions, or empirical models. The new method provides a framework to explore broader applications of SPH in microfluidics and complex fluids with charged objects, such as colloids and biomolecules, in arbitrary complex geometries.


international parallel and distributed processing symposium | 2016

A Comparison of High-Level Programming Choices for Incomplete Sparse Factorization Across Different Architectures

Joshua Dennis Booth; Kyungjoo Kim; Sivasankaran Rajamanickam

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel_for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.


Advances in Water Resources | 2016

Intercomparison of 3D pore-scale flow and solute transport simulation methods

Xiaofan Yang; Yashar Mehmani; William A. Perkins; Andrea Pasquali; Martin Schönherr; Kyungjoo Kim; Mauro Perego; Michael L. Parks; Nathaniel Trask; Matthew T. Balhoff; Marshall C. Richmond; Martin Geier; Manfred Krafczyk; Li-Shi Luo; Alexandre M. Tartakovsky; Timothy D. Scheibe


Computer Methods in Applied Mechanics and Engineering | 2015

A scalable consistent second-order SPH solver for unsteady low Reynolds number flows

Nathaniel Trask; Martin R. Maxey; Kyungjoo Kim; Mauro Perego; Michael L. Parks; Kai Yang; Jinchao Xu


international parallel and distributed processing symposium | 2018

Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization

Kyungjoo Kim; H. Carter Edwards; Sivasankaran Rajamanickam


Archive | 2016

A Massively Parallel Scalable Implicit SPH Solver.

Nathaniel Trask; Martin R. Maxey; Kyungjoo Kim; Mauro Perego; Michael L. Parks; Kai Yang; Jinchao Xu; Wenxiao Pan; Alex Tartakovsky


Archive | 2015

A Highly-Scalable Implicit SPH Code for Simulating Single- and Multi-phase Flows in Geometrically Complex Bounded Domains.

Nathaniel Trask; Kyungjoo Kim; Alexadre Tartakovsky; Mauro Perego; Michael L. Parks


Bulletin of the American Physical Society | 2015

Modeling electrokinetic flow by Lagrangian particle-based method

Wenxiao Pan; Kyungjoo Kim; Mauro Perego; Alexandre M. Tartakovsky; Mike Parks

Collaboration


Dive into the Kyungjoo Kim's collaboration.

Top Co-Authors

Avatar

Mauro Perego

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Michael L. Parks

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexandre M. Tartakovsky

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jinchao Xu

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Kai Yang

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Wenxiao Pan

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

H. Carter Edwards

Sandia National Laboratories

View shared research outputs
Researchain Logo
Decentralizing Knowledge