Kyungjoo Kim
Sandia National Laboratories
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kyungjoo Kim.
arXiv: Mathematical Software | 2016
Kyungjoo Kim; Sivasankaran Rajamanickam; George Stelle; H. Carter Edwards; Stephen L. Olivier
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Cholesky-by-blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.
ieee international conference on high performance computing data and analytics | 2017
Kyungjoo Kim; Timothy B. Costa; Mehmet Deveci; Andrew M. Bradley; Simon D. Hammond; Murat Efe Guney; Sarah Knepper; Shane Story; Sivasankaran Rajamanickam
Many applications, such as PDE based simulations and machine learning, apply blas/lapack routines to large groups of small matrices. While existing batched blas APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to blas/lapack routines that can be used within a hierarchical parallel application. Our layout provides up to 14X, 45X, and 27X speedup against OpenMP loops around optimized dgemm, dtrsm and dgetrf kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched blas/lapack implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.
Journal of Computational Physics | 2017
Wenxiao Pan; Kyungjoo Kim; Mauro Perego; Alexandre M. Tartakovsky; Michael L. Parks
Abstract We present a consistent implicit incompressible smoothed particle hydrodynamics ( I 2 SPH) discretization of Navier–Stokes, Poisson–Boltzmann, and advection–diffusion equations subject to Dirichlet or Robin boundary conditions. It is applied to model various two and three dimensional electrokinetic flows in simple or complex geometries. The accuracy and convergence of the consistent I 2 SPH are examined via comparison with analytical solutions, grid-based numerical solutions, or empirical models. The new method provides a framework to explore broader applications of SPH in microfluidics and complex fluids with charged objects, such as colloids and biomolecules, in arbitrary complex geometries.
international parallel and distributed processing symposium | 2016
Joshua Dennis Booth; Kyungjoo Kim; Sivasankaran Rajamanickam
All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel_for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.
Advances in Water Resources | 2016
Xiaofan Yang; Yashar Mehmani; William A. Perkins; Andrea Pasquali; Martin Schönherr; Kyungjoo Kim; Mauro Perego; Michael L. Parks; Nathaniel Trask; Matthew T. Balhoff; Marshall C. Richmond; Martin Geier; Manfred Krafczyk; Li-Shi Luo; Alexandre M. Tartakovsky; Timothy D. Scheibe
Computer Methods in Applied Mechanics and Engineering | 2015
Nathaniel Trask; Martin R. Maxey; Kyungjoo Kim; Mauro Perego; Michael L. Parks; Kai Yang; Jinchao Xu
international parallel and distributed processing symposium | 2018
Kyungjoo Kim; H. Carter Edwards; Sivasankaran Rajamanickam
Archive | 2016
Nathaniel Trask; Martin R. Maxey; Kyungjoo Kim; Mauro Perego; Michael L. Parks; Kai Yang; Jinchao Xu; Wenxiao Pan; Alex Tartakovsky
Archive | 2015
Nathaniel Trask; Kyungjoo Kim; Alexadre Tartakovsky; Mauro Perego; Michael L. Parks
Bulletin of the American Physical Society | 2015
Wenxiao Pan; Kyungjoo Kim; Mauro Perego; Alexandre M. Tartakovsky; Mike Parks