Nayong Kim
Louisiana State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nayong Kim.
grid computing | 2010
Soon-Heum Ko; Nayong Kim; Joohyun Kim; Abhinav Thota; Shantenu Jha
Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.
BioMed Research International | 2014
Anjani Ragothaman; Sairam Chowdary Boddu; Nayong Kim; Wei P. Feinstein; Michal Brylinski; Shantenu Jha; Joohyun Kim
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
advanced information networking and applications | 2015
Nayong Kim; Richard Platania; Wei Huang; Chris Knight; T. Keyes; Seung-Jong Park; Joohyun Kim
We present the latest development and experimental simulation studies of Statistical Temperature Molecular Dynamics (STMD) and its parallel tempering version, Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD). Our main contributions are i) introduction of newly implemented STMD in LAMMPS, ii) use of large scale distributed cyber infrastructure including Amazon EC2 and the nationwide distributed computing infrastructure GENI, in addition to High-performance Computing (HPC) cluster systems, and iii) benchmark and simulation results highlighting advantages and potentials of STMD and RESTMD for challenging large-scale bio molecular conformational search. In this work, we attempt to provide convincing evidence that RESTMD, combining two advanced sampling protocols, STMD and the replica exchange algorithm, offers various advantages over not only conventional ineffective approaches but also other enhanced sampling methods. Interestingly, RESTMD has benefits over the most popular Replica Exchange Molecular Dynamics (REMD) as an application maximizing its capacity in HPC environments. For example, RESTMD alleviates the need of a large number of replicas which is unavoidable in REMD and is flexible in order to exploit the maximum amount of available computing power of a cluster system. Continuing our recent effort in which RESTMD was implemented with a community molecular dynamics package, CHARMM, and the Hadoop MapReduce, in this work, we report latest development outcomes. First of all, we plugged the implementation of STMD into LAMMPS, one of the most popular public molecular dynamics packages. This is expected to position STMD and RESTMD appealing to investigators from a broad range of life science fields. Secondly, Hadoop MapReduce-based RESTMD is now able to run on Amazon EC2 and the nationwide network virtual organization, the GENI distributed computing environment. Thirdly, in order to find optimized parameters for RESTMD simulations, simulation results using test systems, water and solvated cram bin, were obtained and presented. These results, despite of relatively small sizes and short time scale trajectories, serve to underscore merits and potentials of STMD and RESTMD with respect to the strength in algorithmic advantages as well as efficient utilization of distributed resources.
IWSG '14 Proceedings of the 2014 6th International Workshop on Science Gateways | 2014
Praveenkumar Kondikoppa; Richard Platania; Seung-Jong Park; T. Keyes; Jaegil Kim; Nayong Kim; Joohyun Kim; Shuju Bai
A novel implementation of Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD), belonging to a generalized ensemble method and also known as parallel tempering, is presented. Our implementation employs the MapReduce (MR)-based iterative framework for launching RESTMD over high performance computing (HPC) clusters including our test bed system, Cyber-infrastructure for Reconfigurable Optical Networks (CRON) simulating a network-connected distributed system. Our main contribution is a new implementation of STMD plugged into the well-known CHARMM molecular dynamics package as well as the RESTMD implementation powered by the Hadoop that scales out in a cluster and across distributed systems effectively. To address challenges for the use of Hadoop MapReduce, we examined contributing factors on the performance of the proposed framework with various runtime analysis experiments with two biological systems that differ in size and over different types of HPC resources. Many advantages with the use of RESTMD suggest its effectiveness for enhanced sampling, one of grand challenges in a variety of areas of studies ranging from chemical systems to statistical inference. Lastly, with its support for scale-across capacity over distributed computing infrastructure (DCI) and the use of Hadoop for coarse-grained task-level parallelism, MapReduce-based RESTMD represents truly a good example of the next-generation of applications whose provision is increasingly becoming demanded by science gateway projects, in particular, backed by IaaS clouds.
Infection, Genetics and Evolution | 2018
Rahul Sharma; Pushpendra Singh; Maria Pena; Ramesh Subramanian; Vladmir Chouljenko; Joohyun Kim; Nayong Kim; John Caskey; Marie A. Baudena; Linda B. Adams; Richard W. Truman
Leprosy (Hansens Disease) has occurred throughout human history, and persists today at a low prevalence in most populations. Caused by Mycobacterium leprae, the infection primarily involves the skin, mucosa and peripheral nerves. The susceptible host range for Mycobacterium leprae is quite narrow. Besides humans, nine banded armadillos (Dasypus novemcinctus) and red squirrels (Sciurus vulgaris) are the only other natural hosts for M. leprae, but only armadillos recapitulate the disease as seen in humans. Armadillos across the Southern United States harbor a single predominant genotypic strain (SNP Type-3I) of M. leprae, which is also implicated in the zoonotic transmission of leprosy. We investigated, whether the zoonotic strain (3I) has any notable growth advantages in armadillos over another genetically distant strain-type (SNP Type-4P) of M. leprae, and if M. leprae strains manifest any notably different pathology among armadillos. We co-infected armadillos (n = 6) with 2 × 109 highly viable M. leprae of both strains and assessed the relative growth and dissemination of each strain in the animals. We also analyzed 12 additional armadillos, 6 each individually infected with the same quantity of either strain. The infections were allowed to fulminate and the clinical manifestations of the disease were noted. Animals were humanely sacrificed at the terminal stage of infection and the number of bacilli per gram of liver, spleen and lymph node tissue were enumerated by Q-PCR assay. The growth of M. leprae strain 4P was significantly higher (P < 0.05) than 3I when each strain was propagated individually in armadillos. Significantly (P < 0.0001) higher growth of the 4P strain also was confirmed among animals co-infected with both 3I and 4P strain types using whole genome sequencing. Interestingly, the zoonotic strain does not exhibit any growth advantage in these non-human hosts, but the varied proliferation of the two M. leprae strains within armadillos suggest there are notable pathological variations between M. leprae strain-types.
Concurrency and Computation: Practice and Experience | 2017
Richard Platania; Shayan Shams; Chui-Hui Chiu; Nayong Kim; Joohyun Kim; Seung-Jong Park
We present Hadoop‐based replica exchange (HaRE), a Hadoop‐based implementation of the replica exchange scheme developed primarily for replica exchange statistical temperature molecular dynamics, an example of a large‐scale, advanced sampling molecular dynamics simulation. By using Hadoop as a framework and the MapReduce model for driving replica exchange, an efficient task‐level parallelism is introduced to replica exchange statistical temperature molecular dynamics simulations. In order to demonstrate this, we investigate the performance of our application over various distributed cyberinfrastructures (DCI), including several high‐performance computing systems, our cyberinfrastructure for reconfigurable optical networks testbed, the global environment for network innovations testbed, and the CloudLab testbed. Scalability performance analysis is shown in terms of scale‐out and scale‐up over a single high‐performance computing cluster, EC2, and CloudLab and scale‐across with cyberinfrastructure for reconfigurable optical networks and global environment for network innovations. As a result, we demonstrate that HaRE is capable of efficient execution over both homogeneous and heterogeneous DCI of varying size and configuration. Contributing factors to performance are discussed in order to provide insight towards the effects of computing environment on the execution of HaRE. With these contributions, we propose that similar loosely coupled scientific applications can also take advantage of the scalable, task‐level parallelism Hadoop MapReduce provides over various DCI. Copyright
42nd AIAA Fluid Dynamics Conference and Exhibit | 2012
Nayong Kim; Soon-Heum Ko; Shantenu Jha; Brian Novak; Dorel Moldovan; Dimitris E. Nikitopoulos
The constrained Lagrangian dynamics modeling in the hybrid computational fluid dynamics (CFD) molecular dynamics (MD) approach is improved for the simulation of multi-species polyatomic fluid. The primitive formulation of the classical Lagrangian dynamics equation is replaced by conservative form to account for multi-species fluid system. Also, the equation is applied on molecules instead of individual atom, to preserve the linear momentum between continuum and particle domain without encountering the unfavorable numerical break-down of molecular bonding. We verify our hybrid CFD-MD simulation package by analyzing a nano-scale transient Couette flow of a single monatomic fluid. The multi-species polyatomic Lagrangian dynamics modeling has been evaluated by analyzing two different fluid models: the mixture of two monatomic fluids and a polyatomic molecular fluid under the short-range potential. These two applications verify the accuracy of the proposed model and evaluate the hybrid CFD-MD approach as a tool to describe the complex flow field near the solid obstacle.
international parallel and distributed processing symposium | 2016
Shayan Shams; Nayong Kim; Xiandong Meng; Ming Tai Ha; Shantenu Jha; Zhong Wang; Joohyun Kim
We introduce a pilot-based approach with which scalable data analytics essential for a large RNA-seq data set are efficiently carried out. Major development mechanisms, designed in order to achieve the required scalability, in particular, targeting cloud environments with on-demand computing, are presented. With an example of Amazon EC2, by harnessing distributed and parallel computing implementations, our pipeline is able to allocate optimally computing resources to tasks of a target workflow in an efficient manner. Consequently, decreasing time-to-completion (TTC) or cost, avoiding failures due to a limited resource of a single node, and enabling scalable data analysis with multiple options can be achieved. Our developed pipeline benefits from the underlying pilot system, Radical Pilot, being readily amenable to scalable solutions over distributed heterogeneous computing resources and suitable for advanced workflows of dynamically adaptive executions. In order to provide insights on such features, benchmark experiments, using two real data sets, were carried out. The benchmark experiments focus on the most computationally expensive transcript assembly step. Evaluation and comparison of transcript assembly accuracy using a single de novo assembler or the combination of multiple assemblers are also presented, underscoring its potential as a platform to support multi-assembler multi-parameter methods or ensemble methods which are statistically attractive and easily feasible with our scalable pipeline. The developed pipeline, as manifested by results presented in this work, is built upon effective strategies that address major challenging issues and viable solutions toward an integrative and scalable method for large-scale RNA-seq data analysis, particularly maximizing merits of Infrastructure as a Service (IaaS) clouds
Genome Announcements | 2015
Rui Wang; Hasan C. Tekedar; Mark L. Lawrence; Vladimir N. Chouljenko; Joohyun Kim; Nayong Kim; Konstantin G. Kousoulas; John P. Hawke
ABSTRACT Here, we report the draft genome sequences of Edwardsiella ictaluri strains LADL11-100 and LADL11-194, two isolates from natural outbreaks of edwardsiellosis in the zebrafish Danio rerio, as well as the sequences of the plasmids carried by the zebrafish strain of E. ictaluri.
ASME-JSME-KSME 2011 Joint Fluids Engineering Conference: Volume 1, Symposia – Parts A, B, C, and D | 2011
Soon-Heum Ko; Nayong Kim; Shantenu Jha
We propose numerical approaches to reduce the sampling noise of a hybrid computational fluid dynamics (CFD) - molecular dynamics (MD) solution. A hybrid CFD-MD approach provides higher-resolution solution near the solid obstacle and better efficiency than a pure particle-based simulation technique. However, applications up to now are limited to extreme velocity conditions, since the magnitude of statistical error in sampling particles’ velocity is very large compared to the continuum velocity. Considering technical difficulties of infinitely increasing MD domain size, we propose and experiment a number of numerical alternatives to suppress the excessive sampling noise in solving moderate-velocity flow field. They are the sampling of multiple replicas, virtual stretching of sampling layers in space, and linear fitting of multiple temporal samples. We discuss the pros and cons of each technique in view of solution accuracy and computational cost.© 2011 ASME