Seung-Hwan Lim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seung-Hwan Lim is active.

Explore More

Publication

Featured researches published by Seung-Hwan Lim.

ieee international conference on high performance computing data and analytics | 2015

Optimizing deep learning hyper-parameters through an evolutionary algorithm

Steven R. Young; Derek C. Rose; Thomas P. Karnowski; Seung-Hwan Lim; Robert M. Patton

There has been a recent surge of success in utilizing Deep Learning (DL) in imaging and speech applications for its relatively automatic feature generation and, in particular for convolutional neural networks (CNNs), high accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms.

international symposium on performance analysis of systems and software | 2015

Graph Processing Platforms at Scale: Practices and Experiences

Seung-Hwan Lim; Sangkeun Lee; Gautam Ganesh; Tyler C. Brown; Sreenivas R. Sukumar

Graph analysis has revealed patterns and relationships hidden in data from a variety of domains such as transportation networks, social networks, clinical pathways, and collaboration networks. As these networks grow in size, variety and complexity, it is a challenge to find the right combination of tools and implementation of algorithms to discover new insights from the data. Addressing this challenge, our study presents an extensive empirical evaluation of three representative graph processing platforms: Pegasus, GraphX, and Urika. Each system represents a combination of options in data model, processing paradigm, and infrastructure. We benchmark each platform using three popular graph mining operations, degree distribution, connected components, and PageRank over real-world graphs. Our experiments show that each graph processing platform owns a particular strength for different types of graph operations. While Urika performs the best in non-iterative graph operations like degree distribution, GraphX outperforms iterative operations like connected components and PageRank. We conclude this paper by discussing options to optimize the performance of a graph-theoretic operation on each platform for large-scale real world graphs.

international conference on data engineering | 2015

Graph mining meets the Semantic Web

Sangkeun Lee; Sreenivas R. Sukumar; Seung-Hwan Lim

The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluate the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.

international conference on big data | 2016

Mini-apps for high performance data analysis

Sreenivas R. Sukumar; Michael A. Matheson; Ramakrishnan Kannan; Seung-Hwan Lim

Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a grand challenge that we face today. Despite the growing need for analysis from science domains that are generating ‘Big Data’ from instruments and simulations, building high-performance analytical workflows of data-intensive algorithms have been daunting because: (i) the ‘Big Data’ hardware and software architecture landscape is constantly evolving, (ii) newer architectures impose new programming models, and (iii) data-parallel kernels of analysis algorithms and their performance facets on different architectures are poorly understood. To address these problems, we have: (i) identified scalable data-parallel kernels of popular data analysis algorithms, (ii) implemented ‘Mini-Apps’ of those kernels using different programming models (e.g. Map Reduce, MPI, etc.), (iii) benchmarked and validated the performance of the kernels in diverse architectures. In this paper, we discuss two of those Mini-Apps and show the execution of principal component analysis built as a workflow of the Mini-Apps. We show that Mini-Apps enable scientists to (i) write domain-specific data analysis code that scales on most HPC hardware and (ii) and offers the ability (most times with over a 10x speed-up) to analyze data sizes 100 times the size of what off-the-shelf desktop/workstations of today can handle.

Expert Systems With Applications | 2016

Enabling graph mining in RDF triplestores using SPARQL for holistic in-situ graph analysis

Sangkeun Lee; Sreenivas R. Sukumar; Seokyong Hong; Seung-Hwan Lim

We present implementations of six popular graph mining algorithms using SPARQL.We present guidelines for efficient graph mining algorithms using SPARQL.We released our implementation as publicly available open-source project.We analyzed the performance of our implementation on various system environments. Graph analysis is now considered as a promising technique to discover useful knowledge from data. We posit that there are two dimensions of graph analysis: OnLine Graph Analytic Processing (OLGAP) and Graph Mining (GM) where each respectively focuses on subgraph pattern matching and automatic knowledge discovery. As these two dimensions aim to complementarily solve complex problems, holistic in-situ graph analysis which covers both OLGAP and GM in a single system is critical for minimizing the burdens of operating multiple graph systems and transferring intermediate result-sets between those systems. Nevertheless, most existing graph analysis systems are only capable of one dimension of graph analysis. In this work, we take an approach to enabling GM capabilities (e.g., PageRank, connected-component analysis, node eccentricity, etc.) in RDF triplestores, which are originally developed to store RDF datasets and provide OLGAP capability. More specifically, to achieve our goal, we implemented six representative graph mining algorithms using SPARQL. The approach allows a wide range of available RDF datasets directly applicable for holistic graph analysis within a system. For validation of our approach, we evaluate performance of our implementations with nine real-world datasets and three different computing environments - a laptop computer, an Amazon EC2 instance, and a shared-memory Cray XMT2 URIKA-GD graph-processing appliance. The experimental results show that our implementation can provide promising and scalable performance for real world graph analysis in all tested environments. The developed software is publicly available in an open-source project that we initiated.

international conference on big data | 2015

Table2Graph: A Scalable Graph Construction from Relational Tables Using Map-Reduce

Sangkeun Lee; Byung H. Park; Seung-Hwan Lim; Mallikarjun Shankar

Identifying correlations and relationships between entities within and across different data sets (or databases) is of great importance in many domains. The data warehouse-based integration, which has been most widely practiced, is found to be inadequate to achieve such a goal. Instead we explored an alternate solution that turns multiple disparate data sources into a single heterogeneous graph model so that matching between entities across different source data would be expedited by examining their linkages in the graph. We found, however, while a graph-based model provides outstanding capabilities for this purposes, construction of one such model from relational source databases were time consuming and primarily left to ad hoc proprietary scripts. This led us to develop a reconfigurable and reusable graph construction tool that is designed to work at scale. In this paper, we introduce Table2Graph, the graph construction tool based on Map-Reduce framework over Hadoop. We also discuss results from applying Table2Graph to integrate disparate healthcare databases.

international parallel and distributed processing symposium | 2014

Analyzing Reliability of Virtual Machine Instances with Dynamic Pricing in the Public Cloud

Seung-Hwan Lim; Gautam S. Thakur; James Horey

This study presents reliability analysis of virtual machine instances in public cloud environments in the face of dynamic pricing. Different from traditional fixed pricing, dynamic pricing allows price to dynamically fluctuate over arbitrary period of time according to external factors such as supply and demand, excess capacity, etc. This pricing option introduces a new type of fault: virtual machine instances may be unexpectedly terminated due to conflicts in the original bid price and the current offered price. This new class of fault under dynamic pricing may be more dominant than traditional faults in cloud computing environments, where resource availability associated with traditional faults is often above 99.9%. To address and understand this new type of fault, we translated two classic reliability metrics, mean time between failures and availability, to the Amazon Web Services spot market using historical price data. We also validated our findings by submitting actual bids in the spot market. We found that overall, our historical analysis and experimental validation lined up well. Based upon these experimental results, we also provided suggestions and techniques to maximize overall reliability of virtual machine instances under dynamic pricing.

international conference on big data | 2016

Kernels for scalable data analysis in science: Towards an architecture-portable future

Sreenivas R. Sukumar; Ramakrishnan Kannan; Seung-Hwan Lim; Michael A. Matheson

In this paper, we pose and address some of the unique challenges in the analysis of scientific Big Data on supercomputing platforms. Our approach identifies, implements and scales numerical kernels that are critical to the instantiation of theory-inspired analytic workflows on modern computing architectures. We present the benefits of scalable kernels towards constructing algorithms such as principal component analysis and non-negative matrix factorization on an image-analysis use case at the Oak Ridge Leadership Computing Facility (OLCF). Based on experience with the use-case, we conclude that piecing scalable analytic kernels into user-defined analytic workflows are a flexible, modular and agile way to enable architecture-portable productivity for the data-intensive sciences.

international conference on big data | 2016

Constellation: A science graph network for scalable data and knowledge discovery in extreme-scale scientific collaborations

Sudharshan S. Vazhkudai; John Harney; Raghul Gunasekaran; Dale Stansberry; Seung-Hwan Lim; Thomas E Barron; Andrew W Nash; Arvind Ramanathan

Constellations overarching goal is the federation of information from resources within an extreme-scale scientific collaboration to enable the scalable discovery of data and new knowledge pathways. The resource fabric is comprised of petascale supercomputers and storage systems, users, jobs, datasets and lifecycle artifacts. For an extreme-scale supercomputing center, normal operations can generate hundreds of millions of data products and metadata entries describing the resource fabric. Constellation federates the information extracted from the resources using a custom, transformative science graph network; constructs rich metadata indexes and higher-order derived metadata from the extracted information; and conducts scalable graph analytics to unravel hidden data pathways. Our implementation and deployment for a production, supercomputing facility shows that the graph can scale to more than 750 million vertices, its domain agnostic indexing can answer interesting science queries, and its analytics can aid in structural, topological and temporal analysis to identify usage hotspots.

2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS) | 2016

Fatman vs. littleboy: scaling up linear algebraic operations in scale-out data platforms

Luna Xu; Seung-Hwan Lim; Ali Raza Butt; Sreenivas R. Sukumar; Ramakrishnan Kannan

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.

Explore More