Hemanth Kolla | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hemanth Kolla is active.

Explore More

Publication

Featured researches published by Hemanth Kolla.

ieee international conference on high performance computing data and analytics | 2012

Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

Janine C. Bennett; Hasan Abbasi; Peer-Timo Bremer; Ray W. Grout; Attila Gyulassy; Tong Jin; Scott Klasky; Hemanth Kolla; Manish Parashar; Valerio Pascucci; Philippe Pierre Pebay; David C. Thompson; Hongfeng Yu; Fan Zhang; Jacqueline H. Chen

With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

ieee international conference on high performance computing data and analytics | 2011

ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data

Sriram Lakshminarasimhan; John Jenkins; Zhenhuan Gong; Hemanth Kolla; S. Ku; Stephane Ethier; J.H. Chen; Choong-Seock Chang; Scott Klasky; Robert Latham; Robert B. Ross; Nagiza F. Samatova

Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed - rather than original, full-size - data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatio-temporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries - a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first technology that enables query-driven analytics over the compressed spatio-temporal floating-point double- or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.

high performance distributed computing | 2012

ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Eric R. Schendel; Saurabh V. Pendse; John Jenkins; David A. Boyuka; Zhenhuan Gong; Sriram Lakshminarasimhan; Qing Liu; Hemanth Kolla; J.H. Chen; Scott Klasky; Robert B. Ross; Nagiza F. Samatova

Current peta-scale data analytics frameworks suffer from a significant performance bottleneck due to an imbalance between their enormous computational power and limited I/O bandwidth. Using data compression schemes to reduce the amount of I/O activity is a promising approach to addressing this problem. In this paper, we propose a hybrid framework for interleaving I/O with data compression to achieve improved I/O throughput side-by-side with reduced dataset size. We evaluate several interleaving strategies, present theoretical models, and evaluate the efficiency and scalability of our approach through comparative analysis. With our theoretical model, considering 19 real-world scientific datasets both from the public domain and peta-scale simulations, we estimate that the hybrid method can result in a 12 to 46 increase in throughput on hard-to-compress scientific datasets. At the reported peak bandwidth of 60 GB/s of uncompressed data for a current, leadership-class parallel I/O system, this translates into an effective gain of 7 to 28 GB/s in aggregate throughput.

ieee international conference on high performance computing data and analytics | 2014

Exploring automatic, online failure recovery for scientific applications at extreme scales

Marc Gamell; Daniel S. Katz; Hemanth Kolla; Jacqueline H. Chen; Scott Klasky; Manish Parashar

Application resilience is a key challenge that must be addressed in order to realize the exascale vision. Process/node failures, an important class of failures, are typically handled today by terminating the job and restarting it from the last stored checkpoint. This approach is not expected to scale to exascale. In this paper we present Fenix, a framework for enabling recovery from process/node/blade/cabinet failures for MPI-based parallel applications in an online (i.e., Without disrupting the job) and transparent manner. Fenix provides mechanisms for transparently capturing failures, re-spawning new processes, fixing failed communicators, restoring application state, and returning execution control back to the application. To enable automatic data recovery, Fenix relies on application-driven, diskless, implicitly coordinated check pointing. Using the S3D combustion simulation running on the Titan Cray-XK7 production system at ORNL, we experimentally demonstrate Felixs ability to tolerate high failure rates (e.g., More than one per minute) with low overhead while sustaining performance.

ieee international conference on high performance computing data and analytics | 2014

In-situ feature extraction of large scale combustion simulations using segmented merge trees

Aaditya G. Landge; Valerio Pascucci; Attila Gyulassy; Janine C. Bennett; Hemanth Kolla; Jacqueline H. Chen; Peer-Timo Bremer

The ever increasing amount of data generated by scientific simulations coupled with system I/O constraints are fueling a need for in-situ analysis techniques. Of particular interest are approaches that produce reduced data representations while maintaining the ability to redefine, extract, and study features in a post-process to obtain scientific insights. This paper presents two variants of in-situ feature extraction techniques using segmented merge trees, which encode a wide range of threshold based features. The first approach is a fast, low communication cost technique that generates an exact solution but has limited scalability. The second is a scalable, local approximation that nevertheless is guaranteed to correctly extract all features up to a predefined size. We demonstrate both variants using some of the largest combustion simulations available on leadership class supercomputers. Our approach allows state-of-the-art, feature-based analysis to be performed in-situ at significantly higher frequency than currently possible and with negligible impact on the overall simulation runtime.

ieee international conference on high performance computing data and analytics | 2013

Exploring power behaviors and trade-offs of in-situ data analytics

Marc Gamell; Ivan Rodero; Manish Parashar; Janine C. Bennett; Hemanth Kolla; Jacqueline H. Chen; Peer-Timo Bremer; Aaditya G. Landge; Attila Gyulassy; Patrick S. McCormick; Scott Pakin; Valerio Pascucci; Scott Klasky

As scientific applications target exascale, challenges related to data and energy are becoming dominating concerns. For example, coupled simulation workflows are increasingly adopting in-situ data processing and analysis techniques to address costs and overheads due to data movement and I/O. However it is also critical to understand these overheads and associated trade-offs from an energy perspective. The goal of this paper is exploring data-related energy/performance trade-offs for end-to-end simulation workflows running at scale on current high-end computing systems. Specifically, this paper presents: (1) an analysis of the data-related behaviors of a combustion simulation workflow with an insitu data analytics pipeline, running on the Titan system at ORNL; (2) a power model based on system power and data exchange patterns, which is empirically validated; and (3) the use of the model to characterize the energy behavior of the workflow and to explore energy/performance tradeoffs on current as well as emerging systems.

international conference on cluster computing | 2011

PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets

Sidharth Kumar; Venkatram Vishwanath; Philip H. Carns; Brian Summa; Giorgio Scorzelli; Valerio Pascucci; Robert B. Ross; Jacqueline H. Chen; Hemanth Kolla; Ray W. Grout

The IDX data format provides efficient, cache oblivious, and progressive access to large-scale scientific datasets by storing the data in a hierarchical Z (HZ) order. Data stored in IDX format can be visualized in an interactive environment allowing for meaningful explorations with minimal resources. This technology enables real-time, interactive visualization and analysis of large datasets on a variety of systems ranging from desktops and laptop computers to portable devices such as iPhones/iPads and over the web. While the existing ViSUS API for writing IDX data is serial, there are obvious advantages of applying the IDX format to the output of large scale scientific simulations. We have therefore developed PIDX - a parallel API for writing data in an IDX format. With PIDX it is now possible to generate IDX datasets directly from large scale scientific simulations with the added advantage of real-time monitoring and visualization of the generated data. In this paper, we provide an overview of the IDX file format and how it is generated using PIDX. We then present a data model description and a novel aggregation strategy to enhance the scalability of the PIDX library. The S3D combustion application is used as an example to demonstrate the efficacy of PIDX for a real-world scientific simulation. S3D is used for fundamental studies of turbulent combustion requiring exceptionally high fidelity simulations. PIDX achieves up to 18 GiB/s I/O throughput at 8,192 processes for S3D to write data out in the IDX format. This allows for interactive analysis and visualization of S3D data, thus, enabling in situ analysis of S3D simulation.

international parallel and distributed processing symposium | 2015

Exploring Data Staging Across Deep Memory Hierarchies for Coupled Data Intensive Simulation Workflows

Tong Jin; Fan Zhang; Qian Sun; Hoang Bui; Melissa Romanus; Norbert Podhorszki; Scott Klasky; Hemanth Kolla; Jacqueline H. Chen; Robert Hager; Choong-Seock Chang; Manish Parashar

As applications target extreme scales, data staging and in-situ/in-transit data processing have been proposed to address the data challenges and improve scientific discovery. However, further research is necessary in order to understand how growing data sizes from data intensive simulations coupled with the limited DRAM capacity in High End Computing systems will impact the effectiveness of this approach. In this paper, we explore how we can use deep memory levels for data staging, and develop a multi-tiered data staging method that spans bothDRAM and solid state disks (SSD). This approach allows us to support both code coupling and data management for data intensive simulation workflows. We also show how an adaptive application-aware data placement mechanism can dynamically manage and optimize data placement across the DRAM ands storage levels in this multi-tiered data staging method. We present an experimental evaluation of our approach using wolf resources: an Infiniband cluster (Sith) and a Cray XK7system (Titan), and using combustion (S3D) and fusion (XGC1) simulations.

international parallel and distributed processing symposium | 2012

Multi-level Layout Optimization for Efficient Spatio-temporal Queries on ISABELA-compressed Data

Zhenhuan Gong; Sriram Lakshminarasimhan; John Jenkins; Hemanth Kolla; Stephane Ethier; J.H. Chen; Robert B. Ross; Scott Klasky; Nagiza F. Samatova

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O subsystems of their runtime environments, not only making I/O the primary bottleneck, but also consuming space that pushes the storage capacities of many computing facilities. These problems are exacerbated by the need to perform data-intensive analytics applications, such as querying the dataset by variable and spatio-temporal constraints, for what current database technologies commonly build query indices of size greater than that of the raw data. To help solve these problems, we present a parallel query-processing engine that can handle both range queries and queries with spatio-temporal constraints, on B-spline compressed data with user-controlled accuracy. Our method adapts to widening gaps between computation and I/O performance by querying on compressed metadata separated into bins by variable values, utilizing Hilbert space-filling curves to optimize for spatial constraints and aggregating data access to improve locality of per-bin stored data, reducing the false positive rate and latency bound I/O operations (such as seek) substantially. We show our method to be efficient with respect to storage, computation, and I/O compared to existing database technologies optimized for query processing on scientific data.

ieee international conference on high performance computing data and analytics | 2012

Efficient data restructuring and aggregation for I/O acceleration in PIDX

Sidharth Kumar; Venkatram Vishwanath; Philip H. Carns; Joshua A. Levine; Robert Latham; Giorgio Scorzelli; Hemanth Kolla; Ray W. Grout; Robert B. Ross; Michael E. Papka; Jacqueline H. Chen; Valerio Pascucci

Hierarchical, multiresolution data representations enable interactive analysis and visualization of large-scale simulations. One promising application of these techniques is to store high performance computing simulation output in a hierarchical Z (HZ) ordering that translates data from a Cartesian coordinate scheme to a one-dimensional array ordered by locality at different resolution levels. However, when the dimensions of the simulation data are not an even power of 2, parallel HZ ordering produces sparse memory and network access patterns that inhibit I/O performance. This work presents a new technique for parallel HZ ordering of simulation datasets that restructures simulation data into large (power of 2) blocks to facilitate efficient I/O aggregation. We perform both weak and strong scaling experiments using the S3D combustion application on both Cray-XE6 (65,536 cores) and IBM Blue Gene/P (131,072 cores) platforms. We demonstrate that data can be written in hierarchical, multiresolution format with performance competitive to that of native data-ordering methods.

Explore More