Sidharth Kumar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sidharth Kumar is active.

Explore More

Publication

Featured researches published by Sidharth Kumar.

international conference on cluster computing | 2011

PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets

Sidharth Kumar; Venkatram Vishwanath; Philip H. Carns; Brian Summa; Giorgio Scorzelli; Valerio Pascucci; Robert B. Ross; Jacqueline H. Chen; Hemanth Kolla; Ray W. Grout

The IDX data format provides efficient, cache oblivious, and progressive access to large-scale scientific datasets by storing the data in a hierarchical Z (HZ) order. Data stored in IDX format can be visualized in an interactive environment allowing for meaningful explorations with minimal resources. This technology enables real-time, interactive visualization and analysis of large datasets on a variety of systems ranging from desktops and laptop computers to portable devices such as iPhones/iPads and over the web. While the existing ViSUS API for writing IDX data is serial, there are obvious advantages of applying the IDX format to the output of large scale scientific simulations. We have therefore developed PIDX - a parallel API for writing data in an IDX format. With PIDX it is now possible to generate IDX datasets directly from large scale scientific simulations with the added advantage of real-time monitoring and visualization of the generated data. In this paper, we provide an overview of the IDX file format and how it is generated using PIDX. We then present a data model description and a novel aggregation strategy to enhance the scalability of the PIDX library. The S3D combustion application is used as an example to demonstrate the efficacy of PIDX for a real-world scientific simulation. S3D is used for fundamental studies of turbulent combustion requiring exceptionally high fidelity simulations. PIDX achieves up to 18 GiB/s I/O throughput at 8,192 processes for S3D to write data out in the IDX format. This allows for interactive analysis and visualization of S3D data, thus, enabling in situ analysis of S3D simulation.

petascale data storage workshop | 2010

Towards parallel access of multi-dimensional, multi-resolution scientific data

Sidharth Kumar; Valerio Pascucci; Venkatram Vishwanath; Philip H. Carns; Mark Hereld; Robert Latham; Tom Peterka; Michael E. Papka; Robert B. Ross

Large-scale scientific simulations routinely produce data of increasing resolution. Analyzing this data is key to scientific discovery. A critical bottleneck facing data analysis is the I/O time to access the data due to the disparity between a simulations data layout and the data layout requirements of analysis applications. One method of addressing this problem is to reorganize the data in a manner that makes it more amenable to analysis and visualization. The IDX file format is one example of this approach. It orders data points so that they can be accessed at multiple resolution levels with favorable spatial locality and caching properties. IDX has been used successfully in fields such as digital photography and visualization of large scientific data, and is a promising approach for analysis of HPC data. Unfortunately, the existing tools for writing data in this format only provide a serial interface. HPC applications must therefore either write all data from a single process or convert existing data as a post-processing step, in either case failing to utilize available parallel I/O resources. In this work, we provide an overview of the IDX file format and the existing ViSUS library that provides serial access to IDX data. We investigate methods for writing IDX data in parallel and demonstrate that it is possible for HPC applications to write data directly into IDX format with scalable performance. Our preliminary results demonstrate 60% of the peak I/O throughput when reorganizing and writing the data from 512 processes on an IBM BG/P system. We also analyze the performance bottlenecks and propose future work towards a flexible and efficient implementation.

ieee international conference on high performance computing data and analytics | 2012

Efficient data restructuring and aggregation for I/O acceleration in PIDX

Sidharth Kumar; Venkatram Vishwanath; Philip H. Carns; Joshua A. Levine; Robert Latham; Giorgio Scorzelli; Hemanth Kolla; Ray W. Grout; Robert B. Ross; Michael E. Papka; Jacqueline H. Chen; Valerio Pascucci

Hierarchical, multiresolution data representations enable interactive analysis and visualization of large-scale simulations. One promising application of these techniques is to store high performance computing simulation output in a hierarchical Z (HZ) ordering that translates data from a Cartesian coordinate scheme to a one-dimensional array ordered by locality at different resolution levels. However, when the dimensions of the simulation data are not an even power of 2, parallel HZ ordering produces sparse memory and network access patterns that inhibit I/O performance. This work presents a new technique for parallel HZ ordering of simulation datasets that restructures simulation data into large (power of 2) blocks to facilitate efficient I/O aggregation. We perform both weak and strong scaling experiments using the S3D combustion application on both Cray-XE6 (65,536 cores) and IBM Blue Gene/P (131,072 cores) platforms. We demonstrate that data can be written in hierarchical, multiresolution format with performance competitive to that of native data-ordering methods.

ieee international conference on high performance computing data and analytics | 2013

Characterization and modeling of PIDX parallel I/O for performance optimization

Sidharth Kumar; Avishek Saha; Venkatram Vishwanath; Philip H. Carns; John A. Schmidt; Giorgio Scorzelli; Hemanth Kolla; Ray W. Grout; Robert Latham; Robert Ross; Michael E. Papkafa; Jacqueline H. Chen; Valerio Pascucci

Parallel I/O library performance can vary greatly in response to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction.

Archive | 2012

The ViSUS Visualization Framework

Valerio Pascucci; Giorgio Scorzelli; Brian Summa; Peer-Timo Bremer; Attila Gyulassy; Cameron Christensen; Sujin Philip; Sidharth Kumar

19.

international conference on supercomputing | 2014

Fast Multiresolution Reads of Massive Simulation Datasets

Sidharth Kumar; Cameron Christensen; John A. Schmidt; Peer-Timo Bremer; Eric Brugger; Venkatram Vishwanath; Philip H. Carns; Hemanth Kolla; Ray W. Grout; Jacqueline H. Chen; Martin Berzins; Giorgio Scorzelli; Valerio Pascucci

Todays massively parallel simulation codes can produce output ranging up to many terabytes of data. Utilizing this data to support scientific inquiry requires analysis and visualization, yet the sheer size of the data makes it cumbersome or impossible to read without computational resources similar to the original simulation. We identify two broad classes of problems for reading data and present effective solutions for both. The first class of data reads depends on user requirements and available resources. Tasks such as visualization and user-guided analysis may be accomplished using only a subset of variables with a restricted spatial extent at a reduced resolution. The other class of reads requires full resolution multivariate data to be loaded, for example to restart a simulation. We show that utilizing the hierarchical multiresolution IDX data format enables scalable and efficient serial and parallel read access on a variety of hardware from supercomputers down to portable devices. We demonstrate interactive view-dependent visualization and analysis of massive scientific datasets using low-power commodity hardware, and we compare read performance with other parallel file formats for both full and partial resolution data.

ieee international conference on high performance computing data and analytics | 2014

Efficient I/O and storage of adaptive-resolution data

Sidharth Kumar; John Edwards; Peer-Timo Bremer; Aaron Knoll; Cameron Christensen; Venkatram Vishwanath; Philip H. Carns; John A. Schmidt; Valerio Pascucci

We present an efficient, flexible, adaptive-resolution I/O framework that is suitable for both uniform and Adaptive Mesh Refinement (AMR) simulations. In an AMR setting, current solutions typically represent each resolution level as an independent grid which often results in inefficient storage and performance. Our technique coalesces domain data into a unified, multiresolution representation with fast, spatially aggregated I/O. Furthermore, our framework easily extends to importance-driven storage of uniform grids, for example, by storing regions of interest at full resolution and nonessential regions at lower resolution for visualization or analysis. Our framework, which is an extension of the PIDX framework, achieves state of the art disk usage and I/O performance regardless of resolution of the data, regions of interest, and the number of processes that generated the data. We demonstrate the scalability and efficiency of our framework using the Uintah and S3D large-scale combustion codes on the Mira and Edison supercomputers.

ieee international conference on high performance computing data and analytics | 2012

Scalable visualization and interactive analysis using massive data streams

Valerio Pascucci; Peer-Timo Bremer; Attila Gyulassy; Giorgio Scorzelli; Cameron Christensen; Brian Summa; Sidharth Kumar

Historically, data creation and storage has always outpaced the infrastructure for its movement and utilization. This trend is increasing now more than ever, with the ever growing size of scientific simulations, increased resolution of sensors, and large mosaic images. Effective exploration of massive scientific models demands the combination of data management, analysis, and visualization techniques, working together in an interactive setting. The ViSUS application framework has been designed as an environment that allows the interactive exploration and analysis of massive scientific models in a cache-oblivious, hardware-agnostic manner, enabling processing and visualization of possibly geographically distributed data using many kinds of devices and platforms. For general purpose feature segmentation and exploration we discuss a new paradigm based on topological analysis. This approach enables the extraction of summaries of features present in the data through abstract models that are orders of magnitude smaller than the raw data, providing enough information to support general queries and perform a wide range of analyses without access to the original data.

cluster computing and the grid | 2016

Evaluation of In-Situ Analysis Strategies at Scale for Power Efficiency and Scalability

Ivan Rodero; Manish Parashar; Aaditya G. Landge; Sidharth Kumar; Valerio Pascucci; Peer-Timo Bremer

The increasing gap between available compute power and I/O capabilities is resulting in simulation pipelines running on leadership computing facilities being reformulated. In particular, in-situ processing is complementing conventional post-process analysis, however, it can be performed by using the same compute resources as the simulation or using secondary dedicated resources. In this paper, we focus on three different in-situ analysis strategies, which use the same compute resources as the ongoing simulation but different data movement strategies. We evaluate the costs incurred by these strategies in terms of run time, scalability and power/energy consumption. Furthermore, we extrapolate power behavior to peta-scale and investigate different design choices through projections. Experimental evaluation at full machine scale on Titan supports that using fewer cores per node for in-situ analysis is the optimum choice in terms of scalability. Hence, further research effort should be devoted towards developing in-situ analysis techniques following this strategy in future high-end systems.

ieee international conference on high performance computing data and analytics | 2014

Big data from scientific simulations

John Edwards; Sidharth Kumar; Valerio Pascucci

Scientific simulations often generate massive amounts of data used for debugging, restarts, and scientific analysis and discovery. Challenges that practitioners face using these types of big data are unique. Of primary importance is speed of writing data during a simulation, but this need for fast I/O is at odds with other priorities, such as data access time for visualization and analysis, efficient storage, and portability across a variety of supercomputer topologies, configurations, file systems, and storage devices. The computational power of high-performance computing systems continues to increase according to Moore’s law, but the same is not true for I/O subsystems, creating a performance gap between computation and I/O. This chapter explores these issues, as well as possible optimization strategies, the use of in situ analytics, and a case study using the PIDX I/O library in a typical simulation.

Explore More