Venkatram Vishwanath | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Venkatram Vishwanath is active.

Explore More

Publication

Featured researches published by Venkatram Vishwanath.

ieee symposium on large data analysis and visualization | 2011

Toward simulation-time data analysis and I/O acceleration on leadership-class systems

Venkatram Vishwanath; Mark Hereld; Michael E. Papka

The performance mismatch between computing and I/O components of current-generation HPC systems has made I/O the critical bottleneck for scientific applications. It is therefore critical to make data movement as efficient as possible, and, to facilitate simulation-time data analysis and visualization to reduce the data written to storage. These will be of paramount importance to enabling us to glean novel insights from simulations. We present our work in GLEAN, a flexible framework for data-analysis and I/O acceleration at extreme scale. GLEAN leverages the data semantics of applications, and fully exploits the diverse system topologies and characteristics. We discuss the performance of GLEAN for simulation-time analysis and I/O acceleration with simulations at scale on leadership class systems

ieee international conference on high performance computing data and analytics | 2011

GROPHECY: GPU performance projection from CPU code skeletons

Jiayuan Meng; Vitali A. Morozov; Kalyan Kumaran; Venkatram Vishwanath; Thomas D. Uram

We propose GROPHECY, a GPU performance projection framework that can estimate the performance benefit of GPU acceleration without actual GPU programming or hardware. Users need only to skeletonize pieces of CPU code that are targets for GPU acceleration. Code skeletons are automatically transformed in various ways to mimic tuned GPU codes with characteristics resembling real implementations. The synthesized characteristics are used by an existing analytical model to project GPU performance. The cost and benefit of GPU development can then be estimated according to the transformed code skeleton that yields the best projected performance. With GROPHECY, users can leap toward GPU acceleration only when the cost-benefit makes sense. The framework is validated using kernel benchmarks and data-parallel codes in legacy scientific applications. The measured performance of manually tuned codes deviates from the projected performance by 17% in geometric mean.

international conference on cluster computing | 2004

JuxtaView - a tool for interactive visualization of large imagery on scalable tiled displays

Naveen K. Krishnaprasad; Venkatram Vishwanath; Shalini Venkataraman; Arun G. Rao; Luc Renambot; Jason Leigh; Andrew E. Johnson; Brian Davis

JuxtaView is a cluster-based application for viewing ultra-high-resolution images on scalable tiled displays. We present in JuxtaView, a new parallel computing and distributed memory approach for out-of-core montage visualization, using LambdaRAM, a software-based network-level cache system. The ultimate goal of JuxtaView is to enable a user to interactively roam through potentially terabytes of distributed, spatially referenced image data such as those from electron microscopes, satellites and aerial photographs. In working towards this goal, we describe our first prototype implemented over a local area network, where the image is distributed using LambdaRAM, on the memory of all nodes of a PC cluster driving a tiled display wall. Aggressive prefetching schemes employed by LambdaRAM help to reduce latency involved in remote memory access. We compare LambdaRAM with a more traditional memory-mapped file approach for out-of-core visualization.

ieee international conference on high performance computing data and analytics | 2011

Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems

Venkatram Vishwanath; Mark Hereld; Vitali A. Morozov; Michael E. Papka

There is growing concern that I/O systems will be hard pressed to satisfy the requirements of future leadership-class machines. Even current machines are found to be I/O bound for some applications. In this paper, we identify existing performance bottlenecks in data movement for I/O on the IBM Blue Gene/P (BG/P) supercomputer currently deployed at several leadership computing facilities. We improve the I/O performance by exploiting the network topology of BG/P for collective I/O, leveraging data semantics of applications and incorporating asynchronous data staging. We demonstrate the efficacy of our approaches for synthetic benchmark experiments and for application-level benchmarks at scale on leadership computing systems.

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities | 2011

Examples of in transit visualization

Kenneth Moreland; Ron A. Oldfield; Pat Marion; Sébastien Jourdain; Norbert Podhorszki; Venkatram Vishwanath; Nathan D. Fabian; Ciprian Docan; Manish Parashar; Mark Hereld; Michael E. Papka; Scott Klasky

One of the most pressing issues with petascale analysis is the transport of simulation results data to a meaningful analysis. Traditional workflow prescribes storing the simulation results to disk and later retrieving them for analysis and visualization. However, at petascale this storage of the full results is prohibitive. A solution to this problem is to run the analysis and visualization concurrently with the simulation and bypass the storage of the full results. One mechanism for doing so is in transit visualization in which analysis and visualization is run on I/O nodes that receive the full simulation results but write information from analysis or provide run-time visualization. This paper describes the work in progress for three in transit visualization solutions, each using a different transport mechanism.

ieee international conference on high performance computing data and analytics | 2010

Accelerating I/O Forwarding in IBM Blue Gene/P Systems

Venkatram Vishwanath; Mark Hereld; Kamil Iskra; Dries Kimpe; Vitali A. Morozov; Michael E. Papka; Robert B. Ross; Kazutomo Yoshii

Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. I/O forwarding is a paradigm that attempts to bridge the increasing performance and scalability gap between the compute and I/O components of leadership-class machines to meet the requirements of data-intensive applications by shipping I/O calls from compute nodes to dedicated I/O nodes. I/O forwarding is a critical component of the I/O subsystem of the IBM Blue Gene/P supercomputer currently deployed at several leadership computing facilities. In this paper, we evaluate the performance of the existing I/O forwarding mechanisms for BG/P and identify the performance bottlenecks in the current design. We augment the I/O forwarding with two approaches: I/O scheduling using a work-queue model and asynchronous data staging. We evaluate the efficacy of our approaches using microbenchmarks and application-level benchmarks on leadership class systems.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Measuring Power Consumption on IBM Blue Gene/Q

Sean Wallace; Venkatram Vishwanath; Susan Coghlan; Zhiling Lan; Michael E. Papka

In addition to pushing what is possible computationally, state-of-the-art supercomputers are also pushing what is acceptable in terms of power consumption. Despite hardware manufacturers researching and developing efficient system components (e.g., processor, memory, etc.), the power consumption of a complete system remains an understudied research area. Because of the complexity and unpredictable workloads of these systems, estimating the power consumption of a full system is a nontrivial task.In this paper, we provide system-level power usage and temperature analysis of early access to Argonnes latest generation of IBM Blue Gene supercomputers, the Mira Blue Gene/Q system. The analysis is provided from the point of view of jobs running on the system. We describe the important implications these system level measurements have as well as the challenges they present. Using profiling code on benchmarks, we will also look at the new tools this latest generation of supercomputer provides and gauge their usefulness and how well they match up against the environmental data.

international conference on cluster computing | 2012

Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q

Kazutomo Yoshii; Kamil Iskra; Rinku Gupta; Peter H. Beckman; Venkatram Vishwanath; Chenjie Yu; Susan Coghlan

Power consumption is becoming a critical factor as we continue our quest toward exascale computing. Yet, actual power utilization of a complete system is an insufficiently studied research area. Estimating the power consumption of a large scale system is a nontrivial task because a large number of components are involved and because power requirements are affected by the (unpredictable) workloads. Clearly needed is a power-monitoring infrastructure that can provide timely and accurate feedback to system developers and application writers so that they can optimize the use of this precious resource. Many existing large-scale installations do feature power-monitoring sensors, however, those are part of environmental- and health monitoring sub systems and were not designed with application level power consumption measurements in mind. In this paper, we evaluate the existing power monitoring of IBM Blue Gene systems, with the goal of understanding what capabilities are available and how they fare with respect to spatial and temporal resolution, accuracy, latency, and other characteristics. We find that with a careful choice of dedicated micro benchmarks, we can obtain meaningful power consumption data even on Blue Gene/P, where the interval between available data points is measured in minutes. We next evaluate the monitoring subsystem on Blue Gene/Q, and are able to study the power characteristics of FPU and memory subsystems of Blue Gene/Q. We find the monitoring subsystem capable of providing second-scale resolution of power data conveniently separated between node components with seven seconds latency. This represents a significant improvement in power monitoring infrastructure, and hope future systems will enable real-time power measurement in order to better understand application behavior at a finer granularity.

ieee vgtc conference on visualization | 2016

In situ methods, infrastructures, and applications on high performance computing platforms

Andrew C. Bauer; Hasan Abbasi; James P. Ahrens; Hank Childs; Berk Geveci; Scott Klasky; Kenneth Moreland; Patrick O'Leary; Venkatram Vishwanath; Brad Whitlock; E.W. Bethel

The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i. e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed/visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPUs and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitioners using in situ methods in extreme‐scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.

Proceedings of the first annual ACM SIGMM conference on Multimedia systems | 2010

Multi-application inter-tile synchronization on ultra-high-resolution display walls

Sungwon Nam; Sachin Deshpande; Venkatram Vishwanath; Byungil Jeong; Luc Renambot; Jason Leigh

Ultra-high-resolution tiled-display walls are typically driven by a cluster of computers. Each computer may drive one or more displays. Synchronization between the computers is necessary to ensure that animated imagery displayed on the wall appears seamless. Most tiled-display middleware systems are designed around the assumption that only a single application instance is running in the tiled display at a time. Therefore synchronization can be achieved with a simple solution such as a networked barrier. When a tiled display has to support multiple applications at the same time, however, the simple networked barrier approach does not scale. In this paper we propose and experimentally validate two synchronization algorithms to achieve low-latency, intertile synchronization for multiple applications with independently varying frame rates. The two-phase algorithm is more generally applicable to various highresolution tiled display systems. The one-phase algorithm provides superior results but requires support for the Network Time Protocol and is more CPU-intensive.

Explore More