Seetharami R. Seelam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seetharami R. Seelam is active.

Explore More

Publication

Featured researches published by Seetharami R. Seelam.

virtual execution environments | 2007

Virtual I/O scheduler: a scheduler of schedulers for performance virtualization

Seetharami R. Seelam; Patricia J. Teller

Virtualized storage systems are required to service concurrently executing workloads, with potentially diverse data delivery requirements, that are running under multiple operating systems. Although a number of algorithms have been developed for I/O performance virtualization among operating system (OS) instances and their applications, none results in absolute performance virtualization. By absolute performance virtualization we mean that the performance experienced by applications of one operating system does not suffer due to variations in the I/O request stream characteristics of applications of other operating systems. Key requirements of I/O performance virtualization are fairness and performance isolation. In this paper, we present a novel virtual I/O scheduler (VIOS) that provides absolute performance virtualization by being fair in sharing I/O system resources among operating systems and their applications, and provides performance isolation in the face of variations in the characteristics of I/O streams. The VIOS controls the coarse grain allocation of disk time to the different operating system instances and is OS independent; optionally, a set of OS-dependent schedulers may determine the fine-grain interleaving of requests from the corresponding operating systems to the storage system.

international parallel and distributed processing symposium | 2010

Extreme scale computing: Modeling the impact of system noise in multicore clustered systems

Seetharami R. Seelam; Liana L. Fong; Asser N. Tantawi; John Lewars; John Divirgilio; Kevin J. Gildea

System noise or Jitter is the activity of hardware, firmware, operating system, runtime system, and management software events. It is shown to disproportionately impact application performance in current generation large-scale clustered systems running general-purpose operating systems (GPOS). Jitter mitigation techniques such as co-scheduling jitter events across operating systems improve application performance but their effectiveness on future petascale systems is unknown. To understand if existing co-scheduling solutions enable scalable petascale performance, we construct two complementary jitter models based on detailed analysis of system noise from the nodes of a large-scale system running a GPOS. We validate these two models using experimental data from a system consisting of 128 GPOS instances with 4096 CPUs. Based on our models, we project a minimum slowdown of 2.1%, 5.9%, and 11.5% for applications executing on a similar one petaflop system running 1024 GPOS instances and having global synchronization operations once every 1000 msec, 100 msec, and 10 msec, respectively. Our projections indicate that additional system noise mitigation techniques are required to contain the impact of jitter on multi-petaflop systems, especially for tightly synchronized applications.

international parallel and distributed processing symposium | 2008

Early experiences in application level I/O tracing on blue gene systems

Seetharami R. Seelam; I-Hsin Chung; Ding-Yong Hong; Hui-Fang Wen; Hao Yu

On todays massively parallel processing (MPP) supercomputers, it is increasingly important to understand I/O performance of an application both to guide scalable application development and to tune its performance. These two critical steps are often enabled by performance analysis tools to obtain performance data on thousands of processors in an MPP system. To this end, we present the design, implementation, and early experiences of an application level I/O tracing library and the corresponding tool for analyzing and optimizing I/O performance on Blue Gene (BG) MPP systems. This effort was a part of IBM UPC Toolkit for BG systems. To our knowledge, this is the first comprehensive application-level I/O monitoring, playback, and optimizing tool available on BG systems. The preliminary experiments on popular NPB BTIO benchmark show that the tool is much useful on facilitating detailed I/O performance analysis.

international parallel and distributed processing symposium | 2008

A framework for automated performance bottleneck detection

I-Hsin Chung; Guojing Cong; David J. Klepacki; Simone Sbaraglia; Seetharami R. Seelam; Hui-Fang Wen

In this paper, we present the architecture design and implementation of a framework for automated performance bottleneck detection. The framework analyzes the time-spent distribution in the application and discovers the performance bottlenecks by using given bottleneck definitions. The user can query the application execution performance to identify performance problems. The design of the framework is flexible and extensible so it can be tailored based on the actual application execution environment and performance tuning requirement. To demonstrate the usefulness of the framework, we apply the framework on a practical DARPA application and show how it helps to identify performance bottlenecks. The framework helps to automate the performance tuning process and improve the users productivity.

quantitative evaluation of systems | 2007

A Productivity Centered Tools Framework for Application Performance Tuning

Hui-Fang Wen; Simone Sbaraglia; Seetharami R. Seelam; I-Hsin Chung; Guojing Cong; David J. Klepacki

Our productivity centered performance tuning framework for HPC applications comprises of three main components: (1) a versatile source code, performance metrics, and performance data visualization and analysis graphical user interface, (2) a unique source code and binary instrumentation engine, and (3) an array of data collection facilities to gather performance data across various dimensions including CPU, message passing, threads, memory and I/O. We believe that the ability to decipher performance impacts at the source level and the ability to probe the application with different tools at the same time at varying granularities, while hiding the complications of binary instrumentation, leads to higher productivity of scientists in understanding and tuning the performance of associated computing systems and applications.

ieee conference on mass storage systems and technologies | 2012

vPFS: Bandwidth virtualization of parallel storage systems

Yiqi Xu; Dulcardo Arteaga; Ming Zhao; Yonggang Liu; Renato J. O. Figueiredo; Seetharami R. Seelam

Existing parallel file systems are unable to differentiate I/Os requests from concurrent applications and meet per-application bandwidth requirements. This limitation prevents applications from meeting their desired Quality of Service (QoS) as high-performance computing (HPC) systems continue to scale up. This paper presents vPFS, a new solution to address this challenge through a bandwidth virtualization layer for parallel file systems. vPFS employs user-level parallel file system proxies to interpose requests between native clients and servers and to schedule parallel I/Os from different applications based on configurable bandwidth management policies. vPFS is designed to be generic enough to support various scheduling algorithms and parallel file systems. Its utility and performance are studied with a prototype which virtualizes PVFS2, a widely used parallel file system. Enhanced proportional sharing schedulers are enabled based on the unique characteristics (parallel striped I/Os) and requirement (high throughput) of parallel storage systems. The enhancements include new threshold- and layout-driven scheduling synchronization schemes which reduce global communication overhead while delivering total-service fairness. An experimental evaluation using typical HPC benchmarks (IOR, NPB BTIO) shows that the throughput overhead of vPFS is small (<;3% for write, <;1% for read). It also shows that vPFS can achieve good proportional bandwidth sharing (>;96% of target sharing ratio) for competing applications with diverse I/O patterns.

international parallel and distributed processing symposium | 2010

Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems

Seetharami R. Seelam; I-Hsin Chung; John H. Bauer; Hui-Fang Wen

We present an application-level I/O caching, prefetching, asynchronous system to hide access latency experienced by HPC applications. Our solution of user controllable caching and prefetching system maintains a file-IO cache in the user space of the application, analyzes the I/O access patterns, prefetches requests, and performs write-back of dirty data to storage asynchronously. So each time the application needs the data it does not have to pay the full I/O latency penalty in going to the storage and getting the required data. We have implemented this caching and asynchronous access system on the Blue Gene (BG/L and BG/P) systems. We present experimental results with NAS BT, MADbench, and WRF benchmarks. The results on BG/P system demonstrate that our method hides access latency, enhances application I/O access time by as much as 100%, and improves WRF execution time over 10%.

international parallel and distributed processing symposium | 2009

Application level I/O caching on Blue Gene/P systems

Seetharami R. Seelam; I-Hsin Chung; John H. Bauer; Hao Yu; Hui-Fang Wen

In this paper, we present an application level aggressive I/O caching and prefetching system to hide I/O access latency experienced by out-of-core applications. Without the application level prefetching and caching capability, users of I/O intensive applications need to rewrite them with asynchronous I/O calls or restructure their code with MPI-IO calls to efficiently use the large scale system resources. Our proposed solution of user controllable aggressive caching and prefetching system maintains a file-IO cache in the user space of the application, analyzes the I/O access patterns, prefetches requests, and performs write-back of dirty data to storage asynchronously. So each time the application needs the data it does not have to pay the full I/O latency penalty in going to the storage and getting the required data. We have implemented this aggressive caching and asynchronous prefetching on the Blue Gene/P (BGP) system. The preliminary experiment evaluates the caching performance using the WRF benchmark. The results on BGP system demonstrate that our method improves application I/O throughput.

job scheduling strategies for parallel processing | 2012

Partitioned Parallel Job Scheduling for Extreme Scale Computing

David P. Brelsford; George Chochia; Nathan Falk; Kailash N. Marthi; Ravindra R. Sure; Norman Bobroff; Liana Fong; Seetharami R. Seelam

Recent success in building extreme computing systems poses new challenges in job scheduling design to support cluster sizes that can execute million’s of concurrent tasks. We show that for these extreme scale clusters the resource demand at a centralized scheduler can exceed the capacity or limit the ability of the scheduler to perform well. This paper introduces partitioned scheduling, a hybrid centralized and distributed approach in which compute nodes are assigned to the job centrally, while task to local node resources assignments are performed subsequently at the assigned job nodes. This reduces the memory and processing growth at the central scheduler, and improves the scaling behavior of scheduling time by enabling operations to be done in parallel at the job nodes. When local resource assignments must be distributed to all other job nodes, the partitioned approach trades central processing for increased network communications. Thus, we introduce features that improve communications such as pipelining that leverage the presence of the high speed cluster network. The new system is evaluated for jobs with up to 50K tasks on clusters with 496 nodes and 128 tasks per node. The partitioned scheduling approach is demonstrated to reduce processor and memory usage at the central processor and improve job scheduling and job dispatching times up to an order of magnitude.

european conference on parallel processing | 2004

Profiling and Tracing OpenMP Applications with POMP Based Monitoring Libraries

Luiz DeRose; Bernd Mohr; Seetharami R. Seelam

In this paper we present a collection of tools that are based on the pomp performance monitoring interface for analysis of openmp applications. These pomp compliant libraries, pomprof and the kojak pomp library, provide respectively the functionality for profiling and tracing of openmp applications. In addition, we describe a new approach to compute temporal overhead due to scheduling (load-imbalance), synchronization (barrier time), and the runtime system. Finally, we exemplify the use of these libraries with performance measurement and visualization of the asci sppm benchmark code. Our examples show that the information provided by both tools is consistent, and provides data that is helpful for users to understand the source of performance problems.

Explore More