Sekhar R. Sarukkai
Ames Research Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sekhar R. Sarukkai.
parallel computing | 1996
Jerry C. Yan; Sekhar R. Sarukkai
Abstract In this paper we describe how a performance tuning tool-set, AIMS, guides the user towards developing efficient and scalable production-level parallel programs by locating performance improvement opportunities and determining optimization benefits. AIMSs Xisk helps identify potential optimizations by computing various pre-defined normalized performance indices from program traces. Inspection of these index point to specific optimizations that may benefit program performance. After identifying and characterizing performance problems, AIMSs MK can provide quantitative estimates of performance benefits to help the user avoid arduous optimizations that may not lead to expected performance improvements by. MK also helps identify potential pitfalls or benefits of changing any of various system parameters. Based on MKs performance projection, an informed decision regarding the most beneficial program optimizations or upgrades in execution environments can be chosen.
Journal of Parallel and Distributed Computing | 1996
Rob F. Van der Wijngaart; Sekhar R. Sarukkai; Pankaj Mehra
Observations show that fine-grain software pipelines on MIMD parallel computers with asynchronous communication suffer from dynamic load imbalances which cause delays in addition to the expected pipeline fill time. An analytical model that explains these load imbalances is presented. Optimization derived from the analysis leads to significant improvements in program performance. The results of applying this optimization to general pipeline algorithms on the Intel iPSC/860, Intel Paragon, and IBM SP/2, as well as to pipelined tri-diagonal equation solvers on the Intel Paragon and the IBM SP/2, are presented.
conference on high performance computing (supercomputing) | 1995
Robert J. Block; Sekhar R. Sarukkai; Pankaj Mehra
The increasing use of massively parallel supercomputers to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. Much work has been done to create simple models that represent important characteristics of parallel programs, such as latency, network contention, and communication volume. But many of these methods still require substantial manual effort to represent an application in the models format. The MK toolkit described in this paper is the result of an on-going effort to automate the formation of analytic expressions of program execution time, with a minimum of programmer assistance. In this paper we demonstrate the feasibility of our approach, by extending previous work to detect and model communication patterns automatically, with and without overlapped computations. The predictions derived from these models agree, within reasonable limits, with execution times of programs measured on the Intel iPSC/860 and Paragon. Further, we demonstrate the use of MK in selecting optimal computational grain size and studying various scalability metrics.
international conference on supercomputing | 1994
Sekhar R. Sarukkai; Jerry Yan; Jacob K. Gotwals
Existing tools for locating performance bottlenecks of message passing parallel programs either provide visualizations or profiles of program executions only; they do not highlight the cause of poor program performance. From the perspective of the application, the location and cause of performance problems in terms of procedures, processors and data structures are all important. Identifying the cause of poor performance necessitates the need to expose how well the underlying algorithm has been mapped onto the parallel machine. In this paper, we present a suite of normalized performance indices that provide a convenient mechanism for focusing on a location with poor performance. These indices are complemented by additional indices that highlight the cause of the performance failure in terms of processors, procedures and data structure interactions. With the help of examples from the NAS benchmark suite, we show that the automatically generated indices help detect potential causes of poor performance. These indices are generated from execution traces (augmented with data structure information) obtained from monitoring program executions on the Intel iPSC/860.
international conference on supercomputing | 1996
Rob F. Van der Wijngaart; Sekhar R. Sarukkai; Pankaj Mehra
Pipelining is a common strategy for extracting parallelism from a collection of independent computational tasks, each of which is spread among a number of processors and has an implied data dependence. When implemented on MIMD parallel computers with finite process interrupt times, pipeline algorithms suffer from slowdown--in addition to the expected pipeline fill time--due to a wave-like propagation of delays. This phenomenon, which has been observed experimentally using the performance monitoring system AIMS, is investigated analytically, and an optimal correction is derived to eliminate the wave. Efficiency increase through the correction is verified experimentally.
modeling, analysis, and simulation on computer and telecommunication systems | 1994
Sekhar R. Sarukkai
Tools to study the scalability of parallel programs, as number of processors (p) executing the program and problem size (n) being solved are increased, are a critical component of performance debugging environments for parallel programs. Simulations and scalability metrics have been used to address this issue. Simulation can accurately predict the execution time of a program for a specific (n,p) pair. However, it suffers from the drawback that one needs to simulate the program for each (n,p) pair of interest. On the other hand, while scalability metrics express the program performance as functions of n and p, they have been targeted to specific applications and there are no tools to automatically obtain simple first order scalability trends for generic parallel programs. We address the issue of automatically obtaining scalability trends for a class of data-independent message passing SPMD parallel programs. We validate our approach by considering example parallel programs executed on the Intel iPSC/860 hypercube. We show that insight into the scalability of the program can be obtained, using this approach.<<ETX>>
modeling analysis and simulation on computer and telecommunication systems | 1996
Sekhar R. Sarukkai; Jerry C. Yan
In this paper we seek to demonstrate the importance of studying the effect of changes in execution environment parameters, on parallel applications executed on state-of-the-art multiprocessors. A comprehensive methodology for event-based analysis of program behavior is introduced. This methodology is used to study the performance significance of various system parameters such as processor speed, message-buffer size, buffer copy speed, network bandwidth, communication latency, interrupt overheads and other system parameters. With the help of a few CFD examples, we illustrate the use of our technique in determining suitable parameter values of the execution environment for three applications.
international parallel processing symposium | 1995
Sekhar R. Sarukkai; Jerry C. Yan; Melisa A. Schmidt
Writing efficient parallel programs is complicated by the need to select the right data-structure alignments and distributions, which determine the nature and volume of inter-processor communications. A large number of performance tools for parallel programs have been developed recently to expose the nature of these inter-processor communications. However, none of the tools support performance views or provide statistics in terms of inter-processor data-structure interactions. In this paper we discuss the use of compiler front end tools for automatically tracking data-structure movements in message passing programs, and low-overhead monitoring and postprocessing of such codes. In addition, our approach is compiler/pre-processor and platform independent. We demonstrate that robust instrumentation and low, overhead monitoring of inter-processor data-structure movements is possible, with the use of a number of NAS benchmarks run on the i860 hypercube. We also show that the data so collected con be efficiently used by post processing tools that expose performance bottlenecks using graphical displays and performance statistics.<<ETX>>
hawaii international conference on system sciences | 1996
Sekhar R. Sarukkai; Andrew C. Beers
Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. We present a methodology and tool that automatically tracks memory bindings (not the actual contents) of dynamic data-structures of message-passing C programs, and inter-processor data-structure movement, using PVM on distributed environments. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging many memory access errors. Traditional debuggers in distributed environments rely on existing sequential debuggers on each machine and simply provide an interface for querying and controlling each processors debugging session. However, to quickly locate the processor and to explain reasons for the error, we resort to run-time checking and trace based visualizations of memory access behavior across all processors. In an effort to reduce trace file size, only updates of pointer values and memory management functions are captured.
Software - Practice and Experience | 1995
Jerry C. Yan; Sekhar R. Sarukkai; Pankaj Mehra