Mahesh Rajan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mahesh Rajan is active.

Explore More

Publication

Featured researches published by Mahesh Rajan.

Archive | 2009

Improving performance via mini-applications.

Sandia Report; Michael A. Heroux; Douglas W. Doerfler; Paul S. Crozier; James M. Willenbring; H. Carter Edwards; Alan B. Williams; Mahesh Rajan; Eric R. Keiter; Heidi K. Thorn; Robert W. Numrich

Application performance is determined by a combination of many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, we find that the use of mini-applications - small self-contained proxies for real applications - is an excellent approach for rapidly exploring the parameter space of all these choices. Furthermore, use of mini-applications enriches the interaction between application, library and computer system developers by providing explicit functioning software and concrete performance results that lead to detailed, focused discussions of design trade-offs, algorithm choices and runtime performance issues. In this paper we discuss a collection of mini-applications and demonstrate how we use them to analyze and improve application performance on new and future computer platforms.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Investigating the Impact of the Cielo Cray XE6 Architecture on Scientific Application Codes

Mahesh Rajan; Richard F. Barrett; Douglas Doerfler; Kevin Pedretti

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaigns newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Crays Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.

international conference on cluster computing | 2015

Toward Rapid Understanding of Production HPC Applications and Systems

Anthony Michael Agelastos; Benjamin A. Allan; James M. Brandt; Ann C. Gentile; Sophia Lefantzi; Stephen Todd Monk; Jeffrey Brandon Ogden; Mahesh Rajan; Joel O. Stevenson

A detailed understanding of HPC applications resource needs and their complex interactions with each other and HPC platform resources is critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.

Concurrency and Computation: Practice and Experience | 2012

Application-driven analysis of two generations of capability computing: the transition to multicore processors

Mahesh Rajan; Douglas Doerfler; Richard F. Barrett; Paul Lin; Kevin Pedretti; K. Scott Hemmert

Multicore processors form the basis of most traditional high performance parallel processing architectures. Early experiences with these computers showed significant performance problems, both with regard to computation and inter‐process communication. The transition from Purple, an IBM POWER5‐based machine, to Cielo, a Cray XE6, as the main capability computing platform for the United States Department of Energys Advanced Simulation and Computing campaign provides an opportunity to reexamine these issues after experiences with a few generations of multicore‐based machines. Experiences with Purple identified some important characteristics that led to strong performance of complex scientific application programs at very large scales. Herein, we compare the performance of some Advanced Simulation and Computing mission critical applications at capability scale across this transition to multicore processors. Copyright

International Journal of Distributed Systems and Technologies | 2010

Application Performance on the Tri-Lab Linux Capacity Cluster-TLCC

Douglas Doerfler; Marcus Epperson; Jeff Ogden; Mahesh Rajan

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is representative of the trend in HPC architectures. This paper examines application performance on TLCC contrasting them with Red Storm/Cray XT4. TLCC and Red Storm share similar AMD processors and memory DIMMs. Red Storm however has single socket nodes and custom interconnect. Micro-benchmarks and performance analysis tools help understand the causes for the observed performance differences. Control of processor and memory affinity on TLCC with the numactl utility is shown to result in significant performance gains and is essential to attenuate the detrimental impact of OS interference and cache-coherency overhead. While previous studies have investigated impact of affinity control mostly in the context of small SMP systems, the focus of this paper is on highly parallel MPI applications.

Archive | 2007

Supercomputer and Cluster Performance Modeling and Analysis Efforts: 2004-2006

Judith E. Sturtevant; Anand Ganti; Harold Edward Meyer; Joel O. Stevenson; Robert E. Benner; Susan Phelps Goudy; Douglas W. Doerfler; Stefan P. Domino; Mark A. Taylor; Robert Joseph Malins; Ryan T. Scott; Daniel Wayne Barnette; Mahesh Rajan; James Alfred Ang; Amalia Rebecca Black; Thomas William Laub; Brian Claude Franke

This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandias engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandias capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandias supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

ieee international conference on high performance computing data and analytics | 2012

Unprecedented Scalability and Performance of the New NNSA Tri-Lab Linux Capacity Cluster 2

Mahesh Rajan; Douglas Doerfler; Paul Lin; Simon D. Hammond; Richard F. Barrett

As one of the largest users of supercomputing resources in the world, capacity computing is a critical component in the NNSAs (National Nuclear Security Administration) Advanced Simulation and Computing (ASC) program. The latest acquisitions in this program - named the Tri-Lab Linux Capacity Cluster 2 (TLCC2) machines - have recently been installed at the three NNSA laboratories: Sandia National Laboratories, Los Alamos National Laboratories and Lawrence Livermore National Laboratories. In this paper we investigate performance on Chama, Sandias 1232 node cluster, with dual socket Intel Xeon Sandy Bridge processors connected using Qlogic QDR InfiniBand. Production applications benchmarked on Chama reveal significant improvements in the time to solution and scalability when compared against our earlier generation capacity clusters and a PetaFlops class capability machine.

ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010 | 2010

HPC application performance and scaling : understanding trends and future challenges with application benchmarks on past, present and future Tri-Lab computing systems.

Mahesh Rajan; Douglas Doerfler

In this paper HPC architectural characteristics and their impact on application performance and scaling are investigated. Performance data gathered over several generations of very large HPC systems like: ASC Red Storm, ASC Purple, and a large InfiniBand cluster—Red Sky, are analyzed. As the number of cache coherent cores and number of NUMA domains at a compute node keeps increasing, we analyze their impact with a few simple benchmarks and several applications. We present bottlenecks and remedies examining production applications. We conclude with preliminary early‐hardware performance data from the ASC Cielo, a petaFLOPS class future capability system.

ieee international conference on high performance computing data and analytics | 2014

The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications

Anthony Michael Agelastos; Benjamin A. Allan; Jim M. Brandt; paul cassella; Jeremy Enos; Joshi Fullop; Ann C. Gentile; Steve Monk; Nichamon Naksinehaboon; Jeff Ogden; Mahesh Rajan; Michael T. Showerman; Joel O. Stevenson; Narate Taerat; Thomas Tucker

Archive | 2011