Vaibhav Sundriyal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vaibhav Sundriyal is active.

Explore More

Publication

Featured researches published by Vaibhav Sundriyal.

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface | 2011

Per-call energy saving strategies in all-to-all communications

Vaibhav Sundriyal; Masha Sosonkina

With the increase in the peak performance of modern computing platforms, their energy consumption grows as well, which may lead to overwhelming operating costs and failure rates. Techniques, such as Dynamic Voltage and Frequency Scaling (called DVFS) and CPU Clock Modulation (called throttling) are often used to reduce the power consumption of the compute nodes. However, these techniques should be used judiciously during the application execution to avoid significant performance losses. In this work, two implementations of the all-to-all collective operations are studied as to their augmentation with energy saving strategies on the per-call basis. Experiments were performed on the OSU MPI benchmarks as well as on a few real-world problems from the CPMD and NAS suits, in which energy consumption was reduced by up to 10% and 15.7%, respectively, with little performance degradation.

international workshop on energy efficient supercomputing | 2013

Initial investigation of a scheme to use instantaneous CPU power consumption for energy savings format

Vaibhav Sundriyal; Masha Sosonkina

The drive to extract peak performance from the modern computing platforms has lead to drastic increase in their energy and power consumption and thereby affecting the operating costs and failure rates. Modern processors provide techniques, such as dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (called throttling), to improve energy efficiency on-the-fly. Without careful application, however, DVFS and throttling may cause a significant performance loss due to the system overhead. Much research attempts to use these techniques by choosing a performance loss for the application, under which the energy savings are to be obtained. This paper discusses potential pitfalls of the performance-loss approach and proposes a frequency scaling scheme that is based on instantaneous CPU power consumption, and thus, avoids the need for the user to predefine performance tolerance. Preliminary experiments, performed on NAS benchmarks, show that the proposed scheme saves more energy than the approach based on the predefined performance loss.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Dynamic Frequency Scaling and Energy Saving in Quantum Chemistry Applications

Vaibhav Sundriyal; Masha Sosonkina; Fang Liu; Michael W. Schmidt

Modern high-performance computing system design is becoming increasingly aware of the energy proportional computing to lower the operational costs and raise reliability. At the same time, high-performance application developers are taking pro-active steps towards less energy consumption without a significant performance loss. One way to accomplish this is to change the processor frequency dynamically during application execution. In this paper, a representative computationally-intensive HPC application GAMESS is considered with the aim to investigate the energy saving potential of its various stages. GAMESS is a quantum chemistry software package used worldwide to perform {em ab initio} electronic structure calculations. This paper presents energy consumption characteristics of two Self-Consistent Field method implementations in GAMESS, which radically differ in their computer resource usages. The dynamic frequency scaling optimization is applied to these implementations and serves as verification for the proposed general energy savings model. The developed model provides the minimum of on the compute node energy consumption under a given performance loss tolerance for various processor frequencies.

Journal of Parallel and Distributed Computing | 2013

Energy saving strategies for parallel applications with point-to-point communication phases

Vaibhav Sundriyal; Masha Sosonkina; Alexander Gaenko; Zhao Zhang

Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This paper advocates for a runtime assessment of such overheads by means of characterizing point-to-point communications into phases followed by analyzing the time gaps between the communication calls. Certain communication and architectural parameters are taken into consideration in the three proposed frequency scaling strategies, which differ with respect to their treatment of the time gaps. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. For the latter, three different process-to-core mappings were studied as to their energy savings under the proposed frequency scaling strategies and under the existing state-of-the-art techniques. Close to the maximum energy savings were obtained with a low performance loss of 2% on the given platform.

Concurrency and Computation: Practice and Experience | 2013

Achieving energy efficiency during collective communications

Vaibhav Sundriyal; Masha Sosonkina; Zhao Zhang

Energy consumption has become a major design constraint in modern computing systems. With the advent of petaflops architectures, power‐efficient software stacks have become imperative for scalability. Techniques such as dynamic voltage and frequency scaling (called DVFS) and CPU clock modulation (called throttling) are often used to reduce the power consumption of the compute nodes. To avoid significant performance losses, these techniques should be used judiciously during parallel application execution. For example, its communication phases may be good candidates to apply the DVFS and CPU throttling without incurring a considerable performance loss. They are often considered as indivisible operations although little attention is being devoted to the energy saving potential of their algorithmic steps. In this work, two important collective communication operations, all‐to‐all and allgather, are investigated as to their augmentation with energy saving strategies on the per‐call basis. The experiments prove the viability of such a fine‐grain approach. They also validate a theoretical power consumption estimate for multicore nodes proposed here. While keeping the performance loss low, the obtained energy savings were always significantly higher than those achieved when DVFS or throttling were switched on across the entire application run. Copyright

symposium on computer architecture and high performance computing | 2012

Runtime Procedure for Energy Savings in Applications with Point-to-Point Communications

Vaibhav Sundriyal; Masha Sosonkina; Alexander Gaenko

Although high-performance computing has always been about efficient application execution, both energy and power consumption have become critical concerns owing to their effect on operating costs and failure rates of large-scale computing platforms. Modern microprocessors are equipped with the capabilities to reduce their power consumption using techniques such as dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (called throttling). Without careful application, however, DVFS and throttling may cause a significant performance loss due to system overhead. This work presents design considerations for a runtime procedure that dynamically analyzes blocking point-to-point communications, groups them according to the proposed criteria, and applies frequency scaling by analyzing both communication and architectural parameters without penalizing the performance much. Experiments, performed on NAS parallel benchmarks verify the proposed design by exhibiting energy savings of as much as 11% with a performance loss as low as 2%.

The Journal of Supercomputing | 2016