Balaji Subramaniam
Virginia Tech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Balaji Subramaniam.
international conference on performance engineering | 2013
Balaji Subramaniam; Wu-chun Feng
Massive data centers housing thousands of computing nodes have become commonplace in enterprise computing, and the power consumption of such data centers is growing at an unprecedented rate. Adding to the problem is the inability of the servers to exhibit energy proportionality, i.e., provide energy-efficient execution under all levels of utilization, which diminishes the overall energy efficiency of the data center. It is imperative that we realize effective strategies to control the power consumption of the server and improve the energy efficiency of data centers. With the advent of Intel Sandy Bridge processors, we have the ability to specify a limit on power consumption during runtime, which creates opportunities to design new power-management techniques for enterprise workloads and make the systems that they run on more energy proportional. In this paper, we investigate whether it is possible to achieve energy proportionality for an enterprise-class server workload, namely SPECpower_ssj2008 benchmark, by using Intels Running Average Power Limit (RAPL) interfaces. First, we analyze the power consumption and characterize the instantaneous power profile of the SPECpower benchmark within different subsystems using the on-chip energy meters exposed via the RAPL interfaces. We then analyze the impact of RAPL power limiting on the performance, per-transaction response time, power consumption, and energy efficiency of the benchmark under different load levels. Our observations and results shed light on the efficacy of the RAPL interfaces and provide guidance for designing power-management techniques for enterprise-class workloads.
2013 International Green Computing Conference Proceedings | 2013
Balaji Subramaniam; Winston A. Saunders; Thomas R. W. Scogland; Wu-chun Feng
A recent study shows that computation per kilowatt-hour has doubled every 1.57 years, akin to Moores Law. While this trend is encouraging, its implications to high-performance computing (HPC) are not yet clear. For instance, DARPAs target of a 20-MW exaflop system will require a 56.8-fold performance improvement with only a 2.4-fold increase in power consumption, which seems unachievable in light of the above trend. To provide a more comprehensive perspective, we analyze current trends in energy efficiency from the Green500 and project expectations for the near future. Specifically, we first provide an analysis of energy efficiency trends in HPC systems from the Green500. We then model and forecast the energy efficiency of future HPC systems. Next, we present exascalar - a holistic metric to measure the distance from the exaflop goal. Finally, we discuss our efforts to standardize power measurement methodologies in order to provide the community with reliable and accurate efficiency data.
cluster computing and the grid | 2014
Balaji Subramaniam; Wu-chun Feng
The increasing demand for computation and the commensurate rise in the power density of data centers have led to increased costs associated with constructing and operating a data center. Exacerbating such costs, data centers are often over-provisioned to avoid costly outages associated with the potential overloading of electrical circuitry. However, such over-provisioning is often unnecessary since a data center rarely operates at its maximum capacity. It is imperative that we maximize the use of the available power budget in order to enhance the efficiency of data centers. On the other hand, introducing power constraints to improve the efficiency of a data center can cause unacceptable violation of performance agreements (i.e., throughput and response time constraints). As such, we present a thorough empirical study of performance under power constraints as well as a runtime system to set appropriate power constraints for meeting strict performance targets. In this paper, we design a runtime system based on a load prediction model and an optimization framework to set the appropriate power constraints to meet specific performance targets. We then present the effects of our runtime system on energy proportionality, average power, performance, and instantaneous power consumption of enterprise applications. Our results shed light on mechanisms to tune the power provisioned for a server under strict performance targets and opportunities to improve energy proportionality and instantaneous power consumption via power limiting.
Computer Science - Research and Development | 2013
Thomas R. W. Scogland; Balaji Subramaniam; Wu-chun Feng
Energy efficiency is now a top priority. The first four years of the Green500 have seen the importance of energy efficiency in supercomputing grow from an afterthought to the forefront of innovation as we approach a point where systems become increasingly constrained by power consumption. Even so, the landscape of energy efficiency in supercomputing continues to shift—with new trends emerging and unexpected shifts in previous predictions.This paper offers an in-depth analysis of the new and shifting trends in the Green500. In addition, the analysis offers early indications of the path that we are taking towards exascale and what an exascale machine in 2018 is likely to look like. Lastly, we discuss the emerging efforts and collaborations toward designing and establishing better metrics, methodologies, and workloads for the measurement and analysis of energy-efficient supercomputing.
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011
Thomas R. W. Scogland; Balaji Subramaniam; Wu-chun Feng
It has been traditionally viewed that as the scale of a supercomputer increases, its energy efficiency decreases due to performance that scales sub-linearly and power consumption that scales at least linearly with size. However, based on the first three years of the Green500, this view does not hold true for the fastest supercomputers in the world. Many reasons for this counterintuitive trend have been proposed -- with improvements in feature size, more efficient networks, and larger numbers of slower cores being amongst the most prevalent. Consequently, this paper provides an analysis of emerging trends in the Green500 and delves more deeply into how larger-scale supercomputers compete with smaller-scale supercomputers with respect to energy efficiency. In addition, our analysis provides a compelling early indicator of the future of exascale computing. We then close with a discussion on the evolution of the Green500 based on community feedback.
international parallel and distributed processing symposium | 2008
Nagarajan Venkateswaran; Vinoth Krishnan Elangovan; Karthik Ganesan; T. R. S. Sagar; S. Aananthakrishanan; S. Ramalingam; Shyamsundar Gopalakrishnan; Madhavan Manivannan; Deepak Srinivasan; Viswanath Krishnamurthy; Karthik Chandrasekar; Viswanath Venkatesan; Balaji Subramaniam; V. Sangkar; Aravind Vasudevan; Shrikanth Ganapathy; Sriram Murali; M. Thyagarajan
In this paper we present a novel cluster paradigm and silicon operating system. Our approach in developing the competent cluster design revolves around an execution model to aid the execution of multiple independent applications simultaneously on the cluster, leading to cost sharing across applications. The execution model should envisage simultaneous execution of multiple applications (running traces of multiple independent applications in the same node at an instant, without time sharing) and on all the partitionsf nodes) of a single cluster, without sacrificing the performance of individual application, unlike in the current cluster models. Performance scalability is achieved as we increase the number of nodes, the problem size of the individual independent applications, due to non-dependency across applications and hence increase in the number of non-dependent operations (as the problem sizes of the applications get increased) and this leads to better utilization of the unused resources within the node. This execution model is very much dependent on the node architecture for performance scalability. This would be a major initiative towards achieving performance cost-effective supercomputing.
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2014
Balaji Subramaniam; Wu-chun Feng
The computing community is facing several big data challenges due to the unprecedented growth in the volume and variety of data. Many large-scale Internet companies use distributed NoSQL data stores to mitigate these challenges. These NoSQL data-store installations require massive computing infrastructure, which consume significant amount of energy and contribute to operational costs. This cost is further aggravated by the lack of energy proportionality in servers.
cluster computing and the grid | 2016
Vignesh Adhinarayanan; Balaji Subramaniam; Wu-chun Feng
Accurate power estimation at runtime is essential for the efficient functioning of a power management system. While years of research have yielded accurate power models for the online prediction of instantaneous power for CPUs, such power models for graphics processing units (GPUs) are lacking. GPUs rely on low-resolution power meters that only nominally support basic power management. To address this, we propose an instantaneous power model, and in turn, a power estimator, that uses performance counters in a novel way so as to deliver accurate power estimation at runtime. Our power estimator runs on two real NVIDIA GPUs to show that accurate runtime estimation is possible without the need for the high-fidelity details that are assumed on simulation-based power models. To construct our power model, we first use correlation analysis to identify a concise set of performance counters that work well despite GPU device limitations. Next, we explore several statistical regression techniques and identify the best one. Then, to improve the prediction accuracy, we propose a novel application-dependent modeling technique, where the model is constructed online at runtime, based on the readings from a low-resolution, built-in GPU power meter. Our quantitative results show that a multi-linear model, which produces a mean absolute error of 6%, works the best in practice. An application-specific quadratic model reduces the error to nearly 1%. We show that this model can be constructed with low overhead and high accuracy at runtime. To the best of our knowledge, this is the first work attempting to model the instantaneous power of a real GPU system, earlier related work focused on average power.
cluster computing and the grid | 2014
Balaji Subramaniam
Improving non-peak power efficiency has the potential to significantly enhance the efficiency of a data center and allows us to host more resources under a given power budget. In this paper, we use RAPL interfaces to analyze and model the performance (both throughput and response time) of SPECweb benchmark under subsystem-level power limits. We show that performance under a subsystem-level power limits can be modeled using simple and well-studied non-linear models. We then leverage a load prediction model and an optimization framework to create a runtime system for power management of enterprise application. Our work shows that effective subsystem-level power capping improves the energy proportionality of the server.
international conference on parallel processing | 2012
Balaji Subramaniam; Wu-chun Feng
In the context of the rapid slowing of Dennards scaling, we characterize the efficacy of one of the power-management mechanisms, namely concurrency throttling, which adapts the concurrency (i.e., number of active threads per core) of an application via simultaneous multithreading (SMT). SMT can potentially improve the processor utilization and thus the efficiency of the processor for parallel programs by filling-in the unused issue slots and hiding memory latency on wide-issue super scalar architectures. However, the benefit of using concurrency throttling is highly dependent on the workload. Moreover, previous work in this area occurred on microprocessor platforms where the slowdown effect of Dennards scaling had yet to have significant effect, thus providing the motivation for our work: understanding the efficacy of SMT on a modern multicore platform.