Sudarshan Srinivasan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sudarshan Srinivasan is active.

Explore More

Publication

Featured researches published by Sudarshan Srinivasan.

international symposium on quality electronic design | 2012

Functional test pattern generation for maximizing temperature in 3D IC chip stack

Sudarshan Srinivasan; Sandip Kundu

In a stacked 3D Integrated Circuit (IC), the total power dissipated per unit surface area typically exceeds that of 2D ICs. This results in creation of a greater number of localized thermal hotspots in individual dies of the 3D IC. The location and temperature of these hotspots depend on the actual workload executing on a 3D IC. Since the power dissipation pattern from the applied workload may vary over time, the location and intensity of thermal hotspots may vary with it. The applied workload may be construed as consisting of phases, where the spatial power dissipation pattern remains constant over a phase and changes only from one phase to another. In this paper (i) we develop a thermal modeling scheme that predicts temperature profile at the end of a program phase, and use (ii) a novel Integer Linear Programming (ILP) formulation to arrange program phases to create worst case temperature at a target location. Experimental results show that, by taking the spatio-temporal effect into account, we can raise temperature of a hotspot much higher than what is possible from purely functional trace. Hotspot temperature maximization is important in design verification and testing.

IEEE Transactions on Parallel and Distributed Systems | 2016

Exploring Heterogeneity within a Core for Improved Power Efficiency

Sudarshan Srinivasan; Nithesh Kurella; Israel Koren; Sandip Kundu

Asymmetric multi-core processors (AMPs) comprise cores with different sizes of micro-architectural resources yielding very different performance and energy characteristics. Since the computational demands of workloads vary from one task to the other, AMPs can often provide a higher power efficiency than symmetric multi-cores. Furthermore, as the computational demands of a task change during its course of execution, reassigning the task from one core to another, where it can run more efficiently, can further improve the overall power efficiency. However, too frequent re-assignments of tasks to cores may result in high overhead. To greatly reduce this overhead, we propose a morphable core architecture that can dynamically adapt its resource sizes, operating frequency and voltage to assume one of four possible core configurations. Such a morphable architecture allows more frequent task to core configuration re-assignments for a better match between the current needs of the task and the available resources. To make the online morphing decisions we have developed a runtime analysis scheme that uses hardware performance counters. Our results indicate that the proposed morphable architecture controlled by the runtime management scheme, can improve the throughput/Watt of applications by 31 percent over executing on a static out-of-order core while the previously proposed big/little morphable architecture achieves only a 17 percent improvement.

ieee computer society annual symposium on vlsi | 2013

Program phase duration prediction and its application to fine-grain power management

Sudarshan Srinivasan; Raghavan Kumar; Sandip Kundu

To achieve energy optimal computing, processor resources must be adjusted dynamically to the computing needs of a program. The computational needs of an application may change during its execution depending on the type and locality of the processed data. It has been previously suggested that while a processor waits for data on a cache miss, dynamic voltage and frequency scaling (DVFS) may be used to reduce the energy consumption. However, due to the overheads involved in DVFS such as capacitor charging/discharging time and PLL locking time, fine-grain DVFS did not gain attraction. In this paper, we present a fine-grain DVFS scheme based on the prediction of program execution behavior. If a program is predicted to stay in a low IPC mode for a long period, it may be worthwhile to tolerate the PLL lock time overhead for achieving potential energy savings. The run-time prediction scheme is based on a hardware based dynamic program phase classification and next phase duration estimation. The phase duration prediction scheme is based on a linear weighted least square estimation (WLSE) approach, which is fast and incurs very low hardware overhead. Based on the simulation of several memory intensive SPEC2000 benchmarks, we show that energy reduction of > 7% can be achieved from fine-grain DVFS scheme over the traditional DVFS approach.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012

A Wavelet-Based Spatio-Temporal Heat Dissipation Model for Reordering of Program Phases to Produce Temperature Extremes in a Chip

Sudarshan Srinivasan; Kunal P. Ganeshpure; Sandip Kundu

Localized heating leads to generation of thermal hotspots that affect the performance and reliability of an integrated circuit (IC). Functional workloads determine the locations and temperatures of hotspots on a die. In this paper, we present a systematic approach for developing a synthetic workload to maximize the temperature of a target hotspot. Our approach is based on the observation that hotspot temperature is determined not only by the current activity in that region, but also by the past activities in the surrounding regions. Accordingly, we develop a wavelet-based canonical spatio-temporal heat dissipation model for program traces, and use a novel integer linear programming formulation to rearrange program phases to generate target worst case hotspot temperature. Program phase behavior is rooted in the static structure of programs. In this case, the initial set of program phases is extracted from the SPEC 2000 benchmark. We apply this formulation to target another well-known problem of maximizing the temperature between a pair of coordinates in an IC. Experimental results show that by taking the spatio-temporal effect into account, we can raise the temperature of a hotspot higher than what is otherwise possible. Hotspot temperature maximization is important in design verification and testing.

international symposium on quality electronic design | 2011

Maximizing hotspot temperature: Wavelet based modelling of heating and cooling profile of functional workloads

Sudarshan Srinivasan; Kunal P. Ganeshpure; Sandip Kundu

Localized heating leads to generation of thermal hotspots that affect performance and reliability of a chip. Functional workloads determine the locations and temperature of hotspots on a die. Programs are classified into phases based on program execution profile. During a phase, spatial power dissipation pattern of an application remains unchanged. In this paper we present a systematic approach for developing a synthetic work load which is formed by a combination of phases extracted from functional work load which maximizes the temperature of a hotspot. Hotspot temperature is determined not only by the current activity in that region, but also by the past activities in the surrounding regions. Therefore, if the surrounding areas were “pre-heated” with a different workload, then the target region may become hotter due to slower rate of lateral heat dissipation. In this paper a wavelet-based canonical power dissipation model is developed to capture the temporal and spatial behavior of the power traces. This is followed by an Integer Linear Programming approach which is used to determine the sequence of these program phases in order to create a worst case temperature at the Hotspot. The novel contributions of this paper are (i) wavelet based technique to model spatio-temporal power variation for the phases in the functional workload and a (ii) linear programming scheme that arranges program phases to create the worst case temperature.

international conference on computer design | 2015

Online mechanism for reliability and power-efficiency management of a dynamically reconfigurable core

Sudarshan Srinivasan; Israel Koren; Sandip Kundu

Previous studies have shown that the best way to achieve high throughput/Watt of a single threaded application is by running it on an asymmetric multicore processor (AMP). AMPs feature cores that are tuned for specific workload characteristics. To increase efficiency, the core that offers the best power-performance trade-off for the executing thread is chosen. To reduce the overhead of thread migration, we have previously proposed a morphable core that can morph into multiple core types. In this study, apart from power-performance efficiency, we also consider the reliability of the different core types as indicated by their vulnerability to soft-errors. We show that the best core type for power-efficiency may not be the best for reliability. Accordingly, we develop a multi-objective thread migration strategy to determine the best core type considering power efficiency and reliability. To support runtime decision making, we have developed online estimators for reliability and power efficiency based on performance monitoring counters. In keeping with the existing literature, we use the architectural vulnerability factor (AVF) as the metric for reliability and instructions-per-second2/Watt as the metric for power efficiency. For the multi-objective optimization we use a Cobb-Douglas production function. Our results indicate that the proposed runtime mechanism for reliability and power-efficiency improves, on the average, the throughput/Watt of applications by 24% and reduces the Soft-Error Rate (SER) by 12% compared to the best static execution.

international conference on parallel architectures and compilation techniques | 2014

A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency

Sudarshan Srinivasan; Nithesh Kurella; Israel Koren; Rance Rodrigues; Sandip Kundu

Asymmetric multicore processors (AMPs) consist of cores executing the same ISA, but differing in microarchitectural resources, performance, and power consumption. As the computational bottleneck of a workload shifts from one resource to the next, during its course of execution, reassigning it to the core where it runs most efficiently can improve the overall energy efficiency. Simulation studies show that the performance bottlenecks can shift frequently, often within a few thousands cycles. With frequent core hooping, the overhead of thread migration becomes significant. To mitigate this overhead, we propose a morphable core that can assume one of four possible configurations to address the dominant performance bottlenecks, while retaining the same cache and registers. This way the architectural state remains intact while the morphable core is reconfigured in resources and frequency. We then implement a runtime scheme to decide the best configuration to run on and switch configuration as necessary. Simulation results indicate that on the average, the proposed scheme results in performance/watt improvement of 41%.

international conference on vlsi design | 2016

Dynamic Reconfiguration vs. DVFS: A Comparative Study on Power Efficiency of Processors

Sudarshan Srinivasan; Nithesh Kurella; Israel Koren; Sandip Kundu

As the processor manufacturers focus on designing performance and power efficient cores, asymmetric multicore processors (AMP) have emerged as a viable option. AMPs include multiple cores with varying power and performance characteristics, allowing a workload to choose the best core for power efficiency, which varies during the course of execution. Naturally, this involves migration of execution across cores, which also entails performance and power overhead. This overhead limits the frequency of migration. Recent papers has proposed the implementation of micro architectural heterogeneity within a core, that makes fine grain switching possible by reducing migration overhead. Another way to implement heterogeneity in core is application of Dynamic Voltage and Frequency Scaling (DVFS). DVFS also involves a transition overhead during which no execution can take place. Recent designs report faster voltage transition and PLL relock time, thus enabling fine grain DVFS. In this work, we study the power efficiency at fine grain switching of a dynamically reconfigurable non-monotonic processor versus pure DVFS. Comparison is extended to a Big/Little processor with both fine and coarse grain switching support.

ieee computer society annual symposium on vlsi | 2013

A study on polymorphing superscalar processor dynamically to improve power efficiency

Sudarshan Srinivasan; Rance Rodrigues; Arunachalam Annamalai; Israel Koren; Sandip Kundu

Asymmetric Multicore Processors (AMP) have emerged as likely candidates to solve the performance/power conundrum in the current generation of processors. Most recent work in this area evaluate such multicores by considering large (usually out-of-order (OOO)) and small (usually in-order (InO)) cores on the same chip. Dynamic online swapping of threads between these cores is then facilitated whenever deemed beneficial. However, if threads are swapped too often, the overheads may negatively impact the benefits of swapping. Hence, in most recent work, thread swapping decisions are made at coarse grain instruction granularities, leaving out many opportunities. In this paper, we propose a scheme to mitigate the penalty imposed by thread swapping and yet achieve all the benefits of AMPs. Here, a single superscalar OOO core morphs itself into an InO core at runtime, whenever determined to be performance/Watt efficient. Certain Intel processors already have a similar mechanism to statically morph an OOO core to an InO core to facilitate debug. We extend this existing capability to perform dynamic core morphing at runtime with an orthogonal objective of improving power efficiency. Results indicate that on an average, performance/Watt benefits of 10% can be extracted by our proposed morphing scheme at a very small performance penalty of 3.8%. Since this scheme is based on existing mechanisms readily available in current microprocessors, it incurs no hardware overheads.

international conference on computer design | 2016

Improving performance per Watt of non-monotonic Multicore Processors via bottleneck-based online program phase classification

Sudarshan Srinivasan; Israel Koren; Sandip Kundu

Heterogeneous architectures offer the promise of higher performance/Watt compared to symmetric multi-cores. Recent works have proposed the use of non-monotonic (NM) heterogeneous architectures with diverse core types where each core has unique power and performance characteristics. However, the power and performance benefits achieved by NM architectures is highly dependent on assignment of application to the most suitable core type for all program phases. In this paper we propose a novel online program phase detection technique that is based on the frequency of cache misses and processor stalls which correspond to core resource bottlenecks. We track performance monitors to formulate a Bottleneck Type Vector (BTV) that help direct the application to most appropriate core type for execution. We compare the proposed BTV-based core assignment method to prior online core assignment approaches and demonstrate as much as 22% improvement in average performance/Watt using Instructions per Second (IPS) as the performance metric.

Explore More