Anil Kanduri
Information Technology University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anil Kanduri.
international symposium on low power electronics and design | 2015
Amir-Mohammad Rahmani; Mohammad Hashem Haghbayan; Anil Kanduri; Awet Yemane Weldezion; Pasi Liljeberg; Juha Plosila; Axel Jantsch; Hannu Tenhunen
Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.
international conference on computer design | 2015
Anil Kanduri; Mohammad Hashem Haghbayan; Amir-Mohammad Rahmani; Pasi Liljeberg; Axel Jantsch; Hannu Tenhunen
Limitation on power budget in many-core systems leaves a fraction of on-chip resources inactive, referred to as dark silicon. In such systems, an efficient run-time application mapping approach can considerably enhance resource utilization and mitigate the dark silicon phenomenon. In this paper, we propose a dark silicon aware runtime application mapping approach that patterns active cores alongside the inactive cores in order to evenly distribute power density across the chip. This approach leverages dark silicon to balance the temperature of active cores to provide higher power budget and better resource utilization, within a safe peak operating temperature. In contrast with exhaustive search based mapping approach, our agile heuristic approach has a negligible runtime overhead. Our patterning strategy yields a surplus power budget of up to 17% along with an improved throughput of up to 21% in comparison with other state-of-the-art run-time mapping strategies, while the surplus budget is as high as 40% compared to worst case scenarios.
parallel, distributed and network-based processing | 2014
Mohammad Fattah; Amir-Mohammad Rahmani; Thomas Canhao Xu; Anil Kanduri; Pasi Liljeberg; Juha Plosila; Hannu Tenhunen
Contiguous processor allocation improves both the network and the application performance, by decreasing the congestion probability among communication of different applications. Consequently, the average, standard deviation and worst-case latency of the network is decreased significantly. This makes the contiguous allocation a good solution for time-critical applications with bounded deadlines. On the other hand, non-contiguous allocation will increase the system throughput significantly. Isolated nodes are utilized and more applications can finish their job in a time unit. However, this will lead to poor network metrics, unsuitable for real-time applications. In this work, we combine these two approaches in order to manage workloads with mixed-critical characteristics. Real-time applications are mapped contiguously, while non-critical applications are allowed to get dispersed over the available system nodes. Results show over 50% improvement in worst-case latency and 100 times improvement in deadline misses.
networks on chips | 2015
Mohammad Hashem Haghbayan; Anil Kanduri; Amir-Mohammad Rahmani; Pasi Liljeberg; Axel Jantsch; Hannu Tenhunen
Increasing dynamic workloads running on NoC-based many-core systems necessitates efficient runtime mapping strategies. With an unpredictable nature of application profiles, selecting a rational region to map an incoming application is an NP-hard problem in view of minimizing congestion and maximizing performance. In this paper, we propose a proactive region selection strategy which prioritizes nodes that offer lower congestion and dispersion. Our proposed strategy, MapPro, quantitatively represents the propagated impact of spatial availability and dispersion on the network with every new mapped application. This allows us to identify a suitable region to accommodate an incoming application that results in minimal congestion and dispersion. We cluster the network into squares of different radii to suit applications of different sizes and proactively select a suitable square for a new application, eliminating the overhead caused with typical reactive mapping approaches. We evaluated our proposed strategy over different traffic patterns and observed gains of up to 41% in energy efficiency, 28% in congestion and 21% dispersion when compared to the state-of-the-art region selection methods.
international soc design conference | 2013
Anil Kanduri; Amir-Mohammad Rahmani; Pasi Liljeberg; Kaiyu Wan; Ka Lok Man; Juha Plosila
Embedded systems took a leap as combining computational elements with physical systems led to many novel applications, further saw the rise of a new domain - Cyber-Physical Systems (CPS). Growing importance for CPS in industry threw down many challenges in a designers perspective ranging from computational methods, modeling platforms, programming structures, relevant hardware systems, etc. Ptolemy is the platform which is tailor made for such full scale design of networked and real time systems. In an effort to explore the suitability of Ptolemy II platform for CPS design, we chose an Unmanned Aerial vehicle (UAV) application as a case study. In this paper, we model UAV in Ptolemy II in a modular and hierarchical way such that the system meets the requirements of data flows and dependencies. Key parameters of a typical CPS such as schedulability and predictability were analyzed. In the end, to better the performance of UAV, computational tasks were mapped onto a networks-on-chip based multicore system. Our experimental results show the efficiency of our high level analysis and modeling and the extracted system requirements to enhance the system predictability.
international conference on computer aided design | 2016
Anil Kanduri; Mohammad Hashem Haghbayan; Amir-Mohammad Rahmani; Pasi Liljeberg; Axel Jantsch; Nikil D. Dutt; Hannu Tenhunen
Power Capping techniques are used to restrict power consumption of computer systems to a thermally safe limit. Current many-core systems employ dynamic voltage and frequency scaling (DVFS), power gating (PG) and scheduling methods as actuators for power capping. These knobs arc oriented towards power actuation, while the need for performance and energy savings are increasing in the dark silicon era. To address this, we propose approximation (APPX) as another knob for close-looped power management, lending performance and energy efficiency to existing power capping techniques. We use approximation in a pro-active way for long-term performance-energy objectives, complementing the short-term reactive power objectives. We implement an approximation-enabled power management framework, APPEND, that dynamically chooses an application with appropriate level of approximation from a set of variable accuracy implementations. Subject to the system dynamics, our power manager chooses an effective combination of knobs - APPX, DVFS and PG, in a hierarchical way to ensure power capping with performance and energy gains. Our proposed approach yields 1.5× higher throughput, improved latency upto 5×, better performance per energy and dark silicon mitigation compared to state-of-the-art power management techniques over a set of applications ranging from high to no error resilience.
Archive | 2017
Anil Kanduri; Amir M. Rahmani; Pasi Liljeberg; Ahmed Hemani; Axel Jantsch; Hannu Tenhunen
The possibilities to increase single-core performance have ended due to limited instruction-level parallelism and a high penalty when increasing frequency. This prompted designers to move toward multi-core paradigms [1], largely supported by transistor scaling [2]. Scaling down transistor gate length makes it possible to switch them faster at a lower power, as they have a low capacitance. In this context, an important consideration is power density—the power dissipated per unit area. Dennard’s scaling establishes that reducing physical parameters of transistors allows operating them at lower voltage and thus at lower power, because power consumption is proportional to the square of the applied voltage, keeping power density constant [3]. Dennard’s estimation of scaling effects and constant power density is shown in Table 1.1. Theoretically, scaling down further should result in more computational capacity per unit area. However, scaling is reaching its physical limits to an extent that voltage cannot be scaled down as much as transistor gate length leading to failure of Dennardian trend. This along with a rise in leakage current results in increased power density, rather than a constant power density. Higher power density implies more heat generated in a unit area and hence higher chip temperatures which have to be dissipated through cooling solutions, as increase in temperature beyond a certain level results in unreliable functionality, faster aging, and even permanent failure of the chip. To ensure a safe operation, it is essential for the chip to perform within a fixed power budget [4]. In order to avoid too high power dissipation, a certain part of the chip needs to remain inactive; the inactive part is termed dark silicon [5]. Hence, we have to operate working cores in a multi-core system at less than their full capacity, limiting the performance, resource utilization, and efficiency of the system.
2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip | 2015
Anil Kanduri; Amir-Mohammad Rahmani; Pasi Liljeberg; Hannu Tenhunen
Cyber Physical Systems (CPS) are typically implemented on multicore platforms to handle computational pressure and performance requirements. Complex data flows in those applications cause contention among networked processors, resulting in unpredictable performance. In this work, we explore the impact of application mapping on network contention and predictability. A mapping algorithm focusing on minimizing the number of shared paths while not worsening the total path count is proposed. This algorithm is verified against other recent works over synthetic and realistic traffic patterns. The proposed algorithm considerably improved predictability with 0% shared paths and reasonable latency constraints compared to the other recently proposed algorithms.
design automation conference | 2018
Anil Kanduri; Antonio Miele; Amir M. Rahmani; Pasi Liljeberg; Nikil D. Dutt
Run-time resource management of heterogeneous multi-core systems is challenging due to i) dynamic workloads, that often result in ii) conflicting knob actuation decisions, which potentially iii) compromise on performance for thermal safety. We present a runtime resource management strategy for performance guarantees under power constraints using functionally approximate kernels that exploit accuracy-performance trade-offs within error resilient applications. Our controller integrates approximation with power knobs - DVFS, CPU quota, task migration - in coordinated manner to make performance-aware decisions on power management under variable workloads. Experimental results on Odroid XU3 show the effectiveness of this strategy in meeting performance requirements without power violations compared to existing solutions.
Archive | 2017
Anil Kanduri; Mohammad Hashem Haghbayan; Amir M. Rahmani; Pasi Liljeberg; Axel Jantsch; Hannu Tenhunen
An efficient run-time application mapping approach can considerably enhance resource utilization and mitigate the dark silicon phenomenon. In this chapter, we present a dark silicon aware run-time application mapping approach that patterns active cores alongside the inactive cores in order to evenly distribute power density across the chip. This approach leverages dark silicon to balance the temperature of active cores to provide higher power budget and better resource utilization, within a safe peak operating temperature. In contrast to exhaustive search based mapping techniques, the proposed agile heuristic approach has a negligible run-time overhead. This patterning strategy yields a surplus power budget of up to 17 % along with an improved throughput of up to 21 % in comparison with other state-of-the-art run-time mapping strategies, while the surplus budget is as high as 40 % compared to worst case scenarios.